To simplify analyzing my DSLR vs camera phone pictures, I'd like a reliable way of pulling two key pieces of metadata from a photo: the name of the capturing device and the timestamp of when it was captured) taken. A typical JPG stores this information in exif data, so it shouldn't be hard to access. Consider these four sample photos:
ImageMagick's identify command extracts exif entries with ease. Here's what I see when I run it against my sample photos and look for interesting keywords:
$ for f in *.jpg *.JPG ; do echo $f ; identify -verbose $f | grep exif: | egrep -ie '(date|canon|samsung|eos)' ; done 20230709_050311.jpg exif:DateTime: 2023:07:09 05:03:11 exif:DateTimeDigitized: 2023:07:09 05:03:11 exif:DateTimeOriginal: 2023:07:09 05:03:11 exif:Make: samsung 20230709_055335.jpg exif:DateTime: 2023:07:09 05:53:35 exif:DateTimeDigitized: 2023:07:09 05:53:35 exif:DateTimeOriginal: 2023:07:09 05:53:35 exif:Make: samsung IMG_0029.JPG exif:DateTime: 2023:07:09 09:00:00 exif:DateTimeDigitized: 2023:07:09 09:00:00 exif:DateTimeOriginal: 2023:07:09 09:00:00 exif:Make: Canon exif:Model: Canon EOS Rebel T6s IMG_0045.JPG exif:DateTime: 2023:07:09 09:02:59 exif:DateTimeDigitized: 2023:07:09 09:02:59 exif:DateTimeOriginal: 2023:07:09 09:02:59 exif:Make: Canon exif:Model: Canon EOS Rebel T6s
Extracting a Device Name
It looks like exif:Make or exif:Model is going to be the best source for the camera name. Focusing on these fields, I see:
$ for f in *.jpg *.JPG ; do echo $f ; identify -verbose $f | grep exif: | egrep -ie 'exif:(Make|Model):' ; done 20230709_050311.jpg exif:Make: samsung exif:Model: SM-S908U1 20230709_055335.jpg exif:Make: samsung exif:Model: SM-S908U1 IMG_0029.JPG exif:Make: Canon exif:Model: Canon EOS Rebel T6s IMG_0045.JPG exif:Make: Canon exif:Model: Canon EOS Rebel T6s
SM-S908U1 doesn't mean anything to me, and Canon EOS Rebel T6s is a bit verbose. But mapping these values to easy to work with names requires only a bit of trivial shell scripting:
case $make in SM-S908U1) echo "s22" ;; *T6s*) echo "t6s" ;; *) echo "unknown" ;; esac
Extracting a Timestamp
On the surface, it looks like DateTime contains exactly the timestamp I'm looking for. It uses :'s instead of -'s to delineate the date, but that's trivial to fix:
$ for f in *.jpg *.JPG ; do echo $f ; identify -verbose $f | grep exif:DateTime: | sed -r 's/([0-9]{4}):([0-9]{2}):([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2})/\1-\2-\3 \4:\5:\6/' ; done 20230709_050311.jpg exif:DateTime: 2023-07-09 05:03:11 20230709_055335.jpg exif:DateTime: 2023-07-09 05:53:35 IMG_0029.JPG exif:DateTime: 2023-07-09 09:00:00 IMG_0045.JPG exif:DateTime: 2023-07-09 09:02:59
While these are all seemingly valid timestamps, there's a problem: these photos were all taken in the early morning of July 9th. Why do some have the timestamp of 5am and some 9am? Something's not right.
According to the Google Photos UI, they were all taken between 5am and 6am:
The photos from my Galaxy S22 have exif timestamps that match Google's UI. But the DSLR are pics are totally off. What gives?
First, I spent time closely analyzing the photo metadata from the DSLR's pics. I could see no timestamp that matched what the Photos UI reported.
$ identify -verbose IMG_0029.JPG |egrep -ie '(date|time|stamp)' date:create: 2023-11-08T06:59:29-05:00 date:modify: 2023-11-08T03:58:16-05:00 exif:DateTime: 2023:07:09 09:00:00 exif:DateTimeDigitized: 2023:07:09 09:00:00 exif:DateTimeOriginal: 2023:07:09 09:00:00 exif:ExposureTime: 1/125 exif:SubSecTime: 41 exif:SubSecTimeDigitized: 41 exif:SubSecTimeOriginal: 41 User time: 0.210u Elapsed time: 0:01.329
Then I cursed out Google: how could they show me a correct timestamp on the web, but give me back a random timestamp in the image itself?
And then I remembered that I shot these photos with my camera's internal clock set incorrectly. I used a clever feature offered by Google Photos to shift the image capture time by a relative amount:
Google's almost certainly returning the original exif timestamp to me, not the one that I shifted on Google Photos. While this isn't the behavior I want, it is reasonable behavior. To deal with this, I've implemented my own time shifting logic. My approach is to describe in a text file the original and corrected timestamp for a given photo. Any other photos taken that day will be corrected by the same offset. For example, I can describe my DSLR's timestamp correction for July 9th, 2023 as:
device|date|exif_timestamp|google_photos_timestamp t6s|2023-07-09|09:00|05:46
With this approach, I can issue a correction once and have it apply to the hundreds of photos taken on a give day.
I've packaged up the friendly device name mapping and timestamp correction logic into a shell script named photoassist. Using this script, I can now access scrubbed metadata.
$ for f in *.jpg *.JPG ; do d=$(photoassist -a device -i $f) ; t=$(photoassist -a timestamp -i $f); echo "file=$f, device=$d, timestamp=$t" ; done file=20230709_050311.jpg, device=s22, timestamp=2023-07-09 05:03:11 file=20230709_055335.jpg, device=s22, timestamp=2023-07-09 05:53:35 file=IMG_0029.JPG, device=t6s, timestamp=2023-07-09 05:46:00 file=IMG_0045.JPG, device=t6s, timestamp=2023-07-09 05:48:59
These timestamps now agree with Google Photos and the device names are far easier to recognize and work with.
Next up, I want use this data to annotate and organize a day's worth of photos so I can clearly see what my DSLR is bring to the table.
No comments:
Post a Comment