A few weeks ago, the evercookie project had its initial release and caused quite some stir among web privacy experts. It allows website owners to track site visitors even with deleted cookies by storing the cookie information in a dozen of other in-browser hideouts.
Although the sinister power of evercookie scared me quite a bit, I have to admit that cookies can be a handy thing. Not only in the web context, but also when dealing with digital images. What if a cookie wasn’t used to identify a (returning) web site visitor, but a JPEG file from a digital camera and all the downscaled, converted or edited .JPG, .PSD or .TIF versions derived from it?
Quite a few times I received requests like “Hey, I really like that picture on your picasa album. Can you send me the hi-res version so that I can order a poster of it?” and then spent hours on looking up the requested “source” file. Inspired by evercookie’s success I tried port some of its ideas to the world of digital images, with the focus on easying the file handling and not on the imposing of a privacy threat.
A PhotoCookie is a special SHA-1 checksum of a JPEG file that not only can identify the file itself, but also edited versions derived from it. Unline common MD5 or SHA-1 checksums, PhotoCookies are also “resistant” to metadata changes. The PhotoCookie won’t change if you add IPTC Copyright Information or an EXIF Geotag to a JPEG file.
After its calculation, the PhotoCookie hash is stored as an IPTC Keyword. Since most image editors, converters or online photosharing services retain these keywords either by cloning them to the converted file or exporting them as tags, the photocookie is automatically “propagated” to all derived versions of the JPEG file.
Thus, photocookie is able to retrieve the source photo of a downsized version e.g. hosted at flickr or picasa without having to rely on fragile filename matches.
More than that, the photocookie tool can also tell apart unmodified source files from their edited versions by simply evaluating their stored PhotoCookie hash against a caculated Photocookie hash of their filecontent. If the two PhotoCookies match, it’s the “original” file. If they don’t, it’s a derived version.
How to prepare a PhotoCookie
- read all data between the JPEG segments SOS (start of stream) and EOI (end of image)
- calculate an SHA-1 hash of this data, output its hexdigest and prefix it with a tilda (~). With the tilda prefix, the photocookie will be listed after all other alphanumerical keywords assigned by the user, which reduces visual clutter.
- store this hash as IPTC Keyword, optionally in multiple metadata locations of the image: EXIF User Comment, JPEG comment. This allows the photocookie tool to restore the hash if the IPTC information got lost during the conversion process but other metadata segments were retained.
The photocookie tool
Currently I’m developing a command-line tool to get, set and manipulate PhotoCookies: the photocookie tool is written in Pyton and offers the following commands:
photocookie get [jpegfile or jpegdir]
gets the photocookie from a jpeg file or all files in jpegdir
photocookie set [jpegfile or jpegdir]
sets the photocookie of a jpeg file or all files in the directory jpegdir. Use the option -f or –fallback to activate fallback storage (the photocookie is also stored in the JPEG comment and EXIF User Comment)
photocookie restore [jpegfile or jpegdir]
try to restore the photocookie hash for jpegfile or all files in jpegdir. If the IPTC This only makes sense when the PhotoCookie was set with the fallback option.
photocookie match [photocookie_hash] [directory]
lists all files in the given directory or filename that match photocookie_hash
photocookie group [jpegdir]
group all files in jpegdir by their photocookie hash, outputs lists of jpegfiles with the same photocookie_hash, separated by a blank line.
An example, please
- upload it to your flickr or picasa account. The photocookie will be converted to a tag. So running a search for the photocookie hash (flickr, picasa) will yield all uploaded versions of this photo.
- or resize/edit it using IrfanView (Sample), Google Picasa (Sample) or Adobe Photoshop. If you save that edited image again as JPEG file, the photocookie will be retained (assuming you didn’t change the default “save as” settings)
How to get PhotoCookie
I’ve set up a public repository for PhotoCookie on bitbucket. Feel free to add comments and contact me with questions.