PhotoCookie: an easy version tracker for JPEG files

A few weeks ago, the evercookie project had its initial release and caused quite some stir among web privacy experts. It allows website owners to track site visitors even with deleted cookies by storing the cookie information in a dozen of other in-browser hideouts.

Although the sinister power of evercookie scared me quite a bit, I have to admit that cookies can be a handy thing. Not only in the web context, but also when dealing with digital images. What if a cookie wasn’t used to identify a (returning) web site visitor, but a JPEG file from a digital camera and all the downscaled, converted or edited .JPG, .PSD or .TIF versions derived from it?

Quite a few times I received requests like “Hey, I really like that picture on your picasa album. Can you send me the hi-res version so that I can order a poster of it?” and then spent hours on looking up the requested “source” file. Inspired by evercookie’s success I tried port some of its ideas to the world of digital images, with the focus on easying the file handling and not on the imposing of a privacy threat.

Introducing PhotoCookie

A PhotoCookie is a special SHA-1 checksum of a JPEG file that not only can identify the file itself, but also edited versions derived from it. Unline common MD5 or SHA-1 checksums, PhotoCookies are also “resistant” to metadata changes. The PhotoCookie won’t change if you add IPTC Copyright Information or an EXIF Geotag to a JPEG file.

After its calculation, the PhotoCookie hash is stored as an IPTC Keyword. Since most image editors, converters or online photosharing services retain these keywords either by cloning them to the converted file or exporting them as tags, the photocookie is automatically “propagated” to all derived versions of the JPEG file.

Thus, photocookie is able to retrieve the source photo of a downsized version e.g. hosted at flickr or picasa without having to rely on fragile filename matches.

More than that, the photocookie tool can also tell apart unmodified source files from their edited versions by simply evaluating their stored PhotoCookie hash against a caculated Photocookie hash of their filecontent. If the two PhotoCookies match, it’s the “original” file. If they don’t, it’s a derived version.

How to prepare a PhotoCookie

  1. read all data between the JPEG segments SOS (start of stream) and EOI (end of image)
  2. calculate an SHA-1 hash of this data, output its hexdigest and prefix it with a tilda (~). With the tilda prefix, the photocookie will be listed after all other alphanumerical keywords assigned by the user, which reduces visual clutter.
  3. store this hash as IPTC Keyword, optionally in multiple metadata locations of the image: EXIF User Comment, JPEG comment. This allows the photocookie tool to restore the hash if the IPTC information got lost during the conversion process but other metadata segments were retained.

The photocookie tool

Currently I’m developing a command-line tool to get, set and manipulate PhotoCookies: the photocookie tool is written in Pyton and offers the following commands:

photocookie get [jpegfile or jpegdir]
gets the photocookie from a jpeg file or all files in jpegdir

photocookie set [jpegfile or jpegdir]
sets the photocookie of a jpeg file or all files in the directory jpegdir. Use the option -f or –fallback to activate fallback storage (the photocookie is also stored in the JPEG comment and EXIF User Comment)

photocookie restore [jpegfile or jpegdir]
try to restore the photocookie hash for jpegfile or all files in jpegdir. If the IPTC This only makes sense when the PhotoCookie was set with the fallback option.

photocookie match [photocookie_hash] [directory]
lists all files in the given directory or filename that match photocookie_hash

photocookie group [jpegdir]
group all files in jpegdir by their photocookie hash, outputs lists of jpegfiles with the same photocookie_hash, separated by a blank line.

An example, please


This image has a photocookie of ~18f4942f0ffddc4391bf6a5823602533fe0fec88. You can download it and …

  • upload it to your flickr or picasa account. The photocookie will be converted to a tag. So running a search for the photocookie hash (flickr, picasa) will yield all uploaded versions of this photo.
  • or resize/edit it using IrfanView (Sample), Google Picasa (Sample) or Adobe Photoshop. If you save that edited image again as JPEG file, the photocookie will be retained (assuming you didn’t change the default “save as” settings)

How to get PhotoCookie

I’ve set up a public repository for PhotoCookie on bitbucket. Feel free to add comments and contact me with questions.

Posted in EXIF-Hacks | View Comments

How to speed up Datamatrix decoding in Python

But now for something completely different: in my latest project scan.tag.is I’m not dealing with EXIF, IPTC and other “textual” metadata residing in photo file headers, but with metadata consisting of pixels.

scan.tag.is (among other things) tries to decode datamatrix codes embedded in scanned documents in order to process them further based on the decoded information. As scan.tag.is is written in Python, the excellent pydmtx wrapper for libdmtx was the logical choice for this task.


This image displays a tiny part of a photo scan that scan.tag.is will process. The datamatrix code on the left is extended with a textual description and an url. A click on the image gives you a full-res version of the scan which can be used as sample data for this tutorial.

The simple, but slow basics


Using pydmtx is quite straightforward: first you open the to-be-decoded image with the Python Image Library (PIL) to determine its dimensions. Then you create an instance of pydmtx’ DataMatrix class and call its decode()-method with the obtained dimensions and the image data. The extra step of opening the Image through PIL astonished me a bit, but decode() needs the image dimensions as mandatory parameters. Furthermore we will use PIL later to do some preprocessing. The sample script above tries to detect datamatrix codes in an image given as first command line parameter.

If you need a spontaneous coffee break, you can test this simple decoder with the sample scan from above. Likely, the decoder will run for ages and consume about 90% of your system ressources. Not really production-ready…

Performance boost #1: Limit the number of codes to detect

Believe it or not, but with one tiny extra parameter we can radically speed up the decoding process. By default, pydmtx assumes that there are n datamatrix codes hidden in the image and thus scans the image thoroughly not to miss one of them. While pydmtx usually detects the actual datamatrix codes first and quite fast, there might always be “suspicious” areas in an image that loosely resemble a datamatrix code. Checking these potential matches is what slows pydmtx down. Because we know how many damatatrix codes are in the image, we can simply tell pydmtx to stop after the first or n-th successful detection with the max_count parameter:

dm_read = DataMatrix(max_count = 1)
Performance gain: groundbreaking. Detection time droppend from endless (I had to kill my script after 43 minutes) to 2.0 secs

Performance boost #2: restrict the detection time

Seems that we already reached our goal. The pydtmx detection takes less than a second, and we still can fine-tune the process. But what if pydmtx has difficulties with decoding an actually embedded datamatrix code? Then the “endless-loop”-scenario from our first example might come into place again. Luckily, pydtmx has a control parameter for this problem too: by simply setting timeout to a certain number of milliseconds, we restrict the total detection time per image:

dm_read = DataMatrix(max_count = 1, timeout = 1500)
Performance gain: invisible… effectively prevents the “endless loop”-like scenarios from above

Additional tweaks: specify shape, squareness and number of error corrections

Datamatrix codes can be differentiated by their shape (square or rectangular) and their module count (number of black/white “cells” that build the code). The module count also defines the maximum length of the encoded message. To speed up the decoding, we can provide the decode method with the expected shape of the Datamatrix code. The Datamatrix class defines enumerated constants for this purpose, ranging from the most common DmtxSymbolSquareAuto (-3) to DmtxSymbol16x48 (29). Thus, we can specify the expected code type like so:

dm_read = DataMatrix(max_count = 1, timeout = 1500, shape = Datamatrix.DmtxSymbolSquareAuto)

The shape parameter didn’t bring any significant speed gain in my tests. My test objects were scanned images, but if you want to decode photos or even videos, these parameters can become interesting. Panoramic distortion and noise makes decoding more difficult for pydmtx, so a precise configuration can really be valuable. There are a few other parameters such as edge threshold, error corrections or gap size. If you want to tweak’em all, I recommend to check Simon Wood’s pyDatamatrixScanner, an enhanced GUI for pydmtx that can even detect Datamatrix codes in webcam video.

Performance boost #3: Scale down and/or crop the image

You can further speed up the processing time by scaling down the image before sending it to pydmtx. PIL’s thumbnail method seems the ideal choice for this, because it takes care of the scaling ratio and lets you specify the scaling method easily. However, too drastic scaling will decrease the detection rate, so you might have to do some tweaking before you add it to your workflow.

If you know the approximate position of the datamatrix code in the image, you can use PIL’s crop method to remove the non-relevant parts before detection. Here is a sample script that proportionally scales down the input image to 1060 x 1500 pixels and then uses crop to send just the lower third to pydmtx:

These simple tweaks further reduced the total processing time from 1.9 secs to 0.41 secs, that’s a performance gain of over 75%!

Bonus Tip: Increase the detection rate by adding an “artificial” quiet zone

The Datamatrix code specification states a “quiet” zone (actually a white pixel border) around each datamatrix code. Datamatrix codes generated with pydmtx perfectly adhere to this spec, so I was quite surprised about the low code detection rate in my scans. After some research I found the guilty: as I placed my datamatrix code on the very bottom of the scanning glass. my scanner driver ruined this quiet zone by cropping away the seemingly needless pixels on the bottom of the page. As a consequence, the detection rate dropped drastically.

Luckily, PIL came to rescue again: its ImageOps module offers an easy function to draw a white border around an image, which perfectly serves as a “fake quiet zone”. Here is a version of the above script with this enhancement:

But remember: this tip only makes sense if the datamatrix code is located at the border of the image.

Conclusion

pydmtx integrates nicely with Python and is easy to install. However its documentation is lacking, which can cause programmer frustration. I hope that this tutorial eliminated the most common performance pitfalls with pydmtx and encourages more projects to use this fantastic library.

Posted in Uncategorized | View Comments

pyexiv2 – the best choice for photo metadata manipulation in python

Some days ago, the python community received an early easter gift: version 0.2.0 of the image metadata library pyexiv2 was released. First of all, I want to congratulate project maintainer Olivier Tilloy and all his contributors to this important milestone. Since Geotagging, Copyright Assignment, Timestamp fixing and other image metadata manipulation techniques are commonly used by photographers, it is important to have free software tools in this field.

While there are a lot of command line tools for metadata manipulation, only a few python libraries deal with this subject. A recent Article on EXIF Extraction in Python introduces some of them, but unfortunately doesn’t cover the newly-released pyeviv2. So I decided to spread the word about it.

What is pyexiv2?

pyexiv2 is a Python binding to exiv2, the C++ library for manipulation of EXIF, IPTC and XMP image metadata. As mentioned above, it allows you to perform many metadata manipulation tasks, such as correcting wrong photo timestamps, assigning GPS coordinates to your photos or simply dumping out the EXIF metadata generated by your camera.

Why should I use pyexiv2 instead of EXIF.py or similar libs?

  • pyexiv2 also supports Camera RAW (.NEF, .CRW, .ORF, etc) files and even gives you access to their embedded preview images, which allows you to perform very fast RAW-to-JPEG conversions.
  • It is able to decode many important Makernote Tags, such as LensTypes and ProgramModes
  • Being C++-based, pyexiv2 offers ultra-fast read and write support for metadata. In my tests, pyexiv2′s underlying library exiv2 was magnitudes faster than exiftool when it came to writing metadata
  • It provides a convenient Python API, no need to call external tools (as with various python-to-exiftool wrappers)
  • pyexiv2 automatically converts many tag values to approbriate Python datatypes (e.g. Exif.Image.Datetime returns a Datetime object), freeing you from the parsing date strings like '2004-07-13T21:23:44Z'. Of course, you can also access the raw value of a tag.
  • [UPDATE]: I almost forgot to mention the most important merit of pyexiv2: it has a very robust metadata parser that is tested against hundreds of sample photos from different camera models. While I appreciate the various pure-Python EXIF parsers for educational purposes, they never came close to pyexiv2′s stability.

What is new in version 0.2.0 ?

pyexiv2 0.2.0 handles XMP metadata and supports reading images from stream using the from_buffer method. This means that you can provide pyexiv2 with a variable holding the image data, in prior versions, only the filepath of the image was accepted. Furthermore, pyexiv2 is now compiled against libevix2 0.19, which brought huge performance improvements for reading metadata. Finally, it exposes all recent enhancements of the libexiv2 API, such as preview image access.
pyexiv2 is a complete, non-backwards-compatible rewrite and features an improved, better documented API. It was compiled and tested under Linux and Windows.

Installation & first steps

Windows folks can download the .exe-Installer (requires Python 2.6 or newer on your system), Linux folks have to get the source code and install the required dependencies before compiling pyexiv 0.2.0. There are also pre-built packages in debian experimental, but these are not suitable for production environments.

The excellent introductory tutorial on pyexiv2′s website will make you quickly proficient with the library.

Posted in EXIF-Hacks | View Comments

GearsZipper Part2: Adding support for real files and canvas elements

In the first part of my GearsZipper tutorial I showed you how to create zip archives with in Gears with the help of JSZip. This already worked fine, but our only possible content were Javascript strings. What if we want to add “real”, external files to our zip? If we take the use case of our CSS Editor, this could be static graphic assets like PNGs for border or shadow effects. We certainly don’t want the user to download these seperately.

Extension #1: Adding support for “real” files

Luckily, Gears helps us again. The already mentioned RessourceStore class also has a captureBlobmethod, that allows you to convert an arbitrary url to a Blob. A Blob represents the binary content of a file, thus we can use its getBytes method to convert the file content to a format that JSZip understands. getBytes returns the file content as an array of integers, with Javascript’s built-in String.fromCharCode method we can convert this integer array to a byte string that JSZip understands. Voila, now we are able to include “real” files.

However, this feature also introduced a little drawback: since the captureBlob is asynchronous, we also need to make our utility function asynchronous, but this shouldn’t make our code too complicated. Here is a demo of showing the embedding of real files.

Photo uploads and their problems

A high percentage of all uploading activity is related to photos. Online photo galleries, photo sharing sites or even blog offer specialized upload features for photos. There are quite a few problems related to photo uploading:

  1. Resize before upload Unexperienced users often want to upload the original photos from their camera without resizing them before. This leads to enormous upload times and/or failed uploads.
  2. Upload of multiple photos Thanks to the openFiles() method of Gears’ Desktop class, its quite easy to implement multi-file uploads on the client side. The headaches start on the server side: the uploaded photos arrive peu-a-peu at the server, you need to send session id’s with each photo to assign them to a certain user. Additionally, an increment counter (e.g. “photo 5 of 10″) must be transmitted. Furthermore its also more complicated to notify the user of the upload progress.

There are many Java Applets that target this use case, but if you want minimal dependencies, then its better to stick with Gears. As I already mentioned, Gears is like a time machine that brings the future features of HTML 5 to the browsers from nowaday. Even its API is modelled close to the HTML 5 spec, so once your browser supports HTML 5 natively, it will only cost you little effort to port your uploading solution from Gears to HTML 5. At the time of this writing, Firefox 3.6, the first browser with a complete HTML 5 implementation is ante portas and the others will soon follow. Thus, sooner or later you have a fully standardized solution without any external dependencies.

Batch Upload of resized photos in a zip file

We try to tackle the above scenario with a simplistic approach: all selected photos are resized and added to one zip archive, which can then be processed at once on the server side. Of course this approach has drawbacks, but it saves us from coding a complex server backend and might be applicable to some uploading use cases. In order to create the zip archive, each selected photo has to undergo the following process:

  1. Determine the width and height of the selected photoTo maintain the aspect ratio of the image, I first determine the width and height of the original image by calling the desktop.extractMetaData() method. It takes a Blob as parameter and gives you all the metainformation it knows about it. For images, it returns height and width.
  2. Resize each selected photo to a maximum width of x pixels. This is possible with the Gears Canvas API. Contrary to HTML 5′s Canvas element, the Gears Canvas is an offscreen image processor. It doesn’t draw anything on your screen, but offers resize and crop manipulations for existing images. In my demo, I wanted to resize all images to a maximum size of 400 pixels, so I calculated the new proportional height with the formula Math.round(400/(metadata.imageWidth/metadata.imageHeight))
  3. Encode the resized image We want to add the contents of the resized file to our zip, so we call the Canvas.encode() method. Encoding here actually means that the raw pixels of a canvas image are written to a specific file format. With the mimeType and options parameters of Canvas.encode() you specify the file format (JPEG or PNG) and the compression rate. Canvas.encode() returns a blob with the contents of the file, which we can pass on to GearsZipper.
  4. Adding the blob to the zip Now we can feed the generated blob with a filename to GearsZipper. Internally, GearsZipper encodes the blob to a byte string and feeds it to JSZip. The result is a .zip archive containing resized versions of all selected images.

Further optimisations (exercise to the reader)

If you tried the batch upload demo with a few photos, you might have experienced an unpleasant behaviour: your browser freezed for minutes and the famous Abort this script? message appeared. The reason: the computationally expensive processing of the zip file happened in the main “thread” of the browser that is also responsible for drawing the UI and responding to the user input. It would be highly desireable if the zip creating happened in the background and Gears’ WorkerPool API is the perfect way to achieve this. I haven’t found time to implement this, so I leave it as a nice exercise to the reader…

Conclusions

I hope this series showed you the true potential of Gears and also some interesting applications of client-side zipping. At first sight these use cases might not be obvious, but some client-centered applications could be coded much easier if also the “data export” part resided on the client side. Apps like Themeroller, Picnik and others would certainly benefit from such an approach. Furthermore, the advent of HTML 5 and its file api could also break the last barrier for bringing many desktop apps to the web.

Posted in Uncategorized | View Comments

GearsZipper: Creating ZIP Archives in Gears – Part 1

Imagine that you just coded a HTML/CSS Editor in Javascript that leaves nothing to be desired. Your app knows about the most arcane CSS properties, offers instant preview and has a GUI that simply rocks. But there is one weak point: when the user wants to save his creation, he has to copy/paste the code from TEXTAREAs and create the necessary html/css files on his own.

The Woes of server-side zipping

Wouldn’t it be nice if your app could serve a ZIP archive of all created files? “But this requires a server-side script!” I hear you complaining. And as soon the server side gets involved, the problems start: you have to construct quite complex AJAX requests to submit the user’s code and preserve its structure (Themeroller users might now what I’m talking of ). You’d also have to implement some security measures to prevent abuse of your script. This could go as far as forcing your users to log in before they can use your app, making it less attractive for “drive-by”-users

Welcome to client-side zipping

In this blog post, I present you a receipe for client-side zipping, implemented almost in pure Javascript. Our first ingredient is Stuart Knightley’s fantastic JSZip library. It takes a few Javascripts Strings, stuffs them as individual files in a zip archive and returns the binary content of the archive. Note that JSZip doesn’t perform any compression, it simply concatenates the string data and adds some information about the file organisation. When you extract the archive, the specified files are created and filled with their content.

The strengths and weaknesses of JSZip

The only weak point of Stuart’s approach is the delivery of the zips to the user: he dumps out a base-64-encoded version of the zip archive to the browser’s address bar. This concept is called data URI and causes the browser to trigger a zip download. Unfortunately, data uris have some disadvantages: the required base-64 encoding increases the file size by about 33%. The poor cross-browser support for data URIs (IE 7: nada, IE8: length limited to 32 kb) makes this approach a no-go for our purpose.

Gears to the rescue

Our second ingredient, Gears, comes to the rescue. Among other features, Gears acts like an emergency battery for web applications: it offers offline continuation for web applications by hosting all required files on a tiny webserver called LocalServer. Luckily, we can add new files to LocalServer programmatically via Javascript and thus make Gears serve our dynamically created zip archive. Contrary to the data URI approach of JSZip, we don’t need to base-64-encode our resulting zip archive. This results in a smaller filesize, – but much more important – in a faster compression time. Furthermore, Gears doesn’t impose limits on the size of the generated zip archive and thus makes our solution work across all modern browsers (Note: if your browser uses an older/less advantaged javascript engine, you will surely experience performance impacts. However, GearsZipper is not designed to create multi-gigabyte zips).

The basic recipe as pseudo-code

Here is the basic recipe of my approach as pseudo-code. I admit that a certain knowledge of the Gears Javascript API is required to fully understand it, but nothing prevents you from diving into it.

  1. Invoke JSZip with all the Javascript strings and desired filepaths for the zip archive. Use JSZip.generate(true) to avoid the base-64 encoding of the result string.
  2. Convert the binary result string of JSZip to a Gears Blob using the BlobBuilder class
  3. Store the created Blob on Gears’ LocalServer by using the captureBlob method ofLocalServer’s RessourceStore class. captureBlob also allows you to specify an URL under which the Blob content will be served.
  4. After the creation process is done, insert the generated URL into the webpage or redirect to it to cause an immediate download.

The Demo, please!!!

After digesting all this technical stuff, you really deserve a cool demo. Note that this Demo already includes features that I will introduce in the second part of this tutorial: the inclusion of “real” files and canvas items to the archive.

Posted in Gears Adventures | View Comments

ThermoTagging: Evaluating the CameraTemperature

Last week, I was blogging about exiftool’s newest tag discovery, the CameraTemperature. I assumed that the CameraTemperature roughly equals the environment temperature and sketched some really exciting applications based on this idea (e.g. assigning keywords or conducting searches using this tag).

Unfortunately, some open questions remained: I couldn’t evaluate my assumption, because I don’t own a PowerShot. Another interesting, yet unanswered question is how quickly the CameraTemperature adapts to the environment temperature. Luckily I spend this weekend at my mother’s house and could conduct the necessary tests with her Powershot A2000 IS.

My test setup

My test setup was fairly simple: turn on the Powershot for 5 minutes and take a picture to my digital thermometer every 20 seconds. I planned to conduct this procedure inside my appartment (approx. 24 degrees Celsius room temperature) and outside (approx. 3 degrees Celsius). Before running my tests I allowed all devices an acclimation time of 10 minutes. I deliberately choose this period to be rather short, because I wanted to test how fast the CameraTemperature adapts to a changing environment temperature (think of quickly leaving your warm room to take some pictures of the snowman the kids have built).

Although the Powershot A2000 IS is just pocket size, I didn’t put it in my jacket to avoid any accidential heating by my body.

Interior Test

The interior test ended quite promising: I took 15 pictures, the first 7 yielded a camera temperature of 20 degrees, the other 8 of 21 degrees. My thermometer was showing constantly 23.3 degrees. Wow! This means just a difference of 2-3 degrees.

The thermometer reports 3.7 degrees celsius, but the CameraTemperature yields 18 degrees for this picture.

The thermometer reports 3.7 degrees celsius, but the CameraTemperature yields 18 degrees for this picture.

The disappointing Exterior Test

The exterior test caused more headaches (not just because of the cold October ;-) ). Even after the acclimation period of 10 minutes, my reference thermometer was still showing the interior temperature of 23.3 degrees. When I “rebooted” it by removing the batteries, the temperature first dropped down to 8, then to 3 degrees. This was the accurate temperature according to the local weather report.

However, for my total 20-minutes-stay outside, the camera temperature never dropped below 17 degrees. Quite far away from the 3 degrees actual exterior temperature.

previous interior temperature: 23.7 C, exterior temperature: 3.7 C

This diagram shows the slow acclimation time for the CameraTemperature. Coming from a previous interior temperature of 23.7 C, it was exposed to an exterior temperature of 3.7 C for 10 minutes. After that, numerous shots were taken at the same temperature. The points indicate the recorded CameraTemperature for each shot. After 20 minutes, the CameraTemperature just dropped to 17C

Conclusions

The results from the exterior test seem to indicate that the camera needs a long acclimation time until its temperature reachs the external temperature. In my tests, the temperature dropped by just 5 degrees in the first 20 minutes, which equals to 25% percent of the total temperature difference of 20 degrees. If we assume a linear progression, the camera needs to be exposed at least for 4*20 = 80 minutes to an environment until its temperature equals the environment temperature.

Posted in EXIF-Hacks | View Comments

Take a (Power)Shot at ThermoTagging

Phil Harvey recently published Exiftool 7.97, which comes with a well hidden, but exciting feature: it can decode the camera temperature for Canon PowerShot Models. All Canon Powershots built 2005 or later store this information in their EXIF header, albeit in a proprietary, previously unknown format. According to Phil, the camera temperature corresponds more or less to the external temperature, if the device was exposed for a certain time to an environment. This opens interesting possibilities for image searches: you can quickly filter out pictures taken on a hot beach or during a snowball fight in winter. This article shows you how to access the camera temperature and explores the possibilities of thermo tagging.

UPDATE:Also quite a few Canon IXUS and EOS models as well as many Pentax cams seem to write this tag.

filtering the temperature of your photos

Hot or cold? - The CameraTemperature tag recorded by Canon Powershot cameras allows you to conduct temperature-based searches.

Footage (c) Flickr users Orangesmell and Aduki

Got footage?

If you don’t own a PowerShot, you can quickly download the necessary footage: Phil Harvey hosts a huge “Meta Information Repository”, that contains sample images from many camera manufacturers, including Canon. Simply download the Canon Archive and extract it to your hard disk. Note: to save bandwith, the actual image has been replaced by a tiny placeholder, but the original metadata is still in place. This is fully sufficient for our experiments.

As I don’t own a PowerShot by myself, I also made use of the footage in the repository. If you are a PowerShot user and want to create footage by yourself, be warned: don’t expose your camera to extreme temperatures as this might rapidly shorten its life expectancy.

Extracting the Camera Temperature

First make sure that you ‘ve installed Exiftool 7.97 or newer on your system. When in doubt, simply run

exiftool -ver

from your shell or command prompt – this command displays exiftool’s version number. Then navigate to a folder containing unmodified photos taken by a Canon Powershot camera and run

exiftool -CameraTemperature -ext JPG .

and you should get a list like

Camera Temperature : 23 C
======== CanonPowerShotSD890IS.jpg
Camera Temperature : 26 C
======== CanonPowerShotSD900.jpg
Camera Temperature : 25 C
======== CanonPowerShotSD950IS.jpg
Camera Temperature : 30 C

Exiftool lists the camera temperature in degrees celsius. The -ext JPG . part of the command line makes exiftool look for all JPG files in the current directory, no matter if the extension is upper- or lowercase. By appending -r you can expand the search to subdirectories.

Filtering by temperature

With a bit of command-line-magic, you can also filter your photos by temperature:

exiftool -cameratemperature -if '$cameratemperature# < 20' -ext JPG .

will list all photos that were taken at a camera temperature below 20 degrees celsius, whereas

exiftool -cameratemperature -if '$cameratemperature# >= 20' -ext JPG .

displays a list of all photos taken at 20 degrees or more.

These two filtering commands deserve some explanation: the -if parameter tells exiftool to process only files that meet a certain condition. Also the inner part of the if clause is a bit special: '$cameratemperature# < 20' consists of the tag name cameratemperature (the prepended $ is needed to identify a tagname within a string). The appended # tells exiftool to convert the tag value to a number (e.g. 19 instead of '19 C'). The rest of the clause is quite easy: '< 20' stands for "all values smaller than 20", while >=20 means “greater or equal to 20″.

ThermoTagging hot or cool photos

As exiftool “decodes a riddle wrapped in a mystery inside an enigma” (quote Phil Harvey), it can extract such esoteric tags like Canon:CameraTemperature that are buried in the depths of propriertary Makernotes. Most (if not all) other Metadata readers and image management applications fail decoding this information. So it would be nice to convert the camera temperature to a format that all other applications can access. The most common way to do so is to create an IPTC Keyword for the information contained in the tag. IPTC Keywords are read by most image editors including Photoshop, many photosharing services (including flickr and picasa) auto-convert them to tags during photo upload.
Converted Thermotags at flickr.com
Converted Thermotags at flickr.com

Storing descriptive tags

exiftool -keywords+=cold -if '$cameratemperature# < 15' -ext JPG .

adds the IPTC keyword "cold" to your photo, if the camera temperature was less than 15 degrees. Just modify keyword and if-clause to tag "hot" photos ;-)
To test your results, you can use


exiftool -keywords -cameratemperature -if '$keywords eq "cool"' *.jpg
======== CanonEOS_KissX2.jpg
Keywords : cool
Camera Temperature : 18 C
======== CanonIXY_DIGITAL20IS.jpg
Keywords : cool
Camera Temperature : 17 C
[...]

This commands lists all photos with the keyword “cold” and extracts their keywords and Camera Temperature.

Storing precise values

If you want to store the precise degree value, you can use:

exiftool '-cameratemperature+>keywords' *.jpg

which copies the Camera Temperature value directly to an IPTC Keyword:

exiftool -keywords CanonPowerShotSX200IS.jpg
Keywords : cool, 18 C

Is the camera temperature really identical with the outside temperature?

Although I don’t own a Canon Powershot, I wanted to verify Phil’s assumptions on Canon’s CameraTemperature. In short, he claims that the CameraTemperature is close to the outside temperature, if the device was left in an environment for a certain time. According to him, taking a lot of images in a short time additionally heats up the camera by some degrees. While my tests can’t prove the equality of the temperatures, they indicate a strong correlation (maybe with some offset factor):

  • Range of the CameraTemperature exiftool dumped out values between 12 and 39 degrees celsius for Phil’s sample images. This indicates that the camera temperature is maybe a bit above the outside temperature (why are there no pics with values less than 10 degrees?)
  • Correlation of creation date and CameraTemperature When I compared the temperature with the creation date (Exif:DateTimeOriginal), it turned out that all photos with value less than 20 degrees were taken in autumn or winter
  • Visual correlation I extracted a few thumbnails from the “hottest” (> 35) degrees and coldest (< 15 degrees) values) and tried to find visual hints for heat or coldness (e.g. snow, summer clothes etc). My verdict is that there are no contradictory elements, but you can judge for yourself by looking at the hottest or the coldest thumbnails.

Acknowledgements
I want to thank Phil Harvey for developing Exiftool and his suggestions on this article. Vesa Kivisto also deserves a honorary mention for the decoding of the CameraTemperature tag.

Posted in EXIF-Hacks | Tagged , , , , , | View Comments

Shutter Count – the mileage indicator for your DSLR

So you are about to buy a used DSLR cam from eBay. The model isn’t top-notch, but still solid quality and was released 2 years ago. The pictures on the auction page of the cam look good, but was the camera really rarely used as stated in the auction text? Or are you making your bid for an over-used device that already suffers from abrasion “diseases” like wobbly buttons?

Estimating the “true age” of a camera

The date of purchase isn’t a very reliable indicator, as the usage intensity of cameras can vary greatly. A 2-year-old DSLR that is mainly used at holidays or family events is surely in a better condition than one that serves as bi-weekly-party snapshot device.

It would be nice to have something like a mileage indicator for DSLRs, that more or less counts every picture ever taken with a particular cam. Luckily, such a magic indicator already exists, although its information isn’t always easy to obtain: The Shutter Count measures the number of shutter releases during the lifetime of your camera. A shutter is a moveable device, that exposes the camera sensor to the light during the shooting of the picture. Roughly spoken, the number of shutter releases approximately corresponds to the number of pictures taken with your camera.

Accessing the Shutter Count with exiftool

Unfortunately, the Shutter Count isn’t very easy to access. Only a few DSLR models from Nikon, Canon, Pentax and Samsung (green model names) dump this information out with their EXIF metadata. If you intend to buy such a camera, you can ask the seller to mail you an unmodified photo directly taken from the camera.
Jeffrey Friedl's Exif Viewer reports the Shutter count
Jeffrey Friedl's Exif Viewer reports the Shutter count

You can then either upload the photo to Jeffrey Friedl’s Exif viewer or install the famous ExifTool metadata extractor on your system (Win/Mac/Linux). Open a shell/console, navigate to the directory where you stored the photo and type

exiftool THE_UNMODIFIED_IMAGE.JPG | find "Shutter Count"
on Windows or


exiftool THE_UNMODIFIED_IMAGE.JPG | grep Shutter Count
on Linux or Mac OS.

If a line like
Shutter Count: 2391

appears at your command prompt, your attempt was successful. If nothing appears, your model likely doesn’t include the shutter count in its metadata. NOTE: Only a few EXIF viewers will extract the Shutter Count value, as it is stored in the so-called Makernotes (proprietary extensions of the EXIF standard). To get reliable results, please use the tools I mentioned above.

The Firmware way

This Olympus E-500 has a Shutter Count value of 2265 (screenshot of cam display, yellow highlighted)

This Olympus E-500 has a Shutter Count value of 2265 (screenshot of cam display, yellow highlighted)


Anyway, there is still hope: a few Models, such as the popular Olympus E-Series cameras, allow you to access the shutter count by sniffing the camera’s firmware. The downside: you need physical access to the camera and some patience (or a reliable AND skilled person that does this for you). This guide shows you how to display the shutter count value on the screen of your Olympus E-Series camera.

Interpreting the numbers

You had a hard time to retrieve the shutter count, but what does the value mean? Is the cam still a teenager or already almost in heaven? Oleg Kikin maintains a Shutter Count life expectancy database. Users report their shutter count numbers and whether the cam is still alive or already trash.

Life expectancy of a Nikon D70 based on the shutter count value

Life expectancy of a Nikon D70 based on the shutter count value (clicks). Only 37% of all models with more than 100.000 pictures are still working.


While these numbers can give you some orientation, don’t forget to consider that a lot of factors impact the live expectancy of a DSLR: outdoor usage, storage or maintainance intervals. You should also be aware of the fact that shutter counts are not immutable. EXIF-based shutter counts can be manipulated quite easily, but also the firmware values aren’t immune: firmware upgrades or repairs may alter the shutter count.

Posted in EXIF-Hacks | Tagged , , , , , , , , | View Comments

Ultra-fast RAW to JPEG conversion with exiv2

Finally you escaped from your dusty appartment and spent a wonderful day shooting photos in the wild with your brand-new SLR. To capture all the magnificient details of nature, you decided to shoot in Camera RAW, and not in crumpy-old JPEG format. While waiting for the train home, you quickly want to upload the best shots to your flickr account. Since each RAW photo has a filesize of 10 Megabytes, this isn’t the most bandwith-friendly uploading format. Furthermore, your tiny netbook offers only little CPU horsepower. So you are desparately looking for an easy and performant way to convert dozens of RAW photos back to JPEG.

Imagemagick to the rescue?

When it comes to batch-file processing, Imagemagick is often the obvious choice. No nagging dialogs or splash screens, just a pure and simple command line interface.
In fact, converting a bunch of Camera RAW files to JPEG is as simple as typing:

mogrify -format jpg *.*

in a shell/console window (of course you need to cd to your RAW photo directory before). Unfortunately, this simple approach has some downsides:

  • Imagemagick depends on UFRaw to be able to process RAW photos. If this package is not installed, Imagemagick exists with a rather cryptic error message.
  • On my quite up-to-date laptop (Intel Core2 Duo T9300, 3 GB Ram, Ubuntu 9.04, ImageMagick 6.5.0-2), the conversion process took 65 seconds for 7 RAW files under almost 100% CPU load. That’s 10 seconds per RAW Photo. But for a low-end-netbook, you might have to multiply these numbers by 4 or 5, since the CPU/memory is much weaker.

So, waiting 40 seconds for one photo to convert is not your preferred choice?

Why so slow?

Don’t blame Imagemagick for its poor performance. Converting RAW photos to other formats is actually a computationally intensive task, since the RAW format doesn’t store ready-to-display pixel/color-channel data, but just the information captured by the photo sensor of the camera. You can think of it as some sort of “digital negative” that needs a huge amount of processing, before it becomes a viewable image. The benefit of this “raw” approach is that you can massively influence the development process. With JPEG coming out of your camera, the possibilities are much more limited.

Speeding up the conversion

But back to our problem. If the conversion process is that CPU/memory-hungry, how can we improve it?
The answer is easy: by circumvention. If we look closer at the contents of a RAW image file, we find

“[..] optionally a reduced-size image in JPEG format which can be used for a quick and less computing-intensive preview[..].”

Luckily, “optionally” means de-facto-standard, since the RAW image formats of Nikon, Canon, Olympus, Pentax and many other manufacturers support this feature.
So we don’t need to compute the JPEG representation of our RAW file, we can simply “snip out” an already embedded JPEG from the RAW file. We? Let’s delegate this to a specialized program.

Almost-realtime-”conversion” with exiv2

Since version 0.18, the excellent exiv2 utility can not only extract EXIF/IPTC information from an image, but also dump out embedded JPEG preview files from RAW photos.
The workflow is straightforward: Open a shell/console window and cd to your RAW photo directory.
First we need to find out, which previews are embedded in our RAW file:

exiv2 -pp RAW_CANON_350D.CR2

exiv2′s response tells us that there are 2 embeded preview images, one thumbnail with 160×120 pixels, and a larger preview with 1536×1024 pixels:

Preview 1: image/jpeg, 160x120 pixels, 5164 bytes
Preview 2: image/jpeg, 1536x1024 pixels, 151637 bytes

Preview #2 is just fine for our purpose (uploading to flickr), so we extract it with

exiv2 -ep2 RAW_CANON_350D.CR2

which creates a file called RAW_CANON_350D-preview2.jpg in the current directory. The option ‘-ep2′ simply means “extract preview #2″. To batch-process all our RAW files in the current directory we can type:

exiv2 -ep2 *.CR2

because all RAW files from one camera manufacturer have the same preview image structure. Don’t forget to adapt the file extension to the one of your manufacturer, e.g. *.ORF for Olympus, *.NEF for Nikon.
If you want to extract the preview images to some other directory, the exiv2′s ‘-l’ option is perfect:

exiv2 -ep2 -l previews *.CR2

would extract the JPEGs to the already existing subdirectory ‘previews’.

Downsides

exiv2′s extraction approach is ultra-fast, but has the downside that its limited to the embedded previews of the RAW file. You can’t generate JPEG dumps of an arbitrary size using exiv2. However it is possible to process the extracted previews with Imagemagick (e.g. to scale them down further, apply better compression). Furthermore you have to dispense with the power of RAW processing. However, our described method is just right when you need a quick conversion and/or only have limited CPU horsepower.

Posted in EXIF-Hacks | View Comments

picurl 0.0.1 released

The picurl team is pround to announce the release of picurl 0.0.1, which is available from our Downloads page.
picurl 0.0.1 is available for *NIX systems, a Windows version is in the works.

Our development wiki tells you what picurl is and why you might want to use it.

The QuickStart-Tutorial helps you with setting up and using picurl. If you want to see picurl in action, check our ShowMeDo Demo Video (Flash player required).

Posted in Uncategorized | View Comments