John Watson

Hello! My name is Watson and I’m a freelance web developer. I create web sites using the latest tech for clients of all sizes. Contact me and I’ll help you build your dream project.

Checking the integrity of all JPG files in a directory

I recently had a hard drive go bad on me which actually turned out to be a motherboard going bad. The hard drive may be fine but I don’t want to take any chances, and anyway, you can buy 1 terabyte now for about $70 US. There was definitely something goofy going on with access to the drive. I keep daily backups but a perfect backup of a corrupted source is no good. My most important files are my digital photos, mostly JPG images. I did some visual spot checking but once I hit 8,500 photos in 2005 I looked for a better alternative. Here’s what I came up with. (This works on Linux computers. You’ll have to alter your process on Windows and Macs but the principles are the same.)

First, find a nifty little utility called jpeginfo and install it. jpeginfo is a command line program that attempts to quickly decompress a JPG file and tell you what happens. It’s not infallible, but I’m betting that if the JPG file decompresses successfully then the files are good.

Then you just open a shell, cd to your photos directory and run jpeginfo -c on all of your files. Like so:

    cd photos
    find -iname “*.jpg” -print0 | xargs -0 jpeginfo -c | grep -e WARNING -e ERROR
Any corrupted JPG files will be listed.

How it works: the find command lists all files ending with .jpg (case insensitive). It passes it to xargs which runs jpeginfo -c on each file. The grep command shows just the files that had a WARNING. Instead of grep WARNING you could also redirect the output to a file so that you can see all of the results.

I found 4 corrupted JPGs in my collection of over 35,000 photos. I knew about some of these. At least 2 I think were corrupted during transfer off the memory card. JPG images are surprisingly resilient. Even with bad data in the file a lot of the image could be decoded in 3 of 4 of the images.