Determine if an image is from & ldquo; blank & rdquo; paper


We are making a tool that will allow a chain of photos to be taken. Mixed in with real photos will be photos of blank sheets of paper. I want to separate the series of photos by identifying the images of blank pages.

I'm trying to find a way to identify the blank sheet. Either by counting colors, or some other method. Maybe filesize?

I've got GraphicsMagick, so maybe there's something useful there, and the code will be in PHP, but could be in anything if it works well.

You may do fine with the number of colours, but I am somewhat uneasy about that working well - though it is hard to say without more sample images. So if you run into difficulties with that, you might like to look at the histograms of the two items - paper and "not paper".


Not Paper

You can see that the paper histogram has very steep sides and no tails, whereas the "not paper" histogram has fatter tails. The kurtosis of the image is a measure of exactly that - the fatness of the tails. Higher kurtosis means more of the variance in the image is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. So you would expect the "not paper" to have a higher kurtosis because it has "lumps" of other stuff in the image rather than the fairly uniform paper.

If you get ImageMagick to report the kurtosis of the two images you can see the marked difference.

identify -verbose notpaper.jpg | grep -E "kurtosis:|Red:|Green:|Blue:|Overall"
      kurtosis: 1.03434
      kurtosis: 1.22576
      kurtosis: 0.593927
      kurtosis: 1.49035

And for the paper...

identify -verbose paper.jpg | grep -E "kurtosis:|Red:|Green:|Blue:|Overall"
      kurtosis: -0.953723
      kurtosis: -0.980636
      kurtosis: -1.06634
      kurtosis: -0.0151458

As I said, you may do fine with the number of colours, but maybe consider this if you run into problems.