We are making a tool that will allow a chain of photos to be taken. Mixed in with real photos will be photos of blank sheets of paper. I want to separate the series of photos by identifying the images of blank pages.
I'm trying to find a way to identify the blank sheet. Either by counting colors, or some other method. Maybe filesize?
I've got GraphicsMagick, so maybe there's something useful there, and the code will be in PHP, but could be in anything if it works well.
You may do fine with the number of colours, but I am somewhat uneasy about that working well - though it is hard to say without more sample images. So if you run into difficulties with that, you might like to look at the histograms of the two items - paper and "not paper".
You can see that the paper histogram has very steep sides and no tails, whereas the "not paper" histogram has fatter tails. The kurtosis of the image is a measure of exactly that - the fatness of the tails. Higher kurtosis means more of the variance in the image is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. So you would expect the "not paper" to have a higher kurtosis because it has "lumps" of other stuff in the image rather than the fairly uniform paper.
If you get ImageMagick to report the kurtosis of the two images you can see the marked difference.
identify -verbose notpaper.jpg | grep -E "kurtosis:|Red:|Green:|Blue:|Overall" Red: kurtosis: 1.03434 Green: kurtosis: 1.22576 Blue: kurtosis: 0.593927 Overall: kurtosis: 1.49035
And for the paper...
identify -verbose paper.jpg | grep -E "kurtosis:|Red:|Green:|Blue:|Overall" Red: kurtosis: -0.953723 Green: kurtosis: -0.980636 Blue: kurtosis: -1.06634 Overall: kurtosis: -0.0151458
As I said, you may do fine with the number of colours, but maybe consider this if you run into problems.