OCR not working well, even with very text-based images.
171k images. 86k have tiff file "largest." 84k have tiff file called "larger." Difference in resolution.
Jeff: need to develop ranges to process
Would be nice to know how long it would take to run full corpus to make case for using supercomputers