So, how do we rescue millions of historical weather observations that are not currently available to science, such as those in this table? Surely, ML can do this by now... 🧵
Have you tried something like Tabula? tabula.technology
Write a grant to get them scanned
What happens if you use the human read answers from previous digitization efforts as training data for a bespoke ML effort?
I've had luck with reading files into R. Haven't used this package but it claims to be able to do it from png. www.r-bloggers.com/2016/11/the-... Aside, one of my first jobs included typing in water levels from hand written records from the 40s on for usgs wiers. Good to know it's still needed :D
Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract br...
People > machines. Call for the Zooniverse.
OCR seems like a more practical purpose than machine learning for the task
Economic historians have the same problem with tables (no good OCR as far as I know) and solve it by hiring people who do manual entry...
I would try Transkribus which was specifically made to transcribe historic documents.