The OCR is the @archive.org 's own Tesseract output, so it won't match previous transcripts. We're thinking about fine-tuning better models and rerunning (as we've done for some of Chronicling America), if anyone has free machines and interest.
I like it, though I've not used it in a garden yet just in the areas I mulch and don't want plants growing.
It's the legs I can't quite get over 😂
It would be interesting to start with an existing book's text and index and see how well NER and topic modeling reproduced the index and what the differences between the original and the algorithmic index were...
I am totally failing at logging into my Folger account tonight and too tired to sort out why, but if you are a Folger reader you should be able to get access to some of SPO through their digital resources.