Tag: ocr
-
Converting Books to JSON: A Digital Humanities Project
This post discusses a recent project where I scanned PDF issues of the Council of Literary Magazines and Presses (CLMP) Directory of Literary Magazines from 1995 to 2005 and converted those 10 directories into clean, well-structured JSON. The process encompassed several stages, including PDF to text conversion, data cleaning, and data extraction using Python scripts…