-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find software to convert scanned docs -> searchable text #5
Comments
This relatively-new software service from non-profit Open Media Foundation helps small governments publish agendas as searchable text. I'll add, our open-source Councilmatic could be enhanced to take .pdf's of agendas and publishing them in structured formats. (The .pdf's are generally currently coming from government vendor, primary city data publisher Legistar.) |
Here's one option I've tried out: Output for the San Leandro agenda for 7/17/2017 at 5:30pm (link here):
Some PDFs will have extracted text with newline characters in them, but others, such as in this example, just return a string without any real structure. I don't know how easy this would be for the text wranglers to parse. |
are there open-source tools to convert the text in a scanned letter/agenda into searchable text?
The text was updated successfully, but these errors were encountered: