You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It uses the Project Gutenberg CD of 600 books, containing some 3,583,389 sentences.
It runs through twice: first with the first sentence found in the corpus (from zero to fifty-five thousand); second with the shortest matching sentence (zero to forty-eight thousand).
Made something like this:
gutencounter --cache *.txt >> gutencounter-unsorted.md
gutencounter --sort --cache *.txt >> gutencounter-sorted.md
[leave running until have enough words]
cat gutencounter-unsorted.md > gutencounter.md
cat gutencounter-sorted.md >> gutencounter.md
grep "##" gutencounter.md > contents.txt
[hack contents.txt into links]
cat gutencounter.py >> gutencounter.md
wc -w gutencounter.md
[hack front matter and contents into gutencounter.md and <pre></pre> for source]
multimarkdown gutencounter.md > gutencounter.html
I would like to see the numerical sentences closer together, without chapter headings.
Perhaps the numbers could be in bold?
It's too broken up. For me. The layout persists each sentence is in its original isolation. Pushing them together would allow us to see them together. As your algorithm suggests.
Almost forgot about this. Code and output was created before the deadline, PDF knocked up and all uploaded afterwards.
What happens if we want to find each sequential number, in words, in a big corpus?
This is what happens.
It uses the Project Gutenberg CD of 600 books, containing some 3,583,389 sentences.
It runs through twice: first with the first sentence found in the corpus (from zero to fifty-five thousand); second with the shortest matching sentence (zero to forty-eight thousand).
Made something like this:
Then print to PDF using Chrome. Big thanks to @moonmilk for the CSS.
Source: https://github.com/hugovk/gutengrep/blob/gh-pages/gutencounter.py
The text was updated successfully, but these errors were encountered: