Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naked Fear, Loathing, Pride, Prejudice, and Brunch at Tiffany's (in Las Vegas). #34

Open
hornc opened this issue Nov 4, 2020 · 0 comments

Comments

@hornc
Copy link

hornc commented Nov 4, 2020

This is going to a continuation of my ideas from last year in NaNoGenMo/2019#65

Treating vocabularies as numbering systems, and works composed from them as large numbers, to be manipulated.

Following some very good advice last year I switched focus towards the end of the month to ensuring I actually had 50k words in some kind of format that was readable, rather than bug free code that was pure and true to a half-baked concept that only I was judging on. It was a good exercise in project management: focus on the results that matter.

I was happy enough with the results last year. Some of the bugs / issues with the tokenisation of the source material seemed to make the output more interesting, and my attempts last year to fix it resulted in (if I remember correctly) less interesting output, so I embraced the glitches and accomplished the goal of producing a generated novel using a simple arithmetic operation on a text.

This round I want to:

  • Generalise the tokenisation to be robust against many kinds of input (I'll be using a mix of properly edited text and some OCR'd source content)
  • Work on formalising the tokenisation algorithm so it is repeatable / comprehensible
  • Overcome the challenge of converting a > 100K word text like Pride and Prejudice into an integer. With the current code this requires more than 4 gig of RAM
  • Work on a shared vocab across more than one source work (4) and do some more interesting averaging or combinations.
  • Figure out if there is a conceptually pure way to make the text output interesting, or whether the output will really be as interesting as reading a large integer.
@hornc hornc changed the title Naked Fear, Loathing, Pride, Predjudice, and Brunch at Tiffany's (in Las Vegas). Naked Fear, Loathing, Pride, Prejudice, and Brunch at Tiffany's (in Las Vegas). Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant