-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Quantum Supposition of Oz #137
Comments
Nice technique for handling the punctuation -- it does make it more coherent(-seeming) than a run-of-the-mill Markov chain. It should be possible to clean up the intervening spaces with a postprocessor... I wrote one (here) for my own novel, but admittedly I didn't have quotation marks to deal with. |
I ran into a similar issue with punctuation last year and ended up solving it with a postprocessing step. I'm starting to think that it makes sense to have the generator emit marked-up XML or something and then run clean-up on it as a matter of course. |
Outputting some kind of tree structure (like XML) and then flattening it (sensibly) is a good approach. On the other hand, this level of punctuation/spacing messiness is nothing a few rewriting rules can't clean up. Given that this seems to be a "problem" that several participants have encountered, I'm working on generalizing the code I wrote into a proper reusable tool of some sort. (nice change to be doing engineering again after all that Here's what it does, so far, on an excerpt from The Quantum Supposition of Oz:
(I love that last line :) I don't know how long I'll spend on perfectionistically engineering this, but I'm hoping to end up with something like BeautifulSoup except for plain text. If I'm happy with it before 11 more months have passed, I'll announce it on next year's Resources issue :) |
I've had good luck in the past treating punctuation as its own token, then On Tue Dec 02 2014 at 6:04:28 AM Chris Pressey notifications@github.com
|
A different approach to markov tokenization - I've worked with punctuation before in different ways, but for text blobs, so I never had to worry about the spacing. I appreciated the links to Racter/PBiHC, since I hadn't seen the template details before. |
You're welcome. It's surprising there's so little information about Racter out there (and according to Google, I appear to be one of the experts about Racter---sigh). The source to Racter is out there, but what is there appears to be the post-processed output from INRAC, a custom language used to write Racter. It's bizarre (http://boston.conman.org/2008/06/18.2). |
That... is actually a pretty nifty control structure. "Find all labels that match this pattern, then pick one of those labels at random and call it." |
A Markov chain of order-3 based on the Oz novels written by L. Frank Baum (14 novels in total). The only unusual thing here is that I considered punctuation as "words" in addition to the end-of-paragraph, so that you don't get a "wall of text" but something that is a bit more readable (even if the punctuation is separated by space when it shouldn't be).
The code is github: https://github.com/spc476/NaNoGenMo-2014 and the sample novel can be read at https://github.com/spc476/NaNoGenMo-2014/blob/master/TheQuantumSuppositionOfOz.txt
And my blog entry that goes into more detail about how it works: http://boston.conman.org/2014/11/29.1
The text was updated successfully, but these errors were encountered: