-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RESOURCES! #11
Comments
A submission from @scottmadin: Python Markov chains: https://pypi.python.org/pypi/PyMarkovChain/ Also, similar things in NodeJS: https://npmjs.org/package/archive.org |
I wrote a "Samsa bot" that uses Bing's Ngram database to generate text. You might find it and the associated libraries useful (all Ruby). https://github.com/willf/microsoft_ngram/blob/master/examples/samsabot.rb General library: |
Since @willf is too humble to plug it, Wordnik is an indispensable resource for all things text-related: definitions, parts of speech, random words, rhymes, hypernyms, etc: |
Here's a dump of my notes about generating stories: @rfreebern researched this problem a few years back for this game project of his:
Fairy tales are really well-explored variants of the standard storytelling archetypes described by people like Joseph Campbell. There are a couple of ways that fairy tales are organized, which include their plot outlines (although not their cultural or moral implications): Aarne-Thompson, and Propp. http://en.wikipedia.org/wiki/Aarne-Thompson_classification_system Propp's classification system has been used as the basis for a number of generators and is still the most-used mechanism in the academic literature for such things: http://en.wikipedia.org/wiki/Vladimir_Propp Propp generators are things like: http://www.fdi.ucm.es/profesor/fpeinado/projects/kiids/apps/protopropp/ Clicking through to their later Bard system shows examples at the bottom, and that whole KIIDS things is for interactive narrative and computational narratology, which are the academic terms for this sort of thing (I call my work in this area automated storytelling with post-hoc computational narratives, as my use and implementation aren't for interaction). Mark Finlayson's work out of MIT is a little more recent: http://www.mit.edu/~markaf/research.html Plugging any of that research into Google Scholar and looking at recent citations of those papers are a good way to catch up. The massively-multiplayer video game Star Wars Galaxies tried something along these lines with their Dynamic Points of Interest, but they weren't really well executed from a design and technical implementation perspective. They had a lot of potential, but Raph Koster describes their problems here: http://www.raphkoster.com/2010/04/30/dynamic-pois/ Outside of fairy tales, there are works like Plotto, which provide narrative guides to plot generation, and the monomyth-related works by Campbell, etc.: http://www.brainpickings.org/index.php/2012/01/06/plotto/ Plotto is actually in the public domain, and can be found in the Internet Archive here: https://archive.org/details/plottonewmethodo00cook And journalism is getting into it, too. A program at Northwestern worked out so well, taking sports stats and turning them into sports articles, they didn't publish much research at all and went right into a startup. The Wired article is here: http://www.wired.com/gadgetlab/2012/04/can-an-algorithm-write-a-better-news-story-than-a-human-reporter/all/1 The one paper I found by the Northwestern group cites one major paper from 1977 about "Tale-spin." You can look for citations from the Tale-spin article, and that brings up some interesting recent work from elsewhere: http://scholar.google.com/scholar?cites=8316499405683938909&as_sdt=5,44&sciodt=0,44&hl=en Finally, there's this failed Kickstarter: http://www.kickstarter.com/projects/storybricks/storybricks-the-mmorpg-storytelling-toolset Even more finally, I also found this PDF in a second set of notes: https://research.cc.gatech.edu/inc/content/sequential-recommendation-approach-interactive-personalized-story-generation |
Thanks, @vitorio! That looks helpful. |
(OK, I made a github account.) |
SC Chen's Simple HTML DOM Parser for PHP. |
I'm hanging out in #nanogenmo on FreeNode if anyone wants to join. We can toss ideas around on a casual basis there. |
For those who aren't super IRC-literate, or just don't want to install an irc client, you can go here, pick a username, and visit #nanogenmo from your web browser: |
The Bard project looks awesome. Thanks @vitorio! |
Some Python resources:
|
An article about generator of Recursive Fairy Tales in Haskell (in Russian): http://habrahabr.ru/post/136007/ Google Translate: http://translate.google.com/translate?hl=en&sl=ru&tl=en&u=http%3A%2F%2Fhabrahabr.ru%2Fpost%2F136007%2F |
Not strictly related, but there are several story-based/narrative-focused roleplaying games that could be used/formalised into a system for generating overall plot structures. I'm currently looking at Microscope, Fiasco and FATE Core as potential systems for having characters 'play' through a game and recording what they do and what actions they take to generate stories. |
Here's some of my Python code for generating sentences based on supplied text. None of the Twitter-related code has been tested with v1.1 of the Twitter API, but worked fine on v1.
|
The Dada Engine, which powers the infamous Postmodernism Generator, might come in handy. There's an online manual and a clone on GitHub. |
Not a resource, but a suggestion: when you complete a novel, change the title of your issue to "$NovelTitle by $Author", so that we can easily browse them. (Yeah, someone is now going to actually title their novel "$NovelTitle".) If I were an over-organizational nerd, I would suggest setting up appropriate issue tags ("In Progress", "Complete", "Stupid Ideas", etc). But I leave that up to whether Darius is an over-organizational nerd. |
I agree with you @erkyrath -- I'll try and prod people to do that when they're done. Issue tags... I might start labeling things myself! |
Okay, I opened a new Issue ( #42 ) for general discussion. This thread remains the place for technical resources; the other thread is open to everything else. |
Ficly ( http://ficly.com/stories and its predecessor Ficlets http://ficlets.ficly.com/ ) is a very-short-story writing community, where you have a 1024 character limit. There are lots of tiny stories on the site, but also, you can fork any story and write prequels and sequels to it. Some stories have multiple prequels and sequels, like an unintentional choose-your-own-adventure. All of the Ficly and Ficlets content is licensed CC-BY-SA. In late May 2013, I scraped all of Ficly and dumped 13,144 stories, all of which had at least one prequel or sequel, into a matching amount of JSON files (there should be no standalone 1k character stories). Each JSON file records the ID, URL and title of the story; the author's avatar, name and URL; the IDs and URLs of prequels and sequels; and the story content in Markdown. The scraper (in Python) is probably a little prickly, as it's mostly uncommented, but the .zip of 13k JSON files could be dumped straight into a JSON document store and worked with directly. Perhaps someone wants to generate 50k words of choose-your-own-adventure stories or something. |
I've done some basic gathering of info over a few sources to generate a bunch of sentence structures using parts-of-speech tagging while I've been researching. Other might find this useful, so you can find them here: https://github.com/darkliquid/NaNoGenMo/tree/master/data The data is basically one sentence to a line, each line containing a stream of space separated parts-of-speech tags. There are likely to be mistakes in the set as I've hacked this together without any real understanding of what it is I'm doing or what I yet hope to achieve from it, but have at it and good luck! |
To be clear, @darkliquid's output can be interpreted by looking at this list of part of speech tags. |
this might be inspiring for some folks http://en.wikipedia.org/wiki/Postmodern_literature#Common_themes_and_techniques |
It would be very difficult to use it in an automated way (and I realize it may be unpopular with some participants) but if you haven't heard of it, there's this site called TVTropes. It contains a vast array of, well, tropes (from fiction in general, mostly mass-media but not exclusively television,) pre-deconstructed for your convenience. For example, Applied Phlebotinum. |
Speaking of parts-of-speech tagging (cc @darkliquid), if you're literate in Objective-C Apple's NSLinguisticTagger API is fantastic. (http://nshipster.com/nslinguistictagger/) |
Wow, that is nice. Sadly it's of no use to me in linux world but that looks like a much richer source of data for the kinds of analysis I'm looking to do. On another note, I've started annotating the parts-of-speech tag definitions with example words and some extra rules for their use in sentences where applicable (which hopefully I can then use to scan my sentence structure list to bin structures that are grammatically incorrect). https://github.com/darkliquid/NaNoGenMo/blob/master/data/tag_types.txt |
WordNet can be coaxed into doing part of speech tagging (in addition to Tangentially, I have a resource to contribute. On Mon, Nov 4, 2013 at 11:18 AM, Andrew Montgomery-Hurrell <
|
@warnaars Philip M. Parker! I would love to see some of his novelistic output.... I'd really love to see some of his code. I've got some more links on him at http://www.xradiograph.com/WordSalad/AutomaticForThePeople |
"If the atoms have by chance formed so many sorts of figures, why did it never fall out that they made a house or a shoe? Why at the same rate should we not believe that an infinite number of Greek letters, strewed all over a certain place, might fall into the contexture of the Iliad?" |
For that matter, how about a Library of Babel generator? (Not mine) http://dicelog.com/babel |
Not open source, but still! The Fiction Idea Generator is interesting: http://figapps.net/fig.html It's free this month (iTunes): https://itunes.apple.com/app/fiction-idea-generator-ef/id507536455?mt=8 |
Also you might be interested in the works of Jean-Pierre Balpe |
In one issue here somewhere I obliquely suggested generating a graphic novel -- that is to say, a comic book. While I would love to try, I definitely won't have the time to do this in what remains of November, but here are some resources I found while researching it: http://openclipart.org is a collection of SVG images, all in the public domain. It can also render them as PNGs for you, at the scale you choose. It has a JSON API: http://openclipart.org/developers If you wanted to use that JSON API on your own web page (perhaps to display these images on an HTML5 canvas element) you could use this generic JSONP proxy to make a mockery of the same-origin policy: http://jsonp.jit.su/ Here is a library of onomatopoeic sound-effects: http://www.writtensound.com/index.php Not sure how easy it would be to scrape, but probably wouldn't be hard to pick a random item from a desired category, like: http://www.writtensound.com/index.php?term=movement Here is a list of catchphrases: https://en.wikipedia.org/wiki/List_of_catchphrases And, just for that extra dadaist touch & in no way limited to graphic novels, here is a list of various abuses of the statistical meaning of p-value, collected from various academic papers: http://mchankins.wordpress.com/2013/04/21/still-not-significant-2/ What I imagine the result of using these resources to be something like:
|
@catseye check out blotcomics and the graphic novel harsh noise. I can't shake the feeling that the end result of your automation, however, will end up looking like ELER. |
If we're going graphical I should probably mention the billion-year archives of the webcomic mezzacotta: http://www.mezzacotta.net/ |
You can take a look at the text of my Automated Lovecraft project here: https://github.com/bredfern/automated-lovecraft/blob/master/automated_lovecraft.md |
The interesting thing I learned is that more firepower doesn't produce a better result there's a sweet spot between the size of the data set and the number of layers, so to train on all of lovecraft's text I got the best results using torch with just 4 layers. Since I was running off char nn most of the code I wrote and just bash script actually to run torch processes. I want to get deeper into this stuff so I can go further with it but its exciting to see the training result never having done this before. |
@bredfern Wrong repo! This is the 2013 one, here's this year's: dariusk/NaNoGenMo-2015#1 |
This is an open issue where you can comment and add resources that might come in handy for NaNoGenMo.
NOTE: at some point I will turn this into a more organized document, probably on the wiki for this repo.
The text was updated successfully, but these errors were encountered: