Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Where I'm From" poem & novel generator #49

Open
marythought opened this issue Oct 29, 2015 · 15 comments
Open

"Where I'm From" poem & novel generator #49

marythought opened this issue Oct 29, 2015 · 15 comments

Comments

@marythought
Copy link

Some considerations:
__I'll be coding in Ruby JavaScript
__I'd like to try using the Goodreads API / Google Books API (or something similar) in some way
__Use text from Gutenberg or scrape from internet? I've yet to try scraping so that could be interesting (plus Gutenberg has a pretty strict anti-robot policy so texts would need to be downloaded)
__My husband's idea: find an appropriate sci-fi novel and replace all instances of "snake people" with "millennials" (I am not making this, but somebody should)

@dariusk
Copy link
Owner

dariusk commented Oct 29, 2015

he's such a card

@hugovk
Copy link
Collaborator

hugovk commented Oct 29, 2015

You can download CDs and DVDs of Project Gutenberg books here:
https://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project

I didn't know there is a Google Books API, I'll have to check it.

@marythought
Copy link
Author

DAY ONE

In my teaching years, this poem was everywhere:

Where I'm From
(George Ella Lyon)

I am from clothespins,
from Clorox and carbon-tetrachloride.
I am from the dirt under the back porch.
(Black, glistening,
it tasted like beets.)
I am from the forsythia bush
the Dutch elm
whose long-gone limbs I remember
as if they were my own.

I'm from fudge and eyeglasses,
from Imogene and Alafair.
I'm from the know-it-alls
and the pass-it-ons,
from Perk up! and Pipe down!
I'm from He restoreth my soul
with a cottonball lamb
and ten verses I can say myself.

I'm from Artemus and Billie's Branch,
fried corn and strong coffee.
From the finger my grandfather lost
to the auger,
the eye my father shut to keep his sight.

Under my bed was a dress box
spilling old pictures,
a sift of lost faces
to drift beneath my dreams.
I am from those moments--
snapped before I budded --
leaf-fall from the family tree.

For my first trick, I'll be working on a poem generator (I know I know, we're building a novel, stay tuned ok) to identify the parts of speech at work here and generate new "I'm From" poems that mimic parts of speech and important sound patterns. This should be good practice in working with natural language processors in order to generate poem-length memoir-esque bits of text -- which I can then use as the base for further novel expansions.

@marythought marythought changed the title I'm in! "Where I'm From" poem & novel generator Nov 2, 2015
@marythought
Copy link
Author

Not a bad start! I got RiTa loaded and working, so that's a huge step in the right direction. Next I think I need to find some word banks / corpora for specific parts of the poem (example: nature words). Rita's proper nouns are kind of cringe-y but I'll run it more times and see if I need to substitute something else there. FYI for anyone getting started with Rita, here's a list of the parts of speech abbreviations:

screen shot 2015-11-01 at 6 52 34 pm

@dariusk
Copy link
Owner

dariusk commented Nov 2, 2015 via email

@marythought
Copy link
Author

DAY TWO

I spent a few hours this evening working on linking up random choices from custom word lists. I forked Darius's corpora repo linked in the NaNoGenMo resources and also found some good word lists on the internet for what I am looking for. Fun fact: as a middle school English teacher, I loved word lists, or "word pools" we would sometimes call them. The walls of my classroom were plastered with posters of color words, verbs, adjectives, sensory words, etc. (until mandatory testing took over the entire Spring and they had to be covered up).

Sticking with the corpora format, the word lists are in JSON. JavaScript isn't my first programming language, so I had to google "how do I link a local JSON file to my javascript" and Y'ALL this should be a lot easier, doncha think? I did not want to involve html files or ajax requests (eeek) or jQuery (no!), at least not yet, so I cheated by just making my word list files .js files and then requiring them.

Like this: Fear the Repo

Shush, You, I'll DRY it up later.

I'm pretty happy with how it's shaping up, I love using RiTA to be able to control syllable length.

screen shot 2015-11-02 at 6 21 49 pm

As a reminder, the source poem is here.

I'm hoping to finish assembling the poem tomorrow, then I can figure out where I want to take it from there.

@hugovk hugovk added the preview label Nov 3, 2015
@marythought
Copy link
Author

DAY THIRD -- oh it is very late make that DAY FORTH

Just checking in with some sample output. I wasn't happy with the trees and bushes lists available to me, so I'm just inventing some instead. :D Done through second stanza, two to go!

Names list is 1,000 randomly generated names from list of random names -- if anyone wants to be added, I'm happy to add you! (ps repo is here)

I am from nightclubs,
from Mart and fundamentalism.
I am from the aisle under the common room.
(Navy blue, feminine,
it smelled like cranberry.)
I am from the tulip spruce
the yellow corkbark birch
whose diverse caps I remember
as if they were my own.

I'm from parsnip and statistics,
from Wendie and Marcelle,
I'm from the slam poets
and the smart-alecks,
from 'well done' and 'what'!
I'm from 'He was born with a gift of laughter'
with a corrosive porcupine
and four I can say myself.

@marythought
Copy link
Author

DAY FOUR (FOR REAL)

We have a completed poem!

Where I'm From

I am from birthdays,
from Big Mac and collectivization.
I am from the trot under the storm cellar.
(Cerise, thirdquarter,
it tasted like jackfruit.)
I am from the stinking cottonwood
the tan lilac yew
whose emerald hares I remember
as if they were my own.

I'm from celery and byproducts,
from Fransisca and Scottie.
I'm from the slam poets
and the mean girls,
from 'hallelujah' and 'just kidding'!
I'm from 'Call me Ishmael'
'It does not matter how slowly you go so long as you do not stop'
and four pamphlets I can say myself.

I'm from South Gate and Beaverton,
shredded banana squash and cooling smoothie.
From the neck my stepsister sewed
in a football game,
the thumb my mum trailed to keep their smell.

Above my tea cart was a aft box
holding soft frictions,
a sift of lost faces
to drift around my dreams.
I am from those moments--
brooded before I dabbled--
leaf-fall from the family tree.

For the next step I can go one of (at least) two ways:

  • Try to set this up via html with a "generate poem" button for sharing
  • Not do that^^, because JavaScript script requiring/sharing is ridiculous and I don't fully (or even partially, at this point) understand it. But I can get console output so I could just stop there and worry about the text and not the presentation.
  • But seriously, I'm looking for a job so I should get this out there in a presentable manner
  • ARGH JavaScript. ARRGH Binary trees!!! Did you see how I invented my own trees in the script above? Heh heh "stinking cottonwood." No binary trees allowed.
  • Ok, real talk, I'm going to explore methods of text generation based on a structure like, I dunno, Little House on the Prairie maybe, and including the keywords from the poems above. Then my "chapters" become the poem followed by a narrative, presumably by or about the speaker from the poem. Yes this seems doable!!

@MichaelPaulukonis
Copy link

I like this.

@marythought
Copy link
Author

DAY ... TEN?

Ok, after taking some time off to learn all the data structures and algorithms (or not learn, as the case may be), I needed a quick win so I came back to this and was able to publish a version of the poem generator!

Where I'm From

It's not very fancy, and probably breaks all the Node/Express rules (I am a very proficient Ruby on Rails developer seriously you should hire me), but it meets the prime objective of generating a new poem on demand.

I like this so much I am not sure how to translate it into a novel... but let's not call it "done" yet, because I'm going to sleep on that.

I found a couple open-source texts that work well for "memoir" style (Anne of Green Gables is the frontrunner), so I played with using RiTA to markov it up. My idea was to start with the base text, and then see if there's any way to prioritize the keywords generated in the 'Where I'm From' poem (so it would be a poem followed by short vignette featuring terms mentioned in that poem, and then more in that pattern).

screen shot 2015-11-10 at 8 01 09 pm

It's interesting, but it isn't very readable in paragraph form. So I think I need to consider another method for text generation. Which puts me back at the starting line. :)

Maybe I'll just write more poems...? #NaPoGenMo! I'm not 100% invested in the novel form, at least not for my first experiment this year, but I'm shooting to adhere to the 50,000 word count...

@cpressey
Copy link

+1 NaAnOfGreGaGenMo!

Ah, if only I wasn't already overcommitted...

(There really is a NaPoGenMo too btw, but it's held in April.)

@marythought
Copy link
Author

DAY ELEVENTHEN

Some quick text to share, I'm playing with the RiTA RiLexicon to find near replacement words for a classic poem (again with the poems!!! she just won't stop...). My goal here is to generate output that is clearly recognizable, but sounds bananas.

You might be curious, what is the difference between Rita's RiLexicon methods similarBySound(), similarByLetter(), similarBySoundAndLetter(), and rhymes()? So glad you asked... let's take a look at each of these at play! Each method returns an array of matches, so the computer is choosing a random match (or the original word) each time.

Similar by Sound

Compares the phonemes of the input word (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.

Two reeds divert in a yell good,
And soggy I could not trammel berth
And be one travels, pong I staid
And cooked dean one as far as I curd
To where it burnt in the undergrowth;

Similar by Letter

Compares the characters of the input string (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.

Two loads diverged in a fellow wood,
And sorry I could not travel booth
And be one traveler, song I stood
And cooked dawn one as far as I mould
To where it bet in the undergrowth;

Similar by Sound and Letter

First calls similarBySound(), then filters the result set by the algorithm used in similarByLetter();

Two rods diverge in a bellow good,
And sorry I could not travel bath
And be one traveled, pong I stood
And cooked doan one as far as I could
To where it vent in the underwrote;

Rhyme

Two words rhyme are considered as rhyming if their final stressed vowel and all following phonemes are identical

Two episodes diverged in a mellow likelihood,
And safari I could not travel both
And be one both, strong I sainthood
And overlooked clown one as far as I withstood
To where it dissent in the undergrowth;

Verdict

I hadn't tried by letter before this little exercise (thinking the sound would be more important) but I actually like that output the best, here. It does seem to be keeping the sound and rhythm of the word as well. Linguistical coincidence? Edit-distance magick?

Rhyme is clearly variating greatest from the source text -- this could be fun to play with for replacing end words (or generating new rhyme words) but I won't use it in this "replace nearly every word" exercise.

Just for fun: Alliteration

Finds alliterations by comparing the phonemes of the input string to those of each word in the lexicon

Two razor diverged in a abuse wings,
And scratch I could not fanatic both
And be one entitled, rebuilding I ceases
And consolidates deathbed one as far as I consul
To where it billionaires in the injuries

^^Yikes, that's dark, RiTA! I won't be using this but watch this:

Two [roads organizational] diverged in a [yellow impugning] [wood whittle],
And [sorry confessing] I could not [travel teapot] [both both]
And be one [traveler tumbler], [long inflict] I [stood autistic]
And [looked apologized] [down down] one as far as I [could clawed]
To where it [bent bittersweet] in the [undergrowth incinerators];

These are the word pairs it's claiming for alliteration. Some are truly weird. I feel like this would need some human editing if you were to use it in text generation, or else I might just throw out anything that doesn't start with the same letter as the base word (those all seem to work well!).

Signing off for now, I'm going to keep working on Bob Frost then see what else I can do in RiTA.

@marythought
Copy link
Author

Bonus: here's the whole poem w/ Similar By Sound replacements:

screen shot 2015-11-12 at 12 45 47 am

@marythought
Copy link
Author

DAY THE LAST

After debating what to do with my poor poem-that-is-not-a-novel I decided to go ahead and use Rita's markov functionality, but use it on the poem as source material. What results is an epic memoir poem that doesn't have much plot but generates some interesting language. Not bad for a first attempt!

My #NaNovGenMo2015 Submission

And here is the source code

How I made it:

  1. Generate new "Where I'm From Poem" over and over and save in a source text variable until 50,000 words.
  2. Feed that source text to RiTA markov and generate 5000 sentences in an array (I started with 1000 and that didn't seem like enough)
  3. Make new empty text variable for the output
  4. Until the output text reaches 50,000 words:
    • generate new poem and add it
    • generate between 0-40 random lines from the markoved sentences
    • generate between 0-20 sampled lines from a new generated poem
    • generate between 0-40 lines of markoved (again)
    • rinse and repeat

I was going to serve up the results through express and node just like with my poem generator, but as soon as I got close, I ran into an 'Maximum call stack size exceeded' error. So, eff that. Markdown it is! An interesting aspect of markdown is that it doesn't preserve all the line breaks. I played with this and ultimately decided that I liked the paragraphs/prose poem format for such a long text document, so I left it alone (for a formatted version, see my earlier attempt which does preserve line breaks). I did discover that RiTA will occasionally generate language I wouldn't want to use in an app, so I'm curious if anyone (Darius) has already made a filter for this.

This was fun! I still have Bob Frost to play with, and coincidentally a little project I'm working on called "Walk or Not" fits well with my Ritafied poem. I learned a bunch about natural language processing this month and feel much more comfortable working with RiTA and JavaScript.

Questions or comments? I will answer what I can... if I do it again, I'll be purposeful about chapter headings or something that can break up the 50,000 words to help the flow. At this point, though, I can tinker no more.

Thanks for the opportunity and see you next year!

@hugovk
Copy link
Collaborator

hugovk commented Nov 28, 2015

Have a completed label!


I did discover that RiTA will occasionally generate language I wouldn't want to use in an app, so I'm curious if anyone (Darius) has already made a filter for this.

These are mainly aimed at bots, but should still be generally useful.

Here's a JavaScript, Python, Ruby and PHP word filter:
https://github.com/dariusk/wordfilter

Here's a headline filter:
https://github.com/molly/CyberPrefixer/blob/master/offensive.py

Tips on transphobic joke detection:
http://tinysubversions.com/notes/transphobic-joke-detection/

Some lists of bad words:
https://github.com/shutterstock/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
https://gist.github.com/ryanlewis/a37739d710ccdb4b406d
http://www.bannedwordlist.com/lists/swearWords.txt

[Inactive] muted Twitter topics:
https://github.com/sjml/bot-innocence

Some general etiquette things:
http://tinysubversions.com/2013/03/basic-twitter-bot-etiquette/
http://www.crummy.com/2013/11/27/0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants