-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cheating pseudo-entry: Vocabulary mashup #72
Comments
This is delightful. |
"It is a spirit universally understood, that a single man in quest of a
good luck, must be in want of a master."
|
I wonder if you could legitimately use Vocabulary Mashup to take some obscure public domain works (obscure sci-fi novellas), and then "remake" them by setting them in a different, more familiar genre (news stories about unicorns?). Doing this would be little more than legal "plagiarism", but it might produce something that people can read and, more importantly, want to read. (The reason they may want to read it though...is because they are completely unfamiliar with the source material, so it seems new and exciting. Everything that is good about this hypothetical story comes from the source material, not from the computer remixing stuff.) |
That's an interesting question, isn't it? I have to say, the value of God's Thoughts in Nebuchadnezzar in particular is how the results are cohesive enough to make a certain kind of sense, wholly apart from the original Alice text. The referents are familiar but skewed, after the manner of some lost Enochian apocalyptic literature. Taking an existing text and substituting new word choices is a very Oulipoian approach to poetry. (Similar to S+7/N+7, only taken to a computational extreme.) |
@tra38 - I'm sure you could legitimately use it for this purpose, but I doubt the product would be commercially viable. However, it might be a good first-draft approximation of where to go. UPDATE 2015.11.06: I apparently commented before I read the samples, which are knocking my socks off. If Philip M. Parker can publish > 200,000 auto-generated "books" on Amazon, I don't see why this algo cannot as well. |
What are the stopwords for? Did it have issues with contradictions? |
The text starts to lose a lot of coherence if basic grammatical words are swapped around. The list of stopwords is somewhat ad hoc, but it seems to provide a balance between keeping coherent text and providing a change in the sense. |
The poetry in Alice comes out really wonderfully:
|
@mewo2 Which word2vec data files did you use? |
I used the "standard" Google News model for most stuff. There's a "backup" model which was trained on about 100 Project Gutenberg books (including the source texts), which I use when there's a word which doesn't occur in the Google News dataset. That's usually either an unusual proper name, or something archaic. |
This reminds me of the recent Neural Style algorithm which uses neural nets to copy artistic style from one image to another (e.g. to make a photo look like a Picasso painting). https://github.com/jcjohnson/neural-style If anyone could figure out how to do the same thing with a character-level neural net... :) |
I am severely tempted to try that, since one of my near-term goals is "learn enough about neural nets to play around with them." |
@mewo2 - pretend I've never used word2vec before (and hardly use Python). How would I generate the datasets? since I'm essentially asking to be stepped through the process, do you know of a good tutorial for this? (I've managed to get this all set up on windows, amazingly enough.) |
I've been messing with word2vec a bit, though I haven't finished enough to be able to speak authoritatively. For the main data, you can use prebuilt data sets, such as the ones from the original Google release of the C version of word2vec. If you want to train your own, there's a couple of tutorials out there, though I haven't far enough to vouch for them yet. |
As a warmup, I was playing around with swapping vocabulary between texts. The idea is to replace words in Text A with words from Text B, subject to the following constraints:
The code is available here, although you'll need the word2vec data files to run it. There are also two example texts:
This was mostly done in October, so it doesn't really count for NaNoGenMo purposes, but it may be of interest.
The text was updated successfully, but these errors were encountered: