-
Notifications
You must be signed in to change notification settings - Fork 21
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler pipeline + writers' techniques = a "proper novel" ::blink:: #11
Comments
That pretty much sums up my thoughts/goals too. Is it really so unrealistic? |
Well, I guess we'll see, but yes I think it's incredibly unrealistic. |
Perfect! And definitely a worthy goal. |
I should maybe qualify those statements a bit. I do think the goal I stated is highly unrealistic, certainly with the techniques that I'm personally prepared to use. But the space of possible techniques is vast, so who knows? What I'm sort of getting at by choosing that goal is this: In 2013, I tried generating a "proper novel". Last year, I did a bunch of experiments closer to the so-called "conceptual writing" side of things. This year, I'm returning to the "proper novel", however quixotic any such attempt might be. Given that I've stated a goal that I admit is unrealistic, I suppose I do not expect myself to actually achieve it. But it will be interesting to see how I fail. At the same time, one need not have only one goal, so... After last NaNoGenMo, around January of this year, I started thinking a lot about how people write stories. I did a lot of research (if you can call reading article after article on TVTropes research) and I came to the conclusion that there are certainly some story-writing techniques that can be approximated with algorithms. So, one of my secondary goals is: To implement one or more story-writing techniques that human writers use. This is a much more realistic goal, I think. Heck, even The Swallows had a MacGuffin, but it wasn't really developed. I'd like to go a bit beyond that. I'll probably continue to expand on these thoughts in future posts to this issue. |
::blink blink:: [updated as I had not pasted what I wanted to have pasted] |
I can imagine a computer-generated book being easier to read than something like Naked Lunch or Finnegan's Wake. |
@YottaSecond
Mmmaybe... But I wager that if someone stops reading Finnegan's Wake after chapter 2 it's almost certainly not because their brain went all "I see what you did there." |
Hi, I'm going through and updating the titles on issues to make them more specific. Feel free to edit my edit if it's not to your liking. This is to make browsing issues a lot more pleasant. |
While there will certainly be similarities, my third goal is to not just end up re-writing The Swallows. I was looking through that code yesterday, seeing how much of it could be re-used. Very little, I think. My background is programming languages, so I have a hard time not seeing a story generator as a kind of compiler. A typical compiler is structured as a pipeline with a number of phases. The process for writing a story is much messier, but in a broad sense it too is a "pipeline", from idea to outline to draft to finished work. In fact a story-writing pipeline is in some ways the inverse of a compiler pipeline. A compiler takes a readable text and turns it into an incoherent blob. A writer takes an incoherent blob and turns it into a readable text. One of the first things a compiler often does is strip comments from the source code and throw them away, because they're not crucial to the result. One of the last things a writer might do is add commentary that's not crucial to the story. One of the last things a compiler does is optimize the generated code to make it shorter and more efficient. One of the first things a writer might do is complicate the plot to make it longer and more interesting. Somewhere in the middle of the compiler, it might check that the program does not contain certain errors, like assigning a string value to an integer variable. Somewhere in the middle of writing a story, a writer might check that the characters are not doing something that, in that scene, would not be possible. And so forth. The similarities really are rather remarkable. |
Of course, since most of these operations are generative, if a single pass On Wed, Oct 28, 2015 at 6:33 AM Chris Pressey notifications@github.com
|
Sure, except (continuing with the compiler analogy) most compilers aren't designed to take as input that which they generate as output. I certainly wasn't planning on building anything that could read a novel! |
(Excuse my "designblogging" but it helps give me something to do to stop myself from jumping the gun and starting too early! Am chomping at the bit, can you tell? Trying to keep each post reasonably short.) If the "novel compiler" doesn't take a written text as input, I suppose that raises the question of what it does take as input. One answer could be "nothing, it's just a generator, you just run it," which might be literally true, but it doesn't really answer the question. A more satisfying answer would be that it takes an outline of a plot, in some kind of data format, as input, even if that outline is hardcoded or randomly generated in the compiler itself. It then refines that plot by iteratively rewriting it, stepwise, into increasingly more detailed plots. Once it has a detailed enough plot, it rewrites that into a series of events, and in the end rewrites those into sentences. I suppose this is a top-down, plot-driven approach, as @TheCommieDuck described it. About these plots... (kind of thinking out loud here...) The "seed plot" that the compiler starts with could be as skeletal as The Hero's Journey. Or maybe even more basic, like, the "null story":
From there, you just keep inserting subplots into it. I'm still weighing ideas about exactly how to accomplish this process. I might write more about it later. |
One thing I was playing with in past projects was embedding metadata about the generation in the outputted text, and then performing a last cleanup phase before the actual final output. So there would be a bunch of bracketed tags scattered around marking things that could potentially be expanded. And the last step stripped the bracketed text out or reduced it to its default. I never fully implemented the idea, but it might be useful for your novel compiler. |
One solution to the problem of passes being able to read their own output would be to take a leaf from LLVM and have a single intermediate representation (e.g. a list of events), which most passes use for both input and output. You can munge this repeatedly until your novel is complex enough, then run a single final pass which converts to prose. |
@ikarth My understanding is that this (embedding structured data inside unstructured text) was one of the original use cases for XML, though it's probably under-used these days. I don't currently see foresee myself having a huge need for this, but if it becomes desirable, I'll keep that idea in mind, thanks. @mewo2 I'm currently thinking of the individual passes as purely internal rewriting operations on whatever data structures happen to be convenient at that point in the pipeline. But if the whole novel-model becomes too much to hold in memory, I suppose I will have to think about reading and writing intermediate representations, yeah. |
I note that many of the classic Narrative generators generated their world + stories, and had another independent system that "translated" them into more natural language. For example, TALE-SPIN:
It's a lot more complicated than this, but I can't find back an example/citation right now. Multiple sentences about the current world-state would be combined (JOE WAS IN THE CAVE. JOE KNEW HE WAS IN THE CAVE. THE CAVE WAS DARK. THE CAVE HAD AN EXIT. JOE KNEW THE CAVE WAS DARK. JOE WANTED TO BE IN THE LIGHT. JOE KNEW THE CAVE HAD AN EXIT. => Joe wanted to get out of the cave and into the light.) |
@MichaelPaulukonis the 2nd version is more pleasant to read, but the 1st is just that much closer to 50,000 words, isn't it? Participating, even reading all the issues for this year's edition, is clearly going to cut into what little time I already have. I'll keep these updates short and infrequent. I suppose I have a goal number 4, which is: don't use any libraries or corpuses or APIs except the bare minimum. Well, that's not a goal so much as predilection. I enjoy writing code. I don't enjoy learning and futzing with the idiosyncrazies of Yet Another Dependency. But this gives you an indication of what the final result will be like here. I'm not planing on releasing any previews or code until the end, or at least until the result reaches a certain minimum quality (but see I don't expect that to happen in November so, like, until the end.) |
DRAT! There goes my plan! As usual, I'm hoping to play with a bunch of different dependencies, and then see if anything sticks. Each to our own. |
Update: it generates a story. It is terrible. I do hope Goal 1 didn't get anyone's hopes up. I did call it "unrealistic" and "incredibly unrealistic" in almost immediate succession... Actually, suppose we reframe Goal 1 slightly, with gradation instead of as a yes-or-no thing. How many words of the average NaNoGenMo text is the average reader willing to read, on average, before they give up? By "read" I of course mean, try to make sense of the words, not just look at them. For texts that are complete word salad, the number is probably well below 100. (and then you start skimming forward, maybe, looking for interesting nonsense.) For others, maybe higher. A couple of hundred, at a guess. Hard to say, without going to the ridiculous length of actually conducting experiments on it. Anecdotes welcome, though! |
One of the reasons I did generative erotica is that people will, on On Wed, Nov 4, 2015 at 9:06 AM Chris Pressey notifications@github.com
|
Several pages, meaning, what, about 1500 words? |
Yeah, something like that. (I have a fairly high tolerance for this stuff, On Wed, Nov 4, 2015 at 9:15 AM Chris Pressey notifications@github.com
|
For non-simulations, my guess is that the attention span starts drifting at '3*templates', where templates are the number of words within the template in question. It's enough for the user to gets bored because he grasps the pattern. So if you get a template that is 500 words in length, then that would probably make your 1500 words.
It seems bots are excellent at generating text, but it's the humans who are trying to shift through the resulting nonsense to find actual meaning and worth. There has to be a mathematical formula that can be used to measure the 'fitness' of a text, allowing the bots to engage in filtering for how interesting* it is. This way, you can have the bots generate a bunch of words and then engage in automatic curation. *We can define 'interesting' perhaps by sentiment analysis or how well it matches one of Vonnegut's plot curves. Or maybe, pull in machine learning. You rate a passage the computer generates on a scale of 1-10, and with enough data, eventually the computer will find a pattern. |
Brings to mind genetic algorithms. Has anyone tried that approach? |
With regard to the 'mathematical formula', I suspect you could use On Wed, Nov 4, 2015 at 9:17 AM Tariq Ali notifications@github.com wrote:
|
Humans figuring out patterns seems to be part of the interestingness metric. It seems to work on multiple scales: Groking the central conceit in Aggressive Passive or Redwreath and Goldstar Have Traveled to Deathsgate takes a few minutes at most, which will give you the sense of the overall plot without reading it. (And then figuring out the puzzle of which question goes with which answer can take a lifetime.) Something like #72 or Alice's Adventures in the Whale takes a bit longer, because once you've grasped the pattern, the pleasure is in seeing the changes that were made in a familiar text. I suspect that simulations play by slightly different rules. Dwarf Fortress has certainly generated a lot of stories, though I'm not sure how many of them are interesting precisely because they were interactive. (Not to mention, most renditions are a retelling of the events, rather than a direct output.) I'm going to be watching this year's simulation results with interest. One pleasure that most generative works lack is a sense that an author intended them to happen this way. Not that you can't get a degree of intention-sense. I suspect that's why high-concept things like Aggressive Passive work so well: we can read the higher authorial intent, and that makes it easier to get closure and catharsis. |
And at the very least, the programming language will stop me from doing anything too stupid, right?
YAY SOFTWARE |
OK, I have written this thing. It's a bit long (maybe 10 or 15 minutes to read?), so I put it in a gist. https://gist.github.com/cpressey/6324fff6ef0dfdf69b96 I don't know how well I've succeeded, but I've tried to write it for a general intermediate-programmer audience, not assuming any knowledge of compilers or any advanced programming knowledge. You can also skip over the first and last sections without really missing anything. |
The story compiler approach seems amenable to incorporating some of what you might call low-level plotting techniques: writer's approaches to how things individual scenes get constructed. Like the bit about structuring the story from conflict here, or Jim Butcher's Scenes and Sequels technique. Or, on a larger level, things like this note card technique to structure the novel's chapters. |
Yes! Which is exactly what I'm doing with Goal 2 -- trying to automate some of those techniques. Progress in that area has only been modest so far though. Thanks for the links, it will be interesting to compare the methods they describe with what I've got so far. |
The peals of the half-time gong echoing in the distance, the month trundles along into its third week! Update: not so lucky with the time allocation this past weekend. Goal 1: semi-coherent story length is about 1500 words, with caveats. I also have a few ideas about how one could make the novel more readable (or rather, less unreadable) at 50K-word scale. They're gimmicky cheap ideas and I don't necessarily like them, but having stated Goal 1 the way I did, I guess I'm obligated to pursue them. Needing to choose my battles, I deem Goal 2 completed. I implemented the one writers' technique that I really wanted to implement, even if I didn't end up applying it in a particularly good way. |
OK well I could keep tweaking this and tweaking this and tweaking this and inching closer and closer to Goal 1 but honestly I think it has reached the point of diminishing returns and/or the minimum quality level I referred to earlier so - here it is!!! A Time for Destiny: The Illustrious Career of Serenity Starlight Warhammer O'James during her First Three Years in the Space FightersGenerator is here, has name in ALLCAPS in great tradition of names of story generators. Will mirror the code on GitHub in near future, time permitting. To reproduce this novel, run the generator with |
This is honestly pretty good. Like, I'm having a hard time distinguishing On Fri, Nov 20, 2015 at 10:39 AM Chris Pressey notifications@github.com
|
I read three full chapters before skimming. After the first two I was preparing to skip forward, but then Nebulon showed up, and things got interesting again! ...but after that I wondered what other plot events might happen, so I started scrolling... so I guess that's 2770 words read. Not bad. |
The space-opera setting tempts me to swap out some texture in characters.py On Fri, Nov 20, 2015 at 11:16 AM Greg Kennedy notifications@github.com
|
I think I know who Serenity O'James is.... As usual, your code leaves me gnashing my teeth, wishing mine were as robust. And handsome. You have such robust, handsome code. I love to run my keyboard over it.... |
Oh! Well! Glad it's well-received (increasingly creepy vibe I'm getting from these responses notwithstanding...)
Well, you're probably wrong, as the character is little more than a synthesis of a selection of these traits. I'll probably try to write something about that, and the implementation of writer's techniques, etc, over the weekend. |
Don't act like you've never heard of LGoP before... |
I'm looking forward to the write-up. The generator does a pretty good job with creating individual scenes that are at least nominally readable. Which is high praise in NaNoGenMo. The rather inane resolution of most of the plots is a weakness, of course. Though I suppose it's in-genre for your source material. And I suspect handling that better would be a major project in itself. |
May I request that this generator is under some kind of license (even the Unlicense), in case other people wishes to use or expand on the MARY SUE program? One thing that I did find interesting in your original writeup and in your source code is that the actual starting plot is pretty basic: characters get introduced, then the characters meet up and laugh. That's it. Literally everything else is just the generator adding additional sub-plots and complications to increase the word count. The system works of course, but it seems incredibly foreign to how a human writer would write plot (come up with a central core plot first and then flesh/pad it out). Maybe this has to do with fundamental differences between man and machine: humans are trying to convey some underlining message or meaning to other humans, while bots don't really care at all what they write. |
I'm not convinced that it's alien to the way human writers plot -- It's not how a professional author plans out plots. But, it's definitely On Fri, Nov 20, 2015 at 7:55 PM Tariq Ali notifications@github.com wrote:
|
Had not even enough time over the weekend to do that. Might not have enough time to write up some single coherent thing. Might work in chunks. To wit, I can start with this: What I most wanted to implement was a story generator that could do Chekov's Gun, i.e. foreshadowing an object which does not start out important in the story, but later on becomes important. Implementing Checkov's Gun is actually really easy:
This is basically what is described in this article that @ikarth shared earlier -- see the "XXXX ADD GUN EARLIER XXXX" part. Not difficult, but notably also not something you can do with a simulation alone. Because you can't just output events as they happen. You have to go back and examine them & edit them selectively. So you have to keep the events around in some kind of data structure. This is what led to the compiler-like architecture. In fact it is very clumsily applied in the novel, but I don't really care, the goal was to implement it. As a sort of bonus I also made it add a mention of the object at the end, as a sort of reminder (I don't know if there's a name for this.) |
This will probably happen eventually (eventually meaning, sometime after November is over). Because I feel strongly about open-source and all that, and over the past umpteen years, I've put nearly everything I've done under a permissive license. But honestly, I have to wonder if it even means anything anymore in a world where appropriation artists sell printouts of screenshots of strangers' web pages for tens of thousands of dollars. |
I would dare to suggest that the existence of appropriation artists depends On Mon, Nov 23, 2015 at 1:24 PM Chris Pressey notifications@github.com
|
So... when Georg Baselitz, in The Painter's Equipment (1987), claims to have composed Fidelio (1814) when he was 6 years old - you would say there is nothing subversive about that? Interesting. I think that that is much more subversive than the conceptual parlour tricks that so many so-called artists produce these days. Not that being subversive is the only way to get attention. Not that getting attention is the purpose of art. Unless that's all that's left of art now, I guess. |
Update: code is on GitHub now and in the public domain now. Have been too ill in the past week to write anything up further. Really, it mostly comes down to this: The point of NaNoGenMo is to generate a novel. Even if you take that literally (as is my wont), there's nothing saying it has to be a good novel. So, what's a rich source of bad writing? Fanfiction. And what's the worst kind of writing in fanfiction? Reports vary, but it's hard to go wrong with a Mary Sue. This relieves the writer of many burdens (such as having scenes in which the protagonist does not appear, of the protagonist having character flaws, of the other characters being more than 2-dimensional, etc) and provides much opportunity for padding (the salvation of every writer with a word quota), in this case in the form of poorly thought-through similes and waxing poetic about costume. (In fact, even having a Chekov's Gun in a Mary Sue story is a bit out of place - it's too sophisticated - you'd almost expect things to be pulled out of the air without any foreshadowing instead. Well, this is how it ended up, anyway.)
Oi! You exceeded the dosage guidelines! There was a warning label! I can't be held responsible... Yeah, OK, I'm just pointing out, the warning label is the gimmicky cheap idea I referred to earlier. The idea being, you could probably slog through any NaNoGenMo novel, if you did it in small enough pieces and gave yourself enough time between pieces. Also, there's a sense in which I should've spent the final week adding as many new subplot choices as I could, so maybe you'd see "4. Stranded in Space" after reading the 3rd chapter, and maybe you'd read even more words, and... yeah, I probably should've, in order to strictly pursue Goal 1. But I was so tired of reading the escapades of Serenity Starlight by the time I released this (I'm sure I've read 50,000 words of it myself), and being sick didn't exactly help matters either. I will be content at it being a proof-of-concept for this approach to that goal. Lastly, I just want to emphasize that it's not just an anthology of short stories, too. (Because that wouldn't be a proper novel, would it?) There is a story arc... sort of. Thoughts on plot... I might write about... at some future point... maybe. |
An anthology of short stories can also have a story arc too (see "The Martian Chronicles", which was a bunch of short stories that still had an overarching plot). If I understand correctly, there are two central plots in this novel:
The problem is that unless you really paid attention to the novel itself, looking at how events change within each individual chapter, you would not really notice the story arcs in question. So while there is an arc, the reader may very well not notice it. (This problem does seem solvable though. If you had added more subplots though to get user attention, maybe the reader who have been invested long enough into the story to detect the story trend.) EDIT: I also don't think the novel text itself is bad. I really get more of a theme of a "Saturday Morning Cartoon" show than that of an annoying Mary Sue flaunting her status. I notice Serenity never seems to ever get kidnapped herself though. |
The 10 Rules of Writing a proper Novel |
Novel: A Time for Destiny: The Illustrious Career of Serenity Starlight Warhammer O'James during her First Three Years in the Space Fighters
Code: on Bitbucket
Write-ups: Overview of a "Story Compiler"
Observation: It is very difficult for the average person to read a typical NaNoGenMo-generated novel in its entirety, from beginning to end.
It's because the brain begins to tire, right? It gets all "I see what you did there" and balks at facing yet more unpredictable stuff.
Goal: To write a generator that generates a novel that does not succumb to this effect.
You still might not be able to read the resulting novel to the end, but, if you stop reading after the first 2 chapters, it should be because the novel is just plain bad, not because its aura of generativeness is burning a hole in your attention span.
Downloading an existing novel from Project Gutenberg, or similarly trivial approaches, don't count.
This is, of course, a completely unrealistic goal. But one must have some goal, mustn't one?
The text was updated successfully, but these errors were encountered: