prosaic

                               o
       _   ,_    __   ,   __,      __
     |/ \_/  |  /  \_/ \_/  |  |  /
     |__/    |_/\__/  \/ \_/|_/|_/\___/
    /|
    \|

prosaic

being a prose scraper & cut-up poetry generator

by nathanielksmith

using nltk

and licensed under the GPL.

what is prosaic?

prosaic is a tool for cutting up large quantities of text and rearranging it to form poetic works.

prerequisites

postgresql 9.0+
python 3.5+
linux (it probably works on a mac, i donno)
you might need some -dev libraries and/or gcc to get nltk to compile

database setup

Prosaic requires a postgresql database. Once you've got postgresql installed, run the following to create a database prosaic can access (assumes you're on linux; refer to google to perform steps like this on osx/windows):

sudo su postgres
createuser prosaic -P
# at password prompt, type prosaic and hit enter
createdb prosaic -O prosaic

quick start

sudo pip install prosaic
prosaic source new pride_and_prejudice pandp.txt
prosaic source new hackers hackers_screenplay.txt
prosaic corpus new pride_and_hackers
prosaic corpus link pride_and_hackers pride_and_prejudice
prosaic corpus link pride_and_hackers hackers
prosaic poem new -cpride_and_hackers -thaiku

and so I warn you.
We will know where we have gone
ALL: HACK THE PLANET

See the full tutorial for more detailed instruction. There is also a cli reference.

use as a library

This is a little complex right now; I'm working on a simpler API.

from io import StringIO
from prosaic.cfg import DEFAULT_DB
from prosaic.models import Database, Source, Corpus, get_session
from prosaic.parsing import process_text
from prosaic.generate import poem_from_template

db = Database(**DEFAULT_DB)

source = Source(name='some_name')
process_text(db, source, StringIO('some very long string of text'))

session = get_session(db)
corpus = Corpus(name='sweet corpus', sources=[source])
session.add(corpus)
session.commit()

# poem_from_template returns raw line dictionaries from the database:
poem_lines = poem_from_template([{'syllables': 5}, {'syllables':7}, {'syllables':5}], 
                                db,
                                corpus.id)

# pull raw text out of each line dictionary and print it:
for line in poem_lines:
  print(line[0])

use on the web

there is an extremely alpha web wrapper (currently being re-written) at prosaic.party.

write a template

Templates are currently stored as json files (or passed from within code as python dictionaries) that represent an array of json objects, each one containing describing a line of poetry.

A template describes a "desired" poem. Prosaic uses the template to approximate a piece given what text it has in its database. Running prosaic repeatedly with the same template will almost always yield different results.

You can see available templates with prosaic template ls, edit them with prosaic template edit <template name>, and add your own with prosaic template new <template name>.

The rules available are:

syllables: integer number of syllables you'd like on a line
alliteration: true or false; whether you'd like to see alliteration on a line
keyword: string containing a word you want to see on a line
fuzzy: you want to see a line that happens near a source sentence that has this string keyword.
rhyme: define a rhyme scheme. For example, a couplet template would be: [{"rhyme":"A"}, {"rhyme":"A"}]
blank: if set to true, makes a blank line in the output. for making stanzas.

example template

[{"syllables": 10, "keyword": "death", "rhyme": "A"},
 {"syllables": 12, "fuzzy": "death", "rhyme": "B"},
 {"syllables": 10, "rhyme": "A"},
 {"syllables": 10, "rhyme": "B"},
 {"syllables": 8, "fuzzy": "death", "rhyme": "C"},
 {"syllables": 10, "rhyme": "C"}]

full CLI reference

Check out the CLI reference documentation.

how does prosaic work?

prosaic is two parts: a text parser and a poem writer. a human selects text files to feed to prosaic, who will chunk the text up into phrases and tag them with metadata. the human then links each of these parsed text files to a corpus.

once a corpus is prepared, a human then writes (or reuses) a poem template (in json) that describes a desired poetic structure (number of lines, rhyme scheme, topic) and provides it to prosaic, who then uses the weltanschauung algorithm to randomly approximate a poem according to the template.

my personal workflow is to build a highly thematic corpus (for example, thirty-one cyberpunk novels) and, for each poem, a custom template. I then run prosaic between five and twenty times, each time saving and discarding lines or whole stanzas. finally, I augment the piece with original lines and then clean up any grammar / pronoun agreement from what prosaic emitted. the end result is a human-computer collaborative work. you are, of course, welcome to use prosaic however you see fit.

developing

Patches are more than welcome if they come with tests. Tests should always be green in master; if not, please let me know! To run the tests:

# assuming you have pip install'd prosaic from source into an activated venv:
cd test
py.test

changelog

6.1.1
fix error handling; this was preventing sources from being made.
6.1.0
default to a system-wide nltk_data directory; won't download and install to ~ if found. the path is /usr/share/nltk_data. this is probably only useful on systems where prosaic is installed globally for multiple users (like on tilde.town).
not tied to a release, but the readme has database setup instructions now.
6.0.0
I guess I forgot to change-log 5.x, oops
process_text now takes a read()able thing instead of a string and a database config object as first param
parsing is faster but at the expense of less precision
slightly saner DB engine handling
4.0.0
Port to postgresql + sqlalchemy
Completely rewrite command line interface
Add a --verbose flag and muzzle the logging that used to happen unless it's present
Support a configuration file (~/.prosaic/prosaic.conf) for specifying database connections and default template
Rename some modules
Remove some vestigial features
3.5.4 - update nltk dependence so prosaic works on python 3.5
3.5.3 - mysterious release i don't know
3.5.2 - handle weird double escaping issues
3.5.1 - fix stupid typo
3.5.0 - prosaic now respects environment variables PROSAIC_DBNAME, PROSAIC_DBPORT and PROSAIC_DBHOST. These are used if not overriden from the command line. If neither environment variables nor CLI args are provided, static defaults are used (these are unchanged).
3.4.0 - flurry of improvements to text pre-processing which makes output much cleaner.
3.3.0 - blank rule; can now add blank lines to output for marking stanzas.
3.2.0 - alliteration support!
3.1.0 - can now install prosaic as a command line tool!! also docs!
3.0.0 - lateral port to python (sorry hy), but there are some breaking naming changes.
2.0.0 - shiny new CLI UI. run hy __init__.hy -h to see/explore the subcommands.
1.0.0 - it works

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
doc		doc
prosaic		prosaic
resources		resources
test		test
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prosaic

what is prosaic?

prerequisites

database setup

quick start

use as a library

use on the web

write a template

example template

full CLI reference

how does prosaic work?

developing

changelog

further reading

About

Releases

Packages

Languages

License

sachinelli/prosaic

Folders and files

Latest commit

History

Repository files navigation

prosaic

what is prosaic?

prerequisites

database setup

quick start

use as a library

use on the web

write a template

example template

full CLI reference

how does prosaic work?

developing

changelog

further reading

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages