Skip to content

A simple text generation package released on PIP aimed to allow a user to easily explore results produced by State of the Art NLP models presented to us in the last few months.

License

Notifications You must be signed in to change notification settings

ashmeet13/FilmyKeeda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FilmyKeeda

A simple text generation package aimed to allow a user to easily explore results produced by State of the Art NLP models presented to us in the last few months.

The package currently supports two models -

  1. ULMFiT - From Fast.AI
  2. GPT2 - From OpenAI

Download -

pip install filmykeeda

Usage -

This repository currently holds 4 example.py files demonstrating the use of this package.

example0.py - This will demonstrate on how you can download the corpus

example1.py - This will demonstrate on how to generate script using ULMFiT + Default Tokenizer

example2.py - This will demonstrate on how to generate script using ULMFiT + SentencePiece Tokenizer

example3.py - This will demonstrate on how to generate script using GPT2

Data Used -

Both the models have been trained on a extensive corpus of scripts written by Aaron Sorkin. The works included in this project are -

  • The West Wing (Series)
  • The Social Network (Movie)
  • A Few Good Men (Movie)
  • The American President (Movie)

Model Details -

ULMFiT

ULMFiT Model has been trained using the FastAI library.

ULMFiT allows the user to train a model using a custom tokenizer and 
therefore this package includes two different ULMFiT models -

        1. Trained with default FastAI Tokenizer
        2. Trained with SentencePiece Tokenizer

GPT2

GPT2 Model has been trained using the gpt_2_simple library

GPT2 does not allow an external tokenizer to be used and hence
the model has been simply finetuned to our corpus.

TODO -

  • Package this project as PIP Library
  • Add evaluation scheme's for generated scripts such as ROUGE and Perplexity
  • Clean ULMFiT generated script
  • User to have the ability to train his model

About

A simple text generation package released on PIP aimed to allow a user to easily explore results produced by State of the Art NLP models presented to us in the last few months.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages