Creates a Term Document Tensor. This Term Document Tensor can then be decomposed and clustered.
- Python 3.6
Instructions should work for MacOS and Linux systems
First make sure you have installed Python 3.6 and Pip. Then install virtualenv
pip install virtualenv
Then create and activate your virtualenv
virtualenv venv
virtualenv -p python3 venv
source venv/bin/activate
Next install the project requirements
pip install -r requirements.txt
pip install git+https://github.com/MaxMcGlinnPoole/tf-decompose.git
Currently the program takes several arguments and options: vx.py [-h] [-d DIRECTORY_NAME] [CLUSTERING OPTIONS] (-b | -t) (-ngrams)
-d or --directory accepts the directory name for the files to be parsed
-b or -t to parse the files as text or binary files
-parafac or -tucker to use either tensor decomposition (more to be added later)
-o to generate an output (functionality in progress)
-ngrams ngrams of the term document tensor
-heatmap generate a heatmap of the cosine similarity matrix
-kmeans do kmeans clustering on one of the factor matrices
Sample usage
python3 vx.py -d myDirectory -heatmap -b
- Make a local clone:
git clone https://github.com/MaxMcGlinnPoole/TermDocumentTensor.git
Choose your own destination path. The directory this command is ran in where there folder will be located
- Switch to the directory:
cd TermDocumentTensor
- Create your new branch:
git checkout -b branch name
- Make necessary changes to this source code
- Add changes to git index by using
git add --all .
- Commit your changes:
git commit -am 'update description'
- Push to the branch:
git push
- Submit a new pull request
Meet our research team
- Special thanks to Dr. Charles Nicholas for his mentorship on this research project
- Dr. Tyler Simon, who has been especially helpful with his knowledge of the subject
- Tamara G. Kolda and Brett W. Bader for their paper on tensor decomposition
- NSF