Digital Analytics, Causal Knowledge Acquisition and Reasoning for Technical Language Processing
extensions = ['sphinx.ext.intersphinx',
"nbsphinx", # <- For Jupyter Notebook support
"sphinx.ext.napoleon", # <- For Google style docstrings
templates_path = ['_templates']
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
source_suffix = [".rst", ".md"]
autoapi_dirs = ['../src']
import sphinx_rtd_theme
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# -- NBSphinx options
# Do not execute the notebooks when building the docs
nbsphinx_execute = "never"
autodoc_inherit_docstrings = False
pip install sphinx sphinx_rtd_theme nbsphinx sphinx-copybutton sphinx-autoapi
conda install pandoc
cd docs
make html
cd _build/html
python3 -m http.server
open your brower to: http://localhost:8000
- Install dependency libraries
conda create -n dackar_libs python=3.11
conda activate dackar_libs
pip install spacy==3.5 textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy=1.26 scikit-learn pyspellchecker contextualSpellCheck pandas
- Download language model from spacy (can not use INL network)
python -m spacy download en_core_web_lg
python -m coreferee install en
- Install required nltk data for similarity analysis
python -m nltk.downloader all
- Download language model from spacy
Download en_core_web_lg-3.5.0-py3-none-any.whl from
python -m pip install ./en_core_web_lg-3.5.0-py3-none-any.whl
- Download coreferee model:
Download from
python -m pip install ./
- run script DACKAR/ to download nltk data:
or check installing_nltk_data_ on how to manually install nltk data. For this project, the users can also try the following steps:
cd ~
mkdir nltk_data
cd nltk_data
mkdir corpora
mkdir taggers
mkdir tokenizers
Dowload wordnet, averaged_perceptron_tagger, punkt
cp -r wordnet ~/nltk_data/corpora/
cp -r averaged_perceptron_tagger ~/nltk_data/taggers/
cp -r punkt ~/nltk_data/tokenizers
- Install dependency libraries
conda create -n nlp_libs python=3.9
conda activate nlp_libs
pip install spacy==3.1 textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy==1.26 scikit-learn==1.2.2 pyspellchecker
scikit-learn 1.2.2 is required for quantulum3
- You may need to install stemming for some of unit parsing
pip install stemming
- Windows machine have issue with pydantic (See explosion/spaCy#12659)
Installing typing_extensions<4.6
pip install typing_extensions==4.5.*
- Required libraries and nltk data for similarity analysis
conda install -c conda-forge pandas
python -m nltk.downloader all
- Required library for preprocessing
pip install contextualSpellCheck