Skip to content

Commit

Permalink
Merge pull request #1 from RaRe-Technologies/develop
Browse files Browse the repository at this point in the history
Fix links & spaces in Quick start guide (piskvorky#1500)
  • Loading branch information
VorontsovIE authored Jul 24, 2017
2 parents 12f36b5 + da383bf commit 4be1b7b
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 23 deletions.
4 changes: 2 additions & 2 deletions docs/notebooks/Tensorboard_visualizations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -954,7 +954,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can refer to [this notebook](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/lda_training_tips.ipynb) also before training the LDA model. It contains tips and suggestions for pre-processing the text data, and how to train the LDA model to get good results."
"You can refer to [this notebook](lda_training_tips.ipynb) also before training the LDA model. It contains tips and suggestions for pre-processing the text data, and how to train the LDA model to get good results."
]
},
{
Expand Down Expand Up @@ -1274,7 +1274,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
"version": "3.6.2"
}
},
"nbformat": 4,
Expand Down
92 changes: 71 additions & 21 deletions gensim Quick Start.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@
],
"source": [
"# Create a set of frequent words\n",
"stoplist = set('for a of the and to in'.split(' '))\n",
"stoplist = set('for a of the and to in'.split())\n",
"# Lowercase each document, split it by white space and filter out stopwords\n",
"texts = [[word for word in document.lower().split() if word not in stoplist]\n",
" for document in raw_corpus]\n",
Expand All @@ -178,13 +178,45 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import logging"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"logging.basicConfig(level=logging.DEBUG)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:gensim.models.doc2vec:Slow version of gensim.models.doc2vec is being used\n",
"INFO:summa.preprocessing.cleaner:'pattern' package not found; tag filters are not available for English\n",
"INFO:gensim.corpora.dictionary:adding document #0 to Dictionary(0 unique tokens: [])\n",
"INFO:gensim.corpora.dictionary:built Dictionary(12 unique tokens: ['human', 'interface', 'computer', 'survey', 'user']...) from 9 documents (total 29 corpus positions)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dictionary(12 unique tokens: [u'minors', u'graph', u'system', u'trees', u'eps']...)\n"
"Dictionary(12 unique tokens: ['human', 'interface', 'computer', 'survey', 'user']...)\n"
]
}
],
Expand Down Expand Up @@ -215,14 +247,14 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{u'minors': 11, u'graph': 10, u'system': 6, u'trees': 9, u'eps': 8, u'computer': 1, u'survey': 5, u'user': 7, u'human': 2, u'time': 4, u'interface': 0, u'response': 3}\n"
"{'human': 0, 'interface': 1, 'computer': 2, 'survey': 3, 'user': 4, 'system': 5, 'response': 6, 'time': 7, 'eps': 8, 'trees': 9, 'graph': 10, 'minors': 11}\n"
]
}
],
Expand All @@ -239,16 +271,16 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(1, 1), (2, 1)]"
"[(0, 1), (2, 1)]"
]
},
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -277,24 +309,24 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[[(0, 1), (1, 1), (2, 1)],\n",
" [(1, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)],\n",
" [(0, 1), (6, 1), (7, 1), (8, 1)],\n",
" [(2, 1), (6, 2), (8, 1)],\n",
" [(3, 1), (4, 1), (7, 1)],\n",
" [(2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)],\n",
" [(1, 1), (4, 1), (5, 1), (8, 1)],\n",
" [(0, 1), (5, 2), (8, 1)],\n",
" [(4, 1), (6, 1), (7, 1)],\n",
" [(9, 1)],\n",
" [(9, 1), (10, 1)],\n",
" [(9, 1), (10, 1), (11, 1)],\n",
" [(5, 1), (10, 1), (11, 1)]]"
" [(3, 1), (10, 1), (11, 1)]]"
]
},
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -321,16 +353,34 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:gensim.models.tfidfmodel:collecting document frequencies\n",
"INFO:gensim.models.tfidfmodel:PROGRESS: processing document #0\n",
"INFO:gensim.models.tfidfmodel:calculating IDF weights for 9 documents and 11 features (28 matrix non-zeros)\n"
]
},
{
"data": {
"text/plain": [
"[(6, 0.5898341626740045), (11, 0.8075244024440723)]"
"[(5, 0.5898341626740045), (11, 0.8075244024440723)]"
]
},
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -349,7 +399,7 @@
"source": [
"The `tfidf` model again returns a list of tuples, where the first entry is the token ID and the second entry is the tf-idf weighting. Note that the ID corresponding to \"system\" (which occurred 4 times in the original corpus) has been weighted lower than the ID corresponding to \"minors\" (which only occurred twice).\n",
"\n",
"`gensim` offers a number of different models/transformations. See [Transformations and Topics](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Topics_and_Transformations.ipynb) for details."
"`gensim` offers a number of different models/transformations. See [Transformations and Topics](docs/notebooks/Topics_and_Transformations.ipynb) for details."
]
},
{
Expand All @@ -367,9 +417,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"display_name": "Python 3",
"language": "python",
"name": "python2"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -381,7 +431,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.6.2"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 4be1b7b

Please sign in to comment.