Merge pull request #1 from RaRe-Technologies/develop

Fix links & spaces in Quick start guide (piskvorky#1500)
VorontsovIE · Jul 24, 2017 · 4be1b7b · 4be1b7b
2 parents 12f36b5 + da383bf
commit 4be1b7b
Show file tree

Hide file tree

Showing 2 changed files with 73 additions and 23 deletions.
diff --git a/docs/notebooks/Tensorboard_visualizations.ipynb b/docs/notebooks/Tensorboard_visualizations.ipynb
@@ -954,7 +954,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can refer to [this notebook](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/lda_training_tips.ipynb) also before training the LDA model. It contains tips and suggestions for pre-processing the text data, and how to train the LDA model to get good results."
+    "You can refer to [this notebook](lda_training_tips.ipynb) also before training the LDA model. It contains tips and suggestions for pre-processing the text data, and how to train the LDA model to get good results."
    ]
   },
   {
@@ -1274,7 +1274,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.4.3"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

diff --git a/gensim Quick Start.ipynb b/gensim Quick Start.ipynb
@@ -151,7 +151,7 @@
    ],
    "source": [
     "# Create a set of frequent words\n",
-    "stoplist = set('for a of the and to in'.split(' '))\n",
+    "stoplist = set('for a of the and to in'.split())\n",
     "# Lowercase each document, split it by white space and filter out stopwords\n",
     "texts = [[word for word in document.lower().split() if word not in stoplist]\n",
     "         for document in raw_corpus]\n",
@@ -178,13 +178,45 @@
   {
    "cell_type": "code",
    "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import logging"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "logging.basicConfig(level=logging.DEBUG)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:gensim.models.doc2vec:Slow version of gensim.models.doc2vec is being used\n",
+      "INFO:summa.preprocessing.cleaner:'pattern' package not found; tag filters are not available for English\n",
+      "INFO:gensim.corpora.dictionary:adding document #0 to Dictionary(0 unique tokens: [])\n",
+      "INFO:gensim.corpora.dictionary:built Dictionary(12 unique tokens: ['human', 'interface', 'computer', 'survey', 'user']...) from 9 documents (total 29 corpus positions)\n"
+     ]
+    },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Dictionary(12 unique tokens: [u'minors', u'graph', u'system', u'trees', u'eps']...)\n"
+      "Dictionary(12 unique tokens: ['human', 'interface', 'computer', 'survey', 'user']...)\n"
      ]
     }
    ],
@@ -215,14 +247,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "{u'minors': 11, u'graph': 10, u'system': 6, u'trees': 9, u'eps': 8, u'computer': 1, u'survey': 5, u'user': 7, u'human': 2, u'time': 4, u'interface': 0, u'response': 3}\n"
+      "{'human': 0, 'interface': 1, 'computer': 2, 'survey': 3, 'user': 4, 'system': 5, 'response': 6, 'time': 7, 'eps': 8, 'trees': 9, 'graph': 10, 'minors': 11}\n"
      ]
     }
    ],
@@ -239,16 +271,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[(1, 1), (2, 1)]"
+       "[(0, 1), (2, 1)]"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -277,24 +309,24 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
        "[[(0, 1), (1, 1), (2, 1)],\n",
-       " [(1, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)],\n",
-       " [(0, 1), (6, 1), (7, 1), (8, 1)],\n",
-       " [(2, 1), (6, 2), (8, 1)],\n",
-       " [(3, 1), (4, 1), (7, 1)],\n",
+       " [(2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)],\n",
+       " [(1, 1), (4, 1), (5, 1), (8, 1)],\n",
+       " [(0, 1), (5, 2), (8, 1)],\n",
+       " [(4, 1), (6, 1), (7, 1)],\n",
        " [(9, 1)],\n",
        " [(9, 1), (10, 1)],\n",
        " [(9, 1), (10, 1), (11, 1)],\n",
-       " [(5, 1), (10, 1), (11, 1)]]"
+       " [(3, 1), (10, 1), (11, 1)]]"
       ]
      },
-     "execution_count": 6,
+     "execution_count": 8,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -321,16 +353,34 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
    "metadata": {},
    "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:gensim.models.tfidfmodel:collecting document frequencies\n",
+      "INFO:gensim.models.tfidfmodel:PROGRESS: processing document #0\n",
+      "INFO:gensim.models.tfidfmodel:calculating IDF weights for 9 documents and 11 features (28 matrix non-zeros)\n"
+     ]
+    },
     {
      "data": {
       "text/plain": [
-       "[(6, 0.5898341626740045), (11, 0.8075244024440723)]"
+       "[(5, 0.5898341626740045), (11, 0.8075244024440723)]"
       ]
      },
-     "execution_count": 7,
+     "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -349,7 +399,7 @@
    "source": [
     "The `tfidf` model again returns a list of tuples, where the first entry is the token ID and the second entry is the tf-idf weighting. Note that the ID corresponding to \"system\" (which occurred 4 times in the original corpus) has been weighted lower than the ID corresponding to \"minors\" (which only occurred twice).\n",
     "\n",
-    "`gensim` offers a number of different models/transformations. See [Transformations and Topics](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Topics_and_Transformations.ipynb) for details."
+    "`gensim` offers a number of different models/transformations. See [Transformations and Topics](docs/notebooks/Topics_and_Transformations.ipynb) for details."
    ]
   },
   {
@@ -367,9 +417,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 2",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "python2"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -381,7 +431,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.1"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,