Segmentation fault using build_vocab(..., update=True) for Doc2Vec #1019

danoneata · 2016-11-14T17:12:19Z

Hello!

I'm performing online learning for Doc2Vec, that is, I learn an initial model on a set of tagged documents and try to update the model on a new set of tagged documents. If the second set contains new tags (tags that were not present in the initial set of documents), then I usually get segmentation fault (this behavior is not deterministic, but it happens most of time).

Below you can find a toy example that reproduces the issue; and here is the output of that code. I'm using Python 3.4.3 and Gensim 0.13.3.

I've debugged with gdb and I've got the following output:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff9a4f8700 (LWP 29422)]
__pyx_f_6gensim_6models_13doc2vec_inner_fast_document_dm_hs (__pyx_v_learn_hidden=1, __pyx_v_size=300, __pyx_v_work=0x7fff80001480, __pyx_v_alpha=0.0250000004, __pyx_v_syn1=0x1693ce0, __pyx_v_neu1=0x7fff80001a00, __pyx_v_word_code_len=6,
    __pyx_v_word_code=<optimized out>, __pyx_v_word_point=0x13fe410) at ./gensim/models/doc2vec_inner.c:2078

I'm willing to help fixing this issue if someone can provide me some guidance. Thanks!

Sample code that reproduces the issue:

import logging

from gensim.models.doc2vec import (
    Doc2Vec,
    TaggedDocument,
)

logging.basicConfig(
    format='%(asctime)s : %(threadName)s : %(levelname)s : %(message)s',
    level=logging.DEBUG,
)


def to_str(d):
    return ", ".join(d.keys())


SENTS = [
    "anecdotal using a personal experience or an isolated example instead of a sound argument or compelling evidence",
    "plausible thinking that just because something is plausible means that it is true",
    "occam razor is used as a heuristic technique discovery tool to guide scientists in the development of theoretical models rather than as an arbiter between published models",
    "karl popper argues that a preference for simple theories need not appeal to practical or aesthetic considerations",
    "the successful prediction of a stock future price could yield significant profit",
]

SENTS = [s.split() for s in SENTS]


def main():
    sentences_1 = [
        TaggedDocument(SENTS[0], tags=['SENT_0']),
        TaggedDocument(SENTS[1], tags=['SENT_0']),
        TaggedDocument(SENTS[2], tags=['SENT_1']),
    ]

    sentences_2 = [
        TaggedDocument(SENTS[3], tags=['SENT_1']),
        TaggedDocument(SENTS[4], tags=['SENT_2']),
    ]

    model = Doc2Vec(min_count=1, workers=1)

    model.build_vocab(sentences_1)
    model.train(sentences_1)

    print("-- Base model")
    print("Vocabulary:", to_str(model.vocab))
    print("Tags:", to_str(model.docvecs.doctags))

    model.build_vocab(sentences_2, update=True)
    model.train(sentences_2)

    print("-- Updated model")
    print("Vocabulary:", to_str(model.vocab))
    print("Tags:", to_str(model.docvecs.doctags))


if __name__ == '__main__':
    main()

The text was updated successfully, but these errors were encountered:

tmylk · 2016-11-18T15:47:07Z

Vocab expansion for doc2vec is not supported yet so labelled this as a new feature.

korostelevm · 2017-03-21T18:48:43Z

I ran into this also.. Was taking a look at how updating vocabulary worked in the online for word2vec and tried to replicate the update for doc2vec's doctags.

It seems to work - as in I can train the model with a few examples and then load it, train it more and it will return new doctags and vocabulary in the similarity functions. When storing the updated model I do have to give it a different filename otherwise the segmentation fault still happens. But the weights look like they get updated to. Here are my edits to the original doc2vec.py

In the DocvecsArray class:

Added function to store new doctags from new training in a new property self.new_doctags = {}

  def note_newdoctag(self, key, document_no, document_length, model):
        
        if isinstance(key, int):
            self.max_rawint = max(self.max_rawint, key)
        else:
            if key in self.doctags:
                self.doctags[key] = self.doctags[key].repeat(document_length)
            else:
                self.doctags[key] = Doctag(len(self.offset2doctag), document_length, 1)
                self.new_doctags[key] = Doctag(len(self.offset2doctag), document_length, 1)
                self.offset2doctag.append(key)

        self.new_count = self.max_rawint + 1 + len(self.offset2doctag)

Also an update weights function:

    def update_weights(self, model):
        gained_tags = len(self.doctags) - len(self.new_doctags)
        gained_tags =  len(self.new_doctags)
        newsyn0 = empty((gained_tags, model.vector_size), dtype=REAL)
        
        # # randomize the remaining tags
        for i in xrange(len(self.new_doctags), len(self.doctags)):
            # construct deterministic seed from word AND seed argument
            newsyn0[i - len(self.doctag_syn0)] = model.seeded_vector(i + model.seed)
        self.doctag_syn0 = vstack([self.doctag_syn0, newsyn0])
        self.doctag_syn0_lockf = ones(len(self.doctags), dtype=REAL)  # zeros suppress learning

In the Doc2Vec class:
Then in the scan_vocab function of the Doc2Vec class, call the note_newdoctag function when build_vocab is called with update=True:

for document_no, document in enumerate(documents):
            ...
            if not update:
                for tag in document.tags:
                    self.docvecs.note_doctag(tag, document_no, document_length, self)
            else:
                for tag in document.tags:
                    self.docvecs.note_newdoctag(tag, document_no, document_length, self)
             ...

When finalize_vocab is called in the super class it doesnt run my new update weights in DocvecsArray so I dropped finalize_vocab into Doc2Vec and added

self.docvecs.update_weights(self)

at the end of it.

Here is a link to the full file: https://gist.github.com/korostelevm/d48c80f296516deef045e5aa5dca1518
I just import import doc2vec_online as doc2vec instead of from gensim.models import doc2vec

Disclaimer: I may not know what i'm doing at all, which is why im posting here for someone to hopefully verify

gojomo · 2017-03-21T22:37:47Z

As @tmylk notes, the existing vocab-expansion feature (build_vocab(..., update=True)) wasn't yet designed/tested for Doc2Vec use – so it might work (because of the significant code overlap), or fail in either subtle or extreme ways (lke a SegFault)... it's an unknown.

The times that it's not SegFaulting, there may still be silent corruption – just no memory accesses so bad that they trigger the fault.

Perhaps something in the Doc2Vec paths is still using lengths/references to data that wasn't refreshed by the build_vocab(..., update=True) call?

korostelevm · 2017-03-22T11:55:12Z

Thats what it seemed like to me, I forced into the slow mode to debug it -
At the top of doc2vec.py:

try:
    from gensim.models.doc2vec_inner import train_document_dbow, train_document_dm, train_document_dm_concat
    from gensim.models.word2vec_inner import FAST_VERSION  # blas-adaptation shared from word2vec
    logger.debug('Fast version of {0} is being used'.format(__name__))
    print asdf
# except ImportError:
except Exception:

Then replaced the train function from word2vec and changed if FAST_VERSION < 0: to always run the python threading.

After this instead of getting a segmentation fault I get this in the traceback:

  File "/Users/mike/Dropbox/lsp/recommender/doc2vec_original.py", line 771, in worker_loop
    tally, raw_tally = self._do_train_job(sentences, alpha, (work, neu1))
  File "/Users/mike/Dropbox/lsp/recommender/doc2vec_original.py", line 912, in _do_train_job
    doctag_vectors=doctag_vectors, doctag_locks=doctag_locks)
  File "/Users/mike/Dropbox/lsp/recommender/doc2vec_original.py", line 115, in train_document_dbow
    context_locks=doctag_locks)
  File "/usr/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 269, in train_sg_pair
    l1 = context_vectors[context_index]  # input word (NN input/projection layer)
IndexError: index 10 is out of bounds for axis 0 with size 3

Which I think was trying to tell me the index 10 of my doctags is more than the 3 I had in there in the first round of training. So I did the stuff I mentioned above and it seemed to fix the issue. Put back the fast mode flags and it still works.

ArkadiyD · 2017-03-23T23:36:32Z

I used ddd to debug Cython code and it seemed that the segmentation fault appears at line 123 of doc2vec_inner.pyx: g = (1 - word_code[b] - f) * alpha. Then it turned out that the mistake comes from lines:

if hs:
    codelens[i] = <int>len(predict_word.code)
    codes[i] = <np.uint8_t *>np.PyArray_DATA(predict_word.code)
    points[i] = <np.uint32_t *>np.PyArray_DATA(predict_word.point)

With parameter hs of model set to 0 there are no mistakes (both Python 2. and 3., proved with ddd). So, proposed hotfix is to turn off hs mode when model is upgraded.

tmylk · 2017-03-24T00:30:54Z

An appropriate hotfix would be to disable vocabulary expansion for doc2vec models, but a proper fix would be better

gojomo · 2017-03-24T01:29:07Z

Yes, and the proper fix will require figuring out why the model, post-vocab-update, is using some older or incorrect arrays or sizes, and thus making an improper/illegal memory access.

tmylk · 2017-05-02T22:50:50Z

Current status: only works for hs=0.
Hotfix needed: disable for hs > 0.

wjgan7 · 2017-06-23T23:20:49Z

Looks like I'm still getting a segfault when hs=0. (Based on the doc2vec.py:590, it looks like that is the default, though the docs say it's 1.)

def get_doc2vec():
    return Doc2Vec(size=200,
        iter=1,
        min_count=30,
        workers=multiprocessing.cpu_count(),
        dm=0)

def build_doc2vec(sentences,model=None,total_examples=None,i=0):
    tagged_documents = [TaggedDocument(d,[i]) for d,i in zip(sentences,range(i,i+len(sentences)))]
    if not model:
        model = get_doc2vec()
        model.build_vocab(tagged_documents)
    else:
        model.build_vocab(tagged_documents,update=True)
    model.train(tagged_documents,total_examples=model.corpus_count,epochs=model.iter)
    return (model,i+len(sentences))

Apologies if my code is unclear, but essentially I'm doing the same thing as others above. Any help would be much appreciated.

On a side note, I'm sure I'm using total_examples wrong, but when I put in the real total_examples count across all training calls, it says something like the expected count doesn't match the count for sentences on my current call.

rajivgrover009 · 2017-07-25T22:33:18Z

Is it useful to call trian() function repeatedly on a Doc2Vec model without adding new vocabulary? Will the model get better for new data?

gojomo · 2017-07-26T01:54:16Z

@rajivgrover009 Maybe. Whether it helps or hurts is probably dependent on your dataset, choice of parameters, and the relative contrast between your new texts and the earlier texts. The best-grounded course would be to mix new texts with old to make a new all-inclusive corpus, and continue training with that.

gojomo · 2017-09-08T17:28:11Z

There's another report from @mullenba in #1578, which includes a minimal triggering case.

mino98 · 2018-01-23T18:19:31Z

I'm trying to look into this. Here is a status update...

Previously, @tmylk reported that doc2vec's document expansions works as long as hs=0. This isn't correct: it crashes if either negative != 0 (default: 5) or hs != 0 (default: 0). In other words, it is useless for all practical purposes.

To debug and iterate quickly, I used this workflow:

change doc2vec_inner.c into doc2vec_inner.pyx at this line of the setup script, so that Cynthonize is invoked automatically every time there's a change in the pyx file.
build with CFLAGS='-Wall -O0 -g' python setup.py build then install.
gdb and cause the crash using using the minimal triggering case in Doc2Vec Segmentation Fault Windows and Linux #1578

The coredump points at this line, apparently the index is out of the bounds of EXP_TABLE, which causes segfault.

The equivalent piece of code for word2vec is here. I've read that vocab expansion is supposed to work for word2vec, so I was planning to use that as a guide to check the differences.

Anyone wants to join me in this debugging adventure? 😄

ps: by the way, I tried to deliberately run the "slow" pure-python implementation of doc2vec to see if vocab expansion works. Same problem, it crashes here because doctag_vectors is apparently not expanded correctly and doctag_indexes goes out of bounds.

gojomo · 2018-01-23T22:44:05Z

The pure-python path isn't actually core-dump 'crashing', is it? (I'd think it'd have to be a printed exception, instead.)

Note that segfault crashes are often caused by earlier memory-corruption, rather than the exact line where they're triggered.

mino98 · 2018-01-24T12:19:04Z

Note that segfault crashes are often caused by earlier memory-corruption, rather than the exact line where they're triggered.

Thanks, but in this case it seems that indeed the index is pointing outside of EXP_TABLE. I still have to trace it back, though.

The pure-python path isn't actually core-dump 'crashing', is it?

Yes, it's not coredumping. As I said, it goes out of bounds when it reaches the first new doctag (i.e., "animals" at line 29 of this minimal code) as follows:

Traceback (most recent call last):
  File "/x/y/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/x/y/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/x/y/site-packages/gensim-3.2.0-py3.6-linux-x86_64.egg/gensim/models/word2vec.py", line 992, in worker_loop
    tally, raw_tally = self._do_train_job(sentences, alpha, (work, neu1))
  File "/x/y/site-packages/gensim-3.2.0-py3.6-linux-x86_64.egg/gensim/models/doc2vec.py", line 752, in _do_train_job
    doctag_vectors=doctag_vectors, doctag_locks=doctag_locks
  File "/x/y/site-packages/gensim-3.2.0-py3.6-linux-x86_64.egg/gensim/models/doc2vec.py", line 162, in train_document_dm
    l1 = np_sum(word_vectors[word2_indexes], axis=0) + np_sum(doctag_vectors[doctag_indexes], axis=0)
IndexError: index 1 is out of bounds for axis 0 with size 1

Please note that I had to add the line model.neg_labels = zeros(6) in order for the "slow" version to work at all.

mino98 · 2018-01-24T15:53:16Z

Pushed this fix for the "slow" version.

Regarding the cythonized version... I'd need more time (and help).

gojomo · 2018-01-24T19:33:28Z

Sure, but why would the index be out of the expected, functioning range? Often because of some (arbitrarily-)earlier memory-corruption.

menshikh-iv · 2018-02-21T07:50:48Z

@gojomo I received one more report with this problem, maybe raise an exception for this case (when update=True), because this happens often and often (until we repair the bug itself only).

khulasaandh · 2018-03-18T21:28:58Z

Hi any update on this issue.

I am able to train doc2vec model with new documents in a 32 bit python(for 64 bit python, it still crashes), but cannot query "model.docvecs.most_similar(["XXX"])" for newly added documents. shows index out for range.

An online approach for doc2vec will be very helpful.

menshikh-iv · 2018-03-19T14:24:05Z

@khulasaandh as I know, you can infer_vector for new document & calculate needed similarity values.

khulasaandh · 2018-03-19T15:04:17Z

Hi @menshikh-iv , thanks for the reply.

I am using the same example posted by @danoneata, but have added a few more documents/lines in sentences_1 and sentences_2. As you mentioned, I am computing the infer vector for new document as mentioned below.

infer_vector = model.infer_vector(token_list)
print(model.docvecs.most_similar(positive=[infer_vector]))

It returns me most similar documents but give gives nan values in place of similarity coefficient.
[('SENT_0', nan), ('SENT_1', nan), ('SENT_2', nan)]

Am i doing this wrong?

menshikh-iv · 2018-03-20T08:24:27Z

@khulasaandh looks really suspicious (your code is correct). Can you share data (traned model & token_list) for reproducing this error?

gojomo · 2018-03-20T21:27:29Z

@khulasaandh @menshikh-iv A separate non-segfault anomaly with infer_vector() would be best diagnosed on the discussion list, or a new issue dedicated to that specific problem.

khulasaandh · 2018-03-21T20:02:00Z

Hi @menshikh-iv and @gojomo even on 32bit python that I am using, sometimes the segmentation fault still occurs, but most of the time the code runs.

My python version -
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:38:48) [MSC v.1900 32 bit (Intel)] on win32

please find the code below to replicate the issue.

import logging
from gensim.models.doc2vec import (
    Doc2Vec,
    TaggedDocument,
)

logging.basicConfig(
    format='%(asctime)s : %(threadName)s : %(levelname)s : %(message)s',
    level=logging.DEBUG,
)


def to_str(d):
    return ", ".join(d.keys())


SENTS = [
    "anecdotal using a personal experience or an isolated example instead of a sound argument or compelling evidence",
    "plausible thinking that just because something is plausible means that it is true",
    "occam razor is used as a heuristic technique discovery tool to guide scientists in the development of theoretical models rather than as an arbiter between published models",
    "karl popper argues that a preference for simple theories need not appeal to practical or aesthetic considerations",
    "the successful prediction of a stock future price could yield significant profit",
]

SENTS = [s.split() for s in SENTS]


def main():
    sentences_1 = [
        TaggedDocument(SENTS[0], tags=['SENT_0']),
        TaggedDocument(SENTS[1], tags=['SENT_1']),
        TaggedDocument(SENTS[2], tags=['SENT_2']),
    ]
    sentences_2 = [
        TaggedDocument(SENTS[3], tags=['SENT_3']),
        TaggedDocument(SENTS[4], tags=['SENT_4']),
    ]

    model = Doc2Vec(min_count=1, workers=4)

    model.build_vocab(sentences_1)
    model.train(sentences_1, total_examples=model.corpus_count, epochs=model.iter)

    print("-- Base model")
    print("Vocabulary:", to_str(model.wv.vocab))
    print("Tags:", to_str(model.docvecs.doctags))

    model.build_vocab(sentences_2, update=True)
    model.train(sentences_2, total_examples=model.corpus_count, epochs=model.iter)

    print("-- Updated model")
    print("Vocabulary:", to_str(model.wv.vocab))
    print("Tags:", to_str(model.docvecs.doctags))

    token_list = "the successful prediction of a stock future price could yield significant profit".split()
    infer_vector = model.infer_vector(token_list)
    print(model.docvecs.most_similar(positive=[infer_vector]))

if __name__ == '__main__':
    main()

menshikh-iv · 2018-03-27T21:20:31Z

Big thanks @khulasaandh, reproduced with Python 2.7.14 (default, Sep 23 2017, 22:06:14) [GCC 7.2.0] on linux2

Segfault moment

In [6]: model.train(sentences_2, total_examples=model.corpus_count, epochs=model.iter)
/home/ivan/.virtualenvs/math/bin/ipython:1: DeprecationWarning: Call to deprecated `iter` (Attribute will be removed in 4.0.0, use self.epochs instead).
  #!/home/ivan/.virtualenvs/math/bin/python
2018-03-28 02:18:17,204 : MainThread : INFO : training model with 4 workers on 68 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
2018-03-28 02:18:17,207 : Thread-79 : DEBUG : job loop exiting, total 1 jobs
Segmentation fault (core dumped)

muleyprasad · 2018-04-27T16:25:37Z

Does anyone have a workaround until this gets fixed?

ConfusedMerlin · 2019-07-19T12:11:22Z

Hello,

I'm currently trying to get gensim to train up a couple of TaggedDocument objects, which originate from a non static source of input-data.
Or to put it differently: I need to add non predictable TaggedDocument objects to my doc2vec model on regular base. And - you might guessed it - ran into the same problem as you did.

So its gensim 3.8.0 on a Linux Debian Buster, 64bit.

The workaround offered by nsfinkelstein didn't work at all (beside, I do not know the size of my dictionary) which is sad... and probably caused by my poor Python experience (about.... two weeks?). But (!) I noticed something:

If you are about to add new content to you dictionary, it will go straight into segmentation fault if done in a way one would expect: put new TaggedDocument into the model by using model.build_vocab(documents=newTD, update=True) and then calling model.train(newTD)
But by implementing the workaround in a wrong way I noticed that adding TaggedDocuments that are kind of identical to whatever is already present in the vocabulary wont trigger the segmentation fault.

here... look at these:

td1 = TaggedDocument(words=['1','2','3','4','5','6','7','8','9','10'], tags=[]),
td2 = TaggedDocument(words=['11','12','13','14','15'], tags=[]),

As you can see, the second one is kind of logical extension of the first one. And as you might have observed, the dictionary will add one entry for every word in about the order its put inside.
So after td1 has been added to the vocab, asking for the vocab will yield
'1','2','3','4','5','6','7','8','9','10'
Now one would tend to add td2, but this this will cause the fragmentation fault as soon as we call model.train(td2)

But if you do it this way:

td1 = TaggedDocument(words=['1','2','3','4','5','6','7','8','9','10'], tags=[]),
td2 = TaggedDocument(words=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15'], tags=[]),

you can actually train after adding td2 to the vocab.

It will get a bit harder when you need to insert words to the vocab

td1= TaggedDocument(words='Im','very','confused','and','astonished','about','almost','all','and','everything'] tags=[])
td2= TaggedDocument(words=['I','like','cats','and','docs'] tags=[])

td1's vocabulary representation would omit the second 'and', so it would look like this
'Im','very','confused','and','astonished','about','almost','all','everything'
if you want to repeat the effect from the numbers I described, more work is needed. One need to extract the existing vocabulary and add all words that are NOT already inside the vocab in the order they appear, and offer all of this as new TaggedDocument:

td3 = TaggedDocument(words='Im','very','confused','and','astonished','about','almost','all','everything','I','like','cats','dogs'] tags=[])

Offering this "build_vocab(td3, update=True)" will allow you to train the existing model with td2

But... yes, there is always a but... while this does work with text (documents/words), as soon as you are trying to add tags to the whole thing, it will went back to segmentation fault itself to death. Not even the "offer a special TaggedDocument" trick can solve this :(

And this brought me into a dead end, because I really need those tags... Any chance someone might find a solution for this?

raccoon-science · 2021-10-28T13:02:43Z

Hello,
@korostelevm
I tried to run your code with gensim 4.1.2 and it failed.
Perhaps you could share the environment you used to run this code?

tmylk added feature Issue described a new feature wishlist Feature request labels Nov 18, 2016

ArkadiyD mentioned this issue Mar 24, 2017

Hotfix made #1019 #1239

Closed

gojomo mentioned this issue Apr 8, 2017

Segmentation Fault when calling build_vocab() #1266

Closed

gojomo changed the title ~~Segmentation fault when performing online learning for Doc2Vec~~ Segmentation fault using build_vocab(..., update=True) for Doc2Vec Apr 8, 2017

tmylk mentioned this issue Apr 12, 2017

Adapt the online w2v approach to doc2vec #1277

Closed

gojomo mentioned this issue Sep 8, 2017

Doc2Vec Segmentation Fault Windows and Linux #1578

Closed

menshikh-iv added bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills and removed wishlist Feature request labels Oct 2, 2017

mino98 mentioned this issue Jan 24, 2018

Fix pure python implementation of doc2vec (w/online-learning). Partial fix #1019 #1856

Closed

alexandry-augustin mentioned this issue Jun 15, 2018

Doc2vec fails to train when using build_vocab_from_freq() #2083

Open

gojomo mentioned this issue Dec 2, 2019

Errors when continuing training Doc2Vec model with previously-saved wv #2690

Open

gojomo mentioned this issue Sep 14, 2020

[MRG] fix save_facebook_model failure after update-vocab & other initialization streamlining #2944

Merged

gojomo mentioned this issue Oct 6, 2020

Restore/improve/streamline hooks for controlling/reusing build_vocab() steps #2975

Open

gojomo mentioned this issue Jun 4, 2021

Doc2Vec: when we have string tags, build_vocab with update removes previous index #3162

Closed

raccoon-science mentioned this issue Oct 29, 2021

Adding new tags in doctag_vectors in #3262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault using build_vocab(..., update=True) for Doc2Vec #1019

Segmentation fault using build_vocab(..., update=True) for Doc2Vec #1019

danoneata commented Nov 14, 2016

tmylk commented Nov 18, 2016

korostelevm commented Mar 21, 2017

gojomo commented Mar 21, 2017

korostelevm commented Mar 22, 2017

ArkadiyD commented Mar 23, 2017 •

edited

Loading

tmylk commented Mar 24, 2017

gojomo commented Mar 24, 2017

tmylk commented May 2, 2017

wjgan7 commented Jun 23, 2017 •

edited

Loading

rajivgrover009 commented Jul 25, 2017

gojomo commented Jul 26, 2017

gojomo commented Sep 8, 2017 •

edited

Loading

mino98 commented Jan 23, 2018 •

edited

Loading

gojomo commented Jan 23, 2018

mino98 commented Jan 24, 2018 •

edited

Loading

mino98 commented Jan 24, 2018

gojomo commented Jan 24, 2018

menshikh-iv commented Feb 21, 2018

khulasaandh commented Mar 18, 2018

menshikh-iv commented Mar 19, 2018

khulasaandh commented Mar 19, 2018

menshikh-iv commented Mar 20, 2018

gojomo commented Mar 20, 2018

khulasaandh commented Mar 21, 2018 •

edited by menshikh-iv

Loading

menshikh-iv commented Mar 27, 2018

muleyprasad commented Apr 27, 2018

ConfusedMerlin commented Jul 19, 2019

raccoon-science commented Oct 28, 2021

Segmentation fault using build_vocab(..., update=True) for Doc2Vec #1019

Segmentation fault using build_vocab(..., update=True) for Doc2Vec #1019

Comments

danoneata commented Nov 14, 2016

tmylk commented Nov 18, 2016

korostelevm commented Mar 21, 2017

gojomo commented Mar 21, 2017

korostelevm commented Mar 22, 2017

ArkadiyD commented Mar 23, 2017 • edited Loading

tmylk commented Mar 24, 2017

gojomo commented Mar 24, 2017

tmylk commented May 2, 2017

wjgan7 commented Jun 23, 2017 • edited Loading

rajivgrover009 commented Jul 25, 2017

gojomo commented Jul 26, 2017

gojomo commented Sep 8, 2017 • edited Loading

mino98 commented Jan 23, 2018 • edited Loading

gojomo commented Jan 23, 2018

mino98 commented Jan 24, 2018 • edited Loading

mino98 commented Jan 24, 2018

gojomo commented Jan 24, 2018

menshikh-iv commented Feb 21, 2018

khulasaandh commented Mar 18, 2018

menshikh-iv commented Mar 19, 2018

khulasaandh commented Mar 19, 2018

menshikh-iv commented Mar 20, 2018

gojomo commented Mar 20, 2018

khulasaandh commented Mar 21, 2018 • edited by menshikh-iv Loading

menshikh-iv commented Mar 27, 2018

muleyprasad commented Apr 27, 2018

ConfusedMerlin commented Jul 19, 2019

raccoon-science commented Oct 28, 2021

ArkadiyD commented Mar 23, 2017 •

edited

Loading

wjgan7 commented Jun 23, 2017 •

edited

Loading

gojomo commented Sep 8, 2017 •

edited

Loading

mino98 commented Jan 23, 2018 •

edited

Loading

mino98 commented Jan 24, 2018 •

edited

Loading

khulasaandh commented Mar 21, 2018 •

edited by menshikh-iv

Loading