Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in spaCy 2.0.5 / python 3.5 #1757

Closed
william-dowling opened this issue Dec 21, 2017 · 6 comments
Closed

Segmentation fault in spaCy 2.0.5 / python 3.5 #1757

william-dowling opened this issue Dec 21, 2017 · 6 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@william-dowling
Copy link

william-dowling commented Dec 21, 2017

spaCy 2.0.5 is throwing a core dump. No core dump from this code was seen when using spaCy 1.6, 1.7.5, 1.8.2. I have seen the core dump happen whether or not I am running the debugger.

Here is the complete code that causes the core dump:

#!/usr/bin/env python3

import os
import spacy

language = 'en'

print("Loading Language Model for '%s'..." % language)
nlp = spacy.load(language)
print("Language Model for '%s' loaded." % language)


doc = nlp('Inhalers can be used to treat persistent recurrent asthma')

c = doc[0]
p = None
if c.head != c and c.head != p:
    print('OK')

I see that if I replace "!=" with "is not" then the core dump does not happen.

Info about spaCy

  • Python version: 3.5.2
  • spaCy version: 2.0.5
  • Models: en, en_core_web_md
  • Platform: Darwin-15.6.0-x86_64-i386-64bit
@ines ines added the bug Bugs and behaviour differing from documentation label Jan 3, 2018
@honnibal
Copy link
Member

Thanks! Passing None into Cython can sometimes cause problems if not caught.

@fucking-signup
Copy link

@honnibal I don't think it has solved the issue, sadly.
I am using the same platform (Darwin-17.3.0-x86_64-i386-64bit, macOS basically) and newly added test against segmentation fault is causing a segmentation fault.

Also could be related:
I am getting a segmentation fault when training new models at random. It could happen at any time or not happen at all. The cause is always update method in language.py. And it doesn't matter if the code itself is from examples of how to train a model or from cli/train.py. However, it seems like frequency of segmentation fault increases when there are more training examples (500+).

@apierleoni
Copy link

same problem here,
spacy 2.0.5, mac os, python 3.6.4
trying to learn new NER entities from 5000 of examples using update on the model en_core_web_lg.
It fails randomly with seg fault usually after the second iteration (sometimes making it to the 8th).
Might be related to this (not so) old issue: #1335

@nikeqiang
Copy link

nikeqiang commented Feb 1, 2018

I'm experiencing a similar problem training the NER on anything but a very small set of examples. Training on anything over 1000 examples throws the following error. Is this a memory error?

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Info about spaCy
Python version: 3.6.3
spaCy version: 2.0.5
Models: en, en_core_sm
Platform: MacOS

I note that I got the same error when trying to train using each of (a) the Prodigy ner.batch-train recipe and (b) the regular spacy train_ner.py script.

Example Error messages when running prodigy:


line 1: 41665 Segmentation fault: 11 python -m prodigy "$@"


line 1: 49673 Segmentation fault: 11 python -m prodigy "$@"

@adidier17
Copy link

adidier17 commented Feb 9, 2018

I'm also experiencing the same issue when training the English NER model. When training on about 100 examples there were no problems, but with 500+ I also get the error: "Segmentation fault: 11"

Environment

  • Operating System: OS Sierra 10.12.6
  • Python Version Used: 3.6.4
  • spaCy Version Used: 2.0.7
  • Models: en version 2.0.0

The error occurs on nlp.update after 2 or 3 iterations.

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): #trains only the ner model
		optimizer = nlp.begin_training()
		for itn in range(n_iter):
			random.shuffle(train)
			losses = {}
			for text, annotations in train:
				nlp.update(
					[text], #batch of texts
					[annotations], #batch of annotations
					drop = dropout, #make it harder to memorize data
					sgd = optimizer, #update weights
					losses = losses)
			print(losses)

@lock
Copy link

lock bot commented May 7, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

7 participants