Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock detected #328

Closed
yzj19870824 opened this issue Oct 11, 2019 · 3 comments · Fixed by #335
Closed

deadlock detected #328

yzj19870824 opened this issue Oct 11, 2019 · 3 comments · Fixed by #335
Assignees
Labels
bug Something isn't working

Comments

@yzj19870824
Copy link

yzj19870824 commented Oct 11, 2019

I'm using Fonduer to construct knowledge base from Chinese corpus,

Spacy doesn't support Chinese, So I train a Chinese model by myself , instead of downloading model from Spacy. And my Chinses model cannot split sentences, so the files in the directory which is assigned in TextDocProcess are already splitted by my script.

But when I run featurizer or labeler applys, it's too slow to bear,
Something wrong is recorded in logfile of PostgreSQL:

Not only applying Labeler, but also applying featurizer, deadlock is detected

ERROR: deadlock detected
DETAIL: Process 31694 waits for Sharelock on transaction 71392072; blocked by process 31697.
        Process 31697 waits for Sharelock on transaction 71392071; blocked by process 31694.
        Process 31694: INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_5', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes;INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_2', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes;INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_3', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes;
        Process 31697: INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_5', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes;INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_2', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes

HINT: see server log for query details
CONTEXT: while inserting index tuple(1,66) in relation "label_key"
STATEMENT:INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_5', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes;INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_2', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes;INSERT INTO Label_key(name, candidate_classes) VALUES('subAction_3', ARRAY['sub_action']) ON CONFLICT(name) DO UPDATE SET name=exclued.name, candidate_classes=exclued.candidate_classes;

I don't know why that happen.

@yzj19870824
Copy link
Author

I set parallelism one when I apply featurizer or labeler. And I can bear that speed.
But another problem happens:
The speed of training discriminative model, such as LR, or LSTM, is too slow.
the only reason is DataLoader which fetches the next step data costs almost 7s .
And when I wanna set num_workers 4 or other number, error occurs like this:

THCudaCheck FAIL THCCachingAllocator.cpp line=507 error=3: initialization error.
THCudaCheck FAIL THCCachingAllocator.cpp line=507 error=3: initialization error.
THCudaCheck FAIL THCCachingAllocator.cpp line=507 error=3: initialization error.
THCudaCheck FAIL THCCachingAllocator.cpp line=507 error=3: initialization error.
THCudaCheck FAIL THCCachingAllocator.cpp line=507 error=3: initialization error.
THCudaCheck FAIL THCCachingAllocator.cpp line=507 error=3: initialization error.

Runtime Error: Cuda runtime error(3): initialization error at THCCachingAllocator.cpp:507

the last sentence of error stack is:
X.cuda() classifier.py

there is the same question in pytorch 0.4 or 1.0
I don't know how to speed up the train

@HiromuHota
Copy link
Contributor

@senwu @lukehsiao assign this deadlock issue to me. I have an idea and am working on it.

@lukehsiao lukehsiao added the bug Something isn't working label Oct 17, 2019
@HiromuHota
Copy link
Contributor

@yzj19870824 THCCachingAllocator.cpp error is different from the dead lock. I'd suggest to create another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants