Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OntoNote结果的复现 #6

Open
YeDeming opened this issue Mar 10, 2022 · 4 comments
Open

OntoNote结果的复现 #6

YeDeming opened this issue Mar 10, 2022 · 4 comments

Comments

@YeDeming
Copy link

同学你好,

感谢你开源的代码和完整的注释。

我在conll03上复现了论文的结果,但在ontonote上遇到了一些困难。

我使用https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO上提供的数据,并修改

BARTNER/train.py

Lines 124 to 126 in d54d331

elif dataset_name == 'en-ontonotes':
paths = '../data/en-ontonotes/english'
data_bundle = pipe.process_from_file(paths)

        paths = {
                 'train': "/home/yedeming/data/ontonotes/onto.train.ner",
                 'dev': "/home/yedeming/data/ontonotes/onto.development.ner",
                 'test': "/home/yedeming/data/ontonotes/onto.test.ner",
                 }

运行结果如下:

Save cache to caches/data_facebook/bart-large_en-ontonotes_word.pt.
max_len_a:0.8, max_len:10
In total 3 datasets:
        train has 115812 instances.
        dev has 15680 instances.
        test has 12217 instances.
The number of tokens in tokenizer  50265
50283 50288
......
Best test performance(may not correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.64999999999999, 'rec': 88.52, 'pre': 86.79, 'em': 0.8727}} achieved at Epoch:16.
Best test performance(correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.36, 'rec': 88.28, 'pre': 86.47, 'em': 0.8717}} achieved at Epoch:26.

In Epoch:26/Step:68409, got best dev performance:
Seq2SeqSpanMetric: f=88.94, rec=89.79, pre=88.11, em=0.859

得到测试集F1=87.36,与正常数值相差较大,不知道出现了什么问题

期待您的回复!
叶德铭

@lollipopanddount
Copy link

同学你好,

感谢你开源的代码和完整的注释。

我在conll03上复现了论文的结果,但在ontonote上遇到了一些困难。

我使用https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO上提供的数据,并修改

BARTNER/train.py

Lines 124 to 126 in d54d331

elif dataset_name == 'en-ontonotes':
paths = '../data/en-ontonotes/english'
data_bundle = pipe.process_from_file(paths)

        paths = {
                 'train': "/home/yedeming/data/ontonotes/onto.train.ner",
                 'dev': "/home/yedeming/data/ontonotes/onto.development.ner",
                 'test': "/home/yedeming/data/ontonotes/onto.test.ner",
                 }

运行结果如下:

Save cache to caches/data_facebook/bart-large_en-ontonotes_word.pt.
max_len_a:0.8, max_len:10
In total 3 datasets:
        train has 115812 instances.
        dev has 15680 instances.
        test has 12217 instances.
The number of tokens in tokenizer  50265
50283 50288
......
Best test performance(may not correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.64999999999999, 'rec': 88.52, 'pre': 86.79, 'em': 0.8727}} achieved at Epoch:16.
Best test performance(correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.36, 'rec': 88.28, 'pre': 86.47, 'em': 0.8717}} achieved at Epoch:26.

In Epoch:26/Step:68409, got best dev performance:
Seq2SeqSpanMetric: f=88.94, rec=89.79, pre=88.11, em=0.859

得到测试集F1=87.36,与正常数值相差较大,不知道出现了什么问题

期待您的回复! 叶德铭

请问您的问题解决了吗~

@YeDeming
Copy link
Author

YeDeming commented Apr 7, 2022

暂时没解决

@yhcc
Copy link
Owner

yhcc commented Apr 9, 2022

不好意思,一直没留意到。这个应该是数据集不一样导致的,以下是我的数据规格。你应该是使用ontonotes的v12,但过去的论文一般是使用的v4(参考https://github.com/yhcc/OntoNotes-5.0-NER)
In total 3 datasets: dev has 8528 instances. test has 8262 instances. train has 59924 instances.

@wangyudong1997
Copy link

同学你好,

感谢你开源的代码和完整的注释。

我在conll03上复现了论文的结果,但在ontonote上遇到了一些困难。

我使用https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO上提供的数据,并修改

BARTNER/train.py

Lines 124 to 126 in d54d331

elif dataset_name == 'en-ontonotes':
paths = '../data/en-ontonotes/english'
data_bundle = pipe.process_from_file(paths)

        paths = {
                 'train': "/home/yedeming/data/ontonotes/onto.train.ner",
                 'dev': "/home/yedeming/data/ontonotes/onto.development.ner",
                 'test': "/home/yedeming/data/ontonotes/onto.test.ner",
                 }

运行结果如下:

Save cache to caches/data_facebook/bart-large_en-ontonotes_word.pt.
max_len_a:0.8, max_len:10
In total 3 datasets:
        train has 115812 instances.
        dev has 15680 instances.
        test has 12217 instances.
The number of tokens in tokenizer  50265
50283 50288
......
Best test performance(may not correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.64999999999999, 'rec': 88.52, 'pre': 86.79, 'em': 0.8727}} achieved at Epoch:16.
Best test performance(correspond to the best dev performance):{'Seq2SeqSpanMetric': {'f': 87.36, 'rec': 88.28, 'pre': 86.47, 'em': 0.8717}} achieved at Epoch:26.

In Epoch:26/Step:68409, got best dev performance:
Seq2SeqSpanMetric: f=88.94, rec=89.79, pre=88.11, em=0.859

得到测试集F1=87.36,与正常数值相差较大,不知道出现了什么问题

期待您的回复! 叶德铭

您好conll2003的数据集您哪里找的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants