question about your ChFi-nAnn dataset and bert model #6

pjfeng · 2019-12-02T13:52:41Z

I'm interested in how you created the ChFi-nAnn dataset. And I want to know more details about the way you did. And also, when I run bert model, it doesn't work too. Thank you.

shun-zheng · 2019-12-05T07:53:07Z

We create the ChFinAnn dataset by distant supervision, and processing details are included by the main paper (Section 4) and the supplementary material (Section A.1).

I will refactor the BERT part later.

pjfeng · 2019-12-05T14:32:03Z

Thank you. I want to create a annual report dataset of China A-share companies.

shun-zheng · 2019-12-06T01:53:00Z

Sounds cool!
Currently, Doc2EDAG assumes that the input is a sequence of sentences, so it is suitable for event-related documents, which are mainly expressed by the natural language.
As for the annual report, you may need to handle much semi-structured information, such as tables, figures, etc.
I would like to recommend Founder, which was the pioneering work for the extraction from richly formatted documents.

pjfeng · 2019-12-06T06:18:11Z

Thank you for your advice.

I work at the department of finance, and we have the Bloomberg, Wind, and Thomson Reuters. So we want to utilize text mining and nlp to process the financial news, and want to do some news event identification, risk identification and quantitative factor analysis based on the historical news data of individual stocks.

In this way, we can make some predictions at a certain point in the future, which can be added in the quantitative trading strategy. A product, like Kensho made in USA, and they make a good product and have acquired by Goldman Sachs.

Your work surprises me very well. If possible, we can take a talk.

xiaocuigit · 2019-12-06T15:02:30Z

Sounds cool!
Currently, Doc2EDAG assumes that the input is a sequence of sentences, so it is suitable for event-related documents, which are mainly expressed by the natural language.
As for the annual report, you may need to handle much semi-structured information, such as tables, figures, etc.
I would like to recommend Founder, which was the pioneering work for the extraction from richly formatted documents.

Hi~
Did you use the Founder to build the knowledge base in your paper?

shun-zheng · 2019-12-11T03:48:15Z

@pjfeng What you have mentioned is a very challenging topic, and many startups also worked on it in recent years.
Discussions are welcome.
Thanks for your interests in our work.

shun-zheng · 2019-12-11T03:51:14Z

@xiaocuigit Founder is about extracting inter-entity relations from richly formatted documents, while Doc2EDAG focuses on extracting various event records (each with multiple entities) from a text document.
At present, Doc2EDAG does not support richly formatted documents.
But we think that is a meaningful direction to explore.

pjfeng · 2019-12-18T11:39:53Z

@dolphin-zs Thank you. I have talked to my Professor who is on quantitative trading strategies using NLP. He is very interested in your research. Do you have time to talk about NLP in the finance field?

YuanEric88 · 2021-03-02T08:55:40Z

@dolphin-zs Could you show more details on how to use DS-based method to generate labeled data? I am currently working on event extraction for news data, but I am stuck in the lack of data source. I would like to implement method shown in the paper to generate the news domain dataset.

KyrieIrving24 · 2021-08-04T11:54:15Z

@dolphin-zs Could you show more details on how to use DS-based method to generate labeled data? I am currently working on event extraction for news data, but I am stuck in the lack of data source. I would like to implement method shown in the paper to generate the news domain dataset.

同学请问你们的数据集做出来了吗，最近我也想用DS做一个

BEILOP · 2021-11-17T10:18:50Z

@dolphin-zs你能展示更多关于如何使用基于 DS 的方法来生成标记数据的细节吗？我目前正在研究新闻数据的事件提取，但我陷入了缺乏数据源的困境。我想实现论文中显示的方法来生成新闻域数据集。

您现在效果如何了，我也在做新闻领域事件抽取，希望可以交流

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about your ChFi-nAnn dataset and bert model #6

question about your ChFi-nAnn dataset and bert model #6

pjfeng commented Dec 2, 2019

shun-zheng commented Dec 5, 2019

pjfeng commented Dec 5, 2019

shun-zheng commented Dec 6, 2019

pjfeng commented Dec 6, 2019

xiaocuigit commented Dec 6, 2019

shun-zheng commented Dec 11, 2019

shun-zheng commented Dec 11, 2019

pjfeng commented Dec 18, 2019

YuanEric88 commented Mar 2, 2021

KyrieIrving24 commented Aug 4, 2021

BEILOP commented Nov 17, 2021

question about your ChFi-nAnn dataset and bert model #6

question about your ChFi-nAnn dataset and bert model #6

Comments

pjfeng commented Dec 2, 2019

shun-zheng commented Dec 5, 2019

pjfeng commented Dec 5, 2019

shun-zheng commented Dec 6, 2019

pjfeng commented Dec 6, 2019

xiaocuigit commented Dec 6, 2019

shun-zheng commented Dec 11, 2019

shun-zheng commented Dec 11, 2019

pjfeng commented Dec 18, 2019

YuanEric88 commented Mar 2, 2021

KyrieIrving24 commented Aug 4, 2021

BEILOP commented Nov 17, 2021