Hello, can this be used in English? textbsr doesn't work very well for English. #17

lhzwangwei · 2024-01-04T08:22:17Z

No description provided.

csxmli2016 · 2024-01-04T09:51:42Z

No description provided.

Yes. The released model is mainly trained using Chinese. If the application scenario is English, it is better to fine-tune it in English and Number only. Chinese contains several thousand characters, which is more difficult than English.

wojiaoyanmin · 2024-03-13T08:04:13Z

Thanks, But what if both Chinese and English are used in the application scenario?

csxmli2016 · 2024-03-13T12:10:48Z

Thanks, But what if both Chinese and English are used in the application scenario?

It is better to fine-tune the model by combining English characters and Chinese but with a relatively higher possibility for English characters.

wojiaoyanmin · 2024-03-14T02:22:12Z

The max character length of the model is 16, I think that's not enough for English characters, simplely change the max charater length may not work well, is there any solutions?

csxmli2016 · 2024-03-14T02:27:13Z

The max character length of the model is 16, I think that's not enough for English characters, simplely change the max charater length may not work well, is there any solutions?

The max character length of 16 is mainly designed for Chinese characters, which cover larger spaces than English letters. So if you want to use it in the English characters, it is better to fine-tune on longer character length.

wojiaoyanmin · 2024-03-19T03:08:04Z

The max character length of 16 is mainly designed for Chinese characters, which cover larger spaces than English letters. So if you want to use it in the English characters, it is better to fine-tune on longer character length.

Thankyou! Have you ever tried this pipeline? Did it work well?

csxmli2016 · 2024-03-19T03:11:58Z

The max character length of 16 is mainly designed for Chinese characters, which cover larger spaces than English letters. So if you want to use it in the English characters, it is better to fine-tune on longer character length.

Thankyou! Have you ever tried this pipeline? Did it work well?

You can see that only the character location and classification are related to the character length.
As for the SR process, we embed the generative structure prior into each LR character. This process has no relationship with the character length.
So I think it would work well.

wojiaoyanmin · 2024-03-19T03:18:23Z

The max character length of 16 is mainly designed for Chinese characters, which cover larger spaces than English letters. So if you want to use it in the English characters, it is better to fine-tune on longer character length.

Thankyou! Have you ever tried this pipeline? Did it work well?

You can see that only the character location and classification are related to the character length. As for the SR process, we embed the generative structure prior into each LR character. This process has no relationship with the character length. So I think it would work well.

Thank you very much!!

wojiaoyanmin · 2024-03-30T03:04:39Z

Hi.can you provide some suggestions about how to set these parameters when finetune the model with more English characters?

csxmli2016 · 2024-03-30T03:12:02Z

Hi.can you provide some suggestions about how to set these parameters when finetune the model with more English characters?

Hi, Just replace the corpus with English Characters. See the function of get_text() here. It would be really simple than Chinese Character.

wojiaoyanmin · 2024-03-30T03:28:17Z

sorry,I forget to upload the picture, I mean the parameters below. My main goal in fine-tuning the model was to make it perform well in the Chinese scene at the same time.

csxmli2016 · 2024-03-30T03:37:42Z

sorry,I forget to upload the picture, I mean the parameters below. My main goal in fine-tuning the model was to make it perform well in the Chinese scene at the same time.

The parameter looks fine. The SR branch has already performed well. You may pay more attention to the ocr and encoder branckh, using a slightly larger learning rate.

wojiaoyanmin · 2024-03-30T09:20:59Z

THx 发自我的iPhone

…

------------------ Original ------------------ From: csxmli2016 ***@***.***> Date: Sat,Mar 30,2024 11:38 AM To: csxmli2016/MARCONet ***@***.***> Cc: justababy ***@***.***>, Comment ***@***.***> Subject: Re: [csxmli2016/MARCONet] Hello, can this be used in English? textbsr doesn't work very well for English. (Issue #17) sorry,I forget to upload the picture, I mean the parameters below. My main goal in fine-tuning the model was to make it perform well in the Chinese scene at the same time. The parameter looks fine. The SR branch has already performed well. You may pay more attention to the ocr and encoder branckh, using a slightly larger learning rate. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hello, can this be used in English? textbsr doesn't work very well for English. #17

Hello, can this be used in English? textbsr doesn't work very well for English. #17

lhzwangwei commented Jan 4, 2024

csxmli2016 commented Jan 4, 2024

wojiaoyanmin commented Mar 13, 2024

csxmli2016 commented Mar 13, 2024

wojiaoyanmin commented Mar 14, 2024

csxmli2016 commented Mar 14, 2024

wojiaoyanmin commented Mar 19, 2024

csxmli2016 commented Mar 19, 2024

wojiaoyanmin commented Mar 19, 2024

wojiaoyanmin commented Mar 30, 2024

csxmli2016 commented Mar 30, 2024

wojiaoyanmin commented Mar 30, 2024

csxmli2016 commented Mar 30, 2024

wojiaoyanmin commented Mar 30, 2024 via email

Hello, can this be used in English? textbsr doesn't work very well for English. #17

Hello, can this be used in English? textbsr doesn't work very well for English. #17

Comments

lhzwangwei commented Jan 4, 2024

csxmli2016 commented Jan 4, 2024

wojiaoyanmin commented Mar 13, 2024

csxmli2016 commented Mar 13, 2024

wojiaoyanmin commented Mar 14, 2024

csxmli2016 commented Mar 14, 2024

wojiaoyanmin commented Mar 19, 2024

csxmli2016 commented Mar 19, 2024

wojiaoyanmin commented Mar 19, 2024

wojiaoyanmin commented Mar 30, 2024

csxmli2016 commented Mar 30, 2024

wojiaoyanmin commented Mar 30, 2024

csxmli2016 commented Mar 30, 2024

wojiaoyanmin commented Mar 30, 2024 via email