Don't translate Scripture references in non-verse text #573

johnml1135 · 2024-12-12T20:13:58Z

There are often Scripture references at the beginning or end of non-verse text. Currently we are trying to translate these and making a mess of things. We should be able to do better. Here are some options:

Best proposal:

Strip out all references when making training/pretranslating data
Re-insert them when putting into USFM

Complications:

How do we know something is a scripture reference? Can we truly make a regex to capture all references? What about multiple languages?
- This one should work for english: https://stackoverflow.com/questions/66818564/regex-for-bible-references
Re-insertions would require reworking the USFM updater - which may add some complication (but not too much because there will be no ranges do deal with in non-verse scripture text).

ddaspit · 2025-01-03T17:09:16Z

We should just be more careful about what markers we translate and what markers we don't translate.

johnml1135 · 2025-01-06T17:16:33Z

Yes - we could try some funny things with regex's - but I think just the "translate these tags" and "don't translate these tags" is the best way to get to the 95%.

johnml1135 added this to Serval Dec 12, 2024

github-project-automation bot moved this to 🆕 New in Serval Dec 12, 2024

johnml1135 self-assigned this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't translate Scripture references in non-verse text #573

Don't translate Scripture references in non-verse text #573

johnml1135 commented Dec 12, 2024

ddaspit commented Jan 3, 2025

johnml1135 commented Jan 6, 2025

Don't translate Scripture references in non-verse text #573

Don't translate Scripture references in non-verse text #573

Comments

johnml1135 commented Dec 12, 2024

ddaspit commented Jan 3, 2025

johnml1135 commented Jan 6, 2025