Skip to content

Conversation

@Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Aug 5, 2025

Fixes #671, #693, and #664.

Relies on sillsdev/machine#316.


This change is Reviewable

@Enkidu93 Enkidu93 requested a review from ddaspit August 5, 2025 21:10
@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Aug 5, 2025

I still need to add a couple blurbs to the swagger docs.

@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Aug 5, 2025

I apologize for what are inevitably going to be merge conflicts. @pmachapman @ddaspit - I'll plan on letting Peter get his other PR in and then I'll rebase. But it would still be helpful to review now to confirm that, otherwise, the approach is good. I don't like how complicated the PretranslationService code is getting, but I think it's unfortunately inevitable because of how the parsing needs to be sequenced.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)


src/Serval/src/Serval.Translation/Services/PretranslationService.cs line 321 at r1 (raw file):

            new(sourceQuoteConvention, targetQuoteConvention);

        UsfmParser.Parse(usfm, quotationMarkDenormalizationFirstPass);

Could this pass be run before running the updater in GetUsfmAsync?


src/Serval/src/Serval.Translation/Contracts/PretranslationQuotationMarkBehavior.cs line 3 at r1 (raw file):

namespace Serval.Translation.Contracts;

public enum PretranslationQuotationMarkBehavior

I am toying with the idea of naming this more generically. The enum could be called PretranslationNormalizationBehavior or just NormalizationBehavior (this enum will only be used on pretranslation endpoints so Pretranslation is probably redundant). The values could be Normalized and Denormalized.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 12 files reviewed, 2 unresolved discussions (waiting on @ddaspit)


src/Serval/src/Serval.Translation/Contracts/PretranslationQuotationMarkBehavior.cs line 3 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I am toying with the idea of naming this more generically. The enum could be called PretranslationNormalizationBehavior or just NormalizationBehavior (this enum will only be used on pretranslation endpoints so Pretranslation is probably redundant). The values could be Normalized and Denormalized.

Done


src/Serval/src/Serval.Translation/Services/PretranslationService.cs line 321 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Could this pass be run before running the updater in GetUsfmAsync?

I'm not sure what you mean. It needs to run on the USFM with the pretranslations inserted because that's the text that it is confirming can be denormalized, and since it's not implemented as an IUpdateBlockHandler, I can't pass it during the update itself. I don't like how complex this class is getting, but short-term, I'm not sure what else can be done. Any thoughts?

@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 72.22222% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.29%. Comparing base (9a08cf3) to head (7845bbe).

Files with missing lines Patch % Lines
...rval.Translation/Services/PretranslationService.cs 75.00% 16 Missing and 5 partials ⚠️
src/Serval/src/Serval.Client/Client.g.cs 0.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #721      +/-   ##
==========================================
+ Coverage   66.27%   66.29%   +0.02%     
==========================================
  Files         368      368              
  Lines       19588    19664      +76     
  Branches     2505     2511       +6     
==========================================
+ Hits        12981    13036      +55     
- Misses       5695     5713      +18     
- Partials      912      915       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


src/Serval/src/Serval.Translation/Services/PretranslationService.cs line 321 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I'm not sure what you mean. It needs to run on the USFM with the pretranslations inserted because that's the text that it is confirming can be denormalized, and since it's not implemented as an IUpdateBlockHandler, I can't pass it during the update itself. I don't like how complex this class is getting, but short-term, I'm not sure what else can be done. Any thoughts?

That answers my question. It would be nice to encapsulate the logic in DenormalizeQuotationMarks into a convenience function in Machine.py. Or we could implement some kind of parser pipeline class. We don't need to do that right now.


src/Serval/src/Serval.Translation/Services/PretranslationService.cs line 291 at r2 (raw file):

            remarks.AddRange(denormalizationRemarks);
        }
        var remarkUpdater = new UpdateUsfmParserHandler(remarks: remarks);

Would it be possible to get rid of this final parser pass? Maybe add the disclaimer and marker placement remark in the first updater pass and add the quotation denormalization remark in the denormalization pass.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)


src/Serval/src/Serval.Translation/Services/PretranslationService.cs line 321 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

That answers my question. It would be nice to encapsulate the logic in DenormalizeQuotationMarks into a convenience function in Machine.py. Or we could implement some kind of parser pipeline class. We don't need to do that right now.

I agree. OK, I can spin off an issue for later.


src/Serval/src/Serval.Translation/Services/PretranslationService.cs line 291 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Would it be possible to get rid of this final parser pass? Maybe add the disclaimer and marker placement remark in the first updater pass and add the quotation denormalization remark in the denormalization pass.

I had that originally. The issue is that it affects the order of the remarks. If we were to do that, the "This is an AI Draft" remark would not be first; rather, the quote denormalization remark would. I could pass them all in during quotation denormalization or else during the update, but 1) the logic gets a little uglier and 2) the DenormalizeQuotationMarks function would be doing double-duty. We could also change the remark-adding behavior to add after any existing remarks, but that would require changes to Machine. What do you think?

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@Enkidu93 Enkidu93 force-pushed the denormalize_quotation_marks branch from 086cea1 to 640732c Compare August 13, 2025 21:20
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@Enkidu93 Enkidu93 merged commit 5e002e9 into main Aug 13, 2025
3 checks passed
@Enkidu93 Enkidu93 deleted the denormalize_quotation_marks branch August 13, 2025 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add remarks per chapter for punctuation denormalization?

4 participants