Skip to content

Conversation

@pmachapman
Copy link
Collaborator

@pmachapman pmachapman commented Aug 5, 2025

Resolves #665.

Requires sillsdev/machine/pull/316.


This change is Reviewable

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 41 of 41 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93)


src/Serval/src/Serval.Shared/Models/CorpusAnalysis.cs line 5 at r1 (raw file):

public record CorpusAnalysis
{
    public required string CorpusRef { get; init; }

This should be ParallelCorpusRef so that the naming is consistent with the PretranslateCorpus model. Do we need to continue to support the obsolete Corpus data model with this new feature? I would prefer not to.


src/Serval/src/Serval.Grpc/Protos/serval/translation/v1/platform.proto line 77 at r1 (raw file):

}

message UpdateCorpusAnalysisRequest {

I would rename this and the associated result class to include Parallel so that it is clear that this is referring to a parallel corpus and not a monolingual corpus.


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs line 8 at r1 (raw file):

    private const int Seed = 1234;

    public async Task AnalyseCorporaAsync(

Could this method just return the conventions rather than accept a delegate? We could just run this on a single corpus to simplify the return value.


src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs line 124 at r1 (raw file):

    {
        List<CorpusAnalysis> corpusAnalysis = [];
        await ParallelCorpusPreprocessingService.AnalyseCorporaAsync(

This is only needed for NMT jobs. I think this should be implemented in NmtPreprocessBuildJob.

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ddaspit and @Enkidu93)


src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs line 124 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This is only needed for NMT jobs. I think this should be implemented in NmtPreprocessBuildJob.

Done. To do this, I needed to change the base method to be virtual instead of `abstract.


src/Serval/src/Serval.Grpc/Protos/serval/translation/v1/platform.proto line 77 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would rename this and the associated result class to include Parallel so that it is clear that this is referring to a parallel corpus and not a monolingual corpus.

Done.


src/Serval/src/Serval.Shared/Models/CorpusAnalysis.cs line 5 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This should be ParallelCorpusRef so that the naming is consistent with the PretranslateCorpus model. Do we need to continue to support the obsolete Corpus data model with this new feature? I would prefer not to.

Done,


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs line 8 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Could this method just return the conventions rather than accept a delegate? We could just run this on a single corpus to simplify the return value.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 26 of 26 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)


src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 305 at r2 (raw file):

            u =>
            {
                if (request.ParallelCorpusAnalysis.Count > 0)

I'm not sure exactly what happens if no update ops are called. It might result in an unnecessary Mongo call. Just to be safe, it would make sense to wrap the entire UpdateAsync call in this check.


src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 312 at r2 (raw file):

                            .ParallelCorpusAnalysis.Select(a => new ParallelCorpusAnalysis
                            {
                                ParallelCorpusRef = a.ParallelCorpusId,

Is SF still using the obsolete Corpus model? If the ParallelCorpusId is actually an obsolete Corpus id, then setting ParallelCorpusRef to ParallelCorpusId would be incorrect. To avoid this, we could check that any of the ParallelCorpusId values exists in Pretranslate.ParallelCorpusRef in the UpdateAsync filter. This would be a way of ensuring that the build is using the newer ParallelCorpus model. If it is using the obsolete model, then it simply wouldn't set the analysis.

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 39 of 43 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @Enkidu93)


src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 305 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I'm not sure exactly what happens if no update ops are called. It might result in an unnecessary Mongo call. Just to be safe, it would make sense to wrap the entire UpdateAsync call in this check.

Done. Thank you!


src/Serval/src/Serval.Translation/Services/TranslationPlatformServiceV1.cs line 312 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Is SF still using the obsolete Corpus model? If the ParallelCorpusId is actually an obsolete Corpus id, then setting ParallelCorpusRef to ParallelCorpusId would be incorrect. To avoid this, we could check that any of the ParallelCorpusId values exists in Pretranslate.ParallelCorpusRef in the UpdateAsync filter. This would be a way of ensuring that the build is using the newer ParallelCorpus model. If it is using the obsolete model, then it simply wouldn't set the analysis.

Done. SF not using the obsolete Corpus model. I have updated the logic to check against the engine's list of parallel corpora.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 41 files at r1, 23 of 26 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 66.18705% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.27%. Comparing base (0f659c0) to head (414b1d9).

Files with missing lines Patch % Lines
...hared/Services/ServalTranslationPlatformService.cs 0.00% 21 Missing ⚠️
...lation/Controllers/TranslationEnginesController.cs 20.00% 8 Missing ⚠️
...TranslationUpdateParallelCorpusAnalysisConsumer.cs 0.00% 5 Missing ⚠️
...Translation/Contracts/ParallelCorpusAnalysisDto.cs 0.00% 4 Missing ⚠️
...red/Services/ServalWordAlignmentPlatformService.cs 0.00% 3 Missing ⚠️
src/Serval/src/Serval.Client/Client.g.cs 25.00% 3 Missing ⚠️
....Shared/Configuration/IMachineBuilderExtensions.cs 0.00% 1 Missing ⚠️
...al.Machine.Shared/Models/ParallelCorpusAnalysis.cs 75.00% 1 Missing ⚠️
...src/Serval.Shared/Models/ParallelCorpusAnalysis.cs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #718      +/-   ##
==========================================
- Coverage   66.27%   66.27%   -0.01%     
==========================================
  Files         364      368       +4     
  Lines       19450    19588     +138     
  Branches     2494     2505      +11     
==========================================
+ Hits        12891    12981      +90     
- Misses       5648     5695      +47     
- Partials      911      912       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pmachapman pmachapman merged commit 9a08cf3 into main Aug 13, 2025
2 of 3 checks passed
@pmachapman pmachapman deleted the quote_analysis branch August 13, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Corpora - store and retrieve cachedAnalysis

5 participants