Skip to content

Conversation

@Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented May 30, 2025

@ddaspit, when porting this, is speed a concern? I've tried to follow machine.py as closely as possible with few exceptions, but there's definitely room for optimization. Is this something we should think about now or just something we should revisit later on?


This change is Reviewable

@Enkidu93 Enkidu93 requested review from ddaspit and isaac091 May 30, 2025 17:42
@codecov-commenter
Copy link

codecov-commenter commented May 30, 2025

Codecov Report

Attention: Patch coverage is 95.73864% with 15 lines in your changes missing coverage. Please review.

Project coverage is 70.75%. Comparing base (7ee1679) to head (7d39c0b).

Files with missing lines Patch % Lines
...hine/Corpora/PlaceMarkersUsfmUpdateBlockHandler.cs 95.39% 6 Missing and 7 partials ⚠️
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs 95.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #306      +/-   ##
==========================================
+ Coverage   70.50%   70.75%   +0.25%     
==========================================
  Files         389      390       +1     
  Lines       32386    32721     +335     
  Branches     4545     4605      +60     
==========================================
+ Hits        22833    23153     +320     
- Misses       8503     8510       +7     
- Partials     1050     1058       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Enkidu93 Enkidu93 marked this pull request as ready for review June 2, 2025 18:24
@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Jun 2, 2025

Alright - this is ready for review 😁

Copy link
Collaborator

@isaac091 isaac091 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @ddaspit)

@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Jun 5, 2025

@ddaspit This is ready for review

@Enkidu93 Enkidu93 force-pushed the place_markers_usfm_update_block_handler branch from 8d53798 to 5495c05 Compare June 6, 2025 12:53
@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Jun 6, 2025

Fixes #303

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93)


src/SIL.Machine/Corpora/PlaceMarkersUsfmUpdateBlockHandler.cs line 61 at r4 (raw file):

            // Paragraph markers at the end of the block should stay there
            // Section headers should be ignored but re-inserted in the same position relative to other paragraph markers
            List<UsfmUpdateBlockElement> endElements = new List<UsfmUpdateBlockElement>();

You can use var for all of these variable declarations, since the type is explicit on the right hand side.


src/SIL.Machine/Corpora/PlaceMarkersUsfmUpdateBlockHandler.cs line 91 at r4 (raw file):

                        || (
                            element.Type == UsfmUpdateBlockElementType.Text
                            && element.Tokens[0].ToUsfm().Trim().Count() == 0

You should use Length instead of the LINQ Count function.


tests/SIL.Machine.Tests/Corpora/PlaceMarkersUsfmUpdateBlockHandlerTests.cs line 10 at r4 (raw file):

public class PlaceMarkersUsfmUpdateBlockHandlerTests
{
    private LatinWordTokenizer Tokenizer { get; set; }

This can probably just be a static read-only field.


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 535 at r4 (raw file):

            }
            List<UsfmToken> tokens = updateBlock.GetTokens();
            paraElems.Reverse();

It is a small thing, but I would just use the LINQ Reverse function in the foreach loop instead of the in-place Reverse method on List. It avoids an extra iteration of the list. You can call the LINQ reverse like this Enumerable.Reverse(paraElems).

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 What kind of optimizations are you talking about?

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing lots of Insert(index) and RemoveAt and IndexOf+Substring's as well as what looks like iterating through collections more than once. These make suspicious that we could improve the performance.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine/Corpora/PlaceMarkersUsfmUpdateBlockHandler.cs line 61 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You can use var for all of these variable declarations, since the type is explicit on the right hand side.

Done. I thought we were using the Type name = new() syntax and avoiding var?


src/SIL.Machine/Corpora/PlaceMarkersUsfmUpdateBlockHandler.cs line 91 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should use Length instead of the LINQ Count function.

Done.


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 535 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

It is a small thing, but I would just use the LINQ Reverse function in the foreach loop instead of the in-place Reverse method on List. It avoids an extra iteration of the list. You can call the LINQ reverse like this Enumerable.Reverse(paraElems).

Oooo, thank you! I only didn't do this because I didn't know it existed and was being stupid 😆. That makes sense since you can call Select().Reverse(). Done.


tests/SIL.Machine.Tests/Corpora/PlaceMarkersUsfmUpdateBlockHandlerTests.cs line 10 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This can probably just be a static read-only field.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is any low hanging fruit, I would go ahead and do it. I will approve it now if you decide to hold off on the optimizations.

Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)


src/SIL.Machine/Corpora/PlaceMarkersUsfmUpdateBlockHandler.cs line 61 at r4 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Done. I thought we were using the Type name = new() syntax and avoiding var?

I wasn't sure if we could use that syntax in this assembly, since it targets .NET Standard 2.0, which is locked to an older version of C#.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through it again, I can definitely see room for optimizations but it would require enough reworking (which should then probably be back-ported to machine.py) that I've just spun off an issue for now in the interest of meeting the quarter deadline.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)


src/SIL.Machine/Corpora/PlaceMarkersUsfmUpdateBlockHandler.cs line 61 at r4 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I wasn't sure if we could use that syntax in this assembly, since it targets .NET Standard 2.0, which is locked to an older version of C#.

Yeah 👍. I wasn't sure if when we couldn't use the new() syntax, if you preferred we avoid var.

@Enkidu93 Enkidu93 merged commit 7b018c0 into master Jun 9, 2025
4 checks passed
@Enkidu93 Enkidu93 deleted the place_markers_usfm_update_block_handler branch June 9, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants