Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22781 Adding support for constant denominators (C++) #3337

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

younies
Copy link
Member

@younies younies commented Jan 20, 2025

Adding support for constant denominators

  • Introduced a new field for constant denominators in MeasureUnitImpl.
  • Added methods to set and retrieve constant denominators in MeasureUnit.
  • Updated parsing logic to handle unit constants correctly.
  • Enhanced unit tests to cover new functionality, including constant denominator scenarios.
  • Improved code readability and consistency across MeasureUnit and SingleUnitImpl classes.

TODO in the following PRs

  1. Add support to unit constant during units conversion.
  2. Add support to unit constant during units formatting.

Checklist

  • Required: Issue filed: ICU-22781
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@younies younies requested a review from richgillam January 20, 2025 12:04
@younies younies force-pushed the cpp-parse-constants branch from 7d2adcc to 0cda889 Compare January 20, 2025 19:51
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/measunit_impl.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@younies younies force-pushed the cpp-parse-constants branch from 9b2bce2 to 2e942fd Compare January 20, 2025 20:44
younies added a commit to younies/icu that referenced this pull request Jan 20, 2025
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

younies added a commit to younies/icu that referenced this pull request Jan 21, 2025
@younies younies force-pushed the cpp-parse-constants branch from c7e55db to 135591d Compare January 21, 2025 13:53
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@younies younies marked this pull request as ready for review January 21, 2025 13:54
younies added a commit to younies/icu that referenced this pull request Jan 21, 2025
@younies younies force-pushed the cpp-parse-constants branch from c22021a to af8f078 Compare January 21, 2025 14:01
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@younies younies changed the title ICU-22781 Adding support for constant denominators ICU-22781 (C++) Adding support for constant denominators Jan 21, 2025
@younies younies requested a review from sffc January 21, 2025 14:06
younies added a commit to younies/icu that referenced this pull request Jan 21, 2025
@younies younies force-pushed the cpp-parse-constants branch from af8f078 to 401a1ff Compare January 21, 2025 14:09
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

younies added a commit to younies/icu that referenced this pull request Jan 21, 2025
@younies younies force-pushed the cpp-parse-constants branch from 401a1ff to a8ed6c1 Compare January 21, 2025 14:11
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@younies younies changed the title ICU-22781 (C++) Adding support for constant denominators ICU-22781 Adding support for constant denominators (C++) Jan 21, 2025
younies added a commit to younies/icu that referenced this pull request Jan 21, 2025
@younies younies force-pushed the cpp-parse-constants branch from a8ed6c1 to eb6ab8a Compare January 21, 2025 14:11
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

younies added a commit to younies/icu that referenced this pull request Jan 21, 2025
@younies younies force-pushed the cpp-parse-constants branch from 1ee1c40 to c3757c8 Compare January 21, 2025 14:41
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

younies added a commit to younies/icu that referenced this pull request Jan 21, 2025
@younies younies force-pushed the cpp-parse-constants branch from 3c6149c to 79bba58 Compare January 21, 2025 16:21
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks okay, but I have a few questions.

This also doesn't actually do anything, but I see that you're planning to do that part in separate PRs, so that's okay for now.

// TODO: Consider split function as a utility function.
// Parse the given string to a unsigned long value.
// If the value is not positive integer, it will return `kUnitIdentifierSyntaxError`.
uint64_t parseStrigToLong(const StringPiece str, UErrorCode &status) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be parseStringToLong()? (Missing n)

}

return result;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments about this whole function:

  1. Since this is guaranteed to be a decimal number using ASCII characters in a char string, couldn't you just use atoi() instead of writing the whole thing yourself? Afterward, you could just check the result from atoi() to decide whether it's valid (e.g., disallowing negative numbers).

  2. The logic to parse the mantissa and the logic to parse the exponent are basically the same (except maybe for which values are valid). It seems like you could use the same code to handle each piece. In fact, if atoi() doesn't support scientific notation, you could at least use it to separately parse the mantissa and exponent and then just have your own logic for handling the e and putting the pieces together (or use atof(), although that'll give you back a float, which probably isn't helpful).

  3. Come to think of it, I don't think the logic above works right with EBCDIC, although I don't know if we still care about that...

Copy link
Member Author

@younies younies Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used StringToDoubleConverter for now to mimic the same behavior in Java. However, we need to implement our own logic in a separate class to return an error when there is a limit exceed instead of returning any random number.

I am trying to implement the long converter here: #3339

char c = str.data()[i];

// handle sign
if (i == exponentIndex && c == '+') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is + actually a legal character in a unit identifier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if it is part of the unit constant:

For example, "meter-per-+10".

However, it looks a bit weird, and when the user returns the identifier, it will be removed.

This is now handled by the StringToDoubleConverter.

int32_t endOfConstantIndex = -1;
// If no match was found, we check if the token is a constant denominator.
// 1. find the first `-` from the `currentFIndex` to the end.
for (int32_t i = currentFIndex; i < fSource.length(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use sdrchr() to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I found that we can use StringPiece::find instead.

@@ -695,27 +907,30 @@ class Parser {
bool atStart = fIndex == 0;
Token token = nextToken(status);
if (U_FAILURE(status)) {
return result;
return SingleUnitOrConstant::singleUnitValue(singleUnitResult);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of repetitions of this line in this function, and it's not immediately obvious that what you're doing here is just returning a default-constructed SingleUnitImpl when you get a parse error. It's both really verbose and not clear.

Actually, as I look at this more, it's not always a default-constructed SingleUnitImpl-- it's the actual result you're constructing, as far as you got before getting to the error. Is that really what you want to be doing?

I can't help thinking you might be better off creating a default-constructed SingleUnitOrConstant (or a SingleUnitOrConstant containing a default-constructed SingleUnitImpl) up front, calling it errorResult, and just returning errorResult everywhere you're currently returning SingleUnitOrConstant(singleUnitResult) to indicate a parse error.

Or maybe I'm just horribly misreading this function...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that I have changed the return type from SingleUnitImpl to a new type called SingleUnitOrConstant.

Therefore, if we need to return a result, it needs to be packed in a SingleUnitOrConstant object. After each check for the status, if there is a failure, I need to encapsulate the result (which is SingleUnitImpl) in a SingleUnitOrConstant object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, you are right, I have found a way to get rid of them by returning {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not questioning what you were actually doing; I was just saying it wound up being verbose and hard to read and was suggesting you try to find a more concise way of expressing it. I haven't looked at the changes yet, but on the face of it, {} sounds good to me.

@younies younies requested a review from richgillam January 23, 2025 00:11
@@ -483,7 +486,7 @@ class Token {

static Token constantToken(StringPiece str, UErrorCode &status) {
Token result;
auto value = result.parseStrigToLong(str, status);
auto value = Token::parseStrigToLong(str, status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have parseStrigToLong() instead of parseStriNgToLong().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks for the note :)

richgillam
richgillam previously approved these changes Jan 23, 2025
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot better! Thank you for the changes.

@younies younies force-pushed the cpp-parse-constants branch from 58bb63b to 2e94665 Compare January 24, 2025 00:33
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@younies younies requested a review from richgillam January 24, 2025 00:33
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@younies younies merged commit ba4d4d3 into unicode-org:main Jan 24, 2025
94 checks passed
@younies younies deleted the cpp-parse-constants branch January 24, 2025 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants