Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "hasIndices" RegExp flag #968

Closed

Conversation

fabioh8010
Copy link
Contributor

Summary

This change adds support for hasIndices RegExp flag, according to ES2022 specifications.

References:
https://262.ecma-international.org/13.0/#sec-regexp-regular-expression-objects
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices

This change is also related to this issue: #932

Test Plan

var str = "The Quick Brown Fox Jumps Over The Lazy Brown Dog";
var regex = /quick\s(?<color>brown).+?(?<action>jumps).+?(?<where>over_)*/dgi;
var res = regex.exec(str);

print(Object.getOwnPropertyNames(res)); // 0,1,2,3,length,index,input,groups,indices

print(Object.getOwnPropertyNames(res.indices)); // 0,1,2,3,length,groups
print(res.indices[0]); // 4,26
print(res.indices[1]); // 10,15
print(res.indices[2]); // 20,25
print(res.indices[3]); // undefined

print(Object.getOwnPropertyNames(res.indices.groups)); // color,action,where
print(res.indices.groups.constructor); // undefined
print(res.indices.groups.color); // 10,15
print(res.indices.groups.action); // 20,25
print(res.indices.groups.where); // undefined

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Apr 12, 2023
if (global)
result.push_back('g');
if (ignoreCase)
result.push_back('i');
if (multiline)
result.push_back('m');
if (dotAll)
result.push_back('s');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the order change? Do you think this is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just to align the order with ES2022 specification here.

@@ -318,6 +318,8 @@ STR(multiline, "multiline")
STR(unicode, "unicode")
STR(sticky, "sticky")
STR(dotAll, "dotAll")
STR(indices, "indices")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there two definitions needed indicates and hasIndicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are used here, here and here.

Handle<JSArray> indices) {
// If there are no capture groups, then set groups to undefined.
if (!mappingObj) {
auto executionStatus = JSObject::defineNewOwnProperty(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should refrain from using defineNewOwnProperty. You can only use this API when you know for certain that the property you're adding does not exist on the current object. To be safe, use defineOwnProperty.

Predefined::getSymbolID(Predefined::groups),
PropertyFlags::defaultNewNamedPropertyFlags(),
Runtime::getUndefinedValue());
assert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts should not be written this way. You should check the result of the defineProperty and return in an error in a regular if. Also remember to wrap these in the appropriate LLVM_(UN)LIKELY For example,

if (LLVM_UNLIKELY(JSObject::defineNewOwnProperty(...) == ExecutionStatus::EXCEPTION)){
  return ExecutionStatus::EXCEPTION;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this, but since my createIndicesArray is a void method this approach wouldn't work, so I followed the same way was done to createGroupsObject which is also a void method and uses assert. What's your advice? Change the method's signature to return something like CallResult<void> ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You change the type to ExecutionStatus.

indices,
runtime,
Predefined::getSymbolID(Predefined::groups),
PropertyFlags::defaultNewNamedPropertyFlags(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be getDefaultNewPropertyFlags.

matchObj,
runtime,
Predefined::getSymbolID(Predefined::indices),
PropertyFlags::defaultNewNamedPropertyFlags(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto-getDefaultNewPropertyFlags.

@@ -595,6 +658,13 @@ CallResult<Handle<JSArray>> directRegExpExec(
auto inputSHV = SmallHermesValue::encodeStringValue(*S, runtime);
JSObject::setNamedSlotValueUnsafe(*A, runtime, inputDesc, inputSHV);

// Create indices array.
auto indicesRes = JSArray::create(runtime, match.size(), match.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you create this unconditionally? This will only ever be used if hasIndices is true, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created at this point because I need this array already declared in order to be used inside the loop here, and would also need the same array outside the loop here. Do you think it's problematic?

@@ -2051,7 +2051,6 @@
"import-assertions",
"json-superset",
"new.target",
"regexp-match-indices",
Copy link
Contributor

@fbmal7 fbmal7 Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to test to spec, but it should not be the only source of testing. Add some examples of using the d flag and making sure it's working in some basic tests cases in our lit tests. You can add these cases to test/hermes/regexp.js. For instance, I haven't had time to debug this further, but I am getting an error when running this code on top of your changes:

let re = /(?<keyword>let|const|var)\s+(?<id>[a-zA-Z_$][0-9a-zA-Z_$]*)/d;
let match = re.exec("let myVar");
print((match.indices.groups));

Something funny is happening here since if I instead change the line to the following:

print(JSON.stringify((match.indices.groups)));

everything works fine.

edit- actually I figured out what's going on here. The root cause is outside of the scope of your changes, so you don't have to address them. But there should still be some cases added to our lit tests.

If you're curious, the problem is because ordinaryToPrimitive in Operations.cpp will try to lookup either toString or valueOf on an object to convert it to something printable. But since you defined the indices.groups object with a null parent, it's impossible to find either of these functions on the prototype chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was going to say that, this behavior also happens to match.groups because it's a null-prototype object just like match.indices.groups is.

This issue also happens in other engines if you do like this for example (although they'll work if you just console.log the variable without string conversion):

let re = /(?<keyword>let|const|var)\s+(?<id>[a-zA-Z_$][0-9a-zA-Z_$]*)/d;
let match = re.exec("let myVar");
console.log('match.groups ' + match.groups); // Uncaught TypeError: Cannot convert object to primitive value
console.log('match.indices.groups ' + match.indices.groups); // Uncaught TypeError: Cannot convert object to primitive value

Some references:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec#return_value
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object#null-prototype_objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the testing, I'll implement some tests as requested 👍 . Should I implement tests for this "null-prototype object" case too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think I would classify this as a bug. JSC and SpiderMonkey both exhibit the same behavior as Hermes in this case. You can test this by trying to print out the expression Object.create(null) in the different engines.

You should not write tests for this case.

Handle<JSObject> mappingObj,
Handle<JSArray> indices) {
// If there are no capture groups, then set groups to undefined.
if (!mappingObj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit- is there a point to flipping this conditional if both cases are written out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just preference I guess, I can flip it if you like :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again it's a nit, but I personally feel like it would be cleaner without the negation.

@fbmal7
Copy link
Contributor

fbmal7 commented Apr 13, 2023

Overall the code is looking good so far, thanks for taking this on :)

@fbmal7
Copy link
Contributor

fbmal7 commented Apr 17, 2023

This PR is still in a draft state. Please change the status of it so the rest of the team can review as well.

@fabioh8010 fabioh8010 marked this pull request as ready for review April 18, 2023 17:17
@fabioh8010
Copy link
Contributor Author

fabioh8010 commented Apr 19, 2023

Hi @fbmal7, I addressed all the changes requested and changed the PR status.

@fbmal7
Copy link
Contributor

fbmal7 commented Apr 24, 2023

Actually, I think this code could be cleaner if we consolidate the all the work related to populating the indices object into one method. This way we can also make the code read closer to the spec. We can also address the point about eagerly initializing auto indicesRes = JSArray::create(runtime, match.size(), match.size());. So, you can remove the check for hasIndices inside the loop (which would have been turned into 3 hasIndices checks to avoid the initialization.)

And instead, move all that logic into a new method makeMatchIndicesIndexPairArray, which will be aligned to the spec with the same name. Then we will only have one check for hasIndices. This means you will construct and populate Handle<JSArray> indices inside of makeMatchIndicesIndexPairArray. So it will also need to take in match as a parameter. Also, try to include inline comments in source code for the corresponding spec rules. I will leave a comment on a specific piece of code as an example.

return ExecutionStatus::EXCEPTION;
}
} else {
// If there are no capture groups, then set groups to undefined.
Copy link
Contributor

@fbmal7 fbmal7 Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you can leave an inline comment that copy-pastes the spec. In this case it's coming from this definition.

// 7a. Else, Let groups be undefined.
if (LLVM_UNLIKELY(
            JSObject::defineOwnProperty(
                indices,
                runtime,
                Predefined::getSymbolID(Predefined::groups),
                DefinePropertyFlags::getDefaultNewPropertyFlags(),
                Runtime::getUndefinedValue()) == ExecutionStatus::EXCEPTION)) {
      return ExecutionStatus::EXCEPTION;
    }

This will make more sense when you refactor this code into a new method makeMatchIndicesIndexPairArray. At the top of the function definition, you can insert an inline comment saying 22.2.7.8 MakeMatchIndicesIndexPairArray.

@fabioh8010
Copy link
Contributor Author

fabioh8010 commented Apr 28, 2023

Hi @fbmal7 thanks for your review and thoughts. Yes, I agree, let's keep the implementation closer to the specification 👍
I will be OOO for some days and will return on May 4th to start working on these changes in the following days.

@fabioh8010
Copy link
Contributor Author

Hi @fbmal7 , I addressed the requested changes, tried to keep the implementation as close as possible to the specification, please review it again when you have some time.

@fabioh8010
Copy link
Contributor Author

Hi @fbmal7 , did you have a chance to take a look at the latest changes on this PR? Thanks!

}
auto matchIndexPair = runtime.makeHandle<JSArray>(*matchIndexPairRes);

JSArray::setElementAt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you may use unsafeSetExistingElementAt. You know that it's safe to write the element at that spot because you created it with the proper size above. You will also need to dereference the Handle in order to use that function.

0,
runtime.makeHandle(
HermesValue::encodeUntrustedNumberValue(mg->location)));
JSArray::setElementAt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto- unsafeSetExistingElementAt


// Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)),
// matchIndexPair).
JSArray::setElementAt(A, runtime, idx, matchIndexPair);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto- unsafeSetExistingElementAt

// Let matchIndexPair be undefined.
// Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)),
// matchIndexPair).
JSArray::setElementAt(A, runtime, idx, Runtime::getUndefinedValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto- unsafeSetExistingElementAt. When you switch this, you should also use HermesValue::encodeUndefinedValue() instead of Runtime::getUndefinedValue().

@@ -595,6 +701,10 @@ CallResult<Handle<JSArray>> directRegExpExec(
auto inputSHV = SmallHermesValue::encodeStringValue(*S, runtime);
JSObject::setNamedSlotValueUnsafe(*A, runtime, inputDesc, inputSHV);

// If R contains any GroupName, then let hasGroups be true.
auto groupNames = regexp->getGroupNameMappings(runtime);
bool hasGroups = !!groupNames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the double exclamation mark, bool hasGroups = (bool)groupNames;. Handle overloads the bool operator. It overloads it with explicit, which you can learn about the usage here.

@@ -595,6 +701,10 @@ CallResult<Handle<JSArray>> directRegExpExec(
auto inputSHV = SmallHermesValue::encodeStringValue(*S, runtime);
JSObject::setNamedSlotValueUnsafe(*A, runtime, inputDesc, inputSHV);

// If R contains any GroupName, then let hasGroups be true.
auto groupNames = regexp->getGroupNameMappings(runtime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write out the type here, it's not obvious from the name.

Handle<JSObject> groupNames = regexp->getGroupNameMappings(runtime);

auto clazzHandle = runtime.makeHandle(mappingObj->getClass(runtime));
auto groupsObjRes = JSObject::create(
runtime, Runtime::makeNullHandle<JSObject>(), clazzHandle);
auto groupsObj = runtime.makeHandle(groupsObjRes.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: groupsObj -> groups

@fabioh8010 fabioh8010 force-pushed the feature/regexp-indices-flag branch from 122558c to 57cf9ba Compare June 9, 2023 08:05
@fabioh8010
Copy link
Contributor Author

Hi @fbmal7 , thanks for the review. I've addressed all the changes, but I also had to use JSArray::setStorageEndIndex() in my arrays to be able to work with JSArray::unsafeSetExistingElementAt(), because without resizing the storage I was getting this kind of error.

Assertion failed: (index >= self->beginIndex_ && index < self->endIndex_ && "array index out of range"), function unsafeSetExistingElementAt, file JSArray.h, line 77.

Please let me know if it makes sense, thank you!

@fbmal7
Copy link
Contributor

fbmal7 commented Jun 20, 2023

Using setStorageEndIndex is expected.

Given that this PR will require a bytecode bump, the team is currently discussing the best way to merge these changes in.

@AndrewGable
Copy link

Hey @fbmal7 - We are trying to plan around this PR, would this be included in a new version of Hermes (0.12.1 ?) that would land with 0.73.x of React Native? Thanks in advanced!

@fbmal7
Copy link
Contributor

fbmal7 commented Jul 12, 2023

@AndrewGable We expect this PR to land in time for 0.73, barring any surprises. We will keep this thread updated.

@facebook-github-bot
Copy link
Contributor

@fbmal7 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@fbmal7 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@fbmal7 merged this pull request in 2afc7b0.

facebook-github-bot pushed a commit that referenced this pull request Nov 8, 2023
Summary:
Original Author: contact@fabiohenriques.dev
Original Git: 2afc7b0
Original Reviewed By: neildhar
Original Revision: D48396577

This change adds support for `hasIndices` RegExp flag, according to ES2022 specifications.

References:
https://262.ecma-international.org/13.0/#sec-regexp-regular-expression-objects
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices

This change is also related to this issue: #932

Pull Request resolved: #968

Pulled By: fbmal7

Reviewed By: tmikov, neildhar

Differential Revision: D50992837

fbshipit-source-id: 6c29b930fcc17957b95fe049afc190cfc840ff05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants