Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hasIndices flag for regex #932

Closed
roryabraham opened this issue Mar 3, 2023 · 3 comments
Closed

hasIndices flag for regex #932

roryabraham opened this issue Mar 3, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@roryabraham
Copy link

Problem

We were trying to use a regex with the -d / hasIndices flag, and noticed that it worked on web but not iOS/Android. I then discovered that it looks like this flag is just not yet implemented in the Hermes engine:

class SyntaxFlags {
private:
// Note these are encoded into bytecode files, so changing their values is a
// breaking change.
enum : uint8_t {
ICASE = 1 << 0,
GLOBAL = 1 << 1,
MULTILINE = 1 << 2,
UCODE = 1 << 3,
DOTALL = 1 << 4,
STICKY = 1 << 5
};
public:
// Note: Preserving the order here and ensuring it lines up with the order of
// the offsets above makes the generated assembly more efficient.
// Specifically, it makes the conversion to/from a byte almost free.
uint8_t ignoreCase : 1;
uint8_t global : 1;
uint8_t multiline : 1;
uint8_t unicode : 1;
uint8_t dotAll : 1;
uint8_t sticky : 1;
/// \return a byte representing the flags. Bits are set based on the offsets
/// specified above. This is used for serialising the flags to bytecode.
uint8_t toByte() const {
uint8_t ret = 0;
if (global)
ret |= GLOBAL;
if (ignoreCase)
ret |= ICASE;
if (multiline)
ret |= MULTILINE;
if (unicode)
ret |= UCODE;
if (sticky)
ret |= STICKY;
if (dotAll)
ret |= DOTALL;
return ret;
}
/// \return a SyntaxFlags struct generated from the given byte. Bit offsets
/// are specified above. This is used for deserialising the flags from
/// bytecode.
static SyntaxFlags fromByte(uint8_t byte) {
SyntaxFlags ret = {};
if (byte & GLOBAL)
ret.global = 1;
if (byte & ICASE)
ret.ignoreCase = 1;
if (byte & MULTILINE)
ret.multiline = 1;
if (byte & UCODE)
ret.unicode = 1;
if (byte & STICKY)
ret.sticky = 1;
if (byte & DOTALL)
ret.dotAll = 1;
return ret;
}
/// \return a string representing the flags
/// The characters are returned in the order given in ES 6 21.2.5.3
/// (specifically global, ignoreCase, multiline, unicode, sticky)
/// Note this may differ in order from the string passed in construction
llvh::SmallString<6> toString() const {
llvh::SmallString<6> result;
if (global)
result.push_back('g');
if (ignoreCase)
result.push_back('i');
if (multiline)
result.push_back('m');
if (unicode)
result.push_back('u');
if (sticky)
result.push_back('y');
if (dotAll)
result.push_back('s');
return result;
}
/// Given a flags string \p str, generate the corresponding SyntaxFlags
/// \return the flags if the string is valid, an empty optional otherwise
/// See ES 5.1 15.10.4.1 for description of the validation
static llvh::Optional<SyntaxFlags> fromString(
const llvh::ArrayRef<char16_t> flags) {
// A flags string may contain i,m,g, in any order, but at most once each
auto error = llvh::NoneType::None;
SyntaxFlags ret = {};
for (auto c : flags) {
switch (c) {
case u'i':
if (ret.ignoreCase)
return error;
ret.ignoreCase = 1;
break;
case u'm':
if (ret.multiline)
return error;
ret.multiline = 1;
break;
case u'g':
if (ret.global)
return error;
ret.global = 1;
break;
case u'u':
if (ret.unicode)
return error;
ret.unicode = 1;
break;
case u'y':
if (ret.sticky)
return error;
ret.sticky = 1;
break;
case u's':
if (ret.dotAll)
return error;
ret.dotAll = 1;
break;
default:
return error;
}
}
return ret;
}
};

Solution

Implement the -d / hasIndices flag for Hermes regex.

Additional Context

Our use case for the -d flag is described here

@roryabraham roryabraham added the enhancement New feature or request label Mar 3, 2023
facebook-github-bot pushed a commit that referenced this issue Aug 29, 2023
Summary:
This change adds support for `hasIndices` RegExp flag, according to ES2022 specifications.

References:
https://262.ecma-international.org/13.0/#sec-regexp-regular-expression-objects
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices

This change is also related to this issue: #932

Pull Request resolved: #968

Test Plan:
``` javascript
var str = "The Quick Brown Fox Jumps Over The Lazy Brown Dog";
var regex = /quick\s(?<color>brown).+?(?<action>jumps).+?(?<where>over_)*/dgi;
var res = regex.exec(str);

print(Object.getOwnPropertyNames(res)); // 0,1,2,3,length,index,input,groups,indices

print(Object.getOwnPropertyNames(res.indices)); // 0,1,2,3,length,groups
print(res.indices[0]); // 4,26
print(res.indices[1]); // 10,15
print(res.indices[2]); // 20,25
print(res.indices[3]); // undefined

print(Object.getOwnPropertyNames(res.indices.groups)); // color,action,where
print(res.indices.groups.constructor); // undefined
print(res.indices.groups.color); // 10,15
print(res.indices.groups.action); // 20,25
print(res.indices.groups.where); // undefined
```

Reviewed By: neildhar

Differential Revision: D48396577

Pulled By: fbmal7

fbshipit-source-id: 11f2e52630d3ddbc7e34c3cdbcc80849df458bc5
@roryabraham
Copy link
Author

This was implemented in #968 and should land in the next hermes release I think?

@roryabraham
Copy link
Author

2afc7b0 does not appear to be included in a release yet, but hopefully soon 🙂

@fbmal7
Copy link
Contributor

fbmal7 commented Sep 11, 2023

@roryabraham This feature will be in RN 0.73.

@fbmal7 fbmal7 closed this as completed Sep 11, 2023
facebook-github-bot pushed a commit that referenced this issue Nov 8, 2023
Summary:
Original Author: contact@fabiohenriques.dev
Original Git: 2afc7b0
Original Reviewed By: neildhar
Original Revision: D48396577

This change adds support for `hasIndices` RegExp flag, according to ES2022 specifications.

References:
https://262.ecma-international.org/13.0/#sec-regexp-regular-expression-objects
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices

This change is also related to this issue: #932

Pull Request resolved: #968

Pulled By: fbmal7

Reviewed By: tmikov, neildhar

Differential Revision: D50992837

fbshipit-source-id: 6c29b930fcc17957b95fe049afc190cfc840ff05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants