Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: refactor EndsInANumber in node_url.cc and adds IsIPv4NumberValid #46227

Merged
merged 14 commits into from
Jan 23, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 72 additions & 15 deletions src/node_url.cc
Original file line number Diff line number Diff line change
Expand Up @@ -407,29 +407,86 @@ int64_t ParseIPv4Number(const char* start, const char* end) {
return strtoll(start, nullptr, R);
}

// https://url.spec.whatwg.org/#ipv4-number-parser
bool IsIPv4NumberValid(std::string_view input) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
// If input is the empty string, then return failure.
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input.empty()) {
return false;
}

// If input contains at least two code points..
if (input.size() >= 2) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
// and the first two code points are either "0X" or "0x", then:
if (input[0] == '0' && (input[1] == 'X' || input[1] == 'x')) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input.size() == 2) {
return true;
}

// Remove the first two code points from input,
// radix-R is 16
// If input contains a code point that is not a radix-R digit, then return
// failure.
return input.find_first_not_of("0123456789abcdefABCDEF", 2) ==
anonrig marked this conversation as resolved.
Show resolved Hide resolved
std::string_view::npos;

// and the first code point is U+0030 (0), then:
} else if (input[0] == '0') {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input.size() == 1) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
return true;
}

// Remove the first code point from input.
// radix-R is 8
// If input contains a code point that is not a radix-R digit, then return
// failure.
return input.find_first_not_of("01234567", 1) == std::string_view::npos;
}
}

// If input contains a code point that is not a radix-R digit, then return
// failure. radix-R is 10
return std::all_of(input.begin(), input.end(), ::isdigit);
anonrig marked this conversation as resolved.
Show resolved Hide resolved
}

// https://url.spec.whatwg.org/#ends-in-a-number-checker
bool EndsInANumber(const std::string& input) {
std::vector<std::string> parts = SplitString(input, '.', false);
bool EndsInANumber(const std::string_view input) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input.empty()) {
return false;
}

const std::string delimiter = ".";
anonrig marked this conversation as resolved.
Show resolved Hide resolved
auto pointer_start = input.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use std::string_view::iterator it might solve the issues related to std::string_view construct on windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows build is still failing : /

D:\a\node\node\src\node_url.cc(465,66): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'std::string_view' [D:\a\node\node\libnode.vcxproj]
D:\a\node\node\src\node_url.cc(465,66): message : No constructor could take the source type, or constructor overload resolution was ambiguous [D:\a\node\node\libnode.vcxproj]
  node_util.cc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to know if I can go with &(*it); again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax any recommendations?

Copy link
Contributor Author

@miguelteixeiraa miguelteixeiraa Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd look like this:
std::string_view(&(*pointer_start), pointer_end - pointer_start));
just to make the build pass on windows

Copy link
Contributor Author

@miguelteixeiraa miguelteixeiraa Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i understand is that iterators don't need to be pointers although they could be implemented as one. And some implementations of the STL (MSVC++, Mingw) implement interators with special classes (with no pre-difined casts to pointer), so that we cannot use them directly in the string_view constructor that we are using, which needs a charT*.

then we could use * which is overloaded to do what logically mean: convert the iterator type to its pointed-to object, then pass the address of this &(*it).

https://en.cppreference.com/w/cpp/named_req/RandomAccessIterator
https://stackoverflow.com/questions/32654108/c-stdvectoriterator-is-not-a-pointer-why/64910815#64910815

auto pointer_end = input.end();

if (parts.empty()) return false;
uint8_t parts_size = std::count(pointer_start, pointer_end, delimiter[0]);
++parts_size;

if (parts.back().empty()) {
if (parts.size() == 1) return false;
parts.pop_back();
// If the last item in parts is the empty string, then:
if (input.back() == delimiter[0]) {
// Remove the last item from parts.
--pointer_end;
--parts_size;
}

const std::string& last = parts.back();
// Let last be the last item in parts
if (parts_size > 1) {
pointer_start = std::find_end(
pointer_start, pointer_end, delimiter.begin(), delimiter.end());
++pointer_start;
}

// If last is non-empty and contains only ASCII digits, then return true
if (!last.empty() && std::all_of(last.begin(), last.end(), ::isdigit)) {
return true;
if (std::distance(pointer_start, pointer_end) == 0) {
return false;
}

const char* last_str = last.c_str();
int64_t num = ParseIPv4Number(last_str, last_str + last.size());
if (num >= 0) return true;
// If last is non-empty and contains only ASCII digits, then return true.
if (std::all_of(pointer_start, pointer_end, ::isdigit)) {
return true;
}

return false;
// If parsing last as an IPv4 number does not return failure, then return
// true.
return IsIPv4NumberValid(std::string(pointer_start, pointer_end));
addaleax marked this conversation as resolved.
Show resolved Hide resolved
}

void URLHost::ParseIPv4Host(const char* input, size_t length) {
Expand Down Expand Up @@ -1944,4 +2001,4 @@ MaybeLocal<Value> URL::ToObject(Environment* env) const {
} // namespace node

NODE_BINDING_CONTEXT_AWARE_INTERNAL(url, node::url::Initialize)
NODE_BINDING_EXTERNAL_REFERENCE(url, node::url::RegisterExternalReferences)
NODE_BINDING_EXTERNAL_REFERENCE(url, node::url::RegisterExternalReferences)
anonrig marked this conversation as resolved.
Show resolved Hide resolved