-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with string literals in router. #166
Comments
Hi! I can't reproduce that problem on my Ubuntu 20.04 with gcc-9.4 (C++14), clang-13 (C++14, C++20) and gcc-11 (C++20). Could you provide a minimal example that reproduces this error and tell more about your OS, compiler and C++ standard? |
@eao197 I'm not sure why this is happening; I'm putting it through the debugger, and saw that for some reason it processes the P.S. I was trying a slightly modified pattern when I took the pictures, to see if it made any difference. |
I'll try to get one later. The weird thing is, I was rebasing an old PR and saw this, but it wasn't occurring before. I'm on macOS 12, using Clang. The parent project where RESTinio is being included is c++17. |
It looks like a memory corruption. Maybe you use a dangling pointer somewhere? |
I made a smaller test-case where I can't reproduce the issue, even though I'm using VERY similar code. I'm really at my wits' end. |
@eao197 I think this is a bug concerning From what I'm gathering... the problem presents itself in my app after I initialize our localization engine. I tried to get more specific and noticed the following: TranslationEngine::TranslationEngine() {
std::string_view route_sv(R"-(/:path(.+))-");
const std::regex main_path_regex{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };
std::cregex_iterator token_it{
route_sv.data(),
route_sv.data() + route_sv.size(),
main_path_regex
};
initializeAllLanguages();
/* set all languages in configuration.
* They are kept untranslated internally which prevents having issues
* when changing languages (and need to update internal stuff with
* translations) */
populateLanguages(GetAllLanguages());
std::clog << "locale/debug: Checking if language is set in config." << std::endl;
auto prefValue = config["game/language"].getEnumName();
auto lang = getLanguageByHumanReadableName(prefValue);
setLanguage(lang); ///< this calls std::locale::global();
const std::regex main_path_regex2{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };
std::cregex_iterator token_it4 {
route_sv.data(),
route_sv.data() + route_sv.size(),
main_path_regex
};
std::cregex_iterator token_it5 {
route_sv.data(),
route_sv.data() + route_sv.size(),
main_path_regex2
};
} Interestingly, looking at this in lldb: Using the first regex, instantiated before calling (lldb) print (*token_it)[0].str()
(std::sub_match<const char *>::string_type) $3 = ":path(.+)"
(lldb) print (*token_it)[2].str()
(std::sub_match<const char *>::string_type) $4 = "path" Using the same regex instance, after calling std::locale (lldb) print (*token_it4)[0].str()
(std::sub_match<const char *>::string_type) $5 = ":path(.+)"
(lldb) print (*token_it4)[2].str()
(std::sub_match<const char *>::string_type) $6 = "path" And, using a new regex instance for the same pattern, created after changing locale: (lldb) print (*token_it5)[0].str()
(std::sub_match<const char *>::string_type) $7 = ":path(.+"
(lldb) print (*token_it5)[2].str()
(std::sub_match<const char *>::string_type) $8 = "path(.+" |
I'm in doubt there is an error in #include <regex>
#include <string_view>
#include <locale>
#include <iostream>
int main()
{
std::string_view route_sv(R"-(/:path(.+))-");
const std::regex main_path_regex{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };
std::cregex_iterator token_it{
route_sv.data(),
route_sv.data() + route_sv.size(),
main_path_regex
};
std::cout << "token_it[0]str: '" << (*token_it)[0].str() << "'" << std::endl;
std::cout << "token_it[2]str: '" << (*token_it)[2].str() << "'" << std::endl;
for( const auto ln : { "be_BY.utf8", "ru_RU.utf8", "en_NZ.utf8", "ru_UA.utf8", "POSIX" } )
{
std::cout << "*** " << ln << " ***" << std::endl;
std::locale::global( std::locale(ln) );
const std::regex main_path_regex2{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };
std::cregex_iterator token_it4 {
route_sv.data(),
route_sv.data() + route_sv.size(),
main_path_regex
};
std::cout << "token_it4[0]str: '" << (*token_it4)[0].str() << "'" << std::endl;
std::cout << "token_it4[2]str: '" << (*token_it4)[2].str() << "'" << std::endl;
std::cregex_iterator token_it5 {
route_sv.data(),
route_sv.data() + route_sv.size(),
main_path_regex2
};
std::cout << "token_it5[0]str: '" << (*token_it5)[0].str() << "'" << std::endl;
std::cout << "token_it5[2]str: '" << (*token_it5)[2].str() << "'" << std::endl;
}
} And it produces the same results on my Linux machine (and, if I remove "POSIX", on Windows machine with VC++ from VS2022). So may be there is no need to modify RESTinio code? |
You're correct with that example however, there's a missing piece of the puzzle I forgot to mention: Boost.locale. If I just add Boost.locale so instead of std::locale::global( std::locale(ln) ); it becomes boost::locale::generator m_gen{};
std::locale::global( m_gen(ln) ); to your example, this is the output:
|
Have you tried boost_regex_engine_t for router? If Boost.Locale breaks std::regex then may be it doesn't breaks Boost.Regex? |
It doesn't break Boost.Regex indeed, but it's still a no-go, because the code causing the problem is in BTW: I'm working on fixing bugs with my PR. I'll re-submit it once it can pass all tests. |
It looks a bit weird: boost.locale breaks the normal work of std::regex, but changes are proposed for RESTinio. What if we will use std::regex somewhere inside RESTinio for other purposes? Should we take care about the incorrect behavior of boost.locale+std::regex? And what to do if there will be some other library that breaks std::regex (or something else from the standard library)? Rewrite a part of RESTinio yet again? |
I am proposing changes to RESTinio because, in wanting to support express-router, you opted to rely on a third party library, path-to-regexp (same as express.js does). Except, you're using an ancient version of the path-to-regexp code. I have spotted (and fixed) already several flaws in my PR. Still, at least (but probably not at most) one of the failures are because the original tests have been changed. For example, in path-to-regexp sometime before 2.0 (which is what I believe you used) [
'/:test?-bar',
null,
[
{
name: 'test',
prefix: '/',
delimiter: '/',
optional: true,
repeat: false,
partial: true,
asterisk: false,
pattern: '[^\\/]+?'
},
'-bar'
],
[
['/-bar', ['/-bar', undefined]],
['/foo-bar', ['/foo-bar', 'foo']]
],
[
[{ test: 'foo' }, '/foo-bar']
] Same test in path-to-regexp 6.2.1 (current version): [
"/:test?-bar",
undefined,
[
{
name: "test",
prefix: "/",
suffix: "",
modifier: "?",
pattern: "[^\\/#\\?]+?",
},
"-bar",
],
[
["-bar", ["-bar", undefined]],
["/-bar", null],
["/foo-bar", ["/foo-bar", "foo"]],
],
[
[undefined, "-bar"],
[{ test: "foo" }, "/foo-bar"],
],
], The syntax of the data structures is a bit different, but you can probably see the difference in matching behavior. |
The whole regex issue that motivated my working on this is essentially just a symptom. If you don't update your implementation of path2regex, the behavior between RESTinio and express.js will continue to differ. |
I'll open a new PR now, as I've corrected some errors I found in my port. There's still going to be breaking changes for some people, but they'll likely be things From documentation of path-to-regexp, main changes are (most are copied from the path-to-regexp release notes):
I've updated the test suite to reflect the changes, as well as added some new tests that had been added to the upstream |
Hi, @Lord-Kamina ! Thanks you very much for your work. However, I think that breaking compatibility in 0.6 isn't a good thing, even if that break closes a gap between RESTinio and the original path-to-regex. It seems that your work can be used for work on 0.7 branch when (and if) that work will be started. I'll take a closer look at your PR this week and will merge it in a separate branch. |
While I'd like to see this merged eventually, I'm currently using a heavily modified branch on my app and likely will for a while yet, so it's fine if you'll wait for a bigger release. |
I was previously using the express router, configured something like this:
Since updating to 0.6.17 (using libfmt9), I'm getting the following error: "non-escaped bracket ')' at pos 9: may be unmatched group finish"
If I disregard the fact that this is a raw string and thus nothing should need to be escaped, the error goes away but my rules don't match any request and they all end unhandled.
The text was updated successfully, but these errors were encountered: