Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WHATWG URL API format() function inserts '//' wrongly when 'unicode' is set #48759

Closed
jribbens opened this issue Jul 13, 2023 · 2 comments · Fixed by #49396
Closed

WHATWG URL API format() function inserts '//' wrongly when 'unicode' is set #48759

jribbens opened this issue Jul 13, 2023 · 2 comments · Fixed by #49396
Labels
url Issues and PRs related to the legacy built-in url module. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch.

Comments

@jribbens
Copy link

jribbens commented Jul 13, 2023

Version

v18.16.1

Platform

Linux foo 6.2.0-25-generic #25-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 16 17:05:07 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

url

What steps will reproduce the bug?

> const { format } = require('node:url')
undefined
> format(new URL('tel:123'))
'tel:123'
> format(new URL('tel:123'), { unicode: true })
'tel://123'
> format(new URL('doi:10.123/456'))
'doi:10.123/456'
> format(new URL('doi:10.123/456'), { unicode: true })
'doi://10.123/456'

How often does it reproduce? Is there a required condition?

100% reproducible

What is the expected behavior? Why is that the expected behavior?

The // characters should not be being inserted.

Three reasons why:

  • obviously the unicode flag should not be changing the output of the URLs I show
  • RFC 3986 section 3 says: 'When authority is not present, the path cannot begin with two slash characters ("//")'
  • RFC 3966 clearly shows that the tel: URL syntax (for example) never starts with //

What do you see instead?

URLs that are not in the slashedProtocol set should not be having // added to them.

Additional information

From manual inspection of the code, I am guessing that node/src/node_url.cc line 198 onwards is the issue:

  if (unicode) {
    out->host = ada::idna::to_unicode(out->get_hostname());
  }

I guess that ada::idna::to_unicode is transforming the hostname from a falsey value to an empty but truthy value and hence get_href is adding in the //. Perhaps the above lines should be checking that there actually is a hostname, before updating out->host.

@atlowChemi atlowChemi added whatwg-url Issues and PRs related to the WHATWG URL implementation. url Issues and PRs related to the legacy built-in url module. and removed whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Jul 14, 2023
@atlowChemi
Copy link
Member

@nodejs/url cc

@lemire
Copy link
Member

lemire commented Jul 15, 2023

At a glance, @jribbens is likely correct.

The ada library returns the empty string when calling get_hostname() if there is no host. Prior to calling it, one might want to call has_hostname().

So it should be...

  if (unicode && out->has_hostname()) {
    out->host = ada::idna::to_unicode(out->get_hostname());
  }

Xstoudi added a commit to Xstoudi/node that referenced this issue Aug 9, 2023
@anonrig anonrig added the v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch. label Aug 28, 2023
nodejs-github-bot pushed a commit that referenced this issue Aug 31, 2023
PR-URL: #49396
Fixes: #48759
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
UlisesGascon pushed a commit that referenced this issue Sep 10, 2023
PR-URL: #49396
Fixes: #48759
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
alexfernandez pushed a commit to alexfernandez/node that referenced this issue Nov 1, 2023
PR-URL: nodejs#49396
Fixes: nodejs#48759
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
targos pushed a commit that referenced this issue Nov 27, 2023
PR-URL: #49396
Fixes: #48759
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024
PR-URL: nodejs/node#49396
Fixes: nodejs/node#48759
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024
PR-URL: nodejs/node#49396
Fixes: nodejs/node#48759
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch.
Projects
None yet
4 participants