Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Search param removes values after ampersand, introduced in 2.18.2 #3106

Open
ezwc opened this issue Dec 12, 2024 · 1 comment
Open
Labels
bug something broken P2 considered for next cycle regression this used to work

Comments

@ezwc
Copy link

ezwc commented Dec 12, 2024

I pass to one of my pages a search string, like:
?param1=something&param2=something2

Accessing it uding:
def layout(**kwargs):

In 2.18.1, this works for values that included ampersand. For example:
?param1=something&bla&param2=something2
would result in
kwargs[param1]='something&bla'

With 2.18.2 I get just:
kwargs[param1]='something'
with anything after the ampersand removed.

I would guess this is related to #2991 .

To be clear, I specifically downgraded dash to 2.18.1 and the issue went away.

@gvwilson gvwilson added regression this used to work bug something broken P2 considered for next cycle labels Dec 12, 2024
@C0deBeez
Copy link
Contributor

C0deBeez commented Dec 16, 2024

Thank you for reporting this issue.

Root Cause

The behavior you’re experiencing stems from a change in the query string parsing logic introduced in Dash 2.18.2. Specifically, the _parse_query_string function uses unquote(search) in combination with parse_qs. While this combination adheres to general query string parsing rules, it introduces problems when handling unencoded reserved characters like & within values.

For example:

  • In ?param1=something&bla&param2=something2, parse_qs treats &bla as a new key (bla) instead of part of the value for param1.
  • As per the RFC 3986 standard, & is a reserved character and must be percent-encoded (%26) if it’s meant to be part of a value. So, the correctly encoded query string should be:
    ?param1=something%26bla&param2=something2

Observed Issue with unquote

Additionally, even a properly encoded query string like ?param1=something%26bla&param2=something2 results in the value bla being dropped in Dash 2.18.2. This happens because unquote(search) decodes %26 back to &, and parse_qs again interprets & as a delimiter rather than part of the value.

Proposed Solution

To address this issue, while supporting duplicate keys and empty values, _parse_query_string can be updated to use parse_qsl instead of parse_qs. The parse_qsl method offers better control, especially when handling encoded characters within query strings.

Updated _parse_query_string Implementation

from urllib.parse import parse_qsl

def _parse_query_string(search):
  if search and len(search) > 0 and search[0] == "?":
      search = search[1:]
  else:
      return {}

  parsed_qs = {}
  for k, v in parse_qsl(search, keep_blank_values=True):
      if k in parsed_qs:
          if isinstance(parsed_qs[k], list):
              parsed_qs[k].append(v)
          else:
              parsed_qs[k] = [parsed_qs[k], v]
      else:
          parsed_qs[k] = v
  return parsed_qs



query_strings = [
    "?param1=something&bla&param2=something2",
    "?param1=something%26bla&param2=something2",
    "?param1=value1&param1=value2&param2=value3",
    "?param1=&param2=something2",
    "",
]

for qs in query_strings:
    print(f"Query: {qs}")
    print(f"Parsed: {_parse_query_string(qs)}")

Output:


Query: ?param1=something&bla&param2=something2
Parsed: {'param1': 'something', 'bla': '', 'param2': 'something2'}

Query: ?param1=something%26bla&param2=something2
Parsed: {'param1': 'something&bla', 'param2': 'something2'}

Query: ?param1=value1&param1=value2&param2=value3
Parsed: {'param1': ['value1', 'value2'], 'param2': 'value3'}

Query: ?param1=&param2=something2
Parsed: {'param1': '', 'param2': 'something2'}

Query: 
Parsed: {}

Recommendations

  1. Encourage Proper URI Encoding
    According to the URI specification, reserved characters like & and = must be percent-encoded when used as part of values. For example:
    ?param1=something%26bla&param2=something2
    This ensures predictable parsing behavior.

  2. Discussions
    @gvwilson @T4rk1n Given the above, should Dash enforce URI encoding compliance, or should we handle these edge cases within the parsing logic instead?
    If we decide to enforce URI encoding compliance, parse_qsl seems to be a better solution as it provides better handling for encoded characters, duplicate keys, and empty values. On the other hand, if we choose not to enforce URI compliance, we may need to consider removing the search = unquote(search) line from _parse_query_string to prevent issues like misinterpreting %26 (decoded as &) as a delimiter rather than part of a value.

Looking forward to your thoughts on how best to proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P2 considered for next cycle regression this used to work
Projects
None yet
Development

No branches or pull requests

3 participants