reserved characters are treated inconsistently and not sensibly preserved

This has been a design flaw since the inception of the library, so, mea culpa on that.

Fundamentally, preserving, escaping, and encoding "reserved" characters is entirely the URL object's job, and it's failing at that.  Possibly the most succinct demonstration of the problem is this:

```python
>>> u = URL()
>>> u = u.child(u'/')
>>> u = u.asIRI()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    u = u.asIRI()
  File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 1116, in to_iri
    fragment=_percent_decode(self.fragment))
  File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 861, in replace
    userinfo=_optional(userinfo, self.userinfo),
  File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 606, in __init__
    for segment in path))
  File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 606, in <genexpr>
    for segment in path))
  File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 410, in _textcheck
    % (''.join(delims), name, value))
ValueError: one or more reserved delimiters /?# present in path segment: u'/'
>>>
```

This is - obviously I hope - the wrong place to be failing with an error like this.

There was previously *some* attempt to preserve these characters in the data model and escape them only upon stringification, but d26814c074c6f9787e62af907df17fbd68fde615 wrecked these semantics.  (In fairness: the attempt to do this was broken, and there are some places, like the scheme, where certain characters indeed *cannot* be represented, so this direction isn't entirely wrong.)

Fundamentally if a user wants to encode slashes, question marks, hash signs or whatever else that a human might, for example, type into a text field, then it should be possible to do that.

We could fix this obvious manifestation of the problem by just putting back the escape-only-on-`asText` logic, but that still leaves an even more pernicious problem:

```python
>>> u = URL(path=tuple([u'%2525']))
>>> u.asText()
u'%2525'
>>> u.asIRI().asText()
u'%25'
>>> u.asIRI().asIRI().asText()
u'%'
>>> 
```

Clearly, multiple trips through `asIRI` should not be un-escaping the escape character - the idea is that `.asIRI()` is a normalization step, that should be idempotent upon subsequent calls.

For the moment, I'm not sure exactly what the correct fix is here, but the property I'd really like to preserve is that for any `x`,

`URL.fromText(URL().child(x).<as many asIRI()s or asURI()s as you want>.asText()).<as many .asIRI()s as you want, although possibly not .asURI()s>.segments[0] == x`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reserved characters are treated inconsistently and not sensibly preserved #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

reserved characters are treated inconsistently and not sensibly preserved #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions