-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite PackageFinder’s _sort_locations() and related parsing logic #5846
Comments
Related to my last review comment, it might be good to have solid end-to-end test coverage prior to doing the refactor. If you have structured a good end-to-end test well, you could add test cases to the list as you are investigating various code paths and finding edge cases, without too much more work. That will build confidence as you go. |
Also, for On a separate, more general note (for the larger task), one of Donald's suggestions was to take a sans io approach. That would give you another axis of separation when deciding what and how to extract things out (e.g. pure IO functions vs. pure processing). |
There are actually some tests around this in |
Further to #5827, surely dropping a local directory path passed to |
PEP 503 implies the endpoint should be specified by URLs. It is indeed more helpful to treat paths the same as |
PEP 503 only specifies an HTTP-based protocol. It says nothing about server-side filesystem layout. The words "directory" and "index.html" do not appear anywhere. So what does the pip documentation mean by "a local directory laid out in the same format", when the PEP does not actually specify a format at all? The reasonable (and useful) interpretation is that pip will emulate the behavior of common web servers, namely:
|
@cjerdonek I think we can close this but I'll let you have the fun of clicking close on this. :) |
Given that I've not seen much activity from @cjerdonek lately, I'll take the liberty of closing this issue. :) |
Related discussions:
#5800
#5827
#5836 (comment)
#5838 (comment)
#5838 (comment)
I am looking into how I can break apart
find_all_candidates()
and extract logic from it. The general logic in that function isn’t too difficult, but how inputs are grouped and interpreted is 🤯The logic is scattered between
_sort_locations()
and during fetching the HTML page. There are so many edge cases filtered out in one place or another, I don’t know how to extract anything without breaking others.I decided to take another approach. First, I list all possible input variants and possible outcomes, and compile a table of what happens with each kind of inputs. Then I’ll try to rewrite one function at a time to match the behaviour. Once inner functions are replaced, I can deal with
find_all_candidates()
with confidence.First up is
_sort_locations()
. Here’s the compiled table. Dependency links are not included, but @pradyunsg said it will be removed “pretty soon”, and I don’t think I can finish this before that happens.Other notes:
mimetypes.guess_type()
.schemes (e.g.
git+https:
) are allowed in_sort_locations()
, but droppedlater in
_get_html_page()
.The text was updated successfully, but these errors were encountered: