-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: add utility to check whether a string is a filepath #1085
Conversation
This might fix #1084 (but we need to add a test) |
There's an import urllib.request
url = "https://raw.githubusercontent.com/Chicago/osd-bike-routes/master/data/Bikeroutes.geojson"
bikeroutes_json = urllib.request.urlopen(url).read() fails. |
Let's use that instead — it's a more direct test. |
I didn't merge this because (as I understand it), it's waiting for a test. If I should add it to 1.5.0, let me know. Otherwise, there will probably be only one release candidate before the final release (but technically two "rc" releases in GitHub because it looks like the first one didn't trigger). |
Nope; that's because Azure isn't configured to deploy anymore—it's handled by GitHub Actions now, and I had forgotten. |
Oh, I was thinking that this could just be a test that attempts to download through HTTP, and skips if it fails. I don't think it would be the only test that requires network access. That way, we don't need to add a copyright message for Chicago's data. We're really just wanting it for its size (which could also be an auto-generated JSON, if we really want to avoid network access). |
I've added a test, but I can't test the previous behaviour against this test unless I poke around with the CI. Do you happen to have a Windows machine to hand? |
Right, I forgot to motivate why I downloaded a sample; the data are quite large. It's not a slow download, but it's enough that I don't think we want to add it to the test suite unless required. Are you not comfortable with including the license file? It's not infectious, and it's only in the test suite. |
The CI is our only access (both of us).
I considered it small enough to use in a classroom setting, and the downloads in CI are a lot faster than they are at home. (The CI machines are well-connected!) I just downloaded it in 0.3 seconds at home. Generating simple JSON that is as large as the bikeroutes file would be sufficient. The bikeroutes sample is 2.3 MB, and a list of numbers up to 1 million is 7.8 MB: json.dumps(np.arange(1000000).tolist()) Generating it and interpreting it as an Awkward Array takes 0.2 seconds: ak.Array(json.dumps(np.arange(1000000).tolist())) Without your fix, the above line should fail in Windows (CI). |
If we want to test the Windows case, then we only need 260 chars of JSON, although I can see the merit in re-using a "large" dataset. I will update with your suggestions. |
Oh yeah, and although the license is not infectious, simplifying things to the extent that no one has to evaluate whether that is the case is beneficial. I have a bias against including anything that requires a license, which can be overcome by a good enough reason: Lark has a good enough reason, as does pybind11, as well as RapidJSON... I just want to keep the list short, and this one test is doable without having to add the dataset. |
I didn't know the threshold, but I did know that the bikeroutes exceeded it. That's the thinking behind the 7 MB of JSON. |
cb50306
to
2d8fc58
Compare
Oh, I ignored py2 here because I was thinking about v2. However, it makes sense to merge this now - thanks for backporting it |
I didn't follow the
isint
/isstr
convention: I think we should rename those in a subsequent PR