Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: add utility to check whether a string is a filepath #1085

Merged
merged 8 commits into from
Sep 27, 2021

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Sep 8, 2021

I didn't follow the isint / isstr convention: I think we should rename those in a subsequent PR

@agoose77
Copy link
Collaborator Author

agoose77 commented Sep 8, 2021

This might fix #1084 (but we need to add a test)

@jpivarski
Copy link
Member

There's an ak._util.iswin to check to see if you're on Windows and we know that the bikeroutes JSON is too long to be a Windows path. Such a test would require an HTTP connection, which should probably skip if the attempt to download with

import urllib.request

url = "https://raw.githubusercontent.com/Chicago/osd-bike-routes/master/data/Bikeroutes.geojson"
bikeroutes_json = urllib.request.urlopen(url).read()

fails.

@agoose77
Copy link
Collaborator Author

agoose77 commented Sep 8, 2021

There's an ak._util.iswin to check to see if you're on Windows and we know that the bikeroutes JSON is too long to be a Windows path. Such a test would require an HTTP connection, which should probably skip if the attempt to download with

import urllib.request

url = "https://raw.githubusercontent.com/Chicago/osd-bike-routes/master/data/Bikeroutes.geojson"
bikeroutes_json = urllib.request.urlopen(url).read()

fails.

Let's use that instead — it's a more direct test.

@jpivarski
Copy link
Member

I didn't merge this because (as I understand it), it's waiting for a test. If I should add it to 1.5.0, let me know. Otherwise, there will probably be only one release candidate before the final release (but technically two "rc" releases in GitHub because it looks like the first one didn't trigger).

@jpivarski
Copy link
Member

Nope; that's because Azure isn't configured to deploy anymore—it's handled by GitHub Actions now, and I had forgotten.

@jpivarski
Copy link
Member

Oh, I was thinking that this could just be a test that attempts to download through HTTP, and skips if it fails. I don't think it would be the only test that requires network access.

That way, we don't need to add a copyright message for Chicago's data. We're really just wanting it for its size (which could also be an auto-generated JSON, if we really want to avoid network access).

@agoose77
Copy link
Collaborator Author

agoose77 commented Sep 9, 2021

I've added a test, but I can't test the previous behaviour against this test unless I poke around with the CI. Do you happen to have a Windows machine to hand?

@agoose77
Copy link
Collaborator Author

agoose77 commented Sep 9, 2021

I've added a test, but I can't test the previous behaviour against this test unless I poke around with the CI. Do you happen to have a Windows machine to hand?

Oh, I was thinking that this could just be a test that attempts to download through HTTP, and skips if it fails. I don't think it would be the only test that requires network access.

That way, we don't need to add a copyright message for Chicago's data. We're really just wanting it for its size (which could also be an auto-generated JSON, if we really want to avoid network access).

Right, I forgot to motivate why I downloaded a sample; the data are quite large. It's not a slow download, but it's enough that I don't think we want to add it to the test suite unless required. Are you not comfortable with including the license file? It's not infectious, and it's only in the test suite.

@jpivarski
Copy link
Member

Do you happen to have a Windows machine to hand?

The CI is our only access (both of us).

Right, I forgot to motivate why I downloaded a sample; the data are quite large. It's not a slow download, but it's enough that I don't think we want to add it to the test suite unless required.

I considered it small enough to use in a classroom setting, and the downloads in CI are a lot faster than they are at home. (The CI machines are well-connected!) I just downloaded it in 0.3 seconds at home.

Generating simple JSON that is as large as the bikeroutes file would be sufficient. The bikeroutes sample is 2.3 MB, and a list of numbers up to 1 million is 7.8 MB:

json.dumps(np.arange(1000000).tolist())

Generating it and interpreting it as an Awkward Array takes 0.2 seconds:

ak.Array(json.dumps(np.arange(1000000).tolist()))

Without your fix, the above line should fail in Windows (CI).

@agoose77
Copy link
Collaborator Author

agoose77 commented Sep 9, 2021

If we want to test the Windows case, then we only need 260 chars of JSON, although I can see the merit in re-using a "large" dataset. I will update with your suggestions.

@jpivarski
Copy link
Member

Oh yeah, and although the license is not infectious, simplifying things to the extent that no one has to evaluate whether that is the case is beneficial. I have a bias against including anything that requires a license, which can be overcome by a good enough reason: Lark has a good enough reason, as does pybind11, as well as RapidJSON... I just want to keep the list short, and this one test is doable without having to add the dataset.

@jpivarski
Copy link
Member

260 chars of JSON

I didn't know the threshold, but I did know that the bikeroutes exceeded it. That's the thinking behind the 7 MB of JSON.

@jpivarski jpivarski enabled auto-merge (squash) September 27, 2021 22:39
@jpivarski jpivarski merged commit 1d0231c into main Sep 27, 2021
@jpivarski jpivarski deleted the fix-test-path-win branch September 27, 2021 23:34
@agoose77
Copy link
Collaborator Author

Oh, I ignored py2 here because I was thinking about v2. However, it makes sense to merge this now - thanks for backporting it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"path too long for Windows" when calling ak.from_json with a long json string on Windows
2 participants