Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathname objects are treated as IO and corrupt input on parse #2110

Closed
doriantaylor opened this issue Nov 11, 2020 · 5 comments
Closed

Pathname objects are treated as IO and corrupt input on parse #2110

doriantaylor opened this issue Nov 11, 2020 · 5 comments
Milestone

Comments

@doriantaylor
Copy link
Contributor

I just ran into a situation where Pathname objects get treated as IO instead of a file path.

The problem is that a Pathname object will be passed to io_read_callback where read is called on it, but since Pathname objects have no state, the first ~4000 (4096?) bytes of the file get read in over and over until silently kiboshed by the parser (assuming default flags).

The solution would be to put a branch in the parser logic to pull out Pathname objects for special treatment, either by flattening them to locations or explicitly opening them as file handles. This is a straightforward fix I am offering to do; I am registering this issue beforehand to gauge interest.

@flavorjones
Copy link
Member

Hi @doriantaylor! Thanks for opening this issue.

There's already a PR open -- see #1821 -- which describes the problem and also outlines some of the challenges in solving it well. Maybe we can have the conversation about this approach there?

@doriantaylor
Copy link
Contributor Author

ah damn shoulda checked. ;) okay schlepping over there…

@doriantaylor
Copy link
Contributor Author

Reopening this issue because that other patch is silly. Looks like master fails a bunch of tests. What should I branch against?

@flavorjones
Copy link
Member

@doriantaylor Hey again.

that other patch is silly

Well, it wasn't what I wanted and I said so, but I'd like to encourage you to avoid using terms that imply absolute judgement on other folks' work, particularly when it's a submission from a first-time contributor. I want people to have a thoroughly enjoyable time when they start contributing! Thanks for understanding.

Looks like master fails a bunch of tests

I'm not sure I understand what you mean, master is green except for experimental TruffleRuby builds that are non-blocking: https://ci.nokogiri.org/teams/nokogiri-core/pipelines/nokogiri

It looks like you submitted a PR at #2111 so we can continue the conversation there!

@doriantaylor
Copy link
Contributor Author

Apologies, I just meant there wasn't enough of it to save. As for master failing tests, that might just be my side. See you in #2111.

flavorjones pushed a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
flavorjones added a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
- no need to resolve Pathname in Nokogiri.parse
- added test coverage for XML::Document and HTML::Document#parse
- moved resolution of Pathname to after `url` is set in HTML::Document#parse
- open files with implict "r" because "b" assumes ASCII-8BIT encoding

Part of sparklemotion#2110, sparklemotion#1821
flavorjones added a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
@flavorjones flavorjones added this to the v1.11.0 milestone Nov 12, 2020
flavorjones pushed a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
flavorjones added a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
- no need to resolve Pathname in Nokogiri.parse
- added test coverage for XML::Document and HTML::Document#parse
- moved resolution of Pathname to after `url` is set in HTML::Document#parse
- open files with implict "r" because "b" assumes ASCII-8BIT encoding

Part of sparklemotion#2110, sparklemotion#1821
flavorjones added a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
- added test coverage for objects that respond to `#path`
- added test coverage for `Pathname` specifically

Related to sparklemotion#2110.
flavorjones added a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
flavorjones added a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
- added test coverage for objects that respond to `#path`
- added test coverage for `Pathname` specifically

Related to sparklemotion#2110.
flavorjones added a commit to doriantaylor/nokogiri that referenced this issue Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants