Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on removing the py dependency for path handling #6130

Closed
boxed opened this issue Nov 4, 2019 · 7 comments
Closed

Discussion on removing the py dependency for path handling #6130

boxed opened this issue Nov 4, 2019 · 7 comments
Labels
type: backward compatibility might present some backward compatibility issues which should be carefully noted in the changelog type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature

Comments

@boxed
Copy link
Contributor

boxed commented Nov 4, 2019

I had a look again at pytest performance, and was struck again by how many stat calls pytest performs. This seems to boil down to using the path classes from py for checking file existence, walking directory structures and such. Unfortunately instances of these are passed to plugins so aren't internal implementation details.

I suggest we think of using plain strings as paths that are passed to plugins. This is obviously a breaking change so can't be done until pytest 6. It's not clear to me how we'd make deprecation warnings for this though :(

To take a concrete example of this being a problem, the test suite for the product I work on calls stat 79k times just for the collect phase. If I monkey patch stat to log the paths there are ~3k unique paths in the output. I can get a little performance boost by monkey patching stat to be cached:

orig_stat = os.stat

cache = {}


def monkey_stat(*args, **kwargs):
    a = args[0]
    if a in cache:
        return cache[a]
    r = orig_stat(*args, **kwargs)
    cache[a] = r

    return r

That this can improve the performance is pretty silly :P

It seems like pytest could just use os.walk once and then use that data for the rest of the run of the program...

@RonnyPfannschmidt
Copy link
Member

Killing the pylib dependency in general is a planned long term change

@blueyed
Copy link
Contributor

blueyed commented Nov 4, 2019

I suggest we think of using plain strings as paths that are passed to plugins.

We could have objects based on pathlib, with pylib methods mixed in that would issue a DeprecationWarning maybe.
But I am more in favor of just having a hard cut there.
If we have Path objects already it makes sense to pass them through, but using strings would be more future proof.

@Zac-HD Zac-HD added type: backward compatibility might present some backward compatibility issues which should be carefully noted in the changelog type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature labels Nov 5, 2019
@boxed
Copy link
Contributor Author

boxed commented Nov 5, 2019

In the middle of the night I remembered that the hooks have DI parameters, so it should be simple to start passing new arguments (with new names) and if the signature contains the old parameter names, issue the deprecation warning there. I think it would be nice if we split the parameter names for dirs and files too. Currently it's "path" and plugins have to check if it's a file if they are unsure. If it was "dirname" and "filename" it would be clearer.

If we also make the instantiation of the Path objects lazy at the call site and the core code uses pure strings, then the cutoff could be pretty clean and if you have no deprecated usage you would mostly not have the performance hit of the old stuff.

I also had the idea that we could go through the entire filesystem from the very beginning (respecting ignore patterns?) and then look at an in-memory data structure instead of asking the filesystem over and over. I don't think pytest can, and wants to, support the filesystem changing significantly under its feet during a test session anyway.

@The-Compiler
Copy link
Member

IIRC we discussed something like this (mainly for tmpdir) at the last pytest sprint in 2016 already. I think there, the plan was to have a wrapper layer over pathlib objects which provides a py.path-like API?

For tmpdir, we ended up having a separate tmp_path fixture instead, also see the existing discussion in #2230, #3985, #3988. I do think that's the better solution, as people coming from outside of pytest will be more used to pathlib rather than py.path.

Of course it remains to be seen if pathlib is more performant than py.path, but I'd bet that it is.

@boxed
Copy link
Contributor Author

boxed commented Nov 5, 2019

The py.path API isn't very good, isn't well documented and worse than that it's very different from the rest of the python ecosystem, so keeping it seems like a bad idea long term.

@RonnyPfannschmidt
Copy link
Member

@The-Compiler after initial expieriments a few years back im absolutely certain we should let the pylib api die
there wont be a wrapper, just backward compat properties that will eventually go

@blueyed
Copy link
Contributor

blueyed commented Nov 5, 2019

@boxed

I also had the idea that we could go through the entire filesystem from the very beginning (respecting ignore patterns?) and then look at an in-memory data structure instead of asking the filesystem over and over. I don't think pytest can, and wants to, support the filesystem changing significantly under its feet during a test session anyway.

Yeah, that sounds like a good idea in general (separate issue - please consider creating / owning it then).

@blueyed blueyed closed this as completed Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: backward compatibility might present some backward compatibility issues which should be carefully noted in the changelog type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature
Projects
None yet
Development

No branches or pull requests

5 participants