-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Bootstrap.get_bootstrap_from_recipes() so it's smarter and deterministic, fixes #1875 #1887
Conversation
Thanks @Jonast for the pull request 🙏 |
@AndreMiras added it to the test! If I understand @opacam 's suggestion basically is "don't return the first alphabetical one but something better", but this is strictly speaking not related to the determinism issue and also something that doesn't concern |
tests/test_bootstrap.py
Outdated
["empty", "sdl2", "service_only", "webview"] | ||
expected_bootstraps = sorted( | ||
["empty", "service_only", "webview", "sdl2"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestion to improve the readability of this block:
listdir_ret = ["empty", "service_only", "webview", "sdl2"]
mock_listdir.return_value = listdir_ret
expected_bootstraps = sorted(listdir_ret)
That way we make it clear that we're talking about the same list once sorted once not. Because here you somehow shuffled your lists and at first sight we don't see they're the same or why they're shuffled.
tests/test_bootstrap.py
Outdated
|
||
# Make sure order remains stable even if listdir() order varies: | ||
mock_listdir.return_value = \ | ||
["empty", "webview", "sdl2", "service_only"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: then you could use listdir_ret.reverse()
base don the suggestion above
Yes @Jonast I think you have a fair point, thanks |
IMHO, It does not make sense to me to return the results sorted in this case. I think that we will only add useless code...so I would rather do it once and well done (with some logic I mean), based on recipes, something like:
If we apply the |
Yeah @opacam I can see your point, maybe |
@opacam ok I made something new which fixes |
Yes agree guys, I also thought this functionally should be a set. But then I started imagining all the refactor that would require and I thought "OK maybe not for now" 😛 |
'''Find all the available bootstraps and return them.''' | ||
forbidden_dirs = ('__pycache__', 'common') | ||
bootstraps_dir = join(dirname(__file__), 'bootstraps') | ||
result = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes also thought "this should be a set". But the fact that it was an iterator before made me more cautious. But seeing the code I'm not seeing why not being an iterator would be an issue. The result
set would be anything enormous, right, so no risk to load it on the stack, correct?
tests/test_bootstrap.py
Outdated
# Test all alternatives are listed (they won't have dependencies | ||
# expanded since expand_dependencies() is too simplistic): | ||
expanded_result_2 = expand_dependencies([("pysdl2", "kivy")], self.ctx) | ||
assert([["pysdl2"], ["kivy"]] == expanded_result_2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: assert
is still a keyword, not a function 😄
If you like the function stile you can also use the self.assertEqual()
or self.assertTrue()
. That would also make it more consistent with the style used in this file
|
||
# Special rule: return "webview" if we depend on common web recipe: | ||
for possible_web_dep in known_web_packages: | ||
if have_dependency_in_recipes(possible_web_dep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thoughts without deep diving into it. So I'm not sure if we could get any better, but the time complexity of this one is concerning me.
Let me know if I'm seeing this correctly or not.
have_dependency_in_recipes()
itself seems O(n²)
because we have a in
in a for
loop.
Then we call have_dependency_in_recipes()
itself in another for loop so that gives us a O(n³)
.
If by any chance dep_list
is a set
then the if dep in dep_list:
is not O(n)
anymore, but O(1)
. So then the overall complexity would be O(n²)
.
pythonforandroid/bootstrap.py
Outdated
for name in listdir(bootstraps_dir): | ||
if name in forbidden_dirs: | ||
continue | ||
filen = join(bootstraps_dir, name) | ||
if isdir(filen): | ||
yield name | ||
result.add(name) | ||
return result | ||
|
||
@classmethod | ||
def get_bootstrap_from_recipes(cls, recipes, ctx): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like this method is big now and could be split furthermore by concerns, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I don't see it 😂 (this seems to be my default response to this 😂 😂 ) the previous one you suggested I finally split was due to nesting not to length, I think "it's big" is still a somewhat poor criteria (on its own) for a split. UNLESS you want to be able to test the sub parts separately of course, although I'm not sure that is necessary...? I would personally leave it as is 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely big is a poor and arbitrary criteria, just like the 80 char lines of the pep. It probably started from some feelings and adjusted over time/researches. That "it's big" feeling would vary from one to another of course.
I wanted to share the feeling before looking for solution to see maybe luckily something super clever would have come to your or other people mind 😄
Edit:
So yes took a look at the metrics quickly. Before that method was 40 lines, and at the time of my comment it was 100. Hence my surprise. More specifically if I have to scroll to see the entirety of a method, I start to feel like something is off 😛
So definitely taking the inner function out would help I think. But probably there's more that could be done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then see taking a quick look I already see some possible split since the comments and the logs say it all, for instance:
# Find out which bootstraps are acceptable:
Then a big block and then:
info('Found {} acceptable bootstraps: {}
For me that definitely feels like a split. Not only it will make things more readable:
acceptable_bootstraps = find_acceptable_bootstraps(...)
But also make it more unit testable 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and it also obscures the control flow and introduces interdependencies and more interfaces making changes later harder. I definitely see your point but I don't think chasing that far after tests is necessarily good, so if you ask me I think this is better off as it is UNTIL we actually want to use it separately, or it reaches a length where this is more warranted. It is also not such a deeply complex and core functionality that I'd think this hurts us a lot not being tested more granularly.
But as usual, I'm happy to leave the final decision to you guys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's seemed like an easy refactor to me, but OK 😢
As usual I'm nitpicking, but won't block a PR for something like this. So if you don't feel like it let's leave it for now. Maybe at some point it bugs so much that I'll do it myself.
Thanks for discussing it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh no I haven't convinced you even in the slightest have I 😱 oh dear. maybe I'm wrong then, I'll split it up 👍 I mean I can see your side a little, and in the long run I feel like you had so far a slightly better grasp at writing nice code than I have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's so nice of you 😊
Thank you @Jonast !
|
||
if prioritized_acceptable_bootstraps: | ||
info('Using the highest ranked/first of these: {}' | ||
.format(prioritized_acceptable_bootstraps[0].name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The message say of these
so I was expecting to see the list. But we actually show the one being picked up, which is fine, but then the log wording should change a bit
prioritized_acceptable_bootstraps = sorted( | ||
list(acceptable_bootstraps), | ||
key=functools.cmp_to_key(bootstrap_priority) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very interesting helper function. I understand you didn't want to expose it because it's a util function only being used in get_bootstrap_from_recipes()
, but then:
- it makes the overall
get_bootstrap_from_recipes()
method larger - the
bootstrap_priority()
helper is not unit testable
Maybe we should define it as a private top function inside the file bootstrap.py
file? I have the feeling it would make the overall thing more readable and more testable
for entry in recipes: | ||
if not isinstance(entry, (tuple, list)) or len(entry) == 1: | ||
if isinstance(entry, (tuple, list)): | ||
entry = entry[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's something fishy here 🐟 😄
First it doesn't seem covered, second it means expand_dependencies()
would be a bit polymorphic in a way which can lead to unexpected behavior or code not so easy to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with polymorphic? This is the usual behavior of how all graph lists work, in fact I did not introduce this at all but just stuck to it as it already behaved (see the lower loop.) So no, pretty sure this is as it should be. (Dependencies can be a string or a tuple of alternatives, that is how things are used everywhere)
The coverage is a good point though, let me look into that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply. Great that you will look at the coverage, who knows maybe we will come across another old bug or dead code 😄
I made a first review round, your approach looks promising, thank you for giving it a try 👏 |
@Jonast, oh yeah, this looks far better, thanks for doing this 😄 Note: I have took a superficial look and it looks good, I only miss one thing, I think that maybe we should add some documentation about this new method to find the right bootstrap via user recipes...maybe a note in |
pythonforandroid/bootstrap.py
Outdated
|
||
|
||
def _cmp_bootstraps_by_priority(a, b): | ||
global default_recipe_priorities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing me, why do you need the global
keyword if you're not updating the reference of the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh then I need to remove it!! I somehow thought it would be less confusing to make sure people realize it's not a local var, but I guess this is more confusing isn't it? I'll remove it 😀
self.assertTrue(_cmp_bootstraps_by_priority( | ||
Bootstrap.get_bootstrap("service_only", self.ctx), | ||
Bootstrap.get_bootstrap("sdl2", self.ctx) | ||
) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks!
Then we can write a simple test for covering the alphabetic sorting return (a.name - b.name) # alphabetic sort for determinism
So if basically if I understood correctly, we just need to comp two bootstraps that're not in default_recipe_priorities
list. And if we don't have any, we can just patch the list to make it empty.
Would that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, neat idea! I'll add that, thanks 😍
Okay so I think I worked pretty much everything in (if not I probably forgot, poke me). In my tests it worked, so maybe it would be a good time to check out if this is worth integrating, after all it is quite self-contained and does some nice little local code improvements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thanks for your efforts and patience ❤️
This should fix
Bootstrap.get_bootstrap_from_recipes()
not being deterministic, which fixes #1875