-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config fails to initialize #12
Comments
hint to reproduce: let's simulate that something is running on READTHEDOCS by setting the env variable , ensure the |
I've cloned this repo and executed the code according to the example and the steps to reproduce (with READTHEDOCS as env var). I got the same warning (6 times) I ran the same example using the serial executor and didn't get this warning. The number of newly created files is the same in both, parallel and serial. It seems the warning comes from here when content is 'None' (and it happens only when using parallel) |
does the error still happen if you don't set the READTHEDOCS? any conclusions why it happens with parallel? You might need to add some logging to debug this. |
so I noticed something, the error is saying the object is None, this happens when the YAML is an empty file. So looks like the file exists, but it's empty. maybe a race condition since multiple processes are trying to save the same file? |
The problem persists even without setting READTHEDOCS. I'm not sure where to begin to fix this. (I thought it will be in |
I think this might be a sklearn-evaluation-specific problem and have an idea of how to debug. Let me try something quick in the next few days and I'll post an update here |
update: this looks like a general problem. I thought it was only happening on sklearn-evaluation but it also happens when using ploomber. so please @yafimvo help us with this! |
@edublancas
So we can use it like this
It seems to fix these warnings. Some notes:
|
what is this lock doing? wouldn't it interfere with the parallel executor logic? i.e.g, if a task acquires a lock, other tasks won't run simultaneously? did you find out why this is happening? It seems to me that the error is in this package, and a fix should be applied here |
This lock should protect config.yaml and uid.yaml from being changed by other tasks. Even though it passed the existing tests, this implementation locks the entire task, so yes, I think this may affect the parallel executor (the lock will be released once the task has been completed).
I think it happens when 2 tasks run simultaneously, one creates the configuration file (config.yaml/uid.yaml), while the other task is trying to read from it. I think it completes the tasks successfully since we have this exception handler to re-load content from the dictionary
Can we just use the |
this is a big drawback for users, so we'll have to find another solution
ah. yes! I started overcomplicating, thinking we'd need a more elaborate solution, but you're right! This problem happens because the first process doesn't see the file, so it proceeds to create it, while the second sees the file but not its contents (since the writing operation is still in progress). However, under that scenario, it's safe to load I wanted to convince myself and tested the solution. I think it works but please review #12 |
Question: I noticed that configuration files might differ from task to task. When loading data directly from a file, are we sure it's the right configuration for this task? |
Yes, Config is an abstract class with an abstract core/src/ploomber_core/config.py Line 116 in e6e5cc3
|
I saw this in this example:
looks like the config fails to initialize and the file is never created
Maybe this is the right thing to do and it's just that some logic is wrong. I updated this package so it doesn't call posthog when it detects it's running on readthedocs (which we use to generate documentation) - perhaps that's causing the warning?
The text was updated successfully, but these errors were encountered: