-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a hidden bug in the function named update_frontier_nodes #27
Comments
Very nice catch! It's indeed a mistake. Fortunately I guess the meta problem for causing these bugs was we didn't properly unit test all the modules. Even this was some research code, I think this level of complexity already requires proper testing. It was an oversight and we should prevent it in next projects. |
Thanks for your reply! Also, the training procedure takes too much time. I‘ m running on a server with 4 12-core cpus, 256G mem and 2 tesla p40 (48G mem in total). However, one epoch takes approximate 40 secs (it takes less time without gpu) with only one worker. Currently I'm optimizing the code to improve efficiency. If a more efficient version of Decima is released, please tell me! |
We would suggest training with a smaller problem first, e.g., try reducing Let us know if you find a more efficient implementation or figure out which parts of the code are the bottleneck. Feel free to also submit pull requests too. Thanks a lot! |
I think there is a bug in the following function, defined in spark_env/job_dag.py:
What
self.frontier_nodes
stores are the nodes themselves, not their indices. Although this did not have a significant effect on the training results.The text was updated successfully, but these errors were encountered: