-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column to crawl_history table to capture how long command took #679
Comments
This comment has been minimized.
This comment has been minimized.
@ankushduacodes Is this something you'd be interested in working on? |
@vringar For sure!... Do you have any pointer for me to look for? |
I'd suggest you first fix the watchdog before having a look at this. However can you tell me what is unclear to you after reading the issues description? |
@vringar I am gonna start working on this one now. |
@vringar I added a new column named duration to both SQL and Parquet schema But when I try to run demo.py file, I get a sql operational error saying What could be the reason? How can I rebuild the schema? |
You need to delete the existing database for the table to be rewritten. So consider deleting or moving the database on your Desktop. Then everything should work. |
@vringar Shouldn't the new duration column be of type |
I think casting these |
@vringar I have made a pull request regarding this, Please let me know if any changes are required.... |
I'd like to get some kind of idea of how long different commands are taking.
My motivation is to be able to debug "performance" when instrumenting lots of APIs. I'm thinking about a very crude python timer added around here:
https://github.com/mozilla/OpenWPM/blob/a811af1c9c1f8efa4b9d2a008c6cbb3e5be6d1e8/automation/TaskManager.py#L469
Obviously this number will need to be understood carefully, but I still think it'll be useful.
The other place I think it could be useful in the future is in quantifying the amount of time we spend in "finalize". Now that we can introspect this part of the crawl, it feels like an area we may want to tune in the future in order to optimize cost of crawling.
For this issue to be closed the following things need to be done:
duration
column of typeint
to thecrawl_history
table both for the SQL as well as for the Parquet schemahttps://github.com/mozilla/OpenWPM/blob/a811af1c9c1f8efa4b9d2a008c6cbb3e5be6d1e8/automation/TaskManager.py#L469
https://github.com/mozilla/OpenWPM/blob/a811af1c9c1f8efa4b9d2a008c6cbb3e5be6d1e8/automation/TaskManager.py#L521-L537
saying
The text was updated successfully, but these errors were encountered: