Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add column to crawl_history table to capture how long command took #679

Closed
birdsarah opened this issue Jun 3, 2020 · 10 comments · Fixed by #799
Closed

Add column to crawl_history table to capture how long command took #679

birdsarah opened this issue Jun 3, 2020 · 10 comments · Fixed by #799
Assignees
Labels
feature-request good-first-bug Bugs that are good for a first-time committer to tackle

Comments

@birdsarah
Copy link
Contributor

birdsarah commented Jun 3, 2020

I'd like to get some kind of idea of how long different commands are taking.
My motivation is to be able to debug "performance" when instrumenting lots of APIs. I'm thinking about a very crude python timer added around here:
https://github.com/mozilla/OpenWPM/blob/a811af1c9c1f8efa4b9d2a008c6cbb3e5be6d1e8/automation/TaskManager.py#L469
Obviously this number will need to be understood carefully, but I still think it'll be useful.

The other place I think it could be useful in the future is in quantifying the amount of time we spend in "finalize". Now that we can introspect this part of the crawl, it feels like an area we may want to tune in the future in order to optimize cost of crawling.

For this issue to be closed the following things need to be done:

  1. Add a duration column of type int to the crawl_history table both for the SQL as well as for the Parquet schema
  2. Add
    t1 = time.time()
    before
    https://github.com/mozilla/OpenWPM/blob/a811af1c9c1f8efa4b9d2a008c6cbb3e5be6d1e8/automation/TaskManager.py#L469
  3. Add another line here
    https://github.com/mozilla/OpenWPM/blob/a811af1c9c1f8efa4b9d2a008c6cbb3e5be6d1e8/automation/TaskManager.py#L521-L537
    saying
"duration": time.time()-t1
@birdsarah birdsarah self-assigned this Jun 5, 2020
@vringar vringar added good-first-bug Bugs that are good for a first-time committer to tackle feature-request labels Jun 29, 2020
@vringar

This comment has been minimized.

@vringar
Copy link
Contributor

vringar commented Nov 4, 2020

@ankushduacodes Is this something you'd be interested in working on?

@ankushduacodes
Copy link
Contributor

@vringar For sure!... Do you have any pointer for me to look for?

@vringar
Copy link
Contributor

vringar commented Nov 4, 2020

I'd suggest you first fix the watchdog before having a look at this. However can you tell me what is unclear to you after reading the issues description?

@vringar vringar assigned ankushduacodes and unassigned birdsarah Nov 4, 2020
@ankushduacodes
Copy link
Contributor

@vringar I am gonna start working on this one now.

@ankushduacodes
Copy link
Contributor

@vringar I added a new column named duration to both SQL and Parquet schema
and I have also added the time.time() lines in accordance with issue description.

But when I try to run demo.py file, I get a sql operational error saying <class 'sqlite3.OperationalError'> table crawl_history has no column named duration

What could be the reason? How can I rebuild the schema?

@vringar
Copy link
Contributor

vringar commented Nov 12, 2020

You need to delete the existing database for the table to be rewritten. So consider deleting or moving the database on your Desktop. Then everything should work.

@ankushduacodes
Copy link
Contributor

ankushduacodes commented Nov 12, 2020

@vringar Shouldn't the new duration column be of type float instead of int? Because return type of time.time() method is float

@vringar
Copy link
Contributor

vringar commented Nov 12, 2020

I think casting these floats to ints is acceptable since dealing with int values is a lot nicer when having to do data analysis.
Maybe use time_ns and then save the result in milliseconds.

@ankushduacodes
Copy link
Contributor

@vringar I have made a pull request regarding this, Please let me know if any changes are required....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request good-first-bug Bugs that are good for a first-time committer to tackle
Projects
Status: Done
3 participants