-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use timestamp in aggregate table #2
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zach - Sorry to retract my approval, but I just realized we're still using max_https_crawl_id
in the internal repo. Is there a ddg.git PR to go along with this one?
Also, CONTRIBUTING.md
in this repo still references max_https_crawl_id
.
We will alter table and restart the service(s). Shouldn't need an additional PR. Updated CONTRIBUTING. |
My point was that some internal code still works with the removed field. If you drop https://dub.duckduckgo.com/duckduckgo/ddg/blob/bttf/components/https/create-https-queue.pl#L75 |
That specific script is on deck for an update beyond the ID. We can run it prior to merging the internal PR to give us a little leeway but, if necessary, the queue can be populated manually. |
It sounds like it's expected that this change makes the schema incompatible with the internal scripts like I get the motivation. It would be nice if the code was in a working state, at the very least because it makes it tougher for me to review. Would you like help cleaning it up? In particular, I'm thinking things like:
|
@aaronharsh Let me see if I can clear up the confusion.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [ ]
No longer need the crawl ID since we are aggregating directly. We usually need to decode it into a timestamp anyhow.