Use timestamp in aggregate table #2

zachthompson · 2019-11-19T20:31:08Z

No longer need the crawl ID since we are aggregating directly. We usually need to decode it into a timestamp anyhow.

aaronharsh

Looks good

aaronharsh

@zach - Sorry to retract my approval, but I just realized we're still using max_https_crawl_id in the internal repo. Is there a ddg.git PR to go along with this one?

Also, CONTRIBUTING.md in this repo still references max_https_crawl_id.

zachthompson · 2019-11-20T23:46:19Z

We will alter table and restart the service(s). Shouldn't need an additional PR.

Updated CONTRIBUTING.

aaronharsh · 2019-11-21T01:11:06Z

My point was that some internal code still works with the removed field. If you drop https_crawl_aggregate.max_https_crawl_id, won't that code break? For example:

https://dub.duckduckgo.com/duckduckgo/ddg/blob/bttf/components/https/create-https-queue.pl#L75

zachthompson · 2019-11-21T15:17:00Z

That specific script is on deck for an update beyond the ID. We can run it prior to merging the internal PR to give us a little leeway but, if necessary, the queue can be populated manually.

aaronharsh · 2019-11-21T17:05:14Z

It sounds like it's expected that this change makes the schema incompatible with the internal scripts like create-https-queue.pl. And you're trying to rush this change out to the public repo so that anyone else who forks the repo will get the updated scheme. Am I reading that right?

I get the motivation. It would be nice if the code was in a working state, at the very least because it makes it tougher for me to review. Would you like help cleaning it up? In particular, I'm thinking things like:

Get rid of the duplicate https_crawl.pl in components/https
Make all the scripts work with the new schema
Get rid of any unused scripts

zachthompson · 2019-11-25T21:41:54Z

@aaronharsh Let me see if I can clear up the confusion.

The code is in a working state. You can test this on a local Postgresql instance and it works fine.
It doesn't affect anything internal yet. The integration PR has the submodule pointed at a previous commit with crawl ID still there.
Once this is merged, I will open another PR internally to address any legacy references to crawl ID, bump the submodule commit, and we can verify it end-to-end internally.

Ishcode63

.

tonkla10032533

[ ]

zachthompson added 2 commits November 19, 2019 12:35

Store timestamp in aggregate table instead of crawl id

07243ab

Remove crawl id from conflict values

37a371f

zachthompson assigned aaronharsh Nov 19, 2019

aaronharsh approved these changes Nov 20, 2019

View reviewed changes

aaronharsh requested changes Nov 20, 2019

View reviewed changes

Update aggregate description with updated column

f2f04cf

Ishcode63 reviewed Jan 12, 2023

View reviewed changes

tonkla10032533 approved these changes Aug 1, 2024

View reviewed changes

Nosense9 approved these changes Sep 11, 2024

View reviewed changes

Garfield-FatOrangeKat approved these changes Oct 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use timestamp in aggregate table #2

Use timestamp in aggregate table #2

zachthompson commented Nov 19, 2019

aaronharsh left a comment

aaronharsh left a comment

zachthompson commented Nov 20, 2019

aaronharsh commented Nov 21, 2019 •

edited

Loading

zachthompson commented Nov 21, 2019

aaronharsh commented Nov 21, 2019

zachthompson commented Nov 25, 2019

Ishcode63 left a comment •

edited

Loading

tonkla10032533 left a comment •

edited

Loading

Use timestamp in aggregate table #2

Are you sure you want to change the base?

Use timestamp in aggregate table #2

Conversation

zachthompson commented Nov 19, 2019

aaronharsh left a comment

Choose a reason for hiding this comment

aaronharsh left a comment

Choose a reason for hiding this comment

zachthompson commented Nov 20, 2019

aaronharsh commented Nov 21, 2019 • edited Loading

zachthompson commented Nov 21, 2019

aaronharsh commented Nov 21, 2019

zachthompson commented Nov 25, 2019

Ishcode63 left a comment • edited Loading

Choose a reason for hiding this comment

tonkla10032533 left a comment • edited Loading

Choose a reason for hiding this comment

aaronharsh commented Nov 21, 2019 •

edited

Loading

Ishcode63 left a comment •

edited

Loading

tonkla10032533 left a comment •

edited

Loading