Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use timestamp in aggregate table #2

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,8 +210,8 @@ Aggregate of [https_crawl](#https_crawl) that creates latest crawl sessions base
|requests|Number of comparison requests actually made during the crawl session|integer||
|session_request_limit|The number of comparisons wanted for the session|integer||
|is_redirect|Whether the domain was actually crawled or is a redirect from another host in the table that was crawled|boolean||
|max_https_crawl_id|https_crawl.id of last comparison made during crawl session|bigint||
|redirect_hosts|key/value pairs of hosts and the number of redirects to it|jsonb||
|updated|When last updated|timestamp with time zone||

#### https_upgrade_metrics

Expand Down
16 changes: 8 additions & 8 deletions https_crawl.pl
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,6 @@ sub crawl_sites{
mixed_requests
max_ss_diff
redirects
max_id
requests
is_redirect
redirect_hosts'
Expand Down Expand Up @@ -412,18 +411,18 @@ sub prep_db {
domain,
https,
http_and_https,
https_errs, http,
https_errs,
http,
unknown,
autoupgrade,
mixed_requests,
max_screenshot_diff,
redirects,
max_https_crawl_id,
requests,
is_redirect,
redirect_hosts,
session_request_limit)
values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,$CC{URLS_PER_SITE})
values (?,?,?,?,?,?,?,?,?,?,?,?,?,$CC{URLS_PER_SITE})
on conflict (domain) do update set (
https,
http_and_https,
Expand All @@ -434,11 +433,11 @@ sub prep_db {
mixed_requests,
max_screenshot_diff,
redirects,
max_https_crawl_id,
requests,
is_redirect,
redirect_hosts,
session_request_limit
session_request_limit,
updated
) = (
EXCLUDED.https,
EXCLUDED.http_and_https,
Expand All @@ -449,11 +448,12 @@ sub prep_db {
EXCLUDED.mixed_requests,
EXCLUDED.max_screenshot_diff,
EXCLUDED.redirects,
EXCLUDED.max_https_crawl_id,
EXCLUDED.requests,
EXCLUDED.is_redirect,
EXCLUDED.redirect_hosts,
EXCLUDED.session_request_limit)
EXCLUDED.session_request_limit,
now()
)
where
EXCLUDED.is_redirect = false or
https_crawl_aggregate.is_redirect = true
Expand Down
4 changes: 2 additions & 2 deletions sql/https_crawl_aggregate.sql
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ CREATE TABLE https_crawl_aggregate (
requests integer NOT NULL,
session_request_limit integer NOT NULL,
is_redirect boolean DEFAULT false NOT NULL,
max_https_crawl_id bigint NOT NULL,
redirect_hosts jsonb
redirect_hosts jsonb,
updated timestamp with time zone DEFAULT now() NOT NULL
);


Expand Down