Query crawler #686

kompotkot · 2022-10-25T09:57:27Z

Changes

Fetch query from journal, validate query, execute query and push to bucket if required.

How to test these changes?

Tested locally

Related issues

kompotkot · 2022-10-25T10:29:34Z

@bugout-dev check

Add env MOONSTREAM_S3_DATA_BUCKET
Add env MOONSTREAM_S3_DATA_BUCKET_PREFIX

zomglings

Reconsidered crawler architecture.

zomglings · 2022-10-25T11:44:33Z

crawlers/mooncrawl/mooncrawl/queries_crawler/cli.py

+logger = logging.getLogger(__name__)
+
+
+def parser_queries_execute_handler(args: argparse.Namespace) -> None:


After discussing with @kompotkot and @Andrei-Dolgolev:

We realized that we want this crawler to go through the Query API like any other user would.

The plan is to remove this mooncrawl/queries_crawler code and update the Moonstream Python client to provide this functionality similarly to how we used it in autocorns biologist:
https://github.com/bugout-dev/autocorns/blob/cf00fb492de254821730a256d238d5a332810db6/autocorns/biologist.py#L371

We can remove the existing Moonstream Python client, bump the client version, and publish the new client.

kompotkot · 2022-10-25T11:57:05Z

We need to cherry pick from this PR later.

zomglings · 2022-10-25T11:59:04Z

Although we will not use queries_crawler to crawl public data, we will use it as a replacement for the existing Query API data producer (which is currently in mooncrawl/stats_worker/queries.py.

The queries_crawler should be a CLI which exposes the same functionality but in a more modular way (and should be invoked from systemd on prod).

We will revisit this after our current batch of urgent work.

Query crawler

c3fd463

kompotkot added the crawlers Crawlers module label Oct 25, 2022

kompotkot requested a review from a team October 25, 2022 09:57

Output fix

a729fb5

Exec db query time log

e7fead3

zomglings reviewed Oct 25, 2022

View reviewed changes

kompotkot closed this Oct 25, 2022

kompotkot deleted the crawler-queries-cu branch October 25, 2022 11:55

zomglings restored the crawler-queries-cu branch October 25, 2022 11:57

zomglings reopened this Oct 25, 2022

kompotkot marked this pull request as draft October 28, 2022 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query crawler #686

Query crawler #686

kompotkot commented Oct 25, 2022

kompotkot commented Oct 25, 2022 •

edited

Loading

zomglings left a comment

zomglings Oct 25, 2022

kompotkot commented Oct 25, 2022

zomglings commented Oct 25, 2022

		logger = logging.getLogger(__name__)


		def parser_queries_execute_handler(args: argparse.Namespace) -> None:

Query crawler #686

Are you sure you want to change the base?

Query crawler #686

Conversation

kompotkot commented Oct 25, 2022

Changes

How to test these changes?

Related issues

kompotkot commented Oct 25, 2022 • edited Loading

zomglings left a comment

Choose a reason for hiding this comment

zomglings Oct 25, 2022

Choose a reason for hiding this comment

kompotkot commented Oct 25, 2022

zomglings commented Oct 25, 2022

kompotkot commented Oct 25, 2022 •

edited

Loading