Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starknet updates and fixes #57

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Starknet updates and fixes #57

wants to merge 2 commits into from

Conversation

Wizard1209
Copy link
Collaborator

@Wizard1209 Wizard1209 commented Apr 1, 2024

  1. add raw ingester(to jsonl.gz)
  2. add parquet writer from raw
  3. fix data loss in case 0 chunks were filled
  4. backport to starknet feature for worker to start indexing from last block that have data
  5. refactor _to_sync_gen usage and typing

@@ -166,8 +166,8 @@ def flush():
self._report()
last_report = current_time

if self._writer.buffered_bytes() > 0 and last_block == write_range[1]:
flush()
# NOTE: In case no chunks were made we still need to save data we received
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason behind abandoning the check above?

Although I don't remember why it exists, I know, that we don't want to write half sized chunk unless it is the last one in the specified range.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since bytesize counts as sum of chunks for all tables, if no chuncks were created this check leads to data loss for smaller data volumes

@tmcgroul
Copy link
Contributor

tmcgroul commented Apr 8, 2024

@eldargab
this pr follows the old approach when we have rpc data ingestion and raw data ingestion in a single program which behaviour depends on cli arguments. will we keep starknet as a unique case or it's better to follow a "usual workflow" when dump and ingest components are written in ts and use the same dependencies that are used for other networks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants