Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block store duplicates lots of data #3085

Closed
4 tasks done
Tracked by #3038
AndrewSisley opened this issue Sep 30, 2024 · 0 comments · Fixed by #3096
Closed
4 tasks done
Tracked by #3038

Block store duplicates lots of data #3085

AndrewSisley opened this issue Sep 30, 2024 · 0 comments · Fixed by #3096
Assignees
Labels
area/db-system Related to the core system related components of the DB perf Performance issue or suggestion
Milestone

Comments

@AndrewSisley
Copy link
Contributor

AndrewSisley commented Sep 30, 2024

There is a lot of duplicate data held in the blockstore, and we can considerably shrink the size if we want to.

Note: The block store accounts for the vast majority of Defra's storage requirements. Bigger blocks also means slower writes, reads, and more network traffic.

These issues also permit structural data-divergance, where the duplication of data creates a structure where the data on the parent may diverge with the data on the child. It also prevents parents from containing more than one unique value (e.g. a composite of multiple fields, or multiple documents).

Testing also finds that property name length does impact the storage size given our current encoding, we may wish to reduce this too (or, less likely, change the encoding so field name has no impact on size).

Tasks

Preview Give feedback
  1. area/db-system code quality perf refactor
  2. area/db-system perf
  3. area/db-system perf
  4. area/db-system perf
    AndrewSisley
@AndrewSisley AndrewSisley added perf Performance issue or suggestion area/db-system Related to the core system related components of the DB labels Sep 30, 2024
@AndrewSisley AndrewSisley self-assigned this Sep 30, 2024
@AndrewSisley AndrewSisley added this to the DefraDB v0.14 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/db-system Related to the core system related components of the DB perf Performance issue or suggestion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant