Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block store duplicates lots of data #3085

Closed
4 tasks done
Tracked by #3038
AndrewSisley opened this issue Sep 30, 2024 · 0 comments · Fixed by #3096
Closed
4 tasks done
Tracked by #3038

Block store duplicates lots of data #3085

AndrewSisley opened this issue Sep 30, 2024 · 0 comments · Fixed by #3096
Assignees
Labels
area/db-system Related to the core system related components of the DB perf Performance issue or suggestion
Milestone

Comments

@AndrewSisley
Copy link
Contributor

AndrewSisley commented Sep 30, 2024

There is a lot of duplicate data held in the blockstore, and we can considerably shrink the size if we want to.

Note: The block store accounts for the vast majority of Defra's storage requirements. Bigger blocks also means slower writes, reads, and more network traffic.

These issues also permit structural data-divergance, where the duplication of data creates a structure where the data on the parent may diverge with the data on the child. It also prevents parents from containing more than one unique value (e.g. a composite of multiple fields, or multiple documents).

Testing also finds that property name length does impact the storage size given our current encoding, we may wish to reduce this too (or, less likely, change the encoding so field name has no impact on size).

Tasks

Preview Give feedback
  1. area/db-system code quality perf refactor
  2. area/db-system perf
  3. area/db-system perf
  4. area/db-system perf
    AndrewSisley
@AndrewSisley AndrewSisley added perf Performance issue or suggestion area/db-system Related to the core system related components of the DB labels Sep 30, 2024
@AndrewSisley AndrewSisley self-assigned this Sep 30, 2024
@AndrewSisley AndrewSisley added this to the DefraDB v0.14 milestone Sep 30, 2024
ChrisBQu pushed a commit to ChrisBQu/defradb that referenced this issue Feb 21, 2025
## Relevant issue(s)

Resolves sourcenetwork#3085 sourcenetwork#3089

Documents sourcenetwork#3056 sourcenetwork#3086 sourcenetwork#3087 (I'm going to close these on merge, no need
to have them littering the backlog)

## Description

Removes the duplication of head links from delete blocks.

PR also includes the following to save the hassle of multiple test-cid
updates:
- Removes `fieldName` from composite block deltas
- Removes the magic `_head` link name, and extracts head links to a new,
optional prop
- Documents the reasons for duplicating various bits of data in the
blockstore blocks as discussed in standup

With the actions defined in
`TestQueryCommitsWithFieldIDFieldWithUpdate`, create block size has been
reduced by 4%, and update block size by 7% - this will vary a lot
depending on what fields are being updated though, the test chosen to
calc was just the first test I found that created one small doc, and
updated a single field.

I recommend reviewing commit by commit. The test-cid changes have been
pulled out to their own commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/db-system Related to the core system related components of the DB perf Performance issue or suggestion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant