Skip to content

Common dimensions in schema #361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

benjeffery
Copy link
Contributor

Part of #351

Thought I would get a quick review here before I fix up the tests. I've had to add side effects to convert_local_allele_field_types so that it adds the dimensions it is using to the schema, feels a bit messy.

Copy link
Contributor

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight ick factor to modifying the schema like this, but I think this is a good step in the right direction. We do want a Dimension class though

bio2zarr/vcz.py Outdated

if dimensions is None:
self.dimensions = {
"variants": {"size": 0, "chunk_size": 1000},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define a simple dataclass for Dimension rather than programming with nested dicts

@benjeffery benjeffery changed the title WIP - common dimensions in schema Ccommon dimensions in schema Apr 25, 2025
@benjeffery benjeffery changed the title Ccommon dimensions in schema Common dimensions in schema Apr 25, 2025
@benjeffery benjeffery marked this pull request as ready for review April 25, 2025 13:26
@benjeffery
Copy link
Contributor Author

Should be ready for review now.

@coveralls
Copy link
Collaborator

coveralls commented Apr 25, 2025

Coverage Status

coverage: 97.974% (-0.4%) from 98.414%
when pulling 8915f73 on benjeffery:dimensions-schema
into ac46a92 on sgkit-dev:main.

@benjeffery benjeffery mentioned this pull request Apr 25, 2025
Copy link
Contributor

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but I think we need to fix the default chunk size distribution problem somehow

self.samples_chunk_size = samples_chunk_size
if dimensions is None:
dimensions = {
"variants": VcfZarrDimension(size=0, chunk_size=1000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're distributing this default to three different locations around the code, I think we need to consolidate this somehow

@tomwhite
Copy link
Contributor

This should go in before #360 I think as it provides a better foundation for doing the work there. I've started changing #360 so it works with the changes in this PR (the two currently conflict).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants