Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Storage #12119

Merged
merged 47 commits into from
Jan 27, 2025
Merged

Refactoring Storage #12119

merged 47 commits into from
Jan 27, 2025

Conversation

jdunkerley
Copy link
Member

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

@jdunkerley jdunkerley added the CI: No changelog needed Do not require a changelog entry for this PR. label Jan 25, 2025
@jdunkerley jdunkerley marked this pull request as ready for review January 26, 2025 21:58
Comment on lines +95 to +103
} else if (storage instanceof ColumnDoubleStorage doubleStorage) {
long n = doubleStorage.getSize();
for (long i = 0; i < n; i++) {
if (storage.isNothing(i)) {
appendNulls(1);
} else {
appendDouble(doubleStorage.getItemAsDouble(i));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About these: I'm a little bit slightly worried that because currently all (most?) storages are the 'base' e.g. DoubleStorage, this more 'abstract' variant will rarely run and so is not covered that much by tests.

This is a recurring problem that there are often more edge cases in the implementation of these low-level operations than seems feasible to have high-level tests (especially also ones shared with DB which are often slower).

Not for this PR but perhaps we need to add some 'unit' tests to std-table that test the internals more, to reinforce our correctness here?

Copy link
Member Author

@jdunkerley jdunkerley Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No DoubleStorage is a ColumnDoubleStorage so this will run.
All the typed storages are their specific ColumnStorage.

  • TypedStorage<T> => ColumnStorage<T>
  • BooleanStorage => ColumnStorage<Boolean> and ColumnBooleanStorage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on some tests on the builders to check we get the correct interfaces out makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No DoubleStorage is a ColumnDoubleStorage so this will run. All the typed storages are their specific ColumnStorage.

No, because above it is a more specific branch that has a specialization for DoubleStorage specifically. Which is good. But that means that specialization above runs, not this generic code. That was my point above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok - sorry yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt sensible to add these in at this point but will revisit as pull operations out one by one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be used pretty quickly as ConstantColumnDouble would be a ColumnDoubleStorage for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should have some java unit tests for these type of internals. I had a similar problem when I was looking to change the expression lanaguage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number parser had a load of JUnit tests. @radeusgd would you be able to look at adding running JUnit tests at some point?

Comment on lines 46 to 50
@Override
protected SpecializedStorage<String> newInstance(String[] data, int size) {
return new StringStorage(data, size, type);
public TextType getType() {
// As the type is fixed, we can safely cast it.
return (TextType) super.getType();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to make the Type also part of the generic arguments of SpecializedStorage to avoid these casts.

Please treat as a very optional suggestion

Comment on lines +285 to +284
@Override
public Storage<Long> appendNulls(int count) {
final AbstractLongStorage parent = this;
int size = (int) parent.getSize();
return new ComputedNullableLongStorage(size + count) {
@Override
protected Long computeItem(int idx) {
if (idx < size) {
return parent.getItemBoxed(idx);
} else {
return null;
}
}
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem with this PR as this flaw has existed before (so we can create a separate ticket if you prefer), but you made me realize it.

Every call to appendNulls creates a new ComputedNullableLongStorage layer. In our builders we are very often calling appendNulls(1) multiple times. This will create a lot of layers to actually get to the item.

I think we need to keep an overload inside of ComputedNullableLongStorage that will ensure such appendNulls will be kept flattened - instead of creating new 'layer', it should create a new ComputedNullableLongStorage that points to the original parent, thus keeping the 'layering depth' bounded at at most 1.

Essentially same logic as we plan with masks in the future.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Would be good to avoid the layering problem in ComputedNullableLongStorage::appendNulls, but this problem probably existed before so seems OK to handle it seprately.

@jdunkerley jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Jan 27, 2025
@mergify mergify bot merged commit f25164d into develop Jan 27, 2025
41 of 44 checks passed
@mergify mergify bot deleted the wip/jd/operations-1 branch January 27, 2025 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants