Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update function libraries to use iteration with Vectors #3574

Closed
rcaudy opened this issue Mar 20, 2023 · 2 comments · Fixed by #3659
Closed

Update function libraries to use iteration with Vectors #3574

rcaudy opened this issue Mar 20, 2023 · 2 comments · Fixed by #3659
Assignees
Labels
core Core development tasks feature request New feature or request
Milestone

Comments

@rcaudy
Copy link
Member

rcaudy commented Mar 20, 2023

Once #3570 merges, we can and should change more or less all Vector iteration in our libraries to use iterator or stream methods, rather than looping on get calls. This is potentially far more efficient for some Vector types, especially those downstream of a groupBy, aggBy with Group, updateBy with RollingGroup, or rangeJoin with Group.

For example:

This code in Basic.ftl:

    static public <T> long countObj(ObjectVector<T> values) {
        if (values == null) {
            return NULL_LONG;
        }

        final long n = values.size();
        long count = 0;

        for (long i = 0; i < n; i++) {
            T c = values.get(i);

            if (!isNull(c)) {
                count++;
            }
        }

        return count;
    }

should be:

    static public <T> long countObj(ObjectVector<T> values) {
        if (values == null) {
            return NULL_LONG;
        }

        if (values.isEmpty()) {
            return 0;
        }

        long count = 0;
        for (final Iterator<T> vi = values.iterator(); vi.hasNext(); ){
            final T c = vi.next();

            if (!isNull(c)) {
                count++;
            }
        }

        return count;
    }

We could also write it like this, but it's probably worse:

    static public <T> long countObj(ObjectVector<T> values) {
        if (values == null) {
            return NULL_LONG;
        }

        if (values.isEmpty()) {
            return 0;
        }

        return values.iterator().stream().filter(c -> !isNull(c)).count();
    }

A related change we should also make: Avoid using Vector get when array access will do:

This code in Basic.ftl:

    static public <T> T[] replaceIfNull(ObjectVector<T> values, T replacement) {
        final int n = values.intSize("replaceIfNull");
        T[] result = values.toArray();

        for (int i = 0; i < n; i++) {
            result[i] = replaceIfNull(values.get(i), replacement);
        }

        return result;
    }

should be:

    static public <T> T[] replaceIfNull(ObjectVector<T> values, T replacement) {
        final int n = values.intSize("replaceIfNull");
        T[] result = values.toArray();

        for (int i = 0; i < n; i++) {
            result[i] = replaceIfNull(result[i], replacement);
        }

        return result;
    }
@rcaudy rcaudy added feature request New feature or request core Core development tasks labels Mar 20, 2023
@rcaudy rcaudy added this to the Apr 2023 milestone Mar 20, 2023
@rcaudy
Copy link
Member Author

rcaudy commented Mar 21, 2023

The pre-requisite PR has merged.

@rcaudy
Copy link
Member Author

rcaudy commented Apr 13, 2023

Loosely related to #3689

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core development tasks feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants