Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro for creating record batch from literal slice #12846

Merged
merged 3 commits into from
Oct 13, 2024

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Oct 10, 2024

Which issue does this PR close?

Closes #12574

Rationale for this change

This macro allows for easy creation of a record batch, useful for prototype and unit testing. A user requested this feature.

What changes are included in this PR?

Adds a macro to create a record batch. For example, the following creates a record batch with three arrays, a non nullable Int32, a nullable Float64, and a non nullable String.

let batch = create_batch!(
    ("a", Int32, vec![1, 2, 3]),
    ("b", Float64, vec![Some(4.0), None, Some(5.0)]),
    ("c", Utf8, vec!["alpha", "beta", "gamma"])
)?;

let ctx = SessionContext::new();

ctx.read_batch(batch)?.show().await

Yields:

+---+-----+-------+
| a | b   | c     |
+---+-----+-------+
| 1 | 4.0 | alpha |
| 2 |     | beta  |
| 3 | 5.0 | gamma |
+---+-----+-------+

Are these changes tested?

Tested both in the provided unit test and locally.

Are there any user-facing changes?

No

@github-actions github-actions bot added the common Related to common crate label Oct 10, 2024
@timsaucer timsaucer self-assigned this Oct 10, 2024
@alamb
Copy link
Contributor

alamb commented Oct 10, 2024

Maybe this is worth moving upstream to arrow-rs as well

@timsaucer timsaucer force-pushed the tsaucer/memtable-creation-macro branch from 50dfc06 to 9d6c45a Compare October 10, 2024 14:03
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 Thanks @timsaucer

I think it would be worth filing a ticket to port this upstream to arrow-rs eventually FWIW

};
}

/// Creates a record batch from literal slice of values, suitable for rapid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is beautiful. We can probably use it to clean up a bunch of the testing setup as well

(Float64, $values: expr) => {
std::sync::Arc::new(arrow::array::Float64Array::from($values))
};
(Utf8, $values: expr) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of other types that might be worth supporting too LargeUtf8, Utf8View, etc

Full list here:
https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html

We could definitely do it as a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added issue to track:
#12895

("c", Utf8, vec!["alpha", "beta", "gamma"])
);

assert!(batch.is_ok());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we assert the contents too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in last push

Comment on lines 416 to 420
let batch = create_batch!(
("a", Int32, vec![1, 2, 3]),
("b", Float64, vec![Some(4.0), None, Some(5.0)]),
("c", Utf8, vec!["alpha", "beta", "gamma"])
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

/// ("c", Utf8, vec!["alpha", "beta", "gamma"])
/// );
/// ```
#[macro_export]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this affect how the macro is imported? do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need it so we can use it in downstream repos as well.

@timsaucer
Copy link
Contributor Author

I put in a couple of small additions per feedback here and on the arrow-rs issue.

@alamb alamb merged commit ebfc155 into apache:main Oct 13, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 13, 2024

Thanks agian -- this looks great

hailelagi pushed a commit to hailelagi/datafusion that referenced this pull request Oct 14, 2024
* Add macro for creating record batch, useful for unit test or rapid development

* Update docstring

* Add additional checks in unit test and rename macro per user input
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concise API to create DataFrame from collection
3 participants