Skip to content

Conversation

@marc-pydantic
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

As described in the issue, this is a low-effort QoL fix for now.

What changes are included in this PR?

Uses the existing function for naming fields to replace the hardcoded "is_set" with a field-dependent name. Example output:

Field {
    name: "first_value(records_partitioned.trace_id)[first_value]",
    data_type: Utf8View,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.trace_id)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value]",
    data_type: Int32,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},

Are these changes tested?

No tests have been added, hopefully it should be covered by existing changes.

Are there any user-facing changes?

There should not be any, I assume is_set is never user visible.

@github-actions github-actions bot added the functions Changes to functions implementation label Oct 27, 2025
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

@Jefffrey
Copy link
Contributor

Looks like need to make some accompanying changes to test in the codebase

…ould still have at least 1 duplicate field left
@marc-pydantic
Copy link
Contributor Author

I made the change. test_duplicate_state_fields_for_dfschema_construct is a bit weaker for it, but it still has one duplicate field left, namely timestamp@0.

@github-actions github-actions bot added the core Core DataFusion crate label Oct 27, 2025
@Jefffrey Jefffrey added this pull request to the merge queue Oct 28, 2025
Merged via the queue into apache:main with commit 6ee019e Oct 28, 2025
28 checks passed
@Jefffrey
Copy link
Contributor

Thanks @marc-pydantic

tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
…pache#18303)

## Which issue does this PR close?

- Closes apache#18302

## Rationale for this change

As described in the issue, this is a low-effort QoL fix for now.

## What changes are included in this PR?

Uses the existing function for naming fields to replace the hardcoded
`"is_set"` with a field-dependent name. Example output:

```
Field {
    name: "first_value(records_partitioned.trace_id)[first_value]",
    data_type: Utf8View,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.trace_id)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value]",
    data_type: Int32,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
```

## Are these changes tested?

No tests have been added, hopefully it should be covered by existing
changes.

## Are there any user-facing changes?

There should not be any, I assume `is_set` is never user visible.
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
…pache#18303)

## Which issue does this PR close?

- Closes apache#18302

## Rationale for this change

As described in the issue, this is a low-effort QoL fix for now.

## What changes are included in this PR?

Uses the existing function for naming fields to replace the hardcoded
`"is_set"` with a field-dependent name. Example output:

```
Field {
    name: "first_value(records_partitioned.trace_id)[first_value]",
    data_type: Utf8View,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.trace_id)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value]",
    data_type: Int32,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
Field {
    name: "first_value(records_partitioned.value)[first_value_is_set]",
    data_type: Boolean,
    nullable: true,
    dict_id: 0,
    dict_is_ordered: false,
    metadata: {},
},
```

## Are these changes tested?

No tests have been added, hopefully it should be covered by existing
changes.

## Are there any user-facing changes?

There should not be any, I assume `is_set` is never user visible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FirstValue and LastValue AggregateUDFImpl impls use unique name for data, but not flags

2 participants