Skip to content

Conversation

@AliRana30
Copy link

@AliRana30 AliRana30 commented Jan 30, 2026

Rationale for This Change

The SimpleTable constructor in cpp/src/arrow/table.cc can crash if the first column provided is null. This happens because the constructor attempts to dereference columns[0] to determine the row count before performing any validation.
This change ensures safe initialization when columns are empty or the first column is null.


What Changes Are Included in This PR?

A null-pointer safety check has been added to both SimpleTable constructors to prevent dereferencing a null column during initialization:

if (columns.size() == 0 || !columns[0]) {
  num_rows_ = 0;
} else {
  num_rows_ = columns[0]->length();
}

Are These Changes Tested?

Yes. The change has been verified using the existing unit test suite.

Are There Any User-Facing Changes?

No. This change only improves internal safety and does not affect public APIs or user-visible behavior.

@github-actions
Copy link

⚠️ GitHub issue #49079 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting review Awaiting review label Jan 30, 2026
@AliRana30
Copy link
Author

AliRana30 commented Jan 31, 2026

Hi @kou ,can you have a look at this ??

@WillAyd
Copy link
Contributor

WillAyd commented Jan 31, 2026

Can you add a test for this?

@kou
Copy link
Member

kou commented Feb 1, 2026

I think that we must not accept columns[0] == nullptr. Caller must provide all valid chunked arrays or arrays.

If we want to check nullptr, we should do it in Table::Make() something like the following:

diff --git a/cpp/src/arrow/table.cc b/cpp/src/arrow/table.cc
index 68a8a1951f..2fd51d858e 100644
--- a/cpp/src/arrow/table.cc
+++ b/cpp/src/arrow/table.cc
@@ -260,12 +260,18 @@ std::vector<std::shared_ptr<Field>> Table::fields() const {
 std::shared_ptr<Table> Table::Make(std::shared_ptr<Schema> schema,
                                    std::vector<std::shared_ptr<ChunkedArray>> columns,
                                    int64_t num_rows) {
+  if (std::any_of(columns.begin(), columns.end(), [](const auto& column) {return bool(column);})) {
+    return Status::Invalid(...);
+  }
   return std::make_shared<SimpleTable>(std::move(schema), std::move(columns), num_rows);
 }
 
 std::shared_ptr<Table> Table::Make(std::shared_ptr<Schema> schema,
                                    const std::vector<std::shared_ptr<Array>>& arrays,
                                    int64_t num_rows) {
+  if (std::any_of(arrays.begin(), arrays.end(), [](const auto& array) {return bool(array);})) {
+    return Status::Invalid(...);
+  }
   return std::make_shared<SimpleTable>(std::move(schema), arrays, num_rows);
 }

@AliRana30
Copy link
Author

Hi @kou I have updated the PR to follow your suggestion. I've moved the validation to Table::Make using DCHECK to ensure that callers provide valid (non-null) arrays. I also reverted the change in the SimpleTable constructor to keep it consistent with the "no-nulls-accepted" policy.

Additionally, I've included a fix for the macOS CI failures caused by the missing pkg-config@0.29.2 formula in Homebrew.

@kou
Copy link
Member

kou commented Feb 1, 2026

Why did you use DCHECK instead of arrow::Status? We can use arrow::Status by changing return type of Table::Make to Result<std::shared_ptr<Table>> from std::shared_ptr<Table>. DCHECK is evaluated only with debug build. Is it intentional?

Additionally, I've included a fix for the macOS CI failures caused by the missing pkg-config@0.29.2 formula in Homebrew.

Could you work on it as a separated task (open a separated issue and PR) instead of mixing this PR? It's not related to this issue.

@AliRana30
Copy link
Author

AliRana30 commented Feb 1, 2026

Hi @kou,

Thank you for the feedback!

Regarding Table::Make and DCHECK:
You are absolutely right. I have refactored Table::Make (and Table::MakeEmpty) to return
Result<std::shared_ptr<Table>> instead of std::shared_ptr<Table>.

I have also replaced the DCHECK statements with explicit Status::Invalid returns to ensure validation happens in both debug and release builds. All call sites across the codebase have been updated to handle the new Result type, primarily using ARROW_ASSIGN_OR_RAISE, and .ValueOrDie() where appropriate in tests.

Regarding the macOS CI fix:
Apologies for mixing these changes in the same PR. I have removed the pkg-config related fix from this PR and will submit it as a separate issue and PR, as requested, to keep this refactor focused.

Please let me know if there are any other areas that need adjustments. Thanks again!

@AliRana30
Copy link
Author

Hi @WillAyd,

I've added the requested tests in cpp/src/arrow/table_test.cc to verify the fix.

Specifically, I added:

  • TestTable.MakeInvalidInputs: Verifies that Table::Make properly returns Status::Invalid (instead of crashing) when passed vectors containing nullptr.
  • Additional assertions in the AddColumn and SetColumn tests to ensure they also reject null inputs with Status::Invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants