Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery returns NULLABLE mode for a field that has a REQUIRED mode (field created as a NOT NULL from the beginning) #3166

Closed
anneyxa opened this issue Feb 23, 2024 · 1 comment
Assignees
Labels
api: bigquery Issues related to the googleapis/java-bigquery API.

Comments

@anneyxa
Copy link

anneyxa commented Feb 23, 2024

Environment details

BigQuery Client Library for Java: google-cloud-bigquery version 2.37.2 and below

I'm using Java 11.

Steps to reproduce

  1. Create a table in BigQuery console (or in any other way) with a NOT NULL field, i.e. CREATE TABLE sample.TEST_TABLE ( C1 INT64 NOT NULL, C2 BIGNUMERIC );
  2. Use BigQuery Client Library for Java to execute select statement for that table. Received schema should report that C1 column has mode REQUIRED, while it reports the mode as NULLABLE.

Code example

// create a com.google.cloud.bigquery.BigQuery object, i.e.:
// BigQuery bigQuery = BigQueryOptions.newBuilder()......

String query = "SELECT * FROM sample.TEST_TABLE";
Job job = bigQuery.create(JobInfo.of(QueryJobConfiguration.newBuilder(query).build()));
TableResult result = job.getQueryResults();
Schema schema = result.getSchema();

// Print the schema, or print the fields:
List<Field> fields = schema.getFields();
for (Field field : fields) {
    System.out.println("Field Mode: " + field.getMode());
    // C1 column which was created as NOT NULL and has REQUIRED mode in BigQuery console 
    // is being returned here as mode NULLABLE
}

// OR

// create a com.google.cloud.bigquery.BigQuery object, i.e.:
// BigQuery bigQuery = BigQueryOptions.newBuilder()......

String query = "SELECT * FROM sample.TEST_TABLE";
QueryJobConfiguration.Builder configuration = QueryJobConfiguration.newBuilder(query).setDryRun(true);
Schema schema = ((QueryStatistics) bigQuery.create(JobInfo.of(configuration.build())).getStatistics()).getSchema();

System.out.println(schema);
// See at C1 column mode: Should be REQUIRED but is NULLABLE.

Prints out 2 nullable fields.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/java-bigquery API. label Feb 23, 2024
@whuffman36 whuffman36 self-assigned this Oct 24, 2024
@whuffman36
Copy link
Contributor

When a query is sent to BigQuery to be run, an anonymous table is created to hold the query results. This anonymous table is separate from the existing table created in the console (or any other method). When the anonymous table is created, all fields are set to be NULLABLE by default as it is not always possible to guarantee that the values in the anonymous table would not be NULL. This anonymous table is created and used specifically to hold the results of a query, and does not affect or modify the existing table used to run the query.

In the above code examples, the schema is actually read from the anonymous table rather than the permanent existing table created in the console. We can see this as it is being constructed and copied from the job:

TableResult result = job.getQueryResults();

Because of this, the schema will always have all field modes set to NULLABLE, regardless of the schema of the original table. This is intended behavior.

In order to get the schema of the original table, you can use other methods that specify the dataset and table IDs. For example:

BigQuery bq = BigQueryOptions.newBuilder().setProjectId(<your-project-id>).build().getService();
Table table = bq.getTable(<your-dataset-id>, <your-table-id>);
Schema schema = Schema.fromPb(table.toPb().getSchema());

This schema refers to the original table and will be consistent with the schema specified in the console. Normally, the fact that the anonymous table is created in the background is not very relevant to the user, but it is important to note that the query results are stored in a table separate from the table used to actually run the query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/java-bigquery API.
Projects
None yet
Development

No branches or pull requests

2 participants