Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCP] [BigQuery] Handle totalBytesProcessed NoneType #27474

Merged
merged 4 commits into from
Jul 21, 2023

Conversation

ohaibbq
Copy link
Contributor

@ohaibbq ohaibbq commented Jul 12, 2023

fixes #22701

Some queries may not have access to totalBytesProcessed as a result of row-level security.

Per their docs:

BigQuery hides sensitive statistics on all queries against tables with row-level security.

If any maintainer has some advice on where a good place to implement tests for this is, please let me know :)

@ohaibbq
Copy link
Contributor Author

ohaibbq commented Jul 12, 2023

In addition to the report in #22701, we started seeing the same failure in our pipelines.

Error message from worker: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1571, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1454, in process
    for part, size in self.restriction_provider.split_and_size(
  File "/usr/local/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 331, in split_and_size
    for part in self.split(element, restriction):
  File "/usr/local/lib/python3.9/site-packages/apache_beam/io/iobase.py", line 1641, in split
    estimated_size = restriction.source().estimate_size()
  File "/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py", line 870, in estimate_size
    size = int(job.statistics.totalBytesProcessed)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

Merging #27474 (61225e0) into master (b54bf52) will increase coverage by 0.04%.
The diff coverage is 33.33%.

❗ Current head 61225e0 differs from pull request most recent head 8eddf8d. Consider uploading reports for the commit 8eddf8d to get more accurate results

@@            Coverage Diff             @@
##           master   #27474      +/-   ##
==========================================
+ Coverage   71.12%   71.17%   +0.04%     
==========================================
  Files         860      861       +1     
  Lines      104573   104523      -50     
==========================================
+ Hits        74378    74390      +12     
+ Misses      28638    28585      -53     
+ Partials     1557     1548       -9     
Flag Coverage Δ
python 80.37% <33.33%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/gcp/bigquery.py 70.46% <33.33%> (-0.16%) ⬇️

... and 28 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @AnandInguva for label python.
R: @ahmedabu98 for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

1 similar comment
@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @AnandInguva for label python.
R: @ahmedabu98 for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @AnandInguva @ahmedabu98

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM and is in line with BoundedSource documentation:

Returns:
estimated size of the source if the size can be determined, ``None``
otherwise.

@Abacn Abacn merged commit 98cef8b into apache:master Jul 21, 2023
@ohaibbq
Copy link
Contributor Author

ohaibbq commented Jul 24, 2023

@ahmedabu98 @Abacn Thank you for the approval and merge ❤️
Do we have an estimated idea of when 2.50.0 will be released?

@Abacn
Copy link
Contributor

Abacn commented Jul 25, 2023

Hi, 2.50.0 is scheduled in early September

cushon pushed a commit to cushon/beam that referenced this pull request May 24, 2024
* [GCP] [BigQuery] Handle totalBytesProcessed NoneType

* Update CHANGES.md

* lint / whitespace

---------

Co-authored-by: Yi Hu <yathu@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: BigQuery Row Level Security Reading Data
3 participants