Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQueryExampleGen failing in InteractiveContext on 1.14.0 #6307

Closed
colinjc opened this issue Sep 19, 2023 · 7 comments
Closed

BigQueryExampleGen failing in InteractiveContext on 1.14.0 #6307

colinjc opened this issue Sep 19, 2023 · 7 comments

Comments

@colinjc
Copy link

colinjc commented Sep 19, 2023

System information

  • Have I specified the code to reproduce the issue (Yes, No): yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows),
    Interactive Notebook, Google Cloud, etc): Linux, Notebook
  • TensorFlow version:
  • TFX Version: 1.14.0
  • Python version: 2.9
  • Python dependencies (from pip freeze output):

Describe the current behavior

When using BigQueryExampleGen in a notebook using InteractiveContext it returns a JSON decoding error. This may also be true for pipelines executed on KFP, but I haven't tried it.

Describe the expected behavior

The query executes successfully

Standalone code to reproduce the issue

import tfx

example_gen = tfx.v1.extensions.google_cloud_big_query.BigQueryExampleGen(
    query="SELECT * FROM ...",
)
context.run(example_gen, enable_cache=False)

Name of your Organization (Optional)

Other info / logs

Adding custom_config="{}" works around this issue

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_1951224/1954825689.py in <cell line: 6>()
      4     query="SELECT * FROM sdp-stg-cti-data.intermediate.nits_training_set",
      5 )
----> 6 context.run(example_gen, enable_cache=False)

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/tfx/orchestration/experimental/interactive/notebook_utils.py in run_if_ipython(*args, **kwargs)
     29       # __IPYTHON__ variable is set by IPython, see
     30       # https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/reference.html#embedding-ipython.
---> 31       return fn(*args, **kwargs)
     32     else:
     33       logging.warning(

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run(self, component, enable_cache, beam_pipeline_args)
    162         telemetry_utils.LABEL_TFX_RUNNER: runner_label,
    163     }):
--> 164       execution_id = launcher.launch().execution_id
    165 
    166     return execution_result.ExecutionResult(

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/tfx/orchestration/launcher/base_component_launcher.py in launch(self)
    204       # be immutable in this context.
    205       # output_dict can still be changed, specifically properties.
--> 206       self._run_executor(execution_decision.execution_id,
    207                          copy.deepcopy(execution_decision.input_dict),
    208                          execution_decision.output_dict,

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties)
     71     # be immutable in this context.
     72     # output_dict can still be changed, specifically properties.
---> 73     executor.Do(
     74         copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/tfx/components/example_gen/base_example_gen_executor.py in Do(self, input_dict, output_dict, exec_properties)
    277     logging.info('Generating examples.')
    278     with self._make_beam_pipeline() as pipeline:
--> 279       example_splits = self.GenerateExamplesByBeam(pipeline, exec_properties)
    280 
    281       # pylint: disable=expression-not-assigned, no-value-for-parameter

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/tfx/components/example_gen/base_example_gen_executor.py in GenerateExamplesByBeam(self, pipeline, exec_properties)
    199         buckets.append(total_buckets)
    200       example_splits = (
--> 201           pipeline
    202           | 'InputToRecord' >>
    203           # pylint: disable=no-value-for-parameter

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/apache_beam/transforms/ptransform.py in __ror__(self, pvalueish, _unused)
   1090 
   1091   def __ror__(self, pvalueish, _unused=None):
-> 1092     return self.transform.__ror__(pvalueish, self.label)
   1093 
   1094   def expand(self, pvalue):

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/apache_beam/transforms/ptransform.py in __ror__(self, left, label)
    612     pvalueish = _SetInputPValues().visit(pvalueish, replacements)
    613     self.pipeline = p
--> 614     result = p.apply(self, pvalueish, label)
    615     if deferred:
    616       return result

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/apache_beam/pipeline.py in apply(self, transform, pvalueish, label)
    664       old_label, transform.label = transform.label, label
    665       try:
--> 666         return self.apply(transform, pvalueish)
    667       finally:
    668         transform.label = old_label

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/apache_beam/pipeline.py in apply(self, transform, pvalueish, label)
    710         transform.type_check_inputs(pvalueish)
    711 
--> 712       pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
    713 
    714       if type_options is not None and type_options.pipeline_type_check:

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/apache_beam/runners/runner.py in apply(self, transform, input, options)
    199            ):
    200     # TODO(robertwb): Remove indirection once internal references are fixed.
--> 201     return self.apply_PTransform(transform, input, options)
    202 
    203   def apply_PTransform(self, transform, input, options):

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/apache_beam/runners/runner.py in apply_PTransform(self, transform, input, options)
    203   def apply_PTransform(self, transform, input, options):
    204     # TODO(robertwb): Remove indirection once internal references are fixed.
--> 205     return transform.expand(input)
    206 
    207   def is_fnapi_compatible(self):

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/apache_beam/transforms/ptransform.py in expand(self, pcoll)
    994       # Might not be a function.
    995       pass
--> 996     return self._fn(pcoll, *args, **kwargs)
    997 
    998   def default_label(self):

~/.pyenv/virtualenvs/cti-ml-xgboost/3.9.15/lib/python3.9/site-packages/tfx/extensions/google_cloud_big_query/example_gen/executor.py in _BigQueryToExample(pipeline, exec_properties, split_pattern)
     67   converter = _BigQueryConverter(split_pattern, project)
     68   big_query_custom_config = (
---> 69       json.loads(exec_properties['custom_config'])
     70       if 'custom_config' in exec_properties
     71       else None

~/.pyenv/versions/3.9.15/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    337     else:
    338         if not isinstance(s, (bytes, bytearray)):
--> 339             raise TypeError(f'the JSON object must be str, bytes or bytearray, '
    340                             f'not {s.__class__.__name__}')
    341         s = s.decode(detect_encoding(s), 'surrogatepass')

TypeError: the JSON object must be str, bytes or bytearray, not NoneType
@singhniraj08
Copy link
Contributor

The BigQueryExampleGen executor code is written to accept custom_config in below format:
custom_config = json.dumps({'query_key': 'query_value'})

But, when we don't pass custom_config to BigQueryExampleGen, it fails with error: TypeError: the JSON object must be str, bytes or bytearray, not NoneType.

@colinjc, Thank you for bringing this to our attention.

@roseayeon , Loading BigQuery custom config fails when we don't pass custom_config param to BigQueryExampleGen or set custom_config=None or custom_config={}. Is this expected? Thanks

Ref: Screenshot
image

@lego0901
Copy link
Member

Oh, this is not an intended behavior. Let me fix it shortly.

@vamshikesireddy
Copy link

I am facing the same error while using BigQueryExampleGen. Any update on the fix?

@EagleDangar
Copy link

Facing the same error. Any update on the fix ?

@Saoussen-CH
Copy link

Saoussen-CH commented Jan 22, 2024

Hey, still not fixed yet ? @lego0901

Adding custom_config=json.dumps({}) works as a temporary fix

copybara-service bot pushed a commit that referenced this issue Jan 24, 2024
copybara-service bot pushed a commit that referenced this issue Jan 24, 2024
copybara-service bot pushed a commit that referenced this issue Feb 1, 2024
@lego0901
Copy link
Member

lego0901 commented Feb 1, 2024

This is fixed. Thanks for your all comments!

@lego0901 lego0901 closed this as completed Feb 1, 2024
copybara-service bot pushed a commit that referenced this issue Feb 1, 2024
Copy link
Contributor

github-actions bot commented Feb 1, 2024

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants