-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-8123] Add cloudpickle as optional library #15472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
39a964d
wrapped pickler so that pickler is chosen
ryanthompson591 24b110a
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 c1f8608
changes to pickler
ryanthompson591 954a46f
Update unit tests and correctly add global setter
ryanthompson591 9947720
updated cloudpickle_pickler_test
ryanthompson591 dfa434a
added scope test
ryanthompson591 c8ff50c
updated some comments
ryanthompson591 a6475e5
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 54d1c1b
Allow absl library not to be present.
ryanthompson591 36556fc
allow multiple error types
ryanthompson591 5d7de36
removed absl flags import
ryanthompson591 9d5587b
add cloudpickle dependency
ryanthompson591 7c92db4
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 2d91d60
applied valentyn comments. remove dill dependencies
ryanthompson591 eaa17ab
linted file
ryanthompson591 9eed711
change cloudpickle max requirement
ryanthompson591 62c7d0d
fixed pickle lib typo
ryanthompson591 c5a9df0
fixed pickle lib typo 2
ryanthompson591 771235b
sets the default pickler
ryanthompson591 64b74f4
revert last change
ryanthompson591 28abd0e
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 4344529
fixed pickle lib typo
ryanthompson591 6381aab
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 d842061
fixed typo
ryanthompson591 b8d249e
revert wordcount
ryanthompson591 ea2cd96
added pipeline options
ryanthompson591 87f39f0
added pickler setting to worker
ryanthompson591 7339c29
added setup options
ryanthompson591 9611520
removed base_image_requirements to merge with master branch
ryanthompson591 2eea347
Upgraded arguement names and simplify pickle changing interface in pi…
ryanthompson591 7cb995f
updated function name and param name in sdk_worker_main
ryanthompson591 fcdb9c5
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 56304fa
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 a2012cb
updated base_image_requirements
ryanthompson591 a9a9784
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 b078766
linted to remove space in default arg'
ryanthompson591 db0c27e
Added cloudpickle to requirements
ryanthompson591 adb7ae4
linted
ryanthompson591 3400ef4
change dill reference in coders
ryanthompson591 8cde88c
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 e848b7a
only import lock if it can be imported otherwise ignore
ryanthompson591 e57e987
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 dd56672
linted tests removed unused variables changed line size
ryanthompson591 7fb9f53
moved imports, reverted file that wasnt changed
ryanthompson591 9f08887
changed import order
ryanthompson591 032c631
trying a small fix
ryanthompson591 92e3072
removed rlock pickling
ryanthompson591 1c66fc3
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 94a9c96
added to change file
ryanthompson591 66c8069
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 4ec71c3
Minor wording suggestion.
tvalentyn f42051f
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 915c5c6
Merge branch 'apache:master' into pickle-prototype
ryanthompson591 165c8d0
merged
ryanthompson591 f7b2682
Update CHANGES.md
ryanthompson591 31aba48
merged change.md changes
ryanthompson591 f20ce43
merged
ryanthompson591 4f617b9
linted again
ryanthompson591 4e0a330
removed file that is also removed in head, not sure why git keeps bri…
ryanthompson591 232ecac
removed changes that shouldnt be relevant to this pr
ryanthompson591 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
111 changes: 111 additions & 0 deletions
111
sdks/python/apache_beam/internal/cloudpickle_pickler.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # | ||
| # Licensed to the Apache Software Foundation (ASF) under one or more | ||
| # contributor license agreements. See the NOTICE file distributed with | ||
| # this work for additional information regarding copyright ownership. | ||
| # The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| # (the "License"); you may not use this file except in compliance with | ||
| # the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # | ||
|
|
||
| """Pickler for values, functions, and classes. | ||
|
|
||
| For internal use only. No backwards compatibility guarantees. | ||
|
|
||
| Uses the cloudpickle library to pickle data, functions, lambdas | ||
| and classes. | ||
|
|
||
| dump_session and load_session are no-ops. | ||
| """ | ||
|
|
||
| # pytype: skip-file | ||
|
|
||
| import base64 | ||
| import bz2 | ||
| import io | ||
| import threading | ||
| import zlib | ||
|
|
||
| import cloudpickle | ||
|
|
||
| try: | ||
| from absl import flags | ||
| except (ImportError, ModuleNotFoundError): | ||
| pass | ||
|
|
||
| # Pickling, especially unpickling, causes broken module imports on Python 3 | ||
| # if executed concurrently, see: BEAM-8651, http://bugs.python.org/issue38884. | ||
| _pickle_lock = threading.RLock() | ||
|
|
||
|
|
||
| def dumps(o, enable_trace=True, use_zlib=False): | ||
| # type: (...) -> bytes | ||
|
|
||
| """For internal use only; no backwards-compatibility guarantees.""" | ||
| with _pickle_lock: | ||
| with io.BytesIO() as file: | ||
| pickler = cloudpickle.CloudPickler(file) | ||
| try: | ||
| pickler.dispatch_table[type(flags.FLAGS)] = _pickle_absl_flags | ||
| except NameError: | ||
| pass | ||
| pickler.dump(o) | ||
| s = file.getvalue() | ||
|
|
||
| # Compress as compactly as possible (compresslevel=9) to decrease peak memory | ||
| # usage (of multiple in-memory copies) and to avoid hitting protocol buffer | ||
| # limits. | ||
| # WARNING: Be cautious about compressor change since it can lead to pipeline | ||
| # representation change, and can break streaming job update compatibility on | ||
| # runners such as Dataflow. | ||
| if use_zlib: | ||
| c = zlib.compress(s, 9) | ||
| else: | ||
| c = bz2.compress(s, compresslevel=9) | ||
| del s # Free up some possibly large and no-longer-needed memory. | ||
|
|
||
| return base64.b64encode(c) | ||
|
|
||
|
|
||
| def loads(encoded, enable_trace=True, use_zlib=False): | ||
| """For internal use only; no backwards-compatibility guarantees.""" | ||
|
|
||
| c = base64.b64decode(encoded) | ||
|
|
||
| if use_zlib: | ||
| s = zlib.decompress(c) | ||
| else: | ||
| s = bz2.decompress(c) | ||
|
|
||
| del c # Free up some possibly large and no-longer-needed memory. | ||
|
|
||
| with _pickle_lock: | ||
| unpickled = cloudpickle.loads(s) | ||
| return unpickled | ||
|
|
||
|
|
||
| def _pickle_absl_flags(obj): | ||
| return _create_absl_flags, tuple([]) | ||
|
|
||
|
|
||
| def _create_absl_flags(): | ||
| return flags.FLAGS | ||
|
|
||
|
|
||
| def dump_session(file_path): | ||
| # It is possible to dump session with cloudpickle. However, since references | ||
| # are saved it should not be necessary. See https://s.apache.org/beam-picklers | ||
| pass | ||
|
|
||
|
|
||
| def load_session(file_path): | ||
| # It is possible to load_session with cloudpickle. However, since references | ||
| # are saved it should not be necessary. See https://s.apache.org/beam-picklers | ||
| pass | ||
112 changes: 112 additions & 0 deletions
112
sdks/python/apache_beam/internal/cloudpickle_pickler_test.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # | ||
| # Licensed to the Apache Software Foundation (ASF) under one or more | ||
| # contributor license agreements. See the NOTICE file distributed with | ||
| # this work for additional information regarding copyright ownership. | ||
| # The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| # (the "License"); you may not use this file except in compliance with | ||
| # the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # | ||
|
|
||
| """Unit tests for the cloudpickle_pickler module.""" | ||
|
|
||
| # pytype: skip-file | ||
|
|
||
| import sys | ||
| import types | ||
| import unittest | ||
|
|
||
| from apache_beam.internal import module_test | ||
| from apache_beam.internal.cloudpickle_pickler import dumps | ||
| from apache_beam.internal.cloudpickle_pickler import loads | ||
|
|
||
|
|
||
| class PicklerTest(unittest.TestCase): | ||
|
|
||
| NO_MAPPINGPROXYTYPE = not hasattr(types, "MappingProxyType") | ||
|
|
||
| def test_basics(self): | ||
| self.assertEqual([1, 'a', (u'z', )], loads(dumps([1, 'a', (u'z', )]))) | ||
| fun = lambda x: 'xyz-%s' % x | ||
| self.assertEqual('xyz-abc', loads(dumps(fun))('abc')) | ||
|
|
||
| def test_lambda_with_globals(self): | ||
| """Tests that the globals of a function are preserved.""" | ||
|
|
||
| # The point of the test is that the lambda being called after unpickling | ||
| # relies on having the re module being loaded. | ||
| self.assertEqual(['abc', 'def'], | ||
| loads(dumps( | ||
| module_test.get_lambda_with_globals()))('abc def')) | ||
|
|
||
| def test_lambda_with_main_globals(self): | ||
| self.assertEqual(unittest, loads(dumps(lambda: unittest))()) | ||
|
|
||
| def test_lambda_with_closure(self): | ||
| """Tests that the closure of a function is preserved.""" | ||
| self.assertEqual( | ||
| 'closure: abc', | ||
| loads(dumps(module_test.get_lambda_with_closure('abc')))()) | ||
|
|
||
| def test_class_object_pickled(self): | ||
| self.assertEqual(['abc', 'def'], | ||
| loads(dumps(module_test.Xyz))().foo('abc def')) | ||
|
|
||
| def test_class_instance_pickled(self): | ||
| self.assertEqual(['abc', 'def'], | ||
| loads(dumps(module_test.XYZ_OBJECT)).foo('abc def')) | ||
|
|
||
| def test_pickling_preserves_closure_of_a_function(self): | ||
| self.assertEqual( | ||
| 'X:abc', loads(dumps(module_test.TopClass.NestedClass('abc'))).datum) | ||
| self.assertEqual( | ||
| 'Y:abc', | ||
| loads(dumps(module_test.TopClass.MiddleClass.NestedClass('abc'))).datum) | ||
|
|
||
| def test_pickle_dynamic_class(self): | ||
| self.assertEqual( | ||
| 'Z:abc', loads(dumps(module_test.create_class('abc'))).get()) | ||
|
|
||
| def test_generators(self): | ||
| with self.assertRaises(TypeError): | ||
| dumps((_ for _ in range(10))) | ||
|
|
||
| def test_recursive_class(self): | ||
| self.assertEqual( | ||
| 'RecursiveClass:abc', | ||
| loads(dumps(module_test.RecursiveClass('abc').datum))) | ||
|
|
||
| def test_function_with_external_reference(self): | ||
| out_of_scope_var = 'expected_value' | ||
|
|
||
| def foo(): | ||
| return out_of_scope_var | ||
|
|
||
| self.assertEqual('expected_value', loads(dumps(foo))()) | ||
|
|
||
| @unittest.skipIf(NO_MAPPINGPROXYTYPE, 'test if MappingProxyType introduced') | ||
tvalentyn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| def test_dump_and_load_mapping_proxy(self): | ||
| self.assertEqual( | ||
| 'def', loads(dumps(types.MappingProxyType({'abc': 'def'})))['abc']) | ||
| self.assertEqual( | ||
| types.MappingProxyType, type(loads(dumps(types.MappingProxyType({}))))) | ||
|
|
||
| # pylint: disable=exec-used | ||
| @unittest.skipIf(sys.version_info < (3, 7), 'Python 3.7 or above only') | ||
| def test_dataclass(self): | ||
| exec( | ||
| ''' | ||
| from apache_beam.internal.module_test import DataClass | ||
| self.assertEqual(DataClass(datum='abc'), loads(dumps(DataClass(datum='abc')))) | ||
| ''') | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| unittest.main() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.