-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky tests #496
Comments
|
Windows & Python 3.7 Stack``` 1: ================================== FAILURES =================================== 1: _____________________ test_dynamic_strings_with_all_nones _____________________ 1: 1: lmdb_version_store = NativeVersionStore: Library: local.test_973_2023-06-27T15_10_01_939197, Primary Storage: lmdb_storage. 1: 1: def test_dynamic_strings_with_all_nones(lmdb_version_store): 1: df = pd.DataFrame({"x": [None, None]}) 1: > lmdb_version_store.write("strings", df, dynamic_strings=True) 1: 1: tests\integration\arcticdb\version_store\test_basic_version_store.py:776: 1: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1: 1: self = NativeVersionStore: Library: local.test_973_2023-06-27T15_10_01_939197, Primary Storage: lmdb_storage. 1: symbol = 'strings', data = x 1: 0 None 1: 1 None, metadata = None 1: prune_previous_version = False, pickle_on_failure = False 1: validate_index = False, kwargs = {'dynamic_strings': True} 1: proto_cfg = dynamic_strings: true 1: , dynamic_strings = True 1: recursive_normalizers = False, parallel = False, incomplete = False 1: coerce_columns = None, sparsify_floats = False 1: 1: def write( 1: self, 1: symbol: str, 1: data: Any, 1: metadata: Optional[Any] = None, 1: prune_previous_version: Optional[bool] = None, 1: pickle_on_failure: Optional[bool] = None, 1: validate_index: bool = False, 1: **kwargs, 1: ) -> Optional[VersionedItem]: 1: """ 1: Write `data` to the specified `symbol`. If `symbol` already exists then a new version will be created to 1: reference the newly written data. For more information on versions see the documentation for the `read` 1: primitive. 1: 1: Pandas DataFrames, Pandas Series and Numpy NDArrays will be normalised into a common structure suitable for 1: storage. Data that cannot be normalised can be written by pickling the data, however pickled data 1: consumes more storage space, is less performant for reads and writes and does not support advanced query 1: features. Pickling is therefore only supported via the `pickle_on_failure` flag. 1: 1: Normalised data will be divided into segments that are deduplicated against storage prior to write. As a result, 1: if `data` contains only slight changes compared to pre-existing versions only the delta will be written. 1: 1: Note that `write` is not designed for multiple concurrent writers over a single symbol. 1: 1: Note: ArcticDB will use the 0-th level index of the Pandas DataFrame for its on-disk index. 1: 1: Any non-`DatetimeIndex` will converted into an internal `RowCount` index. That is, ArcticDB will assign each 1: row a monotonically increasing integer identifier and that will be used for the index. 1: 1: Parameters 1: ---------- 1: symbol : `str` 1: Symbol name. Limited to 255 characters. The following characters are not supported in symbols: 1: "*", "&", "<", ">" 1: data : `Union[pd.DataFrame, pd.Series, np.array]` 1: Data to be written. 1: metadata : `Optional[Any]`, default=None 1: Optional metadata to persist along with the symbol. 1: prune_previous_version : `bool`, default=True 1: Removes previous (non-snapshotted) versions from the database. 1: pickle_on_failure: `bool`, default=False 1: Pickle `data` if it can't be normalized. 1: validate_index: bool, default=False 1: If True, will verify that the index of `data` supports date range searches and update operations. This in effect tests that the data is sorted in ascending order. 1: ArcticDB relies on Pandas to detect if data is sorted - you can call DataFrame.index.is_monotonic_increasing on your input DataFrame to see if Pandas believes the 1: data to be sorted 1: kwargs : 1: passed through to the write handler 1: 1: Returns 1: ------- 1: Optional[VersionedItem] 1: Structure containing metadata and version number of the written symbol in the store. 1: The data attribute will not be populated. 1: 1: Raises 1: ------ 1: UnsortedDataException 1: If data is unsorted, when validate_index is set to True. 1: 1: Examples 1: -------- 1: 1: >>> df = pd.DataFrame({'column': [5,6,7]}) 1: >>> lib.write("symbol", df, metadata={'my_dictionary': 'is_great'}) 1: >>> lib.read("symbol").data 1: column 1: 0 5 1: 1 6 1: 2 7 1: """ 1: self.check_symbol_validity(symbol) 1: proto_cfg = self._lib_cfg.lib_desc.version.write_options 1: 1: dynamic_strings = self._resolve_dynamic_strings(kwargs) 1: 1: pickle_on_failure = self.resolve_defaults( 1: "pickle_on_failure", proto_cfg, global_default=False, existing_value=pickle_on_failure, **kwargs 1: ) 1: prune_previous_version = self.resolve_defaults( 1: "prune_previous_version", proto_cfg, global_default=False, existing_value=prune_previous_version, **kwargs 1: ) 1: recursive_normalizers = self.resolve_defaults( 1: "recursive_normalizers", proto_cfg, global_default=False, uppercase=False, **kwargs 1: ) 1: parallel = self.resolve_defaults("parallel", proto_cfg, global_default=False, uppercase=False, **kwargs) 1: incomplete = self.resolve_defaults("incomplete", proto_cfg, global_default=False, uppercase=False, **kwargs) 1: 1: # TODO remove me when dynamic strings is the default everywhere 1: if parallel: 1: dynamic_strings = True 1: 1: coerce_columns = kwargs.get("coerce_columns", None) 1: sparsify_floats = kwargs.get("sparsify_floats", False) 1: 1: _handle_categorical_columns(symbol, data, False) 1: 1: log.debug( 1: "Writing with pickle_on_failure={}, prune_previous_version={}, recursive_normalizers={}", 1: pickle_on_failure, 1: prune_previous_version, 1: recursive_normalizers, 1: ) 1: 1: # Do a multi_key write if the structured is nested and is not trivially normalizable via msgpack. 1: if recursive_normalizers: 1: vit = self.try_flatten_and_write_composite_object( 1: symbol, data, metadata, pickle_on_failure, dynamic_strings 1: ) 1: if isinstance(vit, VersionedItem): 1: return vit 1: 1: udm, item, norm_meta = self._try_normalize( 1: symbol, data, metadata, pickle_on_failure, dynamic_strings, coerce_columns 1: ) 1: # TODO: allow_sparse for write_parallel / recursive normalizers as well. 1: if isinstance(item, NPDDataFrame): 1: if parallel: 1: self.version_store.write_parallel(symbol, item, norm_meta, udm) 1: return None 1: elif incomplete: 1: self.version_store.append_incomplete(symbol, item, norm_meta, udm) 1: return None 1: else: 1: vit = self.version_store.write_versioned_dataframe( 1: > symbol, item, norm_meta, udm, prune_previous_version, sparsify_floats, validate_index 1: ) 1: E arcticdb_ext.exceptions.InternalException: (mdb_dbi_open: MDB_INCOMPATIBLE: Operation and DB incompatible, or DB flags changed) ```Looking at the errors for this one, my recent LMDB fixes, which are in PR, might help resolve it. |
The cause was type_arithmetic_promoted_type would return int64 as the common type for uint64 and any signed int. When we then do the static_cast<WideType>(*ptr++) before calling the Is[Not]InOperator, the uint64 is converted to int64 and the special overloads in the Operators are never used. + Ability to only_test_encoding_version_v1 in a test/class/module
The cause was type_arithmetic_promoted_type would return int64 as the common type for uint64 and any signed int. When we then do the static_cast<WideType>(*ptr++) before calling the Is[Not]InOperator, the uint64 is converted to int64 and the special overloads in the Operators are never used. + Ability to only_test_encoding_version_v1 in a test/class/module
The cause was type_arithmetic_promoted_type would return int64 as the common type for uint64 and any signed int. When we then do the static_cast<WideType>(*ptr++) before calling the Is[Not]InOperator, the uint64 is converted to int64 and the special overloads in the Operators are never used. + Ability to only_test_encoding_version_v1 in a test/class/module
The cause was type_arithmetic_promoted_type would return int64 as the common type for uint64 and any signed int. When we then do the static_cast<WideType>(*ptr++) before calling the Is[Not]InOperator, the uint64 is converted to int64 and the special overloads in the Operators are never used. + Ability to only_test_encoding_version_v1 in a test/class/module
@qc00 To summarise, after what I'm going to call round 1 of the test fixing, we'll be left with:
Is that about right? |
There's also |
The cause was type_arithmetic_promoted_type would return int64 as the common type for uint64 and any signed int64. When we then do the static_cast<WideType>(*ptr++) before calling the Is[Not]InOperator, the uint64 is converted to int64 and the special overloads in the Operators are never used. Additional changes: + Ability to only_test_encoding_version_v1 in a test/class/module + Attempt to fix another flaky test `test_read_ts`
PR #1087 is adding fixes for:
It is also adding xfails for:
We will continue to monitor the following and will not be xfail them:
|
#### Reference Issues/PRs Fix for issue #496 #### What does this implement or fix? Change tmpdir to tmp_path because according to the pytest docs, the tmpdir is depreciated and tmp_path is the way to create temporary paths that are safe to use in multiprocessing setups such as pytest-split and pytest-xdist
#### Reference Issues/PRs Part of issue #496 #### What does this implement or fix? Looks like some unneeded xfails have creeped up in master, probably due to a bad merge on my part. This PR removes them.
> ???
E AssertionError: DataFrame.iloc[:, 0] (column name="a") are different
E
E DataFrame.iloc[:, 0] (column name="a") values are different (100.0 %)
E [index]: [0]
E [left]: [inf]
E [right]: [nan]
E At positional index 0, first diff: inf != nan
E Falsifying example: test_hypothesis_mean_agg(
E lmdb_version_store=NativeVersionStore: Library: local.test.553_2023-12-18T18_04_05_970933_v2, Primary Storage: lmdb_storage.,
E df=
E grouping_column a
E 0 0 1.586038e+307
E 1 0 1.797693e+308
E ,
E )
E
E You can reproduce this example by temporarily adding @reproduce_failure('6.72.4', b'AAICAAIAAAC/xpX//////QIAAgAAAL////////6jAAEAAA==') as a decorator on your test case |
it looks like this build was using a version of the test that doesn't contain the fix for exactly this issue. |
Closing this epic (which is hard to track) in favour of the label |
Tracking ticket for flaky tests
test_restore_version
frequentDuplicateKeyException
in Conda build #469test_engine.py::test_partial_write_hashed
#316-> Through debugging, the underlying issue is LMDB map size is too small on Windows #229
-> Not trivially fixable/skippable. See Improve aggregation numerical accuracy #603
-> Fix attempt bundled with Fix flaky test_filter_numeric_isnotin_unsigned (#496) #604
test_find_version
-> Looks similar in nature to
test_read_ts
abovetest_append_with_cont_mem_problem
below Flaky tests for defragmentation API #985test_append_with_defragmentation
below Flaky tests for defragmentation API #985test_column_names_by_timestamp
(need to usedistinct_timestamps
on it)The text was updated successfully, but these errors were encountered: