-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEST-#6830: Use local s3 server instead of public s3 buckets #6863
Conversation
…kets Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@@ -44,6 +44,8 @@ def test_line_endings(): | |||
if any(i in subdir for i in [".git", ".idea", "__pycache__"]): | |||
continue | |||
for file in files: | |||
if file.endswith(".parquet"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In binary format there is no need to check these characters (\r\n
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sequence of characters may have a different meaning for binary formats. If, for example, you try to replace \r\n
with \n
, the arrow will give a message that parquet file is corrupted.
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
path, | ||
columns, | ||
use_threads=False, | ||
storage_options=kwargs["storage_options"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of storage_options
parameter was missed
@@ -47,7 +47,9 @@ def _read(cls, path_or_buf, **kwargs): | |||
path_or_buf = stringify_path(path_or_buf) | |||
path_or_buf = cls.get_path_or_buffer(path_or_buf) | |||
if isinstance(path_or_buf, str): | |||
if not cls.file_exists(path_or_buf): | |||
if not cls.file_exists( | |||
path_or_buf, storage_options=kwargs.get("storage_options") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of storage_options
parameter was missed
with OpenFile( | ||
path_or_buf, | ||
"rb", | ||
**(kwargs.get("storage_options", None) or {}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of storage_options
parameter was missed
path_or_buf, | ||
"rb", | ||
kwargs.get("compression", "infer"), | ||
**(kwargs.get("storage_options", None) or {}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of storage_options
parameter was missed
@@ -68,7 +68,9 @@ def _read(cls, filepath_or_buffer, **kwargs): | |||
reason=cls._file_not_found_msg(filepath_or_buffer), | |||
**kwargs, | |||
) | |||
filepath_or_buffer = cls.get_path(filepath_or_buffer) | |||
filepath_or_buffer = cls.get_path( | |||
filepath_or_buffer, kwargs.get("storage_options") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of storage_options
parameter was missed
if storage_options is not None: | ||
new_storage_options = dict(storage_options) | ||
new_storage_options.pop("anon", None) | ||
else: | ||
new_storage_options = {} | ||
|
||
fs, _ = fsspec.core.url_to_fs(file_path, **new_storage_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made by analogy with other functions
@@ -212,17 +216,19 @@ def _pandas_read_csv_glob(path, storage_options): | |||
reason=f"{Engine.get()} does not have experimental API", | |||
) | |||
@pytest.mark.parametrize( | |||
"storage_options", | |||
[{"anon": False}, {"anon": True}, {"key": "123", "secret": "123"}, None], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use storage_options=None
(because of our S3 server), but the test coverage hasn't decreased, so I guess there's no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we want to add a test case for storage_options=None
in future? What are we supposed to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, do not use our own s3 server, since in this case we always need to explicitly specify endpoint.
@YarShev @dchigarev ready for review |
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
added andpassingdocs/development/architecture.rst
is up-to-date