-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix JSON bug with data reading #691
Conversation
@@ -1,6 +1,6 @@ | |||
"""Contains class for saving and loading spreadsheet data.""" | |||
from io import BytesIO, StringIO | |||
from typing import Any, Dict, List, Optional, Union, cast | |||
from typing import Any, Dict, List, Optional, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the proper way in the other func for a boolean which checks type is to return a TypeGuard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to cast now
@@ -92,18 +92,12 @@ def is_match( | |||
|
|||
# get current position of stream | |||
if data_utils.is_stream_buffer(file_path): | |||
file_path = cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static typing, not functional change, removed bc of TypeGuard
starting_location = file_path.tell() | ||
|
||
is_valid_avro = fastavro.is_avro(file_path) | ||
|
||
# return to original position in stream | ||
if data_utils.is_stream_buffer(file_path): | ||
file_path = cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static typing, not functional change, removed bc of TypeGuard
|
||
def is_stream_buffer(filepath_or_buffer: Any) -> bool: | ||
|
||
def is_stream_buffer(filepath_or_buffer: Any) -> TypeGuard[Union[StringIO, BytesIO]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix to use TypeGuard
open_method: str ="r", | ||
encoding: Optional[str]=None, | ||
seek_offset: Optional[int]=None, | ||
seek_whence: int=0, | ||
open_method: str = "r", | ||
encoding: Optional[str] = None, | ||
seek_offset: Optional[int] = None, | ||
seek_whence: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting
self.original_type: Union[Type[str], Type[StringIO], Type[BytesIO], Type[IO]] = type(filepath_or_buffer) | ||
self.original_type: Union[ | ||
Type[str], Type[StringIO], Type[BytesIO], Type[IO] | ||
] = type(filepath_or_buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format
self._filepath_or_buffer = cast(TextIOWrapper, self._filepath_or_buffer) # guaranteed by self._is_wrapped | ||
self._filepath_or_buffer = cast( | ||
TextIOWrapper, self._filepath_or_buffer | ||
) # guaranteed by self._is_wrapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format
self._filepath_or_buffer = cast(IO, self._filepath_or_buffer) # can't be str due to conversion in __enter__ | ||
self._filepath_or_buffer = cast( | ||
IO, self._filepath_or_buffer | ||
) # can't be str due to conversion in __enter__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format
_data = _data.to_dict(orient="records", into=OrderedDict) | ||
for i, sample in enumerate(_data): | ||
_data[i] = json.dumps( | ||
data = self._get_data_as_df(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer having two variables of the data with _data
fixes type acceptance at the top
""" | ||
Extract the data as a json format. | ||
|
||
:param data: raw data | ||
:type data: list | ||
:return: dataframe in json format | ||
""" | ||
_data: Union[pd.DataFrame, List] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer having two variables of the data with _data
fixes type acceptance at the top
@@ -349,6 +349,8 @@ def _convert_flat_to_nested_cols(cls, dic: Dict, separator: str = ".") -> Dict: | |||
:return: | |||
""" | |||
for key in list(dic.keys()): | |||
if not isinstance(key, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix for allowing the [1]
format in json reading
@@ -392,14 +394,16 @@ def is_match( | |||
return True | |||
except (json.JSONDecodeError, UnicodeDecodeError): | |||
data_file.seek(0) | |||
|
|||
json_identifier_re = re.compile(r"(:|\[)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allows both :
and [
as a JSON identifier differentiating from just a string.
@@ -12,11 +12,12 @@ | |||
|
|||
class TestNestedJSON(unittest.TestCase): | |||
def test_flat_to_nested_json(self): | |||
dic = {"a.b": "ab", "a.c": "ac", "a.d.f": "adf", "b": "b"} | |||
dic = {"a.b": "ab", "a.c": "ac", "a.d.f": "adf", "b": "b", 1: 3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updates test to check for keys that aren't strings
@@ -56,6 +57,11 @@ def setUpClass(cls): | |||
encoding="utf-8", | |||
count=14, | |||
), | |||
dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adds test which includes the new json format
@@ -0,0 +1,3 @@ | |||
[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new data for testing json reading
Previously: we could not read line separated JSON arrays.
Now we can read:
becomes a JSONData which is wrapping pandas as: