Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV: correctly serialize booleans and dates. #3841

Merged
merged 6 commits into from
May 29, 2019
Merged

CSV: correctly serialize booleans and dates. #3841

merged 6 commits into from
May 29, 2019

Conversation

arikfr
Copy link
Member

@arikfr arikfr commented May 28, 2019

What type of PR is this? (check all applicable)

  • Bug Fix

Description

When downloading CSV version of a query result, it will:

  1. Serialize booleans as true/false (instead of True/False).
  2. Serialize date/time using the organization format for date/time.

Related Tickets & Documents

Closes #3736, closes #2751.

@arikfr arikfr requested a review from rauchy May 28, 2019 13:10
@@ -322,41 +319,6 @@ def store_result(cls, org, data_source, query_hash, query, data, run_time, retri
def groups(self):
return self.data_source.groups

def make_csv_content(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the opportunity to move this serialization logic from redash.models to redash.serializers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! and +1 for tests!

query_data = json_loads(query_result.data)

fieldnames = []
bool_columns = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a tad too scripty for me. I'd prefer to have less details and noise here.
Perhaps this function could be split to several chunks?

  1. Calling a function to slice query_data['columns'] to three lists:
bool_columns, date_columns, datetime_columns = split_columns_by_type(query_data['columns'])
  1. Handling csv.DictWriter stuff and for each row:
    2a. Calling a function to format booleans
    3b. Calling a function to format dates
    4c. Calling a function to format date times

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started with implementing 1 (5e0ccb9), but then when I tried to split it further I realized I will need to pass around the lists 🤢or convert this into a class 🤔. So ended up doing this instead: feda3c9.

I didn't do this initially as it felt too much, but it does clean the code and makes it easy to add support for new types later.


for col in query_data['columns']:
fieldnames.append(col['name'])
if col['type'] == TYPE_BOOLEAN:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feel pretty mutually exclusive, so an elif might be in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]
}

def get_csv_content(factory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not stuff this inside CsvSerializationTest and avoid the passing of factory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason :) I just didn't realize at first it will need the factory, and then I moved it out it was easier to pass it the factory -> c6ff138.

@@ -322,41 +319,6 @@ def store_result(cls, org, data_source, query_hash, query, data, run_time, retri
def groups(self):
return self.data_source.groups

def make_csv_content(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! and +1 for tests!

@arikfr arikfr merged commit 9292ae8 into master May 29, 2019
@arikfr arikfr deleted the csv-fixes branch May 29, 2019 07:45
harveyrendell pushed a commit to pushpay/redash that referenced this pull request Nov 14, 2019
* CSV: correctly serialize booleans and dates.

Closes getredash#3736, closes getredash#2751.

* pep8 fixes

* Move column iteration to a helper function.

* Use elif, as types are mutually exclusive.

* Refactor parsing implementation.

* Move the csv generation fucntion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exporting data should returns SQL types and not python types Datetime format on CSV is different.
2 participants