Excel & CSV query runner #2478

deecay · 2018-04-20T22:06:23Z

This PR adds Excel and CSV as possible datasource.

Datasource Name is the only configuration for both datasource.
Query must be in YAML format.
Specify path of excel/csv file, either local path or URI, for url parameter.
You may pass other parameters to the pandas excel/csv parsing function.

Most of the code is taken from conversion pandas.DataFrame to result and vice versa in python query (code attached) #2078, thanks to @dersphere.

Online Excel data file with parameters

Reference for parameters: CSV and Excel.

Same data without parameters (table is broken)

https://www.unicef.org/sowc2012/pdfs/U5MR-rank_FINAL.xls
TODO:

db-logo

deecay · 2018-09-03T07:50:02Z

Need db-logo for Excel...

arikfr

Thanks! Supporting both CSV and Excel is great.

My main concern with this implementation is the use of Pandas, as I always felt it's a "heavy" dependency that might introduce issues with setting up Redash for some users. Although I did look into it again and I'm no longer sure about this.

@jezdez do you happen to have any insight on Pandas requirements?

(and, @jezdez , of course it's another case for #2921 ;-))

deecay · 2018-11-20T09:39:34Z

@jezdez, any opinions? Maybe resort to requirements_excel_ds.txt?

jezdez · 2018-11-20T09:54:48Z

Yeah, installing Pandas and numpy is quite a lot for the purpose of reading and parsing Excel and CSV files alone. That's an additional ~23 MB (unpacked whl files on Linux for Python 2.7) for the Docker image and a higher maintenance burden given the high profile of Pandas and numpy for a relatively small, albeit useful feature.

Options forward:

find a different library to read and parse CSV/Excel files
ship the query runner as a separate package (Move some query runners to separate packages #2921) and document how users can install it on demand (my preference)
accept the extra dependency

A positive point is that both Pandas and numpy are available as precompiled whl files, so it's literally just a matter of downloading it.

denisov-vlad · 2018-11-20T10:06:09Z

Pandas is must have for python query runner.

Maybe it's heavy but does not require additional libraries which should be installed via apt like other datasources like mssql.

jezdez · 2018-11-20T10:11:48Z

Pandas is must have for python query runner.

Maybe it's heavy but does not require additional libraries which should be installed via apt like other datasources like mssql.

That may or may not be so, it's out of scope of this pull request review though.

deecay · 2018-11-20T11:39:02Z

Option 2 is acceptable for me too. I will wait until #2921 is done.

xlrd will be the candidate for Option 1, but this involves decent amount of 'reinventing the wheel' to get the nice features that Pandas have (skiprows, usecols, etc). Without these features, a "perfectly formatted" excel will be required, which can be very rare as I have displayed in the examples.

arikfr · 2018-11-20T12:14:24Z

Option 2 is acceptable for me too. I will wait until #2921 is done.

We don't have to extract all the query runners for you to be able to do this. But we will probably need to extract the query runners base classes and helper methods to their own package, so they can be used "externally". @jezdez , am I correct here?

xlrd will be the candidate for Option 1, but this involves decent amount of 'reinventing the wheel' to get the nice features that Pandas have (skiprows, usecols, etc). Without these features, a "perfectly formatted" excel will be required, which can be very rare as I have displayed in the examples.

I missed the added functionality. That's actually nice :) I would suggest a different "query syntax" though, let's use YAML here, so the query becomes:

url: https://www.unicef.org/sowc2012/pdfs/U5MR-rank_FINAL.xls
names:
  - Country
  - Mortality
  - Rank
usecols: [0, 1, 2]
skiprows: 7
skipfooter: 2

deecay · 2018-11-20T15:28:31Z

@Arik, interesting idea. I'll take a look at yandex metrica query runner first, and see what I can do about yaml.

deecay · 2018-11-30T09:44:41Z

Done with the yaml part.

deecay · 2019-10-24T12:16:59Z

Is this okay to merge now (by pymapd), since we have both numpy and pandas in our standard install?

machzqcq · 2019-12-15T17:37:36Z

Is this okay to merge now, since we have both numpy and pandas in our standard install?

Can someone confirm this is merged, reading excel and pandas df as data source will help a lot

dersphere · 2019-12-15T18:45:55Z

Would be great to have my initial code somehow integrated (thanks @deecay) :)

xsvfat · 2020-02-24T07:08:44Z

Would be great to get this merged!

jezdez

I'm sorry this never went anywhere folks.

deecay · 2021-05-26T02:13:59Z

I'm sorry this never went anywhere folks.

Hi @jezdez, what is blocking this? Or should I close this for some reason?

arikfr

Good news: we can merge this! @deecay can you please confirm this still works on latest master?

And sorry it took so long.

arikfr

Actually there are few things missing:

Logos for each one of them.
Register them so they loaded by default.
Add the local urls filtering (let me know if you need a pointer for what I mean).

Thanks!

deecay · 2021-07-24T13:27:53Z

Logo for Excel added. (csv logo was already there)
Updated init file to be enabled by default

Could you point to the local urls filtering?

susodapop · 2021-07-26T16:58:01Z

Could you point to the local urls filtering?

I think Arik is referring to this which was discussed when implementing the JSON data source.

deecay · 2021-07-27T03:07:41Z

redash/query_runner/excel.py

+            ua = args['user-agent']
+            args.pop('user-agent', None)
+
+            if is_private_address(path) and settings.ENFORCE_PRIVATE_ADDRESS_BLOCK:


@susodapop , I guess you're right. Done.

arikfr · 2021-07-27T20:27:45Z

So it took over 3 years but it's merged now. Thank you for the the continued contribution and immense patience 😅

dersphere · 2021-07-27T21:09:28Z

😃

ainsofs · 2021-09-28T04:01:56Z

CSV and Excel do not appear on my list of data sources. Is there something I need to do to make that happen? I am using image redash/redash:8.0.0.b32245 . I also tried image version 10 beta with the same result

susodapop · 2021-09-28T13:57:21Z

@ainsofs You need to run the tip of master since this PR isn't merged into the V10 branch (yet). It will be before we release.

* Excel query runner * Param handling for read_excel * CSV query runner * Fix wrong module name * Use yaml as query language * Use yaml as query language for CSV * Added icon and required modules * Local address filtering * Fix syntax error

The following PR's were cherry-picked: * Excel & CSV query runner (#2478) * Pin python3 image version (#5570) * Fix: Edit Source button disappeared for users without CanEdit perms (#5568) * Fix: Specify the protobuf version (#5608) Plus one additional change exclusive to this branch: * Replace reference to yarn with NPM This happened because we cherry-picked #5570 but did not also incorporate #5541 into V10. Co-authored-by: deecay <deecay@users.noreply.github.com> Co-authored-by: Levko Kravets <levko.ne@gmail.com> Co-authored-by: zoomdot <gninggoon@gmail.com>

nixftw · 2021-10-18T09:35:17Z

@susodapop What do you mean by run the tip of master ? Got redash working , but CSV does not appear on the list.

nixftw · 2021-10-18T11:16:40Z

@ainsofs Did you manage do get it working ? If so , please share the solution :)

commit 9c928bd Author: Jesse Whitehouse <jesse@whitehouse.dev> Date: Fri Oct 1 21:13:13 2021 -0500 Bump version to 10.0.0 commit f312adf Author: Jesse <jesse.whitehouse@databricks.com> Date: Fri Oct 1 18:02:27 2021 -0500 Apply V10 beta period feedback / fixes (getredash#5611) The following PR's were cherry-picked: * Excel & CSV query runner (getredash#2478) * Pin python3 image version (getredash#5570) * Fix: Edit Source button disappeared for users without CanEdit perms (getredash#5568) * Fix: Specify the protobuf version (getredash#5608) Plus one additional change exclusive to this branch: * Replace reference to yarn with NPM This happened because we cherry-picked getredash#5570 but did not also incorporate getredash#5541 into V10. Co-authored-by: deecay <deecay@users.noreply.github.com> Co-authored-by: Levko Kravets <levko.ne@gmail.com> Co-authored-by: zoomdot <gninggoon@gmail.com> commit 92e5d78 Author: Jesse <jesse.whitehouse@databricks.com> Date: Thu Jun 17 13:42:07 2021 -0500 Update changelog details for snowflake (getredash#5519) commit 0983e69 Author: Jesse <jesse.whitehouse@databricks.com> Date: Thu Jun 17 12:45:17 2021 -0500 update changelog for v10-beta (getredash#5517) commit dec8879 Author: Jesse <jesse.whitehouse@databricks.com> Date: Tue Jun 15 15:04:36 2021 -0500 Fix: pagination is broken on the dashboard list page (getredash#5516) * Add test that reproduces issue getredash#5466 * Fix: Duplicate dashboard rows were returned by Dashboard.all() (getredash#5466) commit 64a1d7a Author: Jesse Whitehouse <jesse@whitehouse.dev> Date: Tue Jun 1 11:21:49 2021 -0500 Update version for CircleCI build.

deecay added 3 commits April 17, 2018 22:37

Excel query runner

dbc0618

Param handling for read_excel

a0364a6

CSV query runner

0677e07

deecay changed the title ~~Excel query runner~~ Excel & CSV query runner Sep 3, 2018

Fix wrong module name

291e6a1

arikfr reviewed Oct 14, 2018

View reviewed changes

deecay added 3 commits November 30, 2018 17:33

remove pandas from all_ds

7050ba8

Use yaml as query language

29ccce7

Use yaml as query language for CSV

c89db78

deecay changed the title ~~Excel & CSV query runner~~ [WIP] Excel & CSV query runner Nov 30, 2018

weekly-digest bot mentioned this pull request Oct 28, 2019

Weekly Digest (21 October, 2019 - 28 October, 2019) #4308

Closed

weekly-digest bot mentioned this pull request Dec 16, 2019

Weekly Digest (9 December, 2019 - 16 December, 2019) #4451

Closed

deecay changed the title ~~[WIP] Excel & CSV query runner~~ Excel & CSV query runner Dec 26, 2019

deecay requested a review from jezdez December 26, 2019 01:11

weekly-digest bot mentioned this pull request Dec 30, 2019

Weekly Digest (23 December, 2019 - 30 December, 2019) #4508

Closed

weekly-digest bot mentioned this pull request Mar 2, 2020

Weekly Digest (24 February, 2020 - 2 March, 2020) #4702

Closed

hhokawa777 mentioned this pull request Jul 29, 2020

Datasource excel 20200729 deecay/redash#4

Open

1 task

jezdez reviewed Apr 30, 2021

View reviewed changes

arikfr approved these changes May 26, 2021

View reviewed changes

arikfr reviewed May 26, 2021

View reviewed changes

deecay added 2 commits July 24, 2021 13:53

Merge branch 'master' into datasource-excel

cc876a2

Added icon and required modules

37b8329

deecay added 2 commits July 27, 2021 11:52

Local address filtering

e945181

Fix syntax error

111e14e

deecay commented Jul 27, 2021

View reviewed changes

arikfr merged commit b9cb819 into getredash:master Jul 27, 2021

deecay deleted the datasource-excel branch July 28, 2021 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excel & CSV query runner #2478

Excel & CSV query runner #2478

deecay commented Apr 20, 2018 •

edited

Loading

deecay commented Sep 3, 2018

arikfr left a comment

deecay commented Nov 20, 2018

jezdez commented Nov 20, 2018

denisov-vlad commented Nov 20, 2018

jezdez commented Nov 20, 2018

deecay commented Nov 20, 2018

arikfr commented Nov 20, 2018

deecay commented Nov 20, 2018 •

edited

Loading

deecay commented Nov 30, 2018

deecay commented Oct 24, 2019 •

edited

Loading

machzqcq commented Dec 15, 2019 •

edited

Loading

dersphere commented Dec 15, 2019

xsvfat commented Feb 24, 2020

jezdez left a comment

deecay commented May 26, 2021

arikfr left a comment

arikfr left a comment

deecay commented Jul 24, 2021

susodapop commented Jul 26, 2021

deecay Jul 27, 2021

arikfr commented Jul 27, 2021

dersphere commented Jul 27, 2021

ainsofs commented Sep 28, 2021

susodapop commented Sep 28, 2021

nixftw commented Oct 18, 2021

nixftw commented Oct 18, 2021

Excel & CSV query runner #2478

Excel & CSV query runner #2478

Conversation

deecay commented Apr 20, 2018 • edited Loading

deecay commented Sep 3, 2018

arikfr left a comment

Choose a reason for hiding this comment

deecay commented Nov 20, 2018

jezdez commented Nov 20, 2018

denisov-vlad commented Nov 20, 2018

jezdez commented Nov 20, 2018

deecay commented Nov 20, 2018

arikfr commented Nov 20, 2018

deecay commented Nov 20, 2018 • edited Loading

deecay commented Nov 30, 2018

deecay commented Oct 24, 2019 • edited Loading

machzqcq commented Dec 15, 2019 • edited Loading

dersphere commented Dec 15, 2019

xsvfat commented Feb 24, 2020

jezdez left a comment

Choose a reason for hiding this comment

deecay commented May 26, 2021

arikfr left a comment

Choose a reason for hiding this comment

arikfr left a comment

Choose a reason for hiding this comment

deecay commented Jul 24, 2021

susodapop commented Jul 26, 2021

deecay Jul 27, 2021

Choose a reason for hiding this comment

arikfr commented Jul 27, 2021

dersphere commented Jul 27, 2021

ainsofs commented Sep 28, 2021

susodapop commented Sep 28, 2021

nixftw commented Oct 18, 2021

nixftw commented Oct 18, 2021

deecay commented Apr 20, 2018 •

edited

Loading

deecay commented Nov 20, 2018 •

edited

Loading

deecay commented Oct 24, 2019 •

edited

Loading

machzqcq commented Dec 15, 2019 •

edited

Loading