-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excel & CSV query runner #2478
Excel & CSV query runner #2478
Conversation
Need db-logo for Excel... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Supporting both CSV and Excel is great.
My main concern with this implementation is the use of Pandas, as I always felt it's a "heavy" dependency that might introduce issues with setting up Redash for some users. Although I did look into it again and I'm no longer sure about this.
@jezdez do you happen to have any insight on Pandas requirements?
@jezdez, any opinions? Maybe resort to |
Yeah, installing Pandas and numpy is quite a lot for the purpose of reading and parsing Excel and CSV files alone. That's an additional ~23 MB (unpacked whl files on Linux for Python 2.7) for the Docker image and a higher maintenance burden given the high profile of Pandas and numpy for a relatively small, albeit useful feature. Options forward:
A positive point is that both Pandas and numpy are available as precompiled whl files, so it's literally just a matter of downloading it. |
Pandas is must have for python query runner. Maybe it's heavy but does not require additional libraries which should be installed via apt like other datasources like mssql. |
That may or may not be so, it's out of scope of this pull request review though. |
Option 2 is acceptable for me too. I will wait until #2921 is done.
|
We don't have to extract all the query runners for you to be able to do this. But we will probably need to extract the query runners base classes and helper methods to their own package, so they can be used "externally". @jezdez , am I correct here?
I missed the added functionality. That's actually nice :) I would suggest a different "query syntax" though, let's use YAML here, so the query becomes: url: https://www.unicef.org/sowc2012/pdfs/U5MR-rank_FINAL.xls
names:
- Country
- Mortality
- Rank
usecols: [0, 1, 2]
skiprows: 7
skipfooter: 2 |
@Arik, interesting idea. I'll take a look at yandex metrica query runner first, and see what I can do about yaml. |
Is this okay to merge now (by pymapd), since we have both numpy and pandas in our standard install? |
Can someone confirm this is merged, reading excel and pandas df as data source will help a lot |
Would be great to have my initial code somehow integrated (thanks @deecay) :) |
Would be great to get this merged! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry this never went anywhere folks.
Hi @jezdez, what is blocking this? Or should I close this for some reason? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news: we can merge this! @deecay can you please confirm this still works on latest master?
And sorry it took so long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there are few things missing:
- Logos for each one of them.
- Register them so they loaded by default.
- Add the local urls filtering (let me know if you need a pointer for what I mean).
Thanks!
Could you point to the local urls filtering? |
I think Arik is referring to this which was discussed when implementing the JSON data source. |
ua = args['user-agent'] | ||
args.pop('user-agent', None) | ||
|
||
if is_private_address(path) and settings.ENFORCE_PRIVATE_ADDRESS_BLOCK: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@susodapop , I guess you're right. Done.
So it took over 3 years but it's merged now. Thank you for the the continued contribution and immense patience 😅 |
😃 |
CSV and Excel do not appear on my list of data sources. Is there something I need to do to make that happen? I am using image redash/redash:8.0.0.b32245 . I also tried image version 10 beta with the same result |
@ainsofs You need to run the tip of |
* Excel query runner * Param handling for read_excel * CSV query runner * Fix wrong module name * Use yaml as query language * Use yaml as query language for CSV * Added icon and required modules * Local address filtering * Fix syntax error
The following PR's were cherry-picked: * Excel & CSV query runner (#2478) * Pin python3 image version (#5570) * Fix: Edit Source button disappeared for users without CanEdit perms (#5568) * Fix: Specify the protobuf version (#5608) Plus one additional change exclusive to this branch: * Replace reference to yarn with NPM This happened because we cherry-picked #5570 but did not also incorporate #5541 into V10. Co-authored-by: deecay <deecay@users.noreply.github.com> Co-authored-by: Levko Kravets <levko.ne@gmail.com> Co-authored-by: zoomdot <gninggoon@gmail.com>
@susodapop What do you mean by run the tip of master ? Got redash working , but CSV does not appear on the list. |
@ainsofs Did you manage do get it working ? If so , please share the solution :) |
commit 9c928bd Author: Jesse Whitehouse <jesse@whitehouse.dev> Date: Fri Oct 1 21:13:13 2021 -0500 Bump version to 10.0.0 commit f312adf Author: Jesse <jesse.whitehouse@databricks.com> Date: Fri Oct 1 18:02:27 2021 -0500 Apply V10 beta period feedback / fixes (getredash#5611) The following PR's were cherry-picked: * Excel & CSV query runner (getredash#2478) * Pin python3 image version (getredash#5570) * Fix: Edit Source button disappeared for users without CanEdit perms (getredash#5568) * Fix: Specify the protobuf version (getredash#5608) Plus one additional change exclusive to this branch: * Replace reference to yarn with NPM This happened because we cherry-picked getredash#5570 but did not also incorporate getredash#5541 into V10. Co-authored-by: deecay <deecay@users.noreply.github.com> Co-authored-by: Levko Kravets <levko.ne@gmail.com> Co-authored-by: zoomdot <gninggoon@gmail.com> commit 92e5d78 Author: Jesse <jesse.whitehouse@databricks.com> Date: Thu Jun 17 13:42:07 2021 -0500 Update changelog details for snowflake (getredash#5519) commit 0983e69 Author: Jesse <jesse.whitehouse@databricks.com> Date: Thu Jun 17 12:45:17 2021 -0500 update changelog for v10-beta (getredash#5517) commit dec8879 Author: Jesse <jesse.whitehouse@databricks.com> Date: Tue Jun 15 15:04:36 2021 -0500 Fix: pagination is broken on the dashboard list page (getredash#5516) * Add test that reproduces issue getredash#5466 * Fix: Duplicate dashboard rows were returned by Dashboard.all() (getredash#5466) commit 64a1d7a Author: Jesse Whitehouse <jesse@whitehouse.dev> Date: Tue Jun 1 11:21:49 2021 -0500 Update version for CircleCI build.
This PR adds Excel and CSV as possible datasource.
url
parameter.Online Excel data file with parameters
Same data without parameters (table is broken)
https://www.unicef.org/sowc2012/pdfs/U5MR-rank_FINAL.xls
TODO: