Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame LoadSql Method #5662

Open
Tracked by #6144
tzinckgraf opened this issue Mar 20, 2020 · 7 comments
Open
Tracked by #6144

DataFrame LoadSql Method #5662

tzinckgraf opened this issue Mar 20, 2020 · 7 comments
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs

Comments

@tzinckgraf
Copy link

I was playing around with the DataFrame library to start to build a SQL reconciliation tool, and I noticed that there is no easy way to go from SQL query to DataFrame.

I have some code for that can create the DataFrame from a SqlDataReader, but we may want an easier to use API than that.

One idea is to follow the convention used for the LoadCsv function and make something similar to the pandas function read_sql.

I am happy to put a PR together depending on the approach chosen.

@stephentoub stephentoub transferred this issue from dotnet/runtime Mar 20, 2020
@pgovind
Copy link

pgovind commented Mar 20, 2020

One idea is to follow the convention used for the LoadCsv function and make something similar to the pandas function read_sql.

I'm assuming this will take in a raw SQL query as an argument to LoadSql? We should get input from @eerhardt and possible @tannergooding who most recently worked on some SQL readers. In particular, how .NETy is it to take in a SQL query as a parameter? What's the alternative otherwise?

@eerhardt
Copy link
Member

A thing we need to be conscious about is that the dependencies of the core DataFrame library don't grow too unwieldy. Whenever we add a new dependency to the core DataFrame library, we have to consider if it is beneficial for the majority of use cases.

I definitely see the value in an easy API to load a DataFrame from SQL, but I don't believe it is "core" to the library. For example, DataFrame is used in https://github.com/dotnet/spark, which wouldn't want/need a dependency on SQL libraries.

So I would see something like this as an extension to the DataFrame library that ships in a separate NuGet package. We've designed the DataFrame API such that external libraries can "fill" or "load" one up from other data sources (ex. SQL).

@rspaulino
Copy link

Hi. I see that the code had been written for this, is there a design conflict or just the unit test that's is missing to merge this to main?
thanks

@eerhardt
Copy link
Member

eerhardt commented Sep 3, 2020

is there a design conflict or just the unit test that's is missing to merge this to main?

It looks like it is blocked on adding tests.

@rspaulino
Copy link

@eerhardt thanks.

How extensive the test have to be? I got it running on my computer and have use it with Sqlite, I try and help so it can be merge on the main.

@eerhardt
Copy link
Member

eerhardt commented Sep 4, 2020

How extensive the test have to be?

The tests should exercise the main functionality to ensure it works, and doesn't get broken in the future.

I try and help so it can be merge on the main.

That would be great!

@rspaulino
Copy link

👍🏾 will try and give a shot then

@pgovind pgovind transferred this issue from dotnet/corefxlab Mar 6, 2021
@pgovind pgovind added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

No branches or pull requests

4 participants