Added shell for accessing odbc to supply data to Jupyter. Fixed a fe… #222

SternAndrew · 2016-06-23T13:23:26Z

Added shell for accessing odbc to supply data to Jupyter. Fixed a few issues by pulling in newer versions of rpy2 and ipywidgets.

Added freetds to allow access to database instances using odbc.
Create a shell .odbc.ini file that can be filled out to access a database instance.
Updated the version of rpy2 since the %R and %%R magic wouldn't work from the Python kernels so that R can be used in cells within a Python sheet.
Updated ipywidgets so that the widgets can be used within a Python3 sheet. Without this update it wouldn't show the controls in a Python3 sheet after running the sheet
Pinned the conda to jpeg 8 since the ggplot2 graphs wouldn't showup in R sheets. See unable to plot in jupyter notebook using R kernel #210

… issues by pulling in newer versions of rpy2 and ipywidgets. 1) Added freetds to allow access to database instances using odbc. 2) Create a shell .odbc.ini file that can be filled out to access a database instance. 3) Updated the version of rpy2 since the %R and %%R magic wouldn't work from the Python kernels so that R can be used in cells within a Python sheet. 4) Updated ipywidgets so that the widgets can be used within a Python3 sheet. Without this update it wouldn't show the controls in a Python3 sheet after running the sheet 5) Pinned the conda to jpeg 8 since the ggplot2 graphs wouldn't showup in R sheets. See jupyter#210

parente · 2016-06-23T13:46:41Z

@SternAndrew Thanks for the contribution.

I see there's quite a few fixes going into this PR, done by uninstalling packages and installing new ones here. It would probably be better to fix these issues in the base images so that we're not bloating the image with the old layers plus the new ones.

I'm not clear on how reusable the ODBC support is that you're adding. I see the sample configuration. Are users expected to provide their own? I'm asking to understand if this makes sense as an addition directly to the docker stack image, if it should be a recipe on the wiki, or if it should be a separate image entirely that just starts FROM jupyter/datascience-notebook.

SternAndrew · 2016-06-23T14:17:55Z

I expect at some point the base versions of both rpy2 and ipywidgets will get updated in conda and that these patch lines at the end can just be removed. Better to localize the fixes since we shouldn't need them in the long run. I feel the same way about the fix for the ggplot2 using the jpeg 8 setting. If these were long term additions I would agree fully with you but the desire to remove these as quickly as possible and to shrink the image should remain. This increases the desire to get the conda versions updated to fix the problems but until they are fixed there is no reason for new users trying out your docker image to experience these issues. It took quite some time to figure out how to fix these issues and others shouldn't have to go through the same process.

I added the .odbc.ini file but you are correct that.additional sections should be added by the user. This just provides a bit of documentation that will help new users setup and get information from odbc sources as not everyone is using sample data or csv files. The database provides some real usage results that I was able to demo. The addition of freetds and the .odbc.ini file will help others so that they only need to append to the file I created with their database configuration to access data within their environment. I expect them to use this as a base and just add some appended lines before having a running instance. By placing it somewhere else it just makes it harder for new users to find the information. The increase in space is minor but makes it much faster for users to see what Jupyter can really do for them.

I have attempted to also add Ruby to my running base image but the IRuby kernel dies so I didn't include it in the docker file. Although it would be nice to have a base image with Ruby available if any new users like that language since it is fairly popular.

My general impression is that trying out the docker image is the first experience with Jupyter and that it should just work and have everything they would need. It helps them quickly create demos and figure out how helpful this project could be as a tool. I expect that after the demo on the docker image they will create a fuller instance on hardware or create more complete docker images for corporate use.

jakirkham · 2016-06-23T14:41:32Z

To give you some background @SternAndrew, the jpeg problem is a bit more complex. We decided at conda-forge that jpeg 9 was necessary and trying to build with jpeg 8 was problematic. However, defaults hasn't caught up yet. This affects a wide range of things including qt. As a result, it will be a bit before defaults switches to jpeg 9. It seems totally sensible to me to set that constrain lower in the stack to ensure things like matplotlib don't get messed up (in short I agree with @parente). Now knowing this would you be ok with moving that lower in the stack?

SternAndrew · 2016-06-23T15:15:06Z

I have moved the fixes earlier in the stack. Thank you for the background it did help.

parente · 2016-06-23T16:46:43Z

It seems totally sensible to me to set that constrain lower in the stack to ensure things like matplotlib don't get messed up (in short I agree with @parente). Now knowing this would you be ok with moving that lower in the stack?

@jakirkham did you mean "into the base images" by lower in the stack?

jakirkham · 2016-06-23T16:49:24Z

Yes, sorry, that was a bit vague. Does that still seem reasonable to you or did I misunderstand something?

parente · 2016-06-23T16:59:41Z

Yes, sorry, that was a bit vague. Does that still seem reasonable to you or did I misunderstand something?

Yes. I think the PR might need to be split up a bit to target fixes in the right places. Otherwise, the datascience image has all the goodness / upgrades / fixes while the rest are left out in the cold.

Fixes aside, we also have decide if the ODBC support is something that belongs only in this image, in the base images, or outside docker-stacks as a recipe / other image. If it does wind up here, it at least needs some doc about what the user has to do starting from the template in order to us it.

SternAndrew · 2016-06-23T18:02:16Z

I have completed breaking up the fixes. I'm fine if you move the odbc stuff into an earlier base image.

parente · 2016-06-28T11:43:29Z

@SternAndrew I'd like to do a bit more cleanup on your PR before merging and experiment a bit with where the odbc support should go. I'm going to leave a couple notes here. I'll branch from your PR to make the follow-on edits myself in the near future (retaining your original commits of course). If you'd like to carry on with making the changes yourself, don't let me stop you.

parente · 2016-06-28T11:44:05Z

datascience-notebook/Dockerfile

    gcc && apt-get clean && \
    rm -rf /var/lib/apt/lists/*

+# Python pyodbc
+RUN pip install pyodbc


Can we get from conda instead?

Also, pin version.

Haven't tried that out but if it works it would be preferred over the non-conda. Do you have a database to test against? We should install the conda version for both Python2 and Python3.

When I run conda install for pyodbc I get v3.0.10. I get the same when running pip install. I'm assuming their equivalent.

Re: database to test against, I think that's something we're going to want to document in the README (see below). Linking to one of the official maria, mysql, postgres, etc. docker containers to do it is probably the simplest way.

Seems like a good idea. Perhaps the .odbc.ini file could be prepared to use the correct credentials for the maria, mysql image. Perhaps a base database per-populated and setup as a test odbc image preconnected to test the odbc connection. That way the automated build might be able to run a system test using this populated data so future changes can make sure the odbc stuff still is working.

Prepping the default odbc.ini to work with one of the official database images sounds like a good approach. Would you like to take a crack at documenting that setup in the README? If not, I can at some point.

I have a few items on my plate at this time. I think you might get to it before me.

SternAndrew · 2016-06-29T13:10:40Z

Note that I locked in verison 1.00.9 as the version to use but that my current setup is using freetds-1.00.6.tar.gz.

parente · 2017-06-10T02:49:39Z

Hi @SternAndrew. So clearly neither you nor I found the time to get this PR across the finish line and into a state that would make all the additions easy for users to understand and us to maintain. I think it's best to close it out at this point and reopen it if either of us, or someone else, has the time and interest in seeing it all through. Thoughts?

Moved fixes earlier in the stack.

9206a00

Fixed build by moving r-base install above rpy2

9730ff0

Andrew Stern added 2 commits June 23, 2016 16:07

Fixed string termination error.

b2fe424

Added python access to odbc using pyodbc.

89b16bf

parente reviewed Jun 28, 2016
View reviewed changes

Andrew Stern added 4 commits June 29, 2016 08:05

Added checksum for tar file.

6c95068

Added pinning for R packages dbi and scales

c7b1eb1

Updated to head

d3899eb

Replaced patched version of freetds with current released version.

114d1bc

Fixed untar

5cddd1b

parente closed this Jun 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added shell for accessing odbc to supply data to Jupyter. Fixed a fe… #222

Added shell for accessing odbc to supply data to Jupyter. Fixed a fe… #222

SternAndrew commented Jun 23, 2016

parente commented Jun 23, 2016

SternAndrew commented Jun 23, 2016

jakirkham commented Jun 23, 2016

SternAndrew commented Jun 23, 2016

parente commented Jun 23, 2016

jakirkham commented Jun 23, 2016

parente commented Jun 23, 2016

SternAndrew commented Jun 23, 2016

parente commented Jun 28, 2016

parente Jun 28, 2016

parente Jun 28, 2016

SternAndrew Jun 28, 2016

parente Jun 28, 2016

SternAndrew Jun 28, 2016

parente Jul 9, 2016

SternAndrew Jul 11, 2016

SternAndrew commented Jun 29, 2016

parente commented Jun 10, 2017

Added shell for accessing odbc to supply data to Jupyter. Fixed a fe… #222

Added shell for accessing odbc to supply data to Jupyter. Fixed a fe… #222

Conversation

SternAndrew commented Jun 23, 2016

parente commented Jun 23, 2016

SternAndrew commented Jun 23, 2016

jakirkham commented Jun 23, 2016

SternAndrew commented Jun 23, 2016

parente commented Jun 23, 2016

jakirkham commented Jun 23, 2016

parente commented Jun 23, 2016

SternAndrew commented Jun 23, 2016

parente commented Jun 28, 2016

parente Jun 28, 2016

Choose a reason for hiding this comment

parente Jun 28, 2016

Choose a reason for hiding this comment

SternAndrew Jun 28, 2016

Choose a reason for hiding this comment

parente Jun 28, 2016

Choose a reason for hiding this comment

SternAndrew Jun 28, 2016

Choose a reason for hiding this comment

parente Jul 9, 2016

Choose a reason for hiding this comment

SternAndrew Jul 11, 2016

Choose a reason for hiding this comment

SternAndrew commented Jun 29, 2016

parente commented Jun 10, 2017