-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added shell for accessing odbc to supply data to Jupyter. Fixed a fe… #222
Conversation
… issues by pulling in newer versions of rpy2 and ipywidgets. 1) Added freetds to allow access to database instances using odbc. 2) Create a shell .odbc.ini file that can be filled out to access a database instance. 3) Updated the version of rpy2 since the %R and %%R magic wouldn't work from the Python kernels so that R can be used in cells within a Python sheet. 4) Updated ipywidgets so that the widgets can be used within a Python3 sheet. Without this update it wouldn't show the controls in a Python3 sheet after running the sheet 5) Pinned the conda to jpeg 8 since the ggplot2 graphs wouldn't showup in R sheets. See jupyter#210
@SternAndrew Thanks for the contribution. I see there's quite a few fixes going into this PR, done by uninstalling packages and installing new ones here. It would probably be better to fix these issues in the base images so that we're not bloating the image with the old layers plus the new ones. I'm not clear on how reusable the ODBC support is that you're adding. I see the sample configuration. Are users expected to provide their own? I'm asking to understand if this makes sense as an addition directly to the docker stack image, if it should be a recipe on the wiki, or if it should be a separate image entirely that just starts |
I expect at some point the base versions of both rpy2 and ipywidgets will get updated in conda and that these patch lines at the end can just be removed. Better to localize the fixes since we shouldn't need them in the long run. I feel the same way about the fix for the ggplot2 using the jpeg 8 setting. If these were long term additions I would agree fully with you but the desire to remove these as quickly as possible and to shrink the image should remain. This increases the desire to get the conda versions updated to fix the problems but until they are fixed there is no reason for new users trying out your docker image to experience these issues. It took quite some time to figure out how to fix these issues and others shouldn't have to go through the same process. I added the .odbc.ini file but you are correct that.additional sections should be added by the user. This just provides a bit of documentation that will help new users setup and get information from odbc sources as not everyone is using sample data or csv files. The database provides some real usage results that I was able to demo. The addition of freetds and the .odbc.ini file will help others so that they only need to append to the file I created with their database configuration to access data within their environment. I expect them to use this as a base and just add some appended lines before having a running instance. By placing it somewhere else it just makes it harder for new users to find the information. The increase in space is minor but makes it much faster for users to see what Jupyter can really do for them. I have attempted to also add Ruby to my running base image but the IRuby kernel dies so I didn't include it in the docker file. Although it would be nice to have a base image with Ruby available if any new users like that language since it is fairly popular. My general impression is that trying out the docker image is the first experience with Jupyter and that it should just work and have everything they would need. It helps them quickly create demos and figure out how helpful this project could be as a tool. I expect that after the demo on the docker image they will create a fuller instance on hardware or create more complete docker images for corporate use. |
To give you some background @SternAndrew, the |
I have moved the fixes earlier in the stack. Thank you for the background it did help. |
@jakirkham did you mean "into the base images" by lower in the stack? |
Yes, sorry, that was a bit vague. Does that still seem reasonable to you or did I misunderstand something? |
Yes. I think the PR might need to be split up a bit to target fixes in the right places. Otherwise, the datascience image has all the goodness / upgrades / fixes while the rest are left out in the cold. Fixes aside, we also have decide if the ODBC support is something that belongs only in this image, in the base images, or outside docker-stacks as a recipe / other image. If it does wind up here, it at least needs some doc about what the user has to do starting from the template in order to us it. |
I have completed breaking up the fixes. I'm fine if you move the odbc stuff into an earlier base image. |
@SternAndrew I'd like to do a bit more cleanup on your PR before merging and experiment a bit with where the odbc support should go. I'm going to leave a couple notes here. I'll branch from your PR to make the follow-on edits myself in the near future (retaining your original commits of course). If you'd like to carry on with making the changes yourself, don't let me stop you. |
gcc && apt-get clean && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
# Python pyodbc | ||
RUN pip install pyodbc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get from conda instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, pin version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tried that out but if it works it would be preferred over the non-conda. Do you have a database to test against? We should install the conda version for both Python2 and Python3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run conda install for pyodbc I get v3.0.10. I get the same when running pip install. I'm assuming their equivalent.
Re: database to test against, I think that's something we're going to want to document in the README (see below). Linking to one of the official maria, mysql, postgres, etc. docker containers to do it is probably the simplest way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good idea. Perhaps the .odbc.ini file could be prepared to use the correct credentials for the maria, mysql image. Perhaps a base database per-populated and setup as a test odbc image preconnected to test the odbc connection. That way the automated build might be able to run a system test using this populated data so future changes can make sure the odbc stuff still is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prepping the default odbc.ini to work with one of the official database images sounds like a good approach. Would you like to take a crack at documenting that setup in the README? If not, I can at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few items on my plate at this time. I think you might get to it before me.
Note that I locked in verison 1.00.9 as the version to use but that my current setup is using freetds-1.00.6.tar.gz. |
Hi @SternAndrew. So clearly neither you nor I found the time to get this PR across the finish line and into a state that would make all the additions easy for users to understand and us to maintain. I think it's best to close it out at this point and reopen it if either of us, or someone else, has the time and interest in seeing it all through. Thoughts? |
Added shell for accessing odbc to supply data to Jupyter. Fixed a few issues by pulling in newer versions of rpy2 and ipywidgets.