Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added shell for accessing odbc to supply data to Jupyter. Fixed a fe… #222

Closed
wants to merge 10 commits into from
5 changes: 5 additions & 0 deletions base-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ RUN cd /tmp && \
$CONDA_DIR/bin/conda config --system --add channels conda-forge && \
conda clean -tipsy

# Fix the graphing issue with ggplot2
# from: https://github.com/jupyter/docker-stacks/issues/210
RUN mkdir -p $CONDA_DIR/conda-meta && \
echo "jpeg 8*" >> $CONDA_DIR/conda-meta/pinned

# Install Jupyter notebook as jovyan
RUN conda install --quiet --yes \
'notebook=4.2*' \
Expand Down
60 changes: 58 additions & 2 deletions datascience-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,16 @@ RUN apt-get update && \
apt-get install -y --no-install-recommends \
fonts-dejavu \
gfortran \
unixodbc-dev \
libtool-bin \
autoconf \
automake \
gcc && apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Python pyodbc
RUN pip install pyodbc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get from conda instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, pin version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tried that out but if it works it would be preferred over the non-conda. Do you have a database to test against? We should install the conda version for both Python2 and Python3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run conda install for pyodbc I get v3.0.10. I get the same when running pip install. I'm assuming their equivalent.

Re: database to test against, I think that's something we're going to want to document in the README (see below). Linking to one of the official maria, mysql, postgres, etc. docker containers to do it is probably the simplest way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea. Perhaps the .odbc.ini file could be prepared to use the correct credentials for the maria, mysql image. Perhaps a base database per-populated and setup as a test odbc image preconnected to test the odbc connection. That way the automated build might be able to run a system test using this populated data so future changes can make sure the odbc stuff still is working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepping the default odbc.ini to work with one of the official database images sounds like a good approach. Would you like to take a crack at documenting that setup in the README? If not, I can at some point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few items on my plate at this time. I think you might get to it before me.


# Julia dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
Expand All @@ -26,8 +33,15 @@ USER $NB_USER
# R packages including IRKernel which gets installed globally.
RUN conda config --add channels r && \
conda install --quiet --yes \
'rpy2=2.7*' \
'r-base=3.2*' \
'r-base=3.2*'

# Fix the version of rpy2 since the 2.7 is broken.
# Note that there is a bug with rpy2 where the %R and %%R don't work.
# We need an newer version not yet available on the default for conda to fix this.
RUN conda install -c bioconda rpy2=2.7.8

# R packages including IRKernel which gets installed globally.
RUN conda install --quiet --yes \
'r-irkernel=0.5*' \
'r-plyr=1.8*' \
'r-devtools=1.9*' \
Expand All @@ -43,8 +57,13 @@ RUN conda config --add channels r && \
'r-nycflights13=0.1*' \
'r-caret=6.0*' \
'r-rcurl=1.95*' \
'r-dbi=0.3*' \
'r-scales=0.3*' \
'r-randomforest=4.6*' && conda clean -tipsy

# Install additional non-conda R packages
RUN R -e 'install.packages(c("ggthemes", "RODBC", "sendmailR", "tis"), repos="http://cran.utstat.utoronto.ca/")'

# Install IJulia packages as jovyan and then move the kernelspec out
# to the system share location. Avoids problems with runtime UID change not
# taking effect properly on the .local folder in the jovyan home dir.
Expand All @@ -56,3 +75,40 @@ RUN julia -e 'Pkg.add("IJulia")' && \
# Add essential packages
RUN echo "push!(Sys.DL_LOAD_PATH, \"$CONDA_DIR/lib\")" > /home/$NB_USER/.juliarc.jl && \
julia -e 'Pkg.add("Gadfly")' && julia -e 'Pkg.add("RDatasets")' && julia -F -e 'Pkg.add("HDF5")'

# Add odbc drivers to the install
RUN cd /home/jovyan && \
wget ftp://ftp.freetds.org/pub/freetds/stable/freetds-1.00.9.tar.gz && \
echo "dcd5e7589f955ced31269de63fb554562806da0133cb7f930b117588313eaf18 freetds-1.00.9.tar.gz" | sha256sum -c - && \
tar xvzf freetds-1.00.9.tar.gz && \
cd /home/jovyan/freetds-1.00.9 && \
./configure --prefix=/home/jovyan/odbcdriver && \
make && \
make install

# Add ODBC configuration file
RUN echo "[ODBC Data Sources]" >> ~/.odbc.ini && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update README to describe what the user does to this. A sample that works out of the box when bound to a DB in another docker container would be good too. If we merge this, we have to support this, so we better have a way to ensure it works.

echo "SQL_DB = Sample Database Configuration" >> ~/.odbc.ini && \
echo "" >> ~/.odbc.ini && \
echo "[Default]" >> ~/.odbc.ini && \
echo "" >> ~/.odbc.ini && \
echo "[SQL_DB]" >> ~/.odbc.ini && \
echo "Description = Sample Database Configuration" >> ~/.odbc.ini && \
echo "Driver = /home/jovyan/odbcdriver/lib/libtdsodbc.so" >> ~/.odbc.ini && \
echo "Trace = No" >> ~/.odbc.ini && \
echo "TraceFile = /home/jovyan/prodms.log" >> ~/.odbc.ini && \
echo "Server = MachineIPAddress" >> ~/.odbc.ini && \
echo "Host = MachineHostName" >> ~/.odbc.ini && \
echo "Port = MachineDBPort" >> ~/.odbc.ini && \
echo "Database = DatabaseInstanceName" >> ~/.odbc.ini && \
echo "UID = DatabaseUserName" >> ~/.odbc.ini && \
echo "PWD = DatabasePassword" >> ~/.odbc.ini && \
echo "Protocol =" >> ~/.odbc.ini && \
echo "ReadOnly = No" >> ~/.odbc.ini && \
echo "RowVersioning = No" >> ~/.odbc.ini && \
echo "ShowSystemTables = No" >> ~/.odbc.ini && \
echo "ShowOidColumn = No" >> ~/.odbc.ini && \
echo "FakeOidIndex = No" >> ~/.odbc.ini && \
echo "ConnSettings =" >> ~/.odbc.ini && \
echo "TDS_Version = 7.0" >> ~/.odbc.ini

6 changes: 5 additions & 1 deletion scipy-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,13 @@ RUN apt-get update && \

USER $NB_USER

# Fix the ipywidgets since the base image has an older version of ipywidgets.
# The older version works with the Python2 instance but not the Python3 instance.
# Installing a newer version allows the widgets to work with both Python versions.
RUN conda install -c conda-forge ipywidgets=5.1.5
Copy link
Member

@parente parente Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do this separately. conda-forge is configured as a system channel in the base-notebook (https://github.com/jupyter/docker-stacks/blob/master/base-notebook/Dockerfile#L64)

I confirmed locally that a regular install without specifying the channel picks up ipywidgets 5.1.5. (But would be nice if @jakirkham can confirm that this is the correct behavior.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think we should start using a much newer version of conda. It appears to be 3.19.1 currently, but to get channel priority to work we need 4.1.x. Though this may come with some other surprises. That all being said, I pulled down the base-notebook and found it has 4.1.4 installed. So, this pinning is not actually respected. Something we should think about. Though I would recommend using something on the 4.1.x series and we should discuss fixing that pinning in a new PR.

In other words, conda-forge should be preferred without this channel argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we ever intended to pin conda itself. The initial install has to point to some release. Thereafter we let conda upgrade itself. If the entire package mgmt system is busted I think we'd find or during the build process.


# Install Python 3 packages
RUN conda install --quiet --yes \
'ipywidgets=5.1*' \
'pandas=0.17*' \
'numexpr=2.5*' \
'matplotlib=1.5*' \
Expand Down