Skip to content

Commit

Permalink
Merge pull request #15 from yacchin1205/feature/solr-r2
Browse files Browse the repository at this point in the history
Switch backend from MongoDB to Solr
  • Loading branch information
yacchin1205 authored May 16, 2022
2 parents 0c35017 + 8494453 commit 50f431f
Show file tree
Hide file tree
Showing 67 changed files with 5,771 additions and 10,857 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ cache:
- $HOME/.cache/bower
- $HOME/.cache/pip
python:
- 3.7
- 3.9

env:
global:
Expand Down
57 changes: 45 additions & 12 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,30 +1,57 @@
FROM niicloudoperation/notebook@sha256:420a0be732993b86f1a4956ef730a30e92e03a1203cd12850fe376b5103f0395
FROM solr:8 AS solr

FROM niicloudoperation/notebook:latest

USER root

# Install MongoDB and lsyncd
RUN apt-get update && apt-get install -yq supervisor lsyncd uuid-runtime gnupg curl \
&& apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5 \
&& echo "deb http://repo.mongodb.org/apt/debian jessie/mongodb-org/3.6 main" | tee /etc/apt/sources.list.d/mongodb-org-3.6.list \
&& apt-get update && apt-get install -yq mongodb-org \
# Install OpenJDK and lsyncd
RUN apt-get update && apt-get install -yq supervisor lsyncd uuid-runtime \
openjdk-11-jre gnupg curl tinyproxy \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& chown $NB_USER -R /var/log/mongodb /var/lib/mongodb
&& rm -rf /var/lib/apt/lists/*

# Solr
COPY --from=solr /opt /opt/
RUN mkdir -p /var/solr
COPY --from=solr /var/solr /var/solr
ENV SOLR_USER="jovyan" \
SOLR_GROUP="users" \
PATH="/opt/solr/bin:/opt/docker-solr/scripts:$PATH" \
SOLR_INCLUDE=/etc/default/solr.in.sh \
SOLR_HOME=/var/solr/data \
SOLR_PID_DIR=/var/solr \
SOLR_LOGS_DIR=/var/solr/logs \
LOG4J_PROPS=/var/solr/log4j2.xml
RUN chown jovyan:users -R /var/solr /run/tinyproxy

# MINIO
ENV MINIO_ACCESS_KEY=nbsearchak MINIO_SECRET_KEY=nbsearchsk
RUN mkdir -p /opt/minio/bin/ && \
curl -L https://dl.min.io/server/minio/release/linux-amd64/minio > /opt/minio/bin/minio && \
chmod +x /opt/minio/bin/minio && mkdir -p /var/minio && chown jovyan:users -R /var/minio

COPY . /tmp/nbsearch
RUN pip install /tmp/nbsearch jupyter_nbextensions_configurator
RUN pip install /tmp/nbsearch jupyter_nbextensions_configurator jupyter-server-proxy && \
jupyter serverextension enable --sys-prefix jupyter_server_proxy
RUN mkdir -p /usr/local/bin/before-notebook.d && \
cp /tmp/nbsearch/example/*.sh /usr/local/bin/before-notebook.d/ && \
chmod +x /usr/local/bin/before-notebook.d/*.sh && \
cp /tmp/nbsearch/example/update-index /usr/local/bin/ && \
chmod +x /usr/local/bin/update-index
chmod +x /usr/local/bin/update-index && \
mkdir -p /opt/nbsearch/ && \
cp -fr /tmp/nbsearch/solr /opt/nbsearch/

RUN mv /opt/conda/bin/jupyterhub-singleuser /opt/conda/bin/_jupyterhub-singleuser && \
mv /opt/conda/bin/jupyter-notebook /opt/conda/bin/_jupyter-notebook && \
# Boot scripts to perform /usr/local/bin/before-notebook.d/* on JupyterHub
RUN mkdir -p /opt/nbsearch/original/bin/ && \
mv /opt/conda/bin/jupyterhub-singleuser /opt/nbsearch/original/bin/jupyterhub-singleuser && \
mv /opt/conda/bin/jupyter-notebook /opt/nbsearch/original/bin/jupyter-notebook && \
cp /tmp/nbsearch/example/jupyterhub-singleuser /opt/conda/bin/ && \
cp /tmp/nbsearch/example/jupyter-notebook /opt/conda/bin/ && \
chmod +x /opt/conda/bin/jupyterhub-singleuser /opt/conda/bin/jupyter-notebook

# Configuration for Server Proxy
RUN cat /tmp/nbsearch/example/jupyter_notebook_config.py >> $CONDA_DIR/etc/jupyter/jupyter_notebook_config.py

USER $NB_UID

RUN mkdir -p /home/$NB_USER/.nbsearch && \
Expand All @@ -34,8 +61,14 @@ RUN mkdir /home/$NB_USER/.nbsearch/conf.d && \
cp /tmp/nbsearch/example/supervisor.conf /home/$NB_USER/.nbsearch/supervisor.conf && \
cp /tmp/nbsearch/example/update-index.lua /home/$NB_USER/.nbsearch/update-index.lua

# Create Solr schema
RUN precreate-core jupyter-notebook /opt/nbsearch/solr/jupyter-notebook/ && \
precreate-core jupyter-cell /opt/nbsearch/solr/jupyter-cell/

RUN jupyter nbextensions_configurator enable --user && \
jupyter nbextension install --py --user nbsearch && \
jupyter serverextension enable --py --user nbsearch && \
jupyter nbextension enable --py --user nbsearch && \
jupyter nbextension enable --py --user lc_notebook_diff

VOLUME /var/solr /var/minio
65 changes: 43 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,49 +32,70 @@ then restart Jupyter notebook.

## Settings

To use nbsearch, [MongoDB](https://www.mongodb.com/) is required.
You must prepare a MongoDB that can be connected from your Jupyter Notebook,
and describe the following configuration in your jupyter_notebook_config.
In order to use nbsearch, [Solr](https://solr.apache.org/) and S3 compatible storage is required.
Solr is used as a search index and S3 compatible storage is used to store the Notebook data.

You must prepare a Solr server and a S3 compatible storage that can be connected from your Jupyter Notebook,
and describe the configuration in your jupyter_notebook_config.

### Setting up Solr

You need to install Solr and configure two cores with the following schemas.

1. [jupyter-notebook core](./solr/jupyter-notebook/)
1. [jupyter-cell core](./solr/jupyter-cell/)

### Prepare S3 compatible storage

You can use AWS S3 or [MinIO](https://min.io/) as your S3 compatible storage. Install if needed.

### Configuring Jupyter Notebook

You need to describe the following settings in `jupyter_notebook_config`.

```
c.NBSearchDB.hostname = 'localhost'
c.NBSearchDB.port = 'localhost'
c.NBSearchDB.database = 'test_db'
c.NBSearchDB.collection = 'test_notebooks'
c.NBSearchDB.history = 'test_history'
c.NBSearchDB.username = ''
c.NBSearchDB.password = ''
c.NBSearchDB.solr_base_url = 'http://localhost:8983'
c.NBSearchDB.s3_endpoint_url = 'http://localhost:9000'
c.NBSearchDB.solr_basic_auth_username = 'USERNAME_FOR_SOLR'
c.NBSearchDB.solr_basic_auth_password = 'PASSWORD_FOR_SOLR'
c.NBSearchDB.s3_access_key = 'ACCESS_KEY_FOR_S3'
c.NBSearchDB.s3_secret_key = 'SECRET_KEY_FOR_S3'
c.LocalSource.base_dir = '/home/jovyan'
c.LocalSource.server = 'http://localhost:8888/'
```

* `c.NBSearchDB.hostname`, `c.NBSearchDB.port` - Hostname and port of the MongoDB(default: localhost:27017)
* `c.NBSearchDB.username`, `c.NBSearchDB.password` - Username and password of the MongoDB(if needed)
* `c.NBSearchDB.database` - Database name in the MongoDB(default: nbsearch)
* `c.NBSearchDB.collection` - Collection name which notebooks are stored in the Database(default: notebooks)
* `c.NBSearchDB.history` - Collection name which search history are stored in the Database(default: history)
* `c.NBSearchDB.solr_base_url` - The base URL of Solr(default: `http://localhost:8983`)
* `c.NBSearchDB.solr_basic_auth_username`, `c.NBSearchDB.solr_basic_auth_password` - The username and password for Solr(if needed)
* `c.NBSearchDB.s3_endpoint_url` - The URL of S3(default: http://localhost:9000)
* `c.NBSearchDB.s3_access_key`, `c.NBSearchDB.s3_secret_key` - The access key and secret key for S3(required)
* `c.NBSearchDB.s3_region_name` - The region name of S3(if needed)
* `c.NBSearchDB.s3_bucket_name` - The bucket on S3(required)
* `c.NBSearchDB.solr_notebook` - The core for notebooks on Solr(default: `jupyter-notebook`)
* `c.NBSearchDB.solr_cell` - The core for cells on Solr(default: `jupyter-cell`)
* `c.LocalSource.base_dir` - Notebook directory to be searchable
* `c.LocalSource.server` - URL of my server, used to identify the notebooks on this server(default: http://localhost:8888/)

## Usage

### Add indexes of notebooks to MongoDB
### Add indexes of notebooks to Solr

To make all your current notebooks searchable, run the following command. When you run this command, a collection for retrieval is prepared on the MongoDB.
To make all your current notebooks searchable, run the following command. When you run this command, a collection for retrieval is prepared on the Solr.

```
$ jupyter nbsearch update-index $CONDA_DIR/etc/jupyter/jupyter_notebook_config.py --debug local
```

### Search for Notebooks

To search the Notebook, you can use the NBSearch tab. *TBD*
To search the Notebook, you can use the NBSearch tab.
The NBSearch tab allows you to search the Notebook. By clicking on the search result, you can check the contents of the Notebook.

![NBSearch tab](./images/tab.png)
![NBSearch tab](./images/search-notebook.gif)

### Search using chrome extension
### Search for Cells

To search using the browser's context menu, use [nbsearch-helper](https://github.com/NII-cloud-operation/nbsearch-helper).
To search the Cell, you can use the NBSearch search button.
The NBSearch pane allows searching of cells. You can search for preceding and subsequent cells using MEME and add it to the current Notebook.

*TBD*
![NBSearch pane](./images/search-cell.gif)
1 change: 0 additions & 1 deletion devRequirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
pytest
pytest-asyncio
nose
mock
14 changes: 10 additions & 4 deletions example/99-run-supervisor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,20 @@

set -xe

# For MongoDB
# For Solr
supervisord -c /home/jovyan/.nbsearch/supervisor.conf

if [[ ! -f /home/$NB_USER/.nbsearch/config_local.py ]] ; then
while ! nc -z localhost 27017; do
sleep 0.1 # wait for 1/10 of the second before check again
while ! nc -z localhost 8983; do
sleep 0.5
done
jupyter nbsearch update-index /home/$NB_USER/.jupyter/jupyter_notebook_config.py --debug local
while ! nc -z localhost 9000; do
sleep 0.5
done
while ! curl http://localhost:8983/solr/jupyter-cell/admin/ping | grep '"status":"OK"'; do
sleep 0.5
done
jupyter nbsearch update-index --debug /home/$NB_USER/.jupyter/jupyter_notebook_config.py local
fi

export SUPERVISOR_INITIALIZED=1
12 changes: 5 additions & 7 deletions example/config_base.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
c.NBSearchApp.port = 9999
import os

c.NBSearchDB.hostname = 'localhost'
c.NBSearchDB.database = 'test_db'
c.NBSearchDB.collection = 'test_notebooks'
c.NBSearchDB.history = 'test_history'
c.NBSearchDB.username = ''
c.NBSearchDB.password = ''
c.NBSearchDB.solr_base_url = 'http://localhost:8983'
c.NBSearchDB.s3_endpoint_url = 'http://localhost:9000'
c.NBSearchDB.s3_access_key = os.environ['MINIO_ACCESS_KEY']
c.NBSearchDB.s3_secret_key = os.environ['MINIO_SECRET_KEY']

c.LocalSource.base_dir = '/home/jovyan'
2 changes: 1 addition & 1 deletion example/jupyter-notebook
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ if [[ -z "${SUPERVISOR_INITIALIZED}" ]]; then
run-hooks /usr/local/bin/before-notebook.d
fi

/opt/conda/bin/_jupyter-notebook $@
/opt/nbsearch/original/bin/jupyter-notebook $@
27 changes: 27 additions & 0 deletions example/jupyter_notebook_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Solr
import tempfile

def _run_solr_proxy(port):
conf = tempfile.NamedTemporaryFile(mode='w', delete=False)
conf.write('''
LogLevel Warning
PidFile "/run/tinyproxy/tinyproxy.pid"
Logfile "/tmp/tinyproxy-solr.log"
MaxClients 5
MinSpareServers 5
MaxSpareServers 20
StartServers 10
Port {port}
ReverseOnly Yes
Upstream http localhost:8983
'''.format(port=port))
conf.close()
return ['tinyproxy', '-d', '-c', conf.name]

c.ServerProxy.servers = {
'solr': {
'command': _run_solr_proxy,
'absolute_url': True,
'timeout': 30,
}
}
2 changes: 1 addition & 1 deletion example/jupyterhub-singleuser
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ if [[ -z "${SUPERVISOR_INITIALIZED}" ]]; then
run-hooks /usr/local/bin/before-notebook.d
fi

/opt/conda/bin/_jupyterhub-singleuser $@
/opt/nbsearch/original/bin/jupyterhub-singleuser $@
12 changes: 10 additions & 2 deletions example/supervisor.conf
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,16 @@ port = 127.0.0.1:9001
[include]
files = /home/jovyan/conf.d/*.conf

[program:mongod]
command=/usr/bin/mongod --config /etc/mongod.conf
[program:solr]
command=/opt/docker-solr/scripts/solr-foreground
stdout_logfile=/tmp/supervisor-%(program_name)s.log
stderr_logfile=//tmp/supervisor-%(program_name)s.log
autorestart=true
user=jovyan
priority=10

[program:minio]
command=/opt/minio/bin/minio server /var/minio
stdout_logfile=/tmp/supervisor-%(program_name)s.log
stderr_logfile=//tmp/supervisor-%(program_name)s.log
autorestart=true
Expand Down
2 changes: 1 addition & 1 deletion example/update-index
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ if [[ -f /home/$NB_USER/.nbsearch/config_local.py ]] ; then
exit 0
fi

jupyter nbsearch update-index /home/$NB_USER/.jupyter/jupyter_notebook_config.py --debug local $file
jupyter nbsearch update-index --debug /home/$NB_USER/.jupyter/jupyter_notebook_config.py local $file
exit 0
Binary file added images/search-cell.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/search-notebook.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed images/tab.png
Binary file not shown.
17 changes: 9 additions & 8 deletions nbsearch/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,20 @@

def load_jupyter_server_extension(nb_server_app):
nb_server_app.log.info('nbsearch extension started')
if os.path.isdir(nb_server_app.notebook_dir):
tmpdir = os.path.join(nb_server_app.notebook_dir, 'nbsearch-tmp')
if not os.path.isdir(tmpdir):
os.mkdir(tmpdir)
server.register_routes(nb_server_app, nb_server_app.web_app)


# nbextension
def _jupyter_nbextension_paths():
return [dict(section='tree',
src='nbextension',
dest='nbsearch',
require='nbsearch/main')]
notebook_ext = dict(section='notebook',
src='nbextension',
dest='nbsearch',
require='nbsearch/notebook')
tree_ext = dict(section='tree',
src='nbextension',
dest='nbsearch',
require='nbsearch/tree')
return [notebook_ext, tree_ext]


# server extension
Expand Down
54 changes: 0 additions & 54 deletions nbsearch/app.py

This file was deleted.

Loading

0 comments on commit 50f431f

Please sign in to comment.