Skip to content

Commit

Permalink
feat: fix Binding DB assay html encoding, polars_canonical_smiles_wo_…
Browse files Browse the repository at this point in the history
…salt, ci and mkdocs (#2)
  • Loading branch information
kiyoon authored Aug 1, 2024
1 parent 87117c6 commit 6d9f4ad
Show file tree
Hide file tree
Showing 40 changed files with 1,209 additions and 26 deletions.
14 changes: 14 additions & 0 deletions .github/workflows/deploy-mkdocs-on-latest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: Deploy MkDocs on latest commit

on:
push:
branches:
- main
- master

jobs:
deploy-mkdocs:
uses: deargen/workflows/.github/workflows/deploy-mkdocs.yml@master
with:
deploy-type: latest
requirements-file: deps/lock/x86_64-manylinux_2_28/requirements_docs.txt
54 changes: 53 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# bio-data-to-db: make Uniprot PostgreSQL database


[![image](https://img.shields.io/pypi/v/bio-data-to-db.svg)](https://pypi.python.org/pypi/bio-data-to-db)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/bio-data-to-db)](https://pypi.python.org/pypi/bio-data-to-db)
[![image](https://img.shields.io/pypi/l/bio-data-to-db.svg)](https://pypi.python.org/pypi/bio-data-to-db)
Expand All @@ -19,6 +20,8 @@ Written in Rust, thus equipped with extremely fast parsers. Packaged for python,

So far, there is only one function implemented: **convert uniprot data to postgresql**. This package focuses more on parsing the data and inserting it into the database, rather than curating the data.

[📚 Documentation](https://deargen.github.io/bio-data-to-db/)

## 🛠️ Installation

```bash
Expand All @@ -29,6 +32,8 @@ pip install bio-data-to-db

You can use the command line interface or the python API.

### Uniprot

```bash
# It will create a db 'uniprot' and a table named 'public.uniprot_info' in the database.
# If you want another name, you can optionally pass it as the last argument.
Expand Down Expand Up @@ -61,6 +66,49 @@ create_accession_to_pk_id_table("postgresql://user:password@localhost:5432/unipr
keywords_tsv_to_postgresql("~/Downloads/keywords_all_2024_06_26.tsv", "postgresql://user:password@localhost:5432/uniprot")
```

### BindingDB

```bash
# Decode HTML entities and strip the strings in the `assay` table (column: description and assay_name).
# Currently, only assay table is supported.
bio-data-to-db bindingdb fix-table assay 'mysql://username:password@localhost/bind'
```

```python
from bio_data_to_db.bindingdb.fix_tables import fix_assay_table

fix_assay_table("mysql://username:password@localhost/bind")
```

### PostgreSQL Helpers, SMILES, Polars utils and more

```python
Some useful functions to work with PostgreSQL.

```python
from bio_data_to_db.utils.postgresql import (
create_db_if_not_exists,
create_schema_if_not_exists,
set_column_as_primary_key,
make_columns_unique,
make_large_columns_unique,
split_column_str_to_list,
polars_write_database,
)

from bio_data_to_db.utils.smiles import (
canonical_smiles_wo_salt,
polars_canonical_smiles_wo_salt,
)

from bio_data_to_db.utils.polars import (
w_pbar,
)
```

You can find the usage in the [📚 documentation](https://deargen.github.io/bio-data-to-db/).


## 👨‍💻️ Maintenance Notes

### Install from source
Expand All @@ -72,10 +120,14 @@ bash scripts/install.sh
uv pip install -r deps/requirements_dev.in
```

### Compile requirements (generate lockfiles)
### Generate lockfiles

Use GitHub Actions: `apply-pip-compile.yml`. Manually launch the workflow and it will make a commit with the updated lockfiles.

### Publish a new version to PyPI

Use GitHub Actions: `deploy.yml`. Manually launch the workflow and it will compile on all architectures and publish the new version to PyPI.

### About sqlx

Sqlx offline mode should be configured so you can compile the code without a database present.
Expand Down
2 changes: 1 addition & 1 deletion deps/lock/aarch64-apple-darwin/.requirements.in.sha256
Original file line number Diff line number Diff line change
@@ -1 +1 @@
816025c3ff73af3261b082ee7e0c71954aa6b20922e17344cfb2f29636733488 requirements.in
2f65dd8deb2842edfead23a6aafb4f4f0b9e9e98982e39216069787d16327901 requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f0f530946f38443ec95d76ac402dc3e3045fe8f7c26220e46b575aa56649503d requirements_docs.in
13 changes: 12 additions & 1 deletion deps/lock/aarch64-apple-darwin/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,23 @@
# uv pip compile requirements.in -o /home/runner/work/bio-data-to-db/bio-data-to-db/deps/lock/aarch64-apple-darwin/requirements.txt --python-platform aarch64-apple-darwin --python-version 3.10
click==8.1.7
# via typer
connectorx==0.3.3
# via -r requirements.in
markdown-it-py==3.0.0
# via rich
mdurl==0.1.2
# via markdown-it-py
numpy==2.0.0
mysqlclient==2.2.4
# via -r requirements.in
numpy==1.26.4
# via
# pandas
# pyarrow
# rdkit
pandas==2.2.2
# via -r requirements.in
pillow==10.4.0
# via rdkit
polars==1.2.0
# via -r requirements.in
psycopg==3.2.1
Expand All @@ -26,6 +33,8 @@ python-dateutil==2.9.0.post0
# via pandas
pytz==2024.1
# via pandas
rdkit==2024.3.3
# via -r requirements.in
rich==13.7.1
# via typer
shellingham==1.5.4
Expand All @@ -34,6 +43,8 @@ six==1.16.0
# via python-dateutil
sqlalchemy==2.0.31
# via -r requirements.in
tqdm==4.66.4
# via -r requirements.in
typer==0.12.3
# via -r requirements.in
typing-extensions==4.12.2
Expand Down
15 changes: 13 additions & 2 deletions deps/lock/aarch64-apple-darwin/requirements_dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ charset-normalizer==3.3.2
# via requests
click==8.1.7
# via typer
connectorx==0.3.3
# via -r requirements.in
exceptiongroup==1.2.2
# via pytest
filelock==3.15.4
Expand All @@ -24,13 +26,16 @@ maturin==1.7.0
# via -r requirements_dev.in
mdurl==0.1.2
# via markdown-it-py
mysqlclient==2.2.4
# via -r requirements.in
networkx==3.3
# via -r requirements_dev.in
numpy==2.0.0
numpy==1.26.4
# via
# -r requirements_dev.in
# pandas
# pyarrow
# rdkit
# scipy
# trimesh
packaging==24.1
Expand All @@ -39,6 +44,8 @@ packaging==24.1
# pytest
pandas==2.2.2
# via -r requirements.in
pillow==10.4.0
# via rdkit
pluggy==1.5.0
# via pytest
polars==1.2.0
Expand All @@ -59,6 +66,8 @@ pytz==2024.1
# via pandas
pyyaml==6.0.1
# via huggingface-hub
rdkit==2024.3.3
# via -r requirements.in
requests==2.32.3
# via huggingface-hub
rich==13.7.1
Expand All @@ -82,7 +91,9 @@ tomli==2.0.1
# maturin
# pytest
tqdm==4.66.4
# via huggingface-hub
# via
# -r requirements.in
# huggingface-hub
trimesh==4.4.3
# via -r requirements_dev.in
typer==0.12.3
Expand Down
156 changes: 156 additions & 0 deletions deps/lock/aarch64-apple-darwin/requirements_docs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# This file was autogenerated by uv via the following command:
# uv pip compile requirements_docs.in -o /home/runner/work/bio-data-to-db/bio-data-to-db/deps/lock/aarch64-apple-darwin/requirements_docs.txt --python-platform aarch64-apple-darwin --python-version 3.10
babel==2.15.0
# via mkdocs-material
backports-strenum==1.3.1
# via griffe
cairocffi==1.7.1
# via cairosvg
cairosvg==2.7.1
# via mkdocs-material
certifi==2024.7.4
# via requests
cffi==1.16.0
# via cairocffi
charset-normalizer==3.3.2
# via requests
click==8.1.7
# via
# mkdocs
# mkdocstrings
colorama==0.4.6
# via
# griffe
# mkdocs-material
cssselect2==0.7.0
# via cairosvg
defusedxml==0.7.1
# via cairosvg
ghp-import==2.1.0
# via mkdocs
griffe==0.48.0
# via mkdocstrings-python
idna==3.7
# via requests
importlib-metadata==8.2.0
# via mike
importlib-resources==6.4.0
# via mike
jinja2==3.1.4
# via
# mike
# mkdocs
# mkdocs-material
# mkdocstrings
markdown==3.6
# via
# mkdocs
# mkdocs-autorefs
# mkdocs-material
# mkdocstrings
# pymdown-extensions
markupsafe==2.1.5
# via
# jinja2
# mkdocs
# mkdocs-autorefs
# mkdocstrings
mergedeep==1.3.4
# via
# mkdocs
# mkdocs-get-deps
mike==2.1.2
# via -r requirements_docs.in
mkdocs==1.6.0
# via
# -r requirements_docs.in
# mike
# mkdocs-autorefs
# mkdocs-coverage
# mkdocs-gen-files
# mkdocs-literate-nav
# mkdocs-material
# mkdocstrings
mkdocs-autorefs==1.0.1
# via
# -r requirements_docs.in
# mkdocstrings
mkdocs-coverage==1.1.0
# via -r requirements_docs.in
mkdocs-gen-files==0.5.0
# via -r requirements_docs.in
mkdocs-get-deps==0.2.0
# via mkdocs
mkdocs-literate-nav==0.6.1
# via -r requirements_docs.in
mkdocs-material==9.5.30
# via -r requirements_docs.in
mkdocs-material-extensions==1.3.1
# via
# -r requirements_docs.in
# mkdocs-material
mkdocstrings==0.25.2
# via
# -r requirements_docs.in
# mkdocstrings-python
mkdocstrings-python==1.10.7
# via -r requirements_docs.in
packaging==24.1
# via mkdocs
paginate==0.5.6
# via mkdocs-material
pathspec==0.12.1
# via mkdocs
pillow==10.4.0
# via
# cairosvg
# mkdocs-material
platformdirs==4.2.2
# via
# mkdocs-get-deps
# mkdocstrings
pycparser==2.22
# via cffi
pygments==2.18.0
# via mkdocs-material
pymdown-extensions==10.9
# via
# mkdocs-material
# mkdocstrings
pyparsing==3.1.2
# via mike
python-dateutil==2.9.0.post0
# via ghp-import
pyyaml==6.0.1
# via
# mike
# mkdocs
# mkdocs-get-deps
# pymdown-extensions
# pyyaml-env-tag
pyyaml-env-tag==0.1
# via
# mike
# mkdocs
regex==2024.7.24
# via mkdocs-material
requests==2.32.3
# via mkdocs-material
six==1.16.0
# via python-dateutil
tinycss2==1.3.0
# via
# cairosvg
# cssselect2
urllib3==2.2.2
# via requests
verspec==0.1.0
# via mike
watchdog==4.0.1
# via mkdocs
webencodings==0.5.1
# via
# cssselect2
# tinycss2
zipp==3.19.2
# via importlib-metadata
2 changes: 1 addition & 1 deletion deps/lock/x86_64-apple-darwin/.requirements.in.sha256
Original file line number Diff line number Diff line change
@@ -1 +1 @@
816025c3ff73af3261b082ee7e0c71954aa6b20922e17344cfb2f29636733488 requirements.in
2f65dd8deb2842edfead23a6aafb4f4f0b9e9e98982e39216069787d16327901 requirements.in
1 change: 1 addition & 0 deletions deps/lock/x86_64-apple-darwin/.requirements_docs.in.sha256
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f0f530946f38443ec95d76ac402dc3e3045fe8f7c26220e46b575aa56649503d requirements_docs.in
Loading

0 comments on commit 6d9f4ad

Please sign in to comment.