Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fix Binding DB assay html encoding, polars_canonical_smiles_wo_salt, ci and mkdocs #2

Merged
merged 12 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/deploy-mkdocs-on-latest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: Deploy MkDocs on latest commit

on:
push:
branches:
- main
- master

jobs:
deploy-mkdocs:
uses: deargen/workflows/.github/workflows/deploy-mkdocs.yml@master
with:
deploy-type: latest
requirements-file: deps/lock/x86_64-manylinux_2_28/requirements_docs.txt
54 changes: 53 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# bio-data-to-db: make Uniprot PostgreSQL database


[![image](https://img.shields.io/pypi/v/bio-data-to-db.svg)](https://pypi.python.org/pypi/bio-data-to-db)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/bio-data-to-db)](https://pypi.python.org/pypi/bio-data-to-db)
[![image](https://img.shields.io/pypi/l/bio-data-to-db.svg)](https://pypi.python.org/pypi/bio-data-to-db)
Expand All @@ -19,6 +20,8 @@ Written in Rust, thus equipped with extremely fast parsers. Packaged for python,

So far, there is only one function implemented: **convert uniprot data to postgresql**. This package focuses more on parsing the data and inserting it into the database, rather than curating the data.

[📚 Documentation](https://deargen.github.io/bio-data-to-db/)

## 🛠️ Installation

```bash
Expand All @@ -29,6 +32,8 @@ pip install bio-data-to-db

You can use the command line interface or the python API.

### Uniprot

```bash
# It will create a db 'uniprot' and a table named 'public.uniprot_info' in the database.
# If you want another name, you can optionally pass it as the last argument.
Expand Down Expand Up @@ -61,6 +66,49 @@ create_accession_to_pk_id_table("postgresql://user:password@localhost:5432/unipr
keywords_tsv_to_postgresql("~/Downloads/keywords_all_2024_06_26.tsv", "postgresql://user:password@localhost:5432/uniprot")
```

### BindingDB

```bash
# Decode HTML entities and strip the strings in the `assay` table (column: description and assay_name).
# Currently, only assay table is supported.
bio-data-to-db bindingdb fix-table assay 'mysql://username:password@localhost/bind'
```

```python
from bio_data_to_db.bindingdb.fix_tables import fix_assay_table

fix_assay_table("mysql://username:password@localhost/bind")
```

### PostgreSQL Helpers, SMILES, Polars utils and more

```python
Some useful functions to work with PostgreSQL.

```python
from bio_data_to_db.utils.postgresql import (
create_db_if_not_exists,
create_schema_if_not_exists,
set_column_as_primary_key,
make_columns_unique,
make_large_columns_unique,
split_column_str_to_list,
polars_write_database,
)

from bio_data_to_db.utils.smiles import (
canonical_smiles_wo_salt,
polars_canonical_smiles_wo_salt,
)

from bio_data_to_db.utils.polars import (
w_pbar,
)
```

You can find the usage in the [📚 documentation](https://deargen.github.io/bio-data-to-db/).


## 👨‍💻️ Maintenance Notes

### Install from source
Expand All @@ -72,10 +120,14 @@ bash scripts/install.sh
uv pip install -r deps/requirements_dev.in
```

### Compile requirements (generate lockfiles)
### Generate lockfiles

Use GitHub Actions: `apply-pip-compile.yml`. Manually launch the workflow and it will make a commit with the updated lockfiles.

### Publish a new version to PyPI

Use GitHub Actions: `deploy.yml`. Manually launch the workflow and it will compile on all architectures and publish the new version to PyPI.

### About sqlx

Sqlx offline mode should be configured so you can compile the code without a database present.
Expand Down
2 changes: 1 addition & 1 deletion deps/lock/aarch64-apple-darwin/.requirements.in.sha256
Original file line number Diff line number Diff line change
@@ -1 +1 @@
816025c3ff73af3261b082ee7e0c71954aa6b20922e17344cfb2f29636733488 requirements.in
2f65dd8deb2842edfead23a6aafb4f4f0b9e9e98982e39216069787d16327901 requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f0f530946f38443ec95d76ac402dc3e3045fe8f7c26220e46b575aa56649503d requirements_docs.in
13 changes: 12 additions & 1 deletion deps/lock/aarch64-apple-darwin/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,23 @@
# uv pip compile requirements.in -o /home/runner/work/bio-data-to-db/bio-data-to-db/deps/lock/aarch64-apple-darwin/requirements.txt --python-platform aarch64-apple-darwin --python-version 3.10
click==8.1.7
# via typer
connectorx==0.3.3
# via -r requirements.in
markdown-it-py==3.0.0
# via rich
mdurl==0.1.2
# via markdown-it-py
numpy==2.0.0
mysqlclient==2.2.4
# via -r requirements.in
numpy==1.26.4
# via
# pandas
# pyarrow
# rdkit
pandas==2.2.2
# via -r requirements.in
pillow==10.4.0
# via rdkit
polars==1.2.0
# via -r requirements.in
psycopg==3.2.1
Expand All @@ -26,6 +33,8 @@ python-dateutil==2.9.0.post0
# via pandas
pytz==2024.1
# via pandas
rdkit==2024.3.3
# via -r requirements.in
rich==13.7.1
# via typer
shellingham==1.5.4
Expand All @@ -34,6 +43,8 @@ six==1.16.0
# via python-dateutil
sqlalchemy==2.0.31
# via -r requirements.in
tqdm==4.66.4
# via -r requirements.in
typer==0.12.3
# via -r requirements.in
typing-extensions==4.12.2
Expand Down
15 changes: 13 additions & 2 deletions deps/lock/aarch64-apple-darwin/requirements_dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ charset-normalizer==3.3.2
# via requests
click==8.1.7
# via typer
connectorx==0.3.3
# via -r requirements.in
exceptiongroup==1.2.2
# via pytest
filelock==3.15.4
Expand All @@ -24,13 +26,16 @@ maturin==1.7.0
# via -r requirements_dev.in
mdurl==0.1.2
# via markdown-it-py
mysqlclient==2.2.4
# via -r requirements.in
networkx==3.3
# via -r requirements_dev.in
numpy==2.0.0
numpy==1.26.4
# via
# -r requirements_dev.in
# pandas
# pyarrow
# rdkit
# scipy
# trimesh
packaging==24.1
Expand All @@ -39,6 +44,8 @@ packaging==24.1
# pytest
pandas==2.2.2
# via -r requirements.in
pillow==10.4.0
# via rdkit
pluggy==1.5.0
# via pytest
polars==1.2.0
Expand All @@ -59,6 +66,8 @@ pytz==2024.1
# via pandas
pyyaml==6.0.1
# via huggingface-hub
rdkit==2024.3.3
# via -r requirements.in
requests==2.32.3
# via huggingface-hub
rich==13.7.1
Expand All @@ -82,7 +91,9 @@ tomli==2.0.1
# maturin
# pytest
tqdm==4.66.4
# via huggingface-hub
# via
# -r requirements.in
# huggingface-hub
trimesh==4.4.3
# via -r requirements_dev.in
typer==0.12.3
Expand Down
156 changes: 156 additions & 0 deletions deps/lock/aarch64-apple-darwin/requirements_docs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# This file was autogenerated by uv via the following command:
# uv pip compile requirements_docs.in -o /home/runner/work/bio-data-to-db/bio-data-to-db/deps/lock/aarch64-apple-darwin/requirements_docs.txt --python-platform aarch64-apple-darwin --python-version 3.10
babel==2.15.0
# via mkdocs-material
backports-strenum==1.3.1
# via griffe
cairocffi==1.7.1
# via cairosvg
cairosvg==2.7.1
# via mkdocs-material
certifi==2024.7.4
# via requests
cffi==1.16.0
# via cairocffi
charset-normalizer==3.3.2
# via requests
click==8.1.7
# via
# mkdocs
# mkdocstrings
colorama==0.4.6
# via
# griffe
# mkdocs-material
cssselect2==0.7.0
# via cairosvg
defusedxml==0.7.1
# via cairosvg
ghp-import==2.1.0
# via mkdocs
griffe==0.48.0
# via mkdocstrings-python
idna==3.7
# via requests
importlib-metadata==8.2.0
# via mike
importlib-resources==6.4.0
# via mike
jinja2==3.1.4
# via
# mike
# mkdocs
# mkdocs-material
# mkdocstrings
markdown==3.6
# via
# mkdocs
# mkdocs-autorefs
# mkdocs-material
# mkdocstrings
# pymdown-extensions
markupsafe==2.1.5
# via
# jinja2
# mkdocs
# mkdocs-autorefs
# mkdocstrings
mergedeep==1.3.4
# via
# mkdocs
# mkdocs-get-deps
mike==2.1.2
# via -r requirements_docs.in
mkdocs==1.6.0
# via
# -r requirements_docs.in
# mike
# mkdocs-autorefs
# mkdocs-coverage
# mkdocs-gen-files
# mkdocs-literate-nav
# mkdocs-material
# mkdocstrings
mkdocs-autorefs==1.0.1
# via
# -r requirements_docs.in
# mkdocstrings
mkdocs-coverage==1.1.0
# via -r requirements_docs.in
mkdocs-gen-files==0.5.0
# via -r requirements_docs.in
mkdocs-get-deps==0.2.0
# via mkdocs
mkdocs-literate-nav==0.6.1
# via -r requirements_docs.in
mkdocs-material==9.5.30
# via -r requirements_docs.in
mkdocs-material-extensions==1.3.1
# via
# -r requirements_docs.in
# mkdocs-material
mkdocstrings==0.25.2
# via
# -r requirements_docs.in
# mkdocstrings-python
mkdocstrings-python==1.10.7
# via -r requirements_docs.in
packaging==24.1
# via mkdocs
paginate==0.5.6
# via mkdocs-material
pathspec==0.12.1
# via mkdocs
pillow==10.4.0
# via
# cairosvg
# mkdocs-material
platformdirs==4.2.2
# via
# mkdocs-get-deps
# mkdocstrings
pycparser==2.22
# via cffi
pygments==2.18.0
# via mkdocs-material
pymdown-extensions==10.9
# via
# mkdocs-material
# mkdocstrings
pyparsing==3.1.2
# via mike
python-dateutil==2.9.0.post0
# via ghp-import
pyyaml==6.0.1
# via
# mike
# mkdocs
# mkdocs-get-deps
# pymdown-extensions
# pyyaml-env-tag
pyyaml-env-tag==0.1
# via
# mike
# mkdocs
regex==2024.7.24
# via mkdocs-material
requests==2.32.3
# via mkdocs-material
six==1.16.0
# via python-dateutil
tinycss2==1.3.0
# via
# cairosvg
# cssselect2
urllib3==2.2.2
# via requests
verspec==0.1.0
# via mike
watchdog==4.0.1
# via mkdocs
webencodings==0.5.1
# via
# cssselect2
# tinycss2
zipp==3.19.2
# via importlib-metadata
2 changes: 1 addition & 1 deletion deps/lock/x86_64-apple-darwin/.requirements.in.sha256
Original file line number Diff line number Diff line change
@@ -1 +1 @@
816025c3ff73af3261b082ee7e0c71954aa6b20922e17344cfb2f29636733488 requirements.in
2f65dd8deb2842edfead23a6aafb4f4f0b9e9e98982e39216069787d16327901 requirements.in
1 change: 1 addition & 0 deletions deps/lock/x86_64-apple-darwin/.requirements_docs.in.sha256
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f0f530946f38443ec95d76ac402dc3e3045fe8f7c26220e46b575aa56649503d requirements_docs.in
Loading