Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add temporary-file-path argument for multi-processing #19

Merged
merged 3 commits into from
May 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,11 @@ smiles_str = "CC"
allow_bad_stereo = False,
wildcard_radicals = False,
jar_fpath = "/path/to/opsin.jar",
tmp_fpath = "py2opsin_temp_input.txt",
)
```

The result is returned as a Python string, or False if an unexpected error occurs when calling OPSIN. If a list of IUPAC names is provided, a list is returned. It is __highly__ reccomended to use `py2opsin` in this manner if you need to resolve any more than a couple names -- the performance cost of running `OPSIN` from Python one name at a time is significant (~5 seconds/molecule individually, milliseconds otherwise).
The result is returned as a Python string, or False if an unexpected error occurs when calling OPSIN. If a list of IUPAC names is provided, a list is returned. It is __highly__ recommended to use `py2opsin` in this manner if you need to resolve any more than a couple names -- the performance cost of running `OPSIN` from Python one name at a time is significant (~5 seconds/molecule individually, milliseconds otherwise).

Arguments:
- chemical_name (str): IUPAC name of chemical as a Python string, or a list of strings.
Expand All @@ -51,7 +52,10 @@ Arguments:
- allow_bad_stereo (bool, optional): Allow OPSIN to ignore uninterpreatable stereochem. Defaults to False.
- wildcard_radicals (bool, optional): Output radicals as wildcards. Defaults to False.
- jar_fpath (str, optional): Filepath to OPSIN jar file. Defaults to "opsin-cli.jar" which is distributed with py2opsin.
- tmp_fpath (str, optional): tmp_fpath (str, optional): Name for temporary file used for calling OPSIN. Defaults to "py2opsin_temp_input.txt". When multiprocessing, set this to a unique name for each process.

> [!TIP]
> `OPSIN` will already parallelize itself by creating multiple threads! Be wary when using `py2opsin` with multiprocessing to avoid spawning too many processes.

## Massive speedup from `pubchempy` for batch translations
`py2opsin` runs locally and is smaller in scope in what it provides, which makes it __dramatically__ faster at resolving identifiers. In the code block below, the call to `py2opsin` will execute faster than an equivalent call to `pubchempy`:
Expand Down
2 changes: 1 addition & 1 deletion py2opsin/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .py2opsin import py2opsin

__version__ = "1.0.6"
__version__ = "1.1.0"
10 changes: 6 additions & 4 deletions py2opsin/py2opsin.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ def py2opsin(
allow_bad_stereo: bool = False,
wildcard_radicals: bool = False,
jar_fpath: str = "default",
tmp_fpath: str = "py2opsin_temp_input.txt",
) -> str:
"""Simple passthrough to opsin, returning results as Python strings.

Expand All @@ -65,6 +66,8 @@ def py2opsin(
allow_bad_stereo (bool, optional): Allow OPSIN to ignore uninterpreatable stereochem. Defaults to False.
wildcard_radicals (bool, optional): Output radicals as wildcards. Defaults to False.
jar_fpath (str, optional): Filepath to OPSIN jar file. Defaults to "default", which causes py2opsin to use its included jar.
tmp_fpath (str, optional): Name for temporary file used for calling OPSIN. Defaults to "py2opsin_temp_input.txt".
When multiprocessing, set this to a unique name for each process.

Returns:
str: Species in requested format, or False if not found or an error ocurred. List of strings if input is list.
Expand Down Expand Up @@ -112,15 +115,14 @@ def py2opsin(
)

# write the input to a text file
temp_f = "py2opsin_temp_input.txt"
with open(temp_f, "w") as file:
with open(tmp_fpath, "w") as file:
if type(chemical_name) is str:
file.write(chemical_name)
else:
file.writelines("\n".join(chemical_name) + "\n")

# add the temporary file to the args
arg_list.append(temp_f)
arg_list.append(tmp_fpath)

# grab the optional boolean flags
if allow_acid:
Expand Down Expand Up @@ -168,4 +170,4 @@ def py2opsin(
warnings.warn("Unexpected error ocurred! " + e)
return False
finally:
os.remove(temp_f)
os.remove(tmp_fpath)
12 changes: 12 additions & 0 deletions test/test_py2opsin.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
import os
import multiprocessing
import sys
import unittest

from py2opsin import py2opsin


# multiprocessing test function
def _f(b):
return py2opsin(b[0], tmp_fpath=f"tmp_{b[1]}.txt")


class Test_py2opsin(unittest.TestCase):
"""
Test the various functionalities of py2opsin.
Expand Down Expand Up @@ -97,6 +103,12 @@ def test_invalid_output_helpful_error(self):
"Output format SMOLES is invalid. Did you mean 'SMILES'?",
)

def test_multiprocessing(self):
"""py2opsin should safely work when run with multiprocessing"""
with multiprocessing.Pool(2) as pool:
res = pool.map(_f, [("methanol", 0), ("ethanol", 1)])
self.assertEqual(res, ["CO", "C(C)O"])

def test_name_to_smiles(self):
"""
Tests converting IUPAC names to SMILES strings
Expand Down
Loading