-
Notifications
You must be signed in to change notification settings - Fork 3
Home
pyamiimage
is a tool to extract semantic information from diagrams. The diagrams can be pathway diagrams, plots, charts or more.
We had an idea a few months back - we should try to build an automated tool that could extract semantic data from an image. An example of the is given as follows:
Image:
.reaction
{
Reactants: {Diene, Dienophile};
Products: {endo product};
Temperature: {423};
Time: {15hrs}
Pressure: {?}
Catalysts: {o-xylene};
Reversible: {False};
Exothermic: {?};
}
We started out on a specific use case: extracting biosynthetic pathways -- which is like a series of reactions to form different products. We are looking at terpene synthase pathways - biosynthetic pathways in plants that synthesize terpenes. Terpenes are the aromatic compounds found in aromatic plants such as tea, citrus, grapes. These are the compounds that lend such plants with their specific aromas.
Terpenes are synthesized in the plant leaves, flowers and fruits using various terpene synthase pathways. There are two known pathways: The MVA pathway and the MEP pathway, both lead to the same isomeric product: IPP and DMAPP. These compounds are the root of all terpenes. So, we'll start with a diagram such as this:
And extract all the relevant pathway information from this image, we can effectively create a smart image with links to wikidata, kegg or other databases in SVG. We can essentially annotate the diagram, AUTOMATICALLY.
This can be used to mine new pathway information from scientific literature and store it in a public database such as wikipathways very quickly.
pyamiimage
can be downloaded via pip:
pip install pyamiimage
pyamimage
requires Tesseract to run. Make sure you have Tesseract installed and it runs from the terminal with:
tesseract -h
pyamiimage
can be accessed as a python library.
from pyamiimage.ami_image import AmiImage
from pyamiimage.ami_graph import AmiGraph
from pyamiimage.ami_ocr import AmiOCR
image = AmiImage.create_gray_from_path(IMAGE_PATH)
bin_img = AmiImage.create_white_binary_from_image(image)
ocr = AmiOCR(bin_img)
words = ocr.get_words()
pyamiimage
is still in developmental phase. Currently it only supports text extraction via the command-line.
You can run pyamiimage from the terminal with:
pyamiimage --text /path/to/image/file /path/to/output/file
Please open issues on Github whenever you encounter any issues with pyamiimage
. It will greatly help making our software better.
If you would like to contribute, please fork and submit pull requests. Please open issues whenever making major changes.