You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to read the "Test.BioC.XML" in two ways:
1:
with open(fpath, 'r') as fp:
collection = biocxml.load(fp)
docs = collection.documents
2:
with biocxml.iterparse(fpath) as reader:
collection_info = reader.get_collection_info()
for doc in reader:
It is strange to find that all annotations are missing, but relations are corrected parsed.
Any idea why this happens?
The text was updated successfully, but these errors were encountered:
Above is the difference between the correct pubtator file and the one that I converted with bioc.
I believe this is a bug in parsing the xml file somewhere.
In order to help more people, I will explain the problem and post the solution here :
The problem is that each document object has empty annotation list.
But relation annotation list is fine.
Actually the annotations are inside each passage node.
They can be found by the following code.
from bioc import biocxml
fpath = 'Test.BioC.XML'
with open(fpath, 'r') as fp:
collection = biocxml.load(fp)
docs = collection.documents
for doc in docs:
for passage in doc.passages:
for annotation in passage.annotations:
print(annotation)
I’m writing a python script, to convert biocxml file into pubtator file.
I did not find similar script, so all I can do is to write one on my own.
The bioc files are downloaded from :
https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/BioRED.zip
I tried to read the "Test.BioC.XML" in two ways:
1:
with open(fpath, 'r') as fp:
collection = biocxml.load(fp)
docs = collection.documents
2:
with biocxml.iterparse(fpath) as reader:
collection_info = reader.get_collection_info()
for doc in reader:
It is strange to find that all annotations are missing, but relations are corrected parsed.
Any idea why this happens?
The text was updated successfully, but these errors were encountered: