Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atomese->json and json->atomese #179

Open
linas opened this issue Mar 19, 2020 · 4 comments
Open

atomese->json and json->atomese #179

linas opened this issue Mar 19, 2020 · 4 comments

Comments

@linas
Copy link
Contributor

linas commented Mar 19, 2020

@tanksha and @Habush I have a generic idea I want to explore. How many of the biome datasets are available in json format?

I'm thinking that it might be worth creating a generic json->atomese importer, and a generic exporter. However, this would work only if

  1. the input json dataset formats are not badly/insanely designed (i.e. the resulting imported atomese would have to look reasonable enough to be useable)

  2. There are not too many conversions needed to obtain something that can be easily reasoned on or pattern-mined. So, for example, the import process might be (a) generic json->atomese (b) run a BindLink to convert this generic-atomese into something friendlier for mining/pln/moses (c) actually run mining/pln/moses. (so maybe @ngeiswei you'd have an idea about this?)

What is not clear is whether step (b) above is easier than just writing a custom json importer. If it's not easier, then this generic-import idea seems like a not-very-good idea. But I can't really tell...

(follow up to issue #164)

@Habush
Copy link
Contributor

Habush commented Mar 19, 2020

How many of the biome datasets are available in json format?

We don't store any of the dataset in json format. The atomese to json conversion happens on-the-fly after running the annotation.

We are currently converting the atomese into a specific JSON format which we use for a specific purpose (visualization). But if we are to design a generic importer/exporter we should ignore the current format we are using and start from a scratch to specify a schema/format that would convey the information in atomese as much as possible. After that we can replace the current parser code with the generic one and write an adapter for the visualizer.

@linas
Copy link
Contributor Author

linas commented Mar 19, 2020

We are currently converting the atomese into a specific JSON format

You should continue to do that, and I am NOT recommending that this be replaced or changed in any way.

I am asking a different question. There are datasets -- gene ontologies, proteome datasets, the GGI and PPI datasets, etc. for with @tanksha has written importers. Again, I am NOT recommending that those importers be thrown away or re-written. They work, so that's good enough.

What I'm asking about is OTHER datasets: e.g. SBML (systems biology markup language) - is it available as json? if we imported the json, is the "natural", "generic" import good enough, or would it require a custom importer?

@linas
Copy link
Contributor Author

linas commented Mar 19, 2020

@rekado please let me know if you have any opinion about this.

@mjsduncan
Copy link
Contributor

mjsduncan commented Mar 21, 2020

@linus, smbl isn't available in json but it is convertible to OWL (via bioPAX), and all the bio-ontologies we want to import are also available in OWL. a wizard/toolkit/pipeline for translating OWL format to atomese would be hugely useful.
imo translating the bulk databases could be done via json because they all have api's, but in general the semantic translation is pretty simple and it seems more efficient to do it via bulk downloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants