Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common function to map from input fields to common fields #6

Open
jorainer opened this issue Apr 23, 2019 · 7 comments
Open

Common function to map from input fields to common fields #6

jorainer opened this issue Apr 23, 2019 · 7 comments

Comments

@jorainer
Copy link
Collaborator

Please correct me if I got this wrong: the idea is to have map data from different input sources to a commonly agreed set of fields and an object that can hold this data. So, the workflow would be:

  1. read input file.
  2. map names of the input file to commonly accepted names.
  3. put that into a result object.

So, 1) would be an input type specific function and its result should be a named list of the file's elements. 2) uses the schema for the mapping, hence, this could be a single function for all parsers, right? 3) this one would also be a single function as I see it.

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

Input and output, importantly. Otherwise I think you are correct.

I actually envisioned it slightly differently: 1) read input file, 2) map into a result (Spectrum/Spectra/...) object using the corresponding formats' nomenclature/sytem/hierarchy, 3) map the Spectrum/Spectra/result object with custom names to a Spectrum/Spectra/result object with common names. Your workflow is more consistent because mine requires processing the actual peaks separately from / before all other information. It removes an intermediate that I think of as useful, but maybe I can figure out how to work without it.

@jorainer
Copy link
Collaborator Author

jorainer commented Apr 23, 2019

Do you have already a function that converts the names provided by the input file to the common names using the schema?

I think that function will be a key one that we need - it should also be fast, if possible.

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

We are not as quickly progressing here, unfortunately, since I have to fit this work into my regular work somehow. Also my first implementation will certainly not be a fast one.

@jorainer
Copy link
Collaborator Author

No prob. Was not sure if I just overlooked that one.

@Treutler
Copy link
Collaborator

Treutler commented Apr 23, 2019

Please keep in mind that there are multiple field names for the same value in case of (at least) Nist .*msp and Bruker .library. E.g. the instrument in the NIST.msp format can be

  • Instrument
  • Synon: $:07
  • Comments: instrument

I encoded this in the table as Instrument / Synon: $:07 / Comments: instrument.
Accordingly, we have to (i) support these different flavors for the import and (ii) decide which flavor to export.

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

Accordingly, we have to (i) support these different flavors for the import and (ii) decide which flavor to export.

(i) could be feasible by doing something like this:

- field: Synon
  node:
   - field: $:70
     map_read: instrument

or map: instrument, type: readonly. There will also be cases of nested mapping, where a sub-entry in one record format is a toplevel entry in general (e.g. possibly INCHIKEY depending on how we define it.)

(ii) I guess every schema needs to choose a canonical export format.

@Treutler
Copy link
Collaborator

Treutler commented Apr 24, 2019

(ii) I guess every schema needs to choose a canonical export format.

Agreed. I adjusted the fields in the table so that the first field is meant to be the canonical export format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants