-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MappingSetDataFrame
issues when no converter
passed
#513
Comments
Thank you for the issue, responses will be a bit piecemeal:
Is this after msdf.clean()? If so, is there at least 1 entity in the data frame that has that prefix?
Are you sure the remaining 6 are not the build in sssom prefixes?
Wow - long time! |
i. Missing prefix
This happens whether or not I call ii. Built-in SSSOM prefixesYep, they are. My preference is to not have them there, but I can understand why we'd want to include them; maybe it is better. How about this. Can we consider adding a |
i. Bug.. We need a minimal test for this. Can you prepare one (input file(s) with one row, command used to invoke)?
ii. Can you make a Documentation request in https://github.com/mapping-commons/sssom/issues (model repo), asking for clarification if built-in prefixes are required to appear in the metadata or just recommended? If it ends up as recommended, I agree with you, we will add your flag. |
(i) Will do! FYI though, I'm not running a command. This is the Python API where this is a problem. Though perhaps this problem can also arise in the CLI. (ii) Will do! (iii) Instantiating |
Look at the default values: https://github.com/mapping-commons/sssom-py/blob/master/src/sssom/util.py#L96 By default, it is constructed with a converter! At least if I understand the dataclass syntax correctly. |
I don't know enough about dataclasses to be a huge help here, but... We know that it is not instantiating and/or using the instantiated
Lines 27 to 30 in 01965f4
I'd suggest possibly: Add param: curie_map: Dict = None Update: return curies.Converter.from_prefix_map(metadata['curie_map']) if curie_map \
else curies.chain([_get_built_in_prefix_map(), _get_default_converter()]) Delete: Then it looks like this needs to be updated to pass |
Unfortunately we dont have Harshad for a while now, so we will have to sort this ourselves. I am not enthusiastic about any of your proposed solutions, and the problem space is too large for me to reason over in the short time I have per issue. Do you agree that by far the most critical issue you mention here is:
? Can you add a test to https://github.com/mapping-commons/sssom-py/blob/master/tests/test_utils.py that demonstrates the problem? |
I understand this isn't something you can tackle deeply right now. We could also do: convs = [_get_built_in_prefix_map(), _get_default_converter()]
if curie_map:
convs = [curies.Converter.from_prefix_map(metadata['curie_map'])] + convs
return curies.chain(convs) Yep, adding a test is in the OP tasks / on my schedule. |
Done! |
Built-in SSSOM prefixes@matentzn Given that these built-ins are not required, do you mind if I make an issue for this suggestion?:
edit: Created the issue: |
Sure, feel free to make an issue, if the PR is super small, I we can release it fairly quickly. |
Overview
I just noticed several issues with that arise when using
MappingSetDataFrame
and not passing anyConverter
along with it.Sub-issues
curie_map
TestWrite.test_missing_entries()
#534Converter
Sub-issue details
1. Performance
msdf = MappingSetDataFrame(converter=converter, df=df, metadata=metadata)
: Takes 0:00:00.000008 secondsmsdf = MappingSetDataFrame(df=df, metadata=metadata)
: Takes 33.206355 seconds2. Incorrect
curie_map
FYI:
metadata
: icd11.sssom-metadata.yml.zip2.1. Missing entries
In my
curie_map
in mymetadata
dictionary, I have some prefixes (e.g.icd11.foundation
that do not show up in the output file.2.1.1. Add test
Prepare test w/ input file(s) with one row & Python code used to invoke. Possibly another test substituting the Python code for a command, if that is possible.
2.2. Too many entries
Compare the following results.
Converter
and passing that toMappingSetDataFrame
. I also mentioned this in UX: Extra entries incurie_map
of written TSV #514. Can be addressed by.clean_prefix_map()
:remove_unused_builtins
param #537. See example: ordo-icd11.sssom - with converter.tsv.zipMappingSetDataFrame
passing in mymetadata
but noConverter
. See example: ordo-icd11.sssom - no converter.tsv.zip2.2.1. Add test
3. UX: Should automatically instantiate
Converter
I haven't fully thought through possible downsides of this, but I recommend that when
MappingSetDataFrame
is instantiated using only adf
andmetadata
, but noconverter
, it should instantiate the converter viacuries.Converter.from_prefix_map(metadata['curie_map'])
.The text was updated successfully, but these errors were encountered: