-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError in .reader.xml.v21
with reference to non-existent Concept (INSEE)
#205
Comments
Hi! Thanks for the report. I can't reproduce it with the code you've given. I see the following locally: import sdmx
insee = sdmx.Client("INSEE")
flow_msg = insee.dataflow()
insee.dataflow("CNA-PIB-2014", params={"references": "all"})
I also get a 404 / SDMX ErrorMessage if I open this URL in my web browser. Can you please mention the version of Python you're using, and anything else particular in your environment that may cause this difference? Also, aside from that, if you have the actual query URL or a local or cached copy of the XML response, please share that. That way I can try to reproduce the parsing error, even if the example query doesn't work. |
Sorry, first message by hand, it is |
Thanks! I can now reproduce. So the direct URL is https://www.bdm.insee.fr/series/sdmx/dataflow/ALL/CNA-2014-PIB/latest?references=all Skimming this message, I don't see the referenced concept. For example, I see: <str:ConceptScheme id="CONCEPTS_INSEE" urn="urn:sdmx:org.sdmx.infomodel.conceptscheme.ConceptScheme=FR1:CONCEPTS_INSEE(1.0)" agencyID="FR1" version="1.0">
<com:Name xml:lang="fr">Concepts Insee</com:Name>
<com:Name xml:lang="en">Insee concepts</com:Name>
<str:Concept id="FREQ" urn="urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=FR1:CONCEPTS_INSEE(1.0).FREQ">
<com:Name xml:lang="fr">Périodicité</com:Name>
<com:Name xml:lang="en">Frequency</com:Name>
</str:Concept>
…
<str:Concept id="SATISFACTION_VIE" urn="urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=FR1:CONCEPTS_INSEE(1.0).SATISFACTION_VIE">
<com:Name xml:lang="fr">Satisfaction dans la vie</com:Name>
<com:Name xml:lang="en">Life satisfaction</com:Name>
</str:Concept>
</str:ConceptScheme> This is indeed the same "maintainable parent" referred to by: <Ref id="TIME_PERIOD" maintainableParentID="CONCEPTS_INSEE" maintainableParentVersion="1.0" agencyID="FR1" package="conceptscheme" class="Concept"/> …but I don't think it contains the referenced item. Am I missing something? |
OK, IIUC you're saying that upstream INSEE is missing this referenced item, and that's why it isn't parsed correctly, is that right? Should we patch this ourselves in handle_requests then? And (or just) report upstream so that they fix it? If the concept is missing, add it, then pass to |
Yes.
I would suggest to do that, yes. It's probably not intentional.
I'm not 100% sure what I would prefer as a fix. I would say definitely not code that specifically looks for a certain URN from a certain data provider and makes a very targeted correction for a particular, possibly temporary upstream error. I think that is out of scope for a package like this; we can't possibly track every single such case, so it's better not to pretend to try. “If the concept is missing…”—I think something like this is possible.
Does that sound reasonable? |
INSEE
results have a valid schema that fails to parse. .reader.xml.v21
with reference to non-existent Concept (INSEE)
I'm going to report the error upstream. Hopefully they will fix it rapidly. I'm still curious about why the XML data is valid if that's an error (valid in the sense of https://sdmx1.readthedocs.io/en/latest/howto/validate.html). Yes I like a big warning much better than an unusable xml. |
I've also opened an issue through the website, will keep you informed. |
Essentially, XSD schemas say things like:
In other words, it's largely structural; it isn't really meant to express the logical validity of the contents or meaning of those tags and attributes. So the message we're looking at is structurally fine, but it happens that its contents do not make sense as SDMX-ML. By analogy, imagine a SDMX data set about the average height of some humans—structure is fine, valid SDMX-ML, but one of the observations says "1.56 cm". That would be a clear error or invalidity, but not a structural one that could be identified with XSD. |
Over in #207 I discovered that the concept with ID |
Nice to know, I'll add that in the report as soon as I have an answer. |
FYI, INSEE confirmed the receipt of the report and transmitted it to the right team. |
@khaeru This is the answer of the INSEE team:
|
They don't seem to have understood the issue…maybe they didn't come here and read our discussion? The portion of documentation that they point to says that the DSD has a dimension with id= The actual problem is that those components (the dimension and the primary measure) explicitly reference, as concept identities, these artefacts:
and these artefacts do not exist. The references are broken. It would be weird and non-standard, but maybe they intend something like the following:
Because either of those would not be in line with the SDMX standards, they should say so precisely in their documentation. If they don't make such declarations it's impossible to guess what they mean. |
Using the latest SDMX version :
leads to the following error:
despite that I've verified that the schema is valid for this data. I've tried to debug this but I'm not familiar enough with the codebase to do that properly. From what I tried:
the culprit lines in the xml are
For some reason, during the parsing in Reference, when
elem.tag
is suffixed withConceptIdentity
, there seems to be no Ref info. On the previous element,TimeDimension
is parsed and has a Ref. That doesn't seem to make much sense.The text was updated successfully, but these errors were encountered: