-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not a bug: Default namespace doesn't seem to be supported. #137
Comments
Hi 👋 Over on Slack, @kescobo was helping me and came up with a good MWE: julia> test = parsexml("""
<?xml version="1.0" encoding="utf-8"?>
<xbrl
xmlns="http://www.xbrl.org/2003/instance"
xmlns:country="http://xbrl.sec.gov/country/2017-01-31"
xmlns:dei="http://xbrl.sec.gov/dei/2019-01-31"
xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
xmlns:link="http://www.xbrl.org/2003/linkbase"
xmlns:mlic="http://www.metlife.com/20191231"
xmlns:srt="http://fasb.org/srt/2019-01-31"
xmlns:us-gaap="http://fasb.org/us-gaap/2019-01-31"
xmlns:xbrldi="http://xbrl.org/2006/xbrldi"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
</xbrl>""")
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000040b48a20>))
julia> findall("/xbrl", test)
0-element Array{EzXML.Node,1}
julia> test2 = parsexml("""
<?xml version="1.0" encoding="utf-8"?>
<xbrl></xbrl>""")
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000040b48360>))
julia> findall("/xbrl", test2)
1-element Array{EzXML.Node,1}:
EzXML.Node(<ELEMENT_NODE[xbrl]@0x0000000042f60090>) As far as I can tell, the former should be a valid XML document. Any idea what is going on? |
function Base.findall(xpath::AbstractString, doc::Document)
return findall(xpath, doc.node)
end
function Base.findall(xpath::AbstractString, doc::Document)
return findall(xpath, doc.root) # i.e. doc.root instead of doc.node
end
Edit: Nevermind. |
A more minimal MWE:
|
Thanks @kescobo 🙌 I think this might be part of the problem: julia> test = parsexml("""
<?xml version="1.0" encoding="utf-8"?>
<xbrl xmlns="http://www.xbrl.org/2003/instance">
</xbrl>""")
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000008298810>))
julia> namespaces(root(test))
1-element Array{Pair{String,String},1}:
"" => "http://www.xbrl.org/2003/instance" It seems EzXML does not like the empty key 🤔 |
Just for fun:
|
I tried modifying function Base.findall(xpath::AbstractString, doc::Document, ns=namespaces(doc.node))
return findall(xpath, doc.node, ns)
end and then tried julia> findall("/xbrl", test, namespaces(root(test)))
┌ Warning: ignored the empty prefix for 'http://www.xbrl.org/2003/instance'; expected to be non-empty
└ @ EzXML C:\Users\ericf\.julia\dev\EzXML\src\xpath.jl:85
0-element Array{EzXML.Node,1} Because the prefix was empty, it gets ignored. That seems to be why we get zero elements from |
I think there's something about
And also, it seems to break parsing, every time I do that, all subsequent calls to
|
I think |
Oh, I see... |
Your MWE is namespaced and julia> namespaces(test.node)
0-element Array{Pair{String,String},1} so no namespaces are being registered. I think that is why |
From Wikipedia: https://en.wikipedia.org/wiki/XML_namespace
|
It seems like an issue dealing with default namespaces 🤔 |
EzXML apparently uses http://xmlsoft.org/namespaces.html default namespaces should be supported. I am probably confused 🤔 |
It works if I remove the default namespace: julia> test = parsexml("""
<?xml version="1.0" encoding="utf-8"?>
<xbrl
xmlns:country="http://xbrl.sec.gov/country/2017-01-31"
xmlns:dei="http://xbrl.sec.gov/dei/2019-01-31"
xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
xmlns:link="http://www.xbrl.org/2003/linkbase"
xmlns:mlic="http://www.metlife.com/20191231"
xmlns:srt="http://fasb.org/srt/2019-01-31"
xmlns:us-gaap="http://fasb.org/us-gaap/2019-01-31"
xmlns:xbrldi="http://xbrl.org/2006/xbrldi"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
</xbrl>""")
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000008349bf0>))
julia> findall("/xbrl", test)
1-element Array{EzXML.Node,1}:
EzXML.Node(<ELEMENT_NODE[xbrl]@0x000000003192c7e0>) |
Ok. I am slowly learning about namespaces. We need a way to register default namespaces. This package is currently ignoring them 🤔 |
This is C#, but the discussion looks relevant: https://docs.microsoft.com/en-us/dotnet/standard/data/xml/xpath-queries-and-namespaces#the-default-namespace
|
RTFM. Sorry for the noise 😔 |
For discoverability. Related: JuliaIO#137 (which is the only google result for `Warning: ignored the empty prefix for 'http://www.w3.org/2000/svg'; expected to be non-empty`)
Note: I'm on Windows 10, Julia v1.4.0, EzXML v1.1.0
Hi 👋
Thank you for this package 🙏
I am working on a fairly largish 45K line XML file (but some of the lines are VERY long) and can't seem to get a basic
findall
to work.I get my
It seems fine. Then I grab its root:
but then
I am expecting this to give me the root node.
Any idea what I'm doing wrong?
Edit:
This seems to work:
Edit^2: If it helps, here is the XML file:
https://www.sec.gov/Archives/edgar/data/937834/000093783420000005/mlic-12312019x10kdocum_htm.xml
The text was updated successfully, but these errors were encountered: