-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension handling and refactor #105
Conversation
removed get_children, get_node_data, get_node_attribute from abstraction layer and put code directly in gpxfield parser
Removed extra checking from get_first_child. Updated travis to only run tests once (just with coveralls)
Completely remove XMLParser and consolidate some code in GPXParser
Std Lib doesn't read in comments and doesn't support the remove_comments parser option. Std Lib also doesn't support QName methods the same as lxml.
Fixed a circular import on older versions of python, expected to fail unittests with 2.x and LXML
This function seems to do nothing of consequence. Also adding htmlcov (the coverage html folder) to .gitignore, and a few pycodestyle fixes in parser.py.
Added a new __library function to provide convenient access to whether LXML is loaded or not. The LXML status should only be needed when running the test suite.
Best guess for Python 4+ is that strings will remain unicode
removed deprecated getchildren() call by using node.find().
Numerous changes here: changed extensions to list of ETrees added serialization of extensions changed namespace handling to properly prefix and search removed __hash__ functions on mutable objects (makes no sense and broken with extensions) removed tests using hash()
OK now... time for a successful build :(
ElementTree in 2.6 does not support register_namespaces, which is a key part of namespace handing for prefixes. No one should be writing new code on 2.6 anyway...
This reverts commit f607dd7.
Added attribute serialization for ETree nodes
Updated several serialization routines and changed handling for empty containter tags like <link></link>. Laid out some of the framework for prettyprint
Finished adding pretty print handling for serialization.
Added unittests for each extension spot, initialized all extensions as lists instead of None. Code formatting cleanups throughout.
changed to set individual attribs as key, value pairs
Changed default output version to 1.1, added a unittest to verify no extensions are written in 1.0, found a small bug in the Email handling.
Accidentally left work in progress
""" | ||
Parse the XML and return a GPX object. | ||
|
||
Args: | ||
version: str or None indicating the GPX Schema to use. | ||
Options are '1.0', '1.1' and None. When version is None | ||
the version is read from the file or falls back on 1.0. | ||
|
||
the version is read from the file or falls back on 1.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the change it now falls back on 1.1.
I would like to support this pull request and I suggest pulling it in before #102. The refactorings here allow for cleaner re-write of #102:
Furthermore I think that the removal of abstraction layers that became obsolete when removing minidom will make it easier for newcomers to contribute. Well done @nawagers! |
@tkrajina Any estimate when you'll have a chance to look at this? Would it help if it was broken into many smaller PRs? Do you want another person to help maintain the package by reviewing and approving PRs? |
I've been working with this branch and I didn't find any problem (a little bit faster with lxml but not impressive) |
Glad to hear it's working for you. |
@nawagers assuming the email in your commits is correct, I sent you an email now. |
Working for me. Just a note that the test to confirm the extensions children contents is not fleshed out. |
Merged in dev, will be included in the next release. |
Here is a rather big pull request. It will fix #73 #81 #98 in addition to a big step towards pr #102.
@tkrajina and @gluap, have a look
Changes in dev:
and (c)ElementTree from the Standard Library. The only good reason to keep
minidom is to have StdLib solution, but StdLib's ElementTree package API is
nearly the same as lxml. The default parser is now lxml and falls back to
cElementTree and ElementTree.
option no longer makes sense and was removed. This may break old code.
wrapped to switch between minidom and eTree. The functions have all be
written back to their native eTree calls and in some cases, the logic was
cleaned up to be more Pythonic. Functionality from XMLParser, LXMLParser and
GPXParser have all been condensed down to GPXParser.
support minidom, parsing speed improved a fair amount for lxml. More
interesting is that cElementTree runs through all the unittests 2-3x faster
than lxml. lxml is generally faster than the StdLib for serialization, but
gpxpy has it's own serialization routines since it's not in an ElementTree
structure. Moving forward, consider dropping lxml completely. Someone should
do so benchmarking on vastly different platforms (webservers, desktops).
with tags as keys. This made the assumption that there was only one
extension, and that it was only 1 level deep, and no interesting information
was in xml attributes. The schema supports nearly arbitrary xml and multiple
extensions. The easiest way to represent this is a list of Element (and
their subElements). It now parses and serializes in this way.
read/wrote invalid GPX 1.1 when handling extensions. Extensions must live in
their own namespace. GPXParser now uses Regular Expressions to read in all
the namespace declarations and save them to the nsmap dictionary. In nsmap,
'defaultns' is the GPX namespace. This is read in on parsing, but
overwritten on serialization. During serialization, the xsi namespace is
also added. On serialization of extensions, the Clark notation is converted
back to a prefix automatically. Anyone building an extension instead of
parsing one just needs to add the prefix/URI to the dictionary.
StdLib ElementTree is poor and was even worse in 2.6, making it
incompatible. gpxpy on Python 2.6 still works with lxml, but the CI testing
was dropped. No one should be writing new code on 2.6 anyway.
parser, so the tests need 2 environments: with and without lxml. The new
environment variable XMLPARSER is set and installs lxml appropriately. For
python 3.2, lxml is pinned to version 3.6, the last supported version. A
function was added to GPXParser to return which implementation is used. A
unittest was added to check that lxml loads properly when running in the
LXML environment. Unittests were being run twice, once inside and once
outside of the coverage container. The extra run was removed, also pep8 was
changed to pycodestyle (new name).
even if there was no data in subelements. These fields were defined by the
*POINT_FIELDS and were just stored as the text of the tag, like 'copyright'.
The tags are now aware of the subelements. If none of the dependents are
there, all fields are suppressed until the close of that tag. For example,
'copyright' depends on copyright_author, copyright_year, and
copyright_license. The new string is
'copyright:copyright_author:copyright_year:copyright_license'. Specifically
required subelements should be prefaced by an '@', like 'link' requires a
subelement named 'link' so the new string is
link:@link
. Currently '@' isstripped, but enforcement isn't implemented yet.
__hash__
dunder should not be replaced for mutableobjects. The default behavior if
__hash__
is written such that hash(A) ==hash(B) only when A is B and not necessarily when A == B. Implementing
__hash__
on mutable objects will break the usage on a number of StdLibcollections. All
__hash__
functions have been removed. Unittests for thehash functions have also been removed.
properly namespaced extensions (required by schema).
extensions feature multiple extensions, depth, namespaced attributes, text
and tail text.
extension syntax. A new test was added to read a complex extension from xml
and verify it reads. Another test builds a minimal gpx from scratch, puts a
new extension in each field, serializes the gpx, parses it back, and
verifies each field. An additional test adds several extensions, then
serializes with version 1.0 to verify that no extensions are written.
the .format() function instead of the % operator. This was changed in many
places. Long string concatenation should be done by appending to a list and
then using ''.join(). Strings are immutable, so a new string is created for
every += operation.
to_xml functions have a nsmap argument that defaults to None. Also added was
prettyprint and indent. These default to True and '' respectively and
implement prettyprinting by increasing the indent for every nested element.
1.1. If the gpx is written in 1.0, all extension data will be lost. Since
it's better to preserve data, changing default version to 1.1. Version 1.1
probably more widely supported also.