Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format-xml: Add a script for pretty-printing our XML source #432

Closed
wants to merge 1 commit into from

Conversation

wking
Copy link
Contributor

@wking wking commented Aug 3, 2017

Using Python, because I'm familiar with it, and I've floated a Python script for spdx-spec in spdx/spdx-spec#10. I'm not sure which tools/languages other license-list maintainers are familiar with.

Also using LXML, since it's popular for XML editing within Python.

If you run this script on our current master, you get a lot of changes:

$ ./bin/format-xml.py src
$ git --no-pager diff --shortstat
 270 files changed, 16593 insertions(+), 16945 deletions(-)

I've left those changes off this PR for now while we discuss the feasability of this approach, but I can push them up publically somewhere else if folks want to browse them without running the script locally.

This is laying the ground-work for automatically manipulating our XML, e.g. converting to <bullet> (#414).

@zvr
Copy link
Member

zvr commented Aug 3, 2017

:-)
I've written a similar one myself.

But:

  • let's separate the "inline" from the "block" tags
  • you should get rid of all whitespace formatting on input
  • maximum line length is nice

@wking
Copy link
Contributor Author

wking commented Aug 3, 2017 via email

@wking wking mentioned this pull request Aug 3, 2017
@zvr
Copy link
Member

zvr commented Aug 3, 2017

very quickly, because I have to get ready for a flight in a few hours.

  • yes, manual mapping of tags to the inline/block set; the built-in pretty_print was not enough, so I re-wrote it
  • you actually want something like re.sub(r'\s+', ' ', txt)
  • once the indent is calculated, the text is formatted to maxlength and lines filled, like fmt(1)

of course, as you say, none of this should be a blocking point, and I'd be happy to do another iteration later, when I get back

@wking
Copy link
Contributor Author

wking commented Aug 3, 2017 via email

@jlovejoy
Copy link
Member

we'll address this in a later release, leaving issue open for discussion then.

@wking wking mentioned this pull request Oct 18, 2017
@wking
Copy link
Contributor Author

wking commented Dec 14, 2017

@zvr, have you had time to look over this or push forward with your alternative? The current trailing whitespace issues (#410) are leading to conflicts between in-flight fixups like #496 (which removed trailing whitespace here) and #514 (which makes a semantic change to that line). Basically, folks are contributing who seem to have editors which cleanup whitespace issues, and unless they go through the extra trouble to preserve those existing issues, there are likely to be more of these unnecessary merge conflicts.

@zvr
Copy link
Member

zvr commented Dec 15, 2017

Yes, the tool is ok and I've been using it for the licenses I am adding.
For typical output, take a loot at #472 or #470 ...

@wking
Copy link
Contributor Author

wking commented Dec 15, 2017

Yes, the tool is ok and I've been using it...

Can you PR it so we can fix all the other licenses? I'd rather get that churn out of the way, and automated formatting would mean one less thing to manually review.

@goneall
Copy link
Member

goneall commented Dec 15, 2017

Can you PR it so we can fix all the other licenses? I'd rather get that churn out of the way, and automated formatting would mean one less thing to manually review.

Are we going to require all contributors to use the same pretty printing tool? Seems like there is going to be a lot of churn if we use this for all licenses. Not sure this is something we should be doing right before the release.

@wking
Copy link
Contributor Author

wking commented Dec 15, 2017 via email

@goneall
Copy link
Member

goneall commented Dec 15, 2017

And churning before the release means that diffing between this release and the next will be much easier.

Good point. If we're going to pretty print all the XML, this would be the time to do it. Suggest after we merge most of the other PR's to avoid re-basing.

With all of the recent PR's, I'm expecting some changes once we test the XML files using the license generator tool. I'll want to coordinate my fixes either before or after pretty printing.

@wking
Copy link
Contributor Author

wking commented Dec 15, 2017 via email

This was referenced Dec 17, 2017
@wking
Copy link
Contributor Author

wking commented Dec 29, 2017 via email

@wking
Copy link
Contributor Author

wking commented Mar 26, 2018

@zvr, still no PR for your pretty-printer? In the absence of that PR, can we land this on so we can get mostly auto-formatted XML? There are still very few PRs in flight that touch existing licenses, and if we are really concerned about merge conflicts I'm fine rolling out the auto-formatting in stages to work around any existing PRs.

@zvr
Copy link
Member

zvr commented Mar 26, 2018

I have published the pretty printer in https://github.com/zvr/xmlindent -- last touched when the schema changed 5 months ago.
I admit I don't remember the current state, but I used it on the massive update we did back then.

@goneall
Copy link
Member

goneall commented Dec 13, 2018

Discussed on the legal call 13 Dec 2016. The online tools will pretty print edited XML files prior to the pull request, so this should not be needed in the normal workflow. For other workflows, we would encourage submitter to pretty print prior to submit. Pretty printing after submitting may cause issues identifying change history in git. Decision made to close / reject the PR.

@goneall goneall closed this Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants