-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makefile: Add a Makefile and build script #22
Conversation
Could you document or propose changes to the license-list-XML travis files to use these when there is a PR or commit? For example, do we clone this repo into the license-list-XML working directory? |
On Tue, Jan 02, 2018 at 03:26:16PM -0800, goneall wrote:
Could you document or propose changes to the license-list-XML travis
files to use these when there is a PR or commit? For example, do we
clone this repo into the license-list-XML working directory?
To minimize the number of balls in the air, I've punted on
auto-publish for now. I can fill that in (with follow-up PRs?) if you
like. I'd like the eventual license-list-XML .travis.yml to look
something like:
matrix:
include:
- language: java
before_install:
- git clone --depth 1 https://github.com/goneall/license-test-files
- git clone --depth 1 https://github.com/spdx/license-list-data
script:
- 'if [[ master = "${TRAVIS_BRANCH}" ]]; then make -C license-list-data LICENSE_XML=../ TEST_DATA=../license-test-files publish; else make TEST_DATA=license-test-files validate-canonical-match; fi'
- language: node_js
node_js:
- "node"
|
Based on the script in-flight with [1], but using an external test-data directory. This script does not cover fetching those directories, but it does cover the build process once they're fetched. I didn't build the JAR myself, so I'm just hashing the JAR that Bintray is currently returning and assuming that is uncompromised. Docs on $@, $<, and $* in [2]. The key in goneall.gpg is 0xEA760AF7CBD0E4F9013F06FB8CCBE63DE91BFF1C, imported with: $ gpg --search 0x8CCBE63DE91BFF1C gpg: data source: http://5.45.108.219:11371 (1) Gary O'Neall (Key for signing jar files) <gary@sourceauditor.com> 4096 bit RSA key 8CCBE63DE91BFF1C, created: 2017-12-13 Keys 1-1 of 1 for "0x8CCBE63DE91BFF1C". Enter number(s), N)ext, or Q)uit > 1 gpg: key 8CCBE63DE91BFF1C: public key "Gary O'Neall (Key for signing jar files) <gary@sourceauditor.com>" imported gpg: Total number processed: 1 gpg: imported: 1 The source keyserver is from hkp://pool.sks-keyservers.net. I then exported the key with: $ gpg --export 0x8CCBE63DE91BFF1C >goneall.gpg When exported without ASCII armor, such keys can be used as a keyring. The --no-default-keyring option disables the user's default keyring (if any. There probably won't be one on Travis). And the --keyring ./goneall.gpg slots in our local file. We need the ./ (at least with GnuPG 2.1.20), because gpg2(1) has: If the filename does not contain a slash, it is assumed to be in the GnuPG home directory ("~/.gnupg" if --homedir or $GNUPGHOME is not used). The output may still include: gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. but that's ok, because gpg still exits zero. And because of our keyring manipulation, we know the key is trusted. Signatures from any other key will instead show: gpg: Can't check signature: No public key and exit 2. [1]: spdx/license-list-XML#592 [2]: https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html
I like the makefile approach. I would like to move this into the license-list-XML repo and reference the local test data. Since the script version already has this working and most of your recommendations have been implemented, I would like to roll out the scripts first then follow-up with an improved makefile implementation. I would also like to implement an incremental build functionality. It takes a long time to process all of the license list. We could detect which files were modified since the last build and enhance the SPDX tools to take a list of license-listXML files to be "compiled". Using Make to orchestrate this would be a lot easier than implementing in a script. |
I'd rather not, for the reasons given here and in spdx/license-test-files#6. Managing separate checkouts (as I do in spdx/license-list-XML#593) isn't hard, and could be automated with scripts/submodules. I think that's worth it to stay dry.
This PR is also implemented ;).
The scripts still auto-tag releases, and I think that's a mistake. Also, I don't see a hurry to auto-publish. spdx/license-list-XML#593 has enough to auto-validate without pulling in output-format-specific code. We can circle back and auto-publish in follow-up work. |
I commented out the tagging for the releases. Once we auto-publish and test, I no longer need to manually run the scripts myself. This saves us time and allows me to go on vacations ;) In terms of the repositories for the tests, I don't think you and are are going to agree. I do think putting the tests in a repository separate from the source XML files places a burden on contributors and maintainers of the license-list-XML repo with little or no practical benefit. As an example, the recent PR to change the test for the Bittorrent-1.0 license would cause a failure for the next commit to the license list XML. The commit would likely be unrelated to Bittorrent. It would be confusing for the committer. Since the submittal is in a different repository, it won't be very obvious what caused the error. To fix the problem, the license XML would have to be updated. Since they are in separate repositories, you would have to coordinate the submittals to avoid test failures. If you submit the license XML first, it may fail since it doesn't match the test. If you submit the test first you have the problem describe above. As an alternative, if we maintained the tests and the license XML in the same repository you could make one commit that included both the test update and license XML update. This would logically group the changes together. Make the changes atomic. Make it easier to revert and keep consistent. |
On Wed, Jan 03, 2018 at 05:05:38PM +0000, goneall wrote:
Once we auto-publish and test, I no longer need to manually run the
scripts myself. This saves us time and allows me to go on vacations
;)
I think everyone's on board with auto-testing as soon as possible.
But I don't see a problem with publishing lagging a month or so if you
happen to be on vacation. If license-list-XML expects to tag a
release, I expect we can either negotiate the timing to avoid your
vacations or grant other folks write access and have them push the
release tag instead.
In terms of the repositories for the tests, I don't think you and
are are going to agree.
Is there some middle ground where we can agree? Would you accept a
submodule entry in license-list-XML to include the test data?
I do think putting the tests in a repository separate from the
source XML files places a burden on contributors and maintainers of
the license-list-XML repo with little or no practical benefit.
And I think the burden is the other way around ;). But I guess we can
wait and see how your way plays out.
As an example, the recent PR to change the test for the
Bittorrent-1.0 license would cause a failure for the next commit to
the license list XML.
No it wouldn't, because license-list-XML is pinning a specific
test-data commit [1].
Since they are in separate repositories, you would have to
coordinate the submittals to avoid test failures.
With separate repositories, you'd still need to coordinate the *order*
of license-list-XML landings (at least, if you wanted to stay with
commits that were in the test-data master branch). For example, if
the test-data repo landed a BitTorrent-1.0 fix and then a GPL-2.0 fix
(for example), license-list-XML would have to merge their
BitTorrent-1.0 PR before they merged thir GPL-2.0 fix. I don't think
that's a large cost.
Make the changes atomic.
They'd still be atomic, since a single license-list-XML commit would
bump the pinned test-data commit hash and update the XML.
[1]: https://github.com/spdx/license-list-XML/pull/593/files#diff-354f30a63fb0907d4ad57269548329e3R6
|
This would solve some of my concerns by making the changes and history more visible. Would this still require submitting the test file and license file in separate repos? If I were to make a change that required a new or updated test along with the XML file, I would like to be able to make that change in a single commit. |
@wking separate question and topic: If I were to change the interface to the RdfaLicenseGenerator tool to take in specific file patterns in place of a directory name, would you be able to update the makefile to only compile the files which have changed since the last time the license-list-data files were generated? I am thinking the incremental builds would greatly improve the performance of the license checks and could also be used to generate the output files if we take the additional step of deleting output files on deletion of the associated XML file. |
On Wed, Jan 03, 2018 at 12:02:27PM -0800, goneall wrote:
Would this still require submitting the test file and license file
in separate repos? If I were to make a change that required a new
or updated test along with the XML file, I would like to be able to
make that change in a single commit.
You'd still need two commits:
* One like [1] to license-test-files to update the test data.
* Another like [2] to update the XML and bump the submodule.
However, with the spdx/license-list-XML#492 approach, you *already*
have those same two commits. Unless you drop [1], in which case folks
need to pull in all of license-list-XML if they want access to this
test data. The submodule approach (or the fairly similar
commit-pinning from [3]) is giving you the ability to recycle [1] when
you're working up [2] instead of requiring you to repeat [1] like you
did in [4].
[1]: spdx/license-test-files@7771d37
[2]: wking/license-list-XML@86c2af5
[3]: spdx/license-list-XML@992d62e
[4]: spdx/license-list-XML@2c1c392
|
On Wed, Jan 03, 2018 at 08:16:32PM +0000, goneall wrote:
If I were to change the interface to the RdfaLicenseGenerator tool
to take in specific file patterns in place of a directory name,
would you be able to update the makefile to only compile the files
which have changed since the last time the license-list-data files
were generated?
The cleanest rule for something like this would be to use static
pattern rules [1], wildcard [2], substitution references [3], and
shell [4]:
LICENSE_PATHS = $(wildcard $(LICENSE_XML)/src/*.xml)
LICENSES = $(LICENSE_PATHS:$(LICENSE_XML)/src/%.xml=%)
VERSION = $(shell git -C "${LICENSE_XML}" describe --always || echo 'UNKNOWN')
DATE = $(shell date -I)
text/%.txt: $(LICENSE_XML)/src/%.xml spdx-tools-$(TOOL_VERSION).jar-valid resources/licenses-full.json
java -jar spdx-tools-$(TOOL_VERSION).jar LicenseTextGenerator $< $@ $(VERSION) $(DATE)
… additional rules for the other output types …
I am thinking the incremental builds would greatly improve the
performance of the license checks and could also be used to generate
the output files…
How often do you build? While efficient build tooling is nice, the
current build only takes ~5min [5].
… if we take the additional step of deleting output files on
deletion of the associated XML file.
The make rule I floated above is using mtimes to determine file
updates. Using Git to determine file removals would require also
tracking the last-built commit. That's possible (especially if it's
always “the current commit's first parent”), but I'm not sure it's
worth the trouble to trim a 5min build.
[1]: https://www.gnu.org/software/make/manual/html_node/Static-Usage.html
[2]: https://www.gnu.org/software/make/manual/html_node/Wildcard-Function.html
[3]: https://www.gnu.org/software/make/manual/html_node/Substitution-Refs.html
[4]: https://www.gnu.org/software/make/manual/html_node/Shell-Function.html
[5]: https://travis-ci.org/spdx/license-list-XML/builds/324695123
|
TOOLS=${TOOLS:-spdx-tools-*.jar} | ||
LICENSE_XML=${LICENSE_XML:-../license-list-XML} | ||
TEST_DATA=${TEST_DATA:-../license-test-files/simpleForGenerator} | ||
EXPECTED_WARNINGS="\"Duplicates licenses: AGPL-3.0-or-later, AGPL-3.0-only\",\"Duplicates licenses: GFDL-1.1-or-later, GFDL-1.1-only\",\"Duplicates licenses: GFDL-1.2-or-later, GFDL-1.2-only\",\"Duplicates licenses: GFDL-1.3-or-later, GFDL-1.3-only\",\"Duplicates licenses: GPL-1.0-or-later, GPL-1.0-only\",\"Duplicates licenses: GPL-2.0-or-later, GPL-2.0-only\",\"Duplicates licenses: GPL-3.0-or-later, GPL-3.0-only\",\"Duplicates licenses: LGPL-2.0-or-later, LGPL-2.0-only\",\"Duplicates licenses: LGPL-2.1-or-later, LGPL-2.1-only\",\"Duplicates licenses: LGPL-3.0-or-later, LGPL-3.0-only\",\"Duplicates licenses: MPL-2.0, MPL-2.0-no-copyleft-exception\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll be able to drop this line if spdx/license-list-XML#593 lands with its expected-warnings
file.
OK - Let's leave it as is. |
I'm still of the opinion to store the test files in the license-list-XML repository. I'm OK re-opening the discussion once we decide on the format of the test repository. Jack is still working out the proposal. As far as keeping the 2 repositories in sync, we could exclude the canonical text tests from the license list test data repo, or, we could automatically update the test data repo when the license-list-XML canonical text is updated using the same Travis-CI scripts which test the license XML files. |
On Fri, Jan 05, 2018 at 12:42:53AM +0000, goneall wrote:
As far as keeping the 2 repositories in sync, we could exclude the
canonical text tests from the license list test data repo…
That gets around the issue somewhat for canonical/original text. But
either the alt/optional sections will remain unexercised in
license-list-XML or *all* of the test data (including other variants)
will have to go into license-list-XML.
… or, we could automatically update the test data repo when the
license-list-XML canonical text is updated using the same Travis-CI
scripts which test the license XML files.
If we end up with all the test data (including variants) in
license-list-XML, I don't see a point to maintaining this repository.
Folks who need the test data can just pull license-list-XML. I think
we're missing an opportunity for a useful separation of concerns, but
I can't think of any reasons beyond those I've already mentioned.
|
I do agree with this point - let's revisit this once we have the test repository fully up and running. We can always switch over to a different approach. In the tools, I have a much more robust testing framework which uses positive and negative tests. The only thing missing is a consensus on the file structure for the tests. |
MIT License Copyright (c) [year] [fullname] Permission is hereby granted, free of charge, to any person obtaining a copy The above copyright notice and this permission notice shall be included in all THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
Update README examples
Since we have the testing up and running in the license-list-XML repo, closing this PR |
Based on the script in-flight with spdx/license-list-XML#592, but using an external test-data directory. This script does not cover fetching those directories, but it does cover the build process once they're fetched. I didn't build the JAR myself, so I'm just hashing the JAR that Bintray is currently returning and assuming that is uncompromised.
Docs on
$@
,$<
, and$*
are here.This pull request is an alternative to #21, we'll want one or the other.