-
-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get valid yaml output #3220
Get valid yaml output #3220
Conversation
We need to find a better way in saneyaml IMHO |
License texts were producing invalid YAML files when there were different indentations in the text along with empty newlines, and to make these texts YAML safe we add a single space to empty newlines to license texts with different indentation levels. Reference: #3219 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
3003abb
to
a21dfd9
Compare
@pombredanne License texts were producing invalid YAML files when there were So we update only those which were producing invalid YAML, so we are updating only a small number of files, when compared to updating all files with empty newlines. |
a21dfd9
to
d241168
Compare
* Modify License.dump() to dump yaml safe text * Add tests to verify yaml output validity * Add checks to License.validate() for yaml validity of text Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
d241168
to
a970b20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AyanSinhaMahapatra Looks good! I think there should be a new entry in the changelog for this fix.
On another note, I will also have to be careful not to remove those spaces on the newlines when I open the license files you edited, as I have a vscode option enabled that removes spaces from newlines on save.
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! see some comments inline for your consideration.
You also have a failiing test: "FAILED tests/formattedcode/test_output_yaml.py::test_scan_produces_valid_yaml" |
f19a27d
to
607497d
Compare
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
607497d
to
4b2b723
Compare
@pombredanne all green, ready for your review again! |
src/licensedcode/cache.py
Outdated
@@ -130,32 +130,32 @@ def load_or_build( | |||
|
|||
lock_file = os.path.join(scancode_cache_dir, LICENSE_LOCKFILE_NAME) | |||
|
|||
additional_directories = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this more and I think we have these states:
- index loading of a prebuilt index: we should not do any validation on rules and licenses. This is what happens on https://github.com/nexB/scancode-toolkit/pull/3220/files#diff-51085e58e72e83ba456876b9fd68b26c1e343417de07bb82a29c3e49ce4c7d45R112 alright
2.1. index building with validation: there we MUST do the validation inside a lock. Otherwise we would have a possible scan that starts on 5 processes and there is no index. Each of the 5 processes would do the validation separately. There you need to increase the LICENSE_INDEX_LOCK_TIMEOUT value in https://github.com/nexB/scancode-toolkit/pull/3220/files#diff-51085e58e72e83ba456876b9fd68b26c1e343417de07bb82a29c3e49ce4c7d45R32 this is 4 minutes for now. We should increase to 6 minutes
2.2 index building and validation separated: in this alternative approach we would separate validation in a new CLI command, and index building WOULD not require validation and may fail, and we would have to assume that validation is done on request beforehand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there we MUST do the validation inside a lock. Otherwise we would have a possible scan that starts on 5 processes and there is no index. Each of the 5 processes would do the validation separately.
Ah! I understand now, thanks for the explanation @pombredanne , I've reverted back to validating inside the lock, we still don't validate the YAML load/dump here anyway so this should be fine.
index building and validation separated:
Seems more complicated, confusing and would require more changes. I like the process we have now much more. Unless this has some good benifits it seems we should continue using our current approach?
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
876d45d
to
0b3338a
Compare
@@ -805,6 +837,7 @@ def get_rules( | |||
licenses_data_dir=licenses_data_dir, | |||
rules_data_dir=rules_data_dir, | |||
validate=False, | |||
validate_thorough=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any case where it makes sense NOT to validate thorough now that we have an increased timeout? Otherwise explain when we do thorough vs. non thorough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make sense to do this expensive validation process every time we reindex/create index as this increases the time. We are instead choosing to do this only while testing and this should be sufficient to prevent having license texts that would produce invalid YAML.
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
@pombredanne addressed comments, ready for review again! |
@@ -100,3 +100,11 @@ reindex the licenses without these licenses from the additional plugins. | |||
|
|||
Rebuild the license index including texts all languages (and not only | |||
English) and exit. This is an EXPERIMENTAL option. | |||
|
|||
|
|||
``--load-dump`` Option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍 Thanks
Reference: aboutcode-org/saneyaml#9
Fixes #3219
Tasks
Run tests locally to check for errors.