Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sec38 div issues #8000

Merged
merged 37 commits into from
Aug 16, 2021
Merged

Sec38 div issues #8000

merged 37 commits into from
Aug 16, 2021

Conversation

poikilotherm
Copy link
Contributor

@poikilotherm poikilotherm commented Jul 14, 2021

What this PR does / why we need it:

This PR fixes several security reportings. Please see linked issue for details.

Which issue(s) this PR closes:

Closes IQSS/dataverse-security#38

Special notes for your reviewer:
There are a lot of moving parts in this one. This is due to the fact that it's moving a lot of Java XML tech in the background by updating old deps.
I tried to add as much context information to commit messages for the individual pieces as possible.

Starting with the XML stuff, it was a bit like Domino: stones started to fall once the first had been pushed.

  • Changed XML printing in Java made tests fail due to whitespace and hard String compare (avoided by using XMLUnit). Required lots of tests using XML to be adapted.
  • While doing this, stumbled over MANY of these tests silently relying on BrandingUtil unit test been run before, failing when executed on their own. Refactored LOADS of tests (not just the ones I absolutely had too) to be more atomic. Not perfect but better.
  • All of this was a good chance to move the tests to JUnit5 and create loads of parameterized tests.
  • These refactorings to literally unblock me cause a lot of noise in this PR. I'm sorry - but usually fixed tests are a good thing, aren't they? BTW removed lots of noise from test outputs...

Suggestions on how to test this:
Should be all set with our usual unit and integration tests.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope.

Is there a release notes update needed for this change?:
Nope.

Additional documentation:
None yet.

The update to Passay 1.6.0 required adapting to the change Java API.

IQSS/dataverse-security#38
The password validator test spit out an error. To identify the
failing example, the test was ported to JUnit5 parameterized test.

The test example has been changed, as the library detects another error.
The error message has been validated and found correct.

IQSS/dataverse-security#38
@poikilotherm poikilotherm self-assigned this Jul 14, 2021
@coveralls
Copy link

coveralls commented Jul 14, 2021

Coverage Status

Coverage increased (+0.002%) to 19.135% when pulling ca21f73 on poikilotherm:sec38-div-issues into 8c4d651 on IQSS:develop.

This fixes an potential security issue due to an outdated dependency
(commons-beanutils).

The test needed to be fixed and revealed that formerly email addresses
with trailing whitespace were accepted. The update solved this error,
too.

IQSS/dataverse-security#38
@qqmyers
Copy link
Member

qqmyers commented Jul 14, 2021

FWIW - I was doing some light testing yesterday with the 1.27 tika and haven't found any problems.

poikilotherm and others added 21 commits July 16, 2021 16:48
The old version 3.1.1 had issues with updated deps using
classes in version 55 (Java 11). See https://issues.apache.org/jira/browse/MDEP-613
(Although going for 3.1.2 would be sufficient, better go
for recent version)
…rse-security#38

The XOM library had been changed to com.iom7.xom before, trying to
achieve updated dependencies. It had been declared as a direct
dependency, which it is not - the library is nowhere used in the
Dataverse codebase. Instead, it is used by sword2-server and others.

As the original authors of XOM added never versions, so this commit
removes the direct dependency and adds a newer version to
<dependencyManagement>.

This has a sideeffect: now the xercesImpl library version is defined by
Apache Tika (depending on it) instead of Xom (which is a sub-dependency,
making the relying version there loose against Tika).

This update of xercesImpl is necessary as the old version (the code used
<=2.8.0) contains security issues fixed in 2.12.1, which Tika depends
on.

It does not seem necessary to add xercesImpl to <dependencyManagement>
yet. That might change in the future to avoid drifting versions...

See IQSS/dataverse-security#38
…e Commons HttpClient

The future update of sword2-server will drop the dependency to commons-httpclient.
We rely on that as a huge technical debt (see comment in pom.xml) and been using it
ever since via transitive dependencies. Re-adding as a direct dependency to not break
our codebase when updating sword2-server.
[pull] develop from IQSS:develop
Some other tests might rely on the BrandingUtil to actually do things.
This refactoring ensures proper mocking and cleaning up after the tests
ran, trying to ensure better atomicity.

It enables easy setup and teardown of mocks to be used in other tests.
It might be necessary to expose the underlying mocked objects for deeper
testing. Left for later exercise if ever necessary.

This refactoring also
- makes a few tests parameterized to be easier to read,
- enables much easier identification of failing test examples and
- removes static ordering now unnecessary.
- This refactoring moves these tests from JUnit 4 to 5.
- This tests setups mocking for BrandingUtil via BrandingUtilTest to be atomically usable.
- Removed star import.
This will be used to write better unit tests for generated XML to not rely on
perfect String comparison and fail because of negligible whitespace changes.

Also move org.skyscreamer.jsonassert to bottom of deps, inline with other test deps.
- Use JUnit5 parameterized tests for examples to simplify test code
- Create DOMs via JSoup and compare HTML with XMLUnit diff
- Remove fixed ordering via proper atomization
- Update to latest JSoup version
- This refactoring moves these tests from JUnit 4 to 5.
- The code has been much simplified by moving common parts
  functions, static fields etc. Let people concentrate on writing
  tests and produce less noise.
- Tried to use Jakarta JSON-B. Our DTO classes are not compatible as-is,
  thus sticking with Google Gson for now.
- Test have seen a speedup of ~700ms. Might become even faster by caching
  the mapped JSON objects and handing out clones via commons-lang serialization.
  Left as exercise for future devs.
- Also fixes some future XML serialization issues seen with the SWORD2 lib update.
@poikilotherm poikilotherm marked this pull request as ready for review July 30, 2021 17:45
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Jul 30, 2021

Folks I'm on a 2 weeks vacation now.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few comments but I don't see anything problematic w.r.t. the security updates, so I'd suggest moving ahead with this rather than waiting for any changes (since @poikilotherm is away) and making any response to the comments a new PR(s). I did suggest updating the duracloud version in the pom.xml but I can do that in a separate PR.


init();
final PasswordData passwordData = PasswordData.newInstance(password, String.valueOf(passwordModificationTime.getTime()), null);
// final PasswordData passwordData = PasswordData.newInstance(password, "username", null);
final PasswordData passwordData = new PasswordData(password);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passwordModificationTime is no longer used - did the change drop a time check that should be restored? Or should the param get dropped from the methods?

Copy link
Contributor Author

@poikilotherm poikilotherm Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time check never had any real effect of checking a time - the time has been passed as a String value for the username property of PasswordData.newInstance(). Please see https://repo1.maven.org/maven2/org/passay/passay/1.1.0/passay-1.1.0-javadoc.jar (couldn't find a published browser readable version...)


// then
String xml = XmlPrinter.prettyPrintXml(byteArrayOutputStream.toString(StandardCharsets.UTF_8));
XmlAssert.assertThat(xml).isInvalid();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're testing that DDIExportUtil is still broken? Or am I misunderstanding something?

It looks like the earlier version of the test was disabled with an if(false) statement, but replacing that with something that verifies a bug seems very odd. I don't know the history here, but if this is still a valid use case, I'd suggest making this test work for correct behavior and then skipping it for now with a new issue to get someone to look at fixing DDIExporter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test wasn't a test. It simply did nothing, as you pointed out.

I linked to #3648 from the commit message, too - this is a more complex problem.

As in good testing principles, a test should test for a certain behaviour. As the current behaviour is producing invalid XML, at least we will notice when this changes. If you folks would rather see the test disabled, that's just an annotation away. Whatever you prefer, I'm with you 😄

XmlAssert.assertThat(xml).isInvalid();
logger.severe("DDIExporterTest.testExportDataset() creates XML but it's invalid. Fixing in DDIExportUtil required.");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the testCitation test dropped? Is that a mistake or is that test no longer valid/useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the full commit message of 67a2f50

Shortly after, more and extensive testing had been added by @pkiraly in a distinct place.

<commons.logging.version>1.2</commons.logging.version>
<commons.lang3.version>3.12.0</commons.lang3.version>
<httpcomponents.client.version>4.5.13</httpcomponents.client.version>
<apache.httpcomponents.client.version>4.5.13</apache.httpcomponents.client.version>
<apache.httpcomponents.core.version>4.4.14</apache.httpcomponents.core.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor point - apache.httpcomponents.[http]core.version to match the real package name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you folks see as a better fit. Feel free to change.

@@ -584,7 +617,7 @@
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.19</version>
<!-- no version here as managed by <dependencyManagement> above! -->
</dependency>
<dependency>
<groupId>org.duracloud</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duracloud needs to go to 7.0.0 (common and storeclient - both at 4.4.6 now) per IQSS/dataverse-security#35. Can that be included here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it will need to be 7.1.0+ which is not yet released. See https://github.com/IQSS/dataverse-security/issues/35#issuecomment-894187477

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this pull request Aug 5, 2021
// 5 or more numbers in a row
Arguments.of(2, "ma02138", goodStrength20, maxLength, 6, dictionary, numberOfCharactersDefault, characterRulesDefault, numConsecutiveDigitsAllowed),
// 5 or more numbers in a row, but multiple times
Arguments.of(3, "ma8312002138", goodStrength20, maxLength, 6, dictionary, numberOfCharactersDefault, characterRulesDefault, numConsecutiveDigitsAllowed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it understood why this has 3 errors instead of 2 with the new version of passay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There was a bug in the older Passay version, which wasn't identifying multiple occurences of more than x numbers in a row. (It's 10 digits here, which is 2x error of 5 numbers) This made a +1 for this test necessary.

@poikilotherm
Copy link
Contributor Author

Made a few quick comments - still on vacation but wanted to unblock people.

@kcondon
Copy link
Contributor

kcondon commented Aug 6, 2021

@poikilotherm I know that above in testing you'd said should be ok with existing unit and integration tests but I do like to better understand the scope of changes and to do some selected exploratory testing regardless of the automated tests. So, if I understand correctly, any functionality currently using XML, eg. exports, api endpoint inputs and outputs, harvesting, should be checked, whether through automated tests and/or manually? Additionally, running the automated tests should pass seeing they have been extensively updated, is this correct or can you provide a better overview of impacted functionality needing to be retested?

@djbrooke
Copy link
Contributor

Hey @poikilotherm, I hope you enjoy your vacation! Thanks for the PR! I'll move this back to Community Dev for now, and we can revisit it when you're back.

@poikilotherm
Copy link
Contributor Author

Hi @kcondon - I'm back. I already updated the branch with latest develop to solve some merge conflicts.

Regarding testing:

  • SWORD should be completely covered by the API tests.
  • Password rules are covered by tests, too.
  • XML exports like Schema.org, OpenAIRE and DDI are the primary use cases of XMLPrinter:
    image
    Looks like they are covered by tests.
  • Other places with XML usage might be found by grepping the codebase for XML. It's quite a list and might not be worth it.
    Maybe can be filtered with things covered by tests.

Hope this helps...(?)

src/main/java/edu/harvard/iq/dataverse/DataCitation.java
src/main/java/edu/harvard/iq/dataverse/search/SearchFields.java
src/main/java/edu/harvard/iq/dataverse/metadataimport/ForeignMetadataImportServiceBean.java
src/main/java/edu/harvard/iq/dataverse/util/BundleUtil.java
src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java
src/main/java/edu/harvard/iq/dataverse/util/ShapefileHandler.java
src/main/java/edu/harvard/iq/dataverse/util/JhoveFileType.java
src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java
src/main/java/edu/harvard/iq/dataverse/util/DataSourceProducer.java
src/main/java/edu/harvard/iq/dataverse/util/xml/XmlPrinter.java
src/main/java/edu/harvard/iq/dataverse/util/xml/XmlValidator.java
src/main/java/edu/harvard/iq/dataverse/util/bagit/BagGenerator.java
src/main/java/edu/harvard/iq/dataverse/search/IndexServiceBean.java
src/main/java/edu/harvard/iq/dataverse/util/xml/html/HtmlFormatUtil.java
src/main/java/edu/harvard/iq/dataverse/util/xml/html/HtmlPrinter.java
src/main/java/edu/harvard/iq/dataverse/DataCiteRESTfullClient.java
src/main/java/edu/harvard/iq/dataverse/api/Meta.java
src/main/java/edu/harvard/iq/dataverse/api/BundleDownloadInstanceWriter.java
src/main/java/edu/harvard/iq/dataverse/api/Search.java
src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
src/main/java/edu/harvard/iq/dataverse/api/Access.java
src/main/java/edu/harvard/iq/dataverse/api/imports/ImportServiceBean.java
src/main/java/edu/harvard/iq/dataverse/api/imports/ImportGenericServiceBean.java
src/main/java/edu/harvard/iq/dataverse/api/Index.java
src/main/java/edu/harvard/iq/dataverse/api/EditDDI.java
src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java
src/main/java/edu/harvard/iq/dataverse/api/imports/ImportDDIServiceBean.java
src/main/java/edu/harvard/iq/dataverse/api/datadeposit/CollectionListManagerImpl.java
src/main/java/edu/harvard/iq/dataverse/api/datadeposit/ReceiptGenerator.java
src/main/java/edu/harvard/iq/dataverse/api/datadeposit/SwordServiceBean.java
src/main/java/edu/harvard/iq/dataverse/api/datadeposit/SwordConfigurationImpl.java
src/main/java/edu/harvard/iq/dataverse/api/batchjob/BatchJobResource.java
src/main/java/edu/harvard/iq/dataverse/api/BatchImport.java
src/main/java/edu/harvard/iq/dataverse/DOIDataCiteRegisterService.java
src/main/java/edu/harvard/iq/dataverse/batch/jobs/importer/filesystem/FileRecordJobListener.java
src/main/java/edu/harvard/iq/dataverse/batch/jobs/importer/filesystem/FileRecordReader.java
src/main/java/edu/harvard/iq/dataverse/sitemap/SiteMapUtil.java
src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java
src/main/java/edu/harvard/iq/dataverse/engine/command/impl/GoogleCloudSubmitToArchiveCommand.java
src/main/java/edu/harvard/iq/dataverse/datavariable/VariableMetadataDDIParser.java
src/main/java/edu/harvard/iq/dataverse/datavariable/DataTableImportDDI.java
src/main/java/edu/harvard/iq/dataverse/engine/command/impl/DuraCloudSubmitToArchiveCommand.java
src/main/java/edu/harvard/iq/dataverse/engine/command/impl/LocalSubmitToArchiveCommand.java
src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/XgetRecordHandler.java
src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/XlistRecordsHandler.java
src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/XlistRecords.java
src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/Xrecord.java
src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/Xmetadata.java
src/main/java/edu/harvard/iq/dataverse/harvest/server/web/servlet/OAIServlet.java
src/main/java/edu/harvard/iq/dataverse/harvest/client/oai/OaiHandler.java
src/main/java/edu/harvard/iq/dataverse/harvest/client/HarvesterServiceBean.java
src/main/java/edu/harvard/iq/dataverse/harvest/client/FastGetRecord.java
src/main/java/edu/harvard/iq/dataverse/AbstractGlobalIdServiceBean.java
src/main/java/edu/harvard/iq/dataverse/FileDownloadServiceBean.java
src/main/java/edu/harvard/iq/dataverse/dataaccess/StoredOriginalFile.java
src/main/java/edu/harvard/iq/dataverse/DOIDataCiteRegisterCache.java
src/main/java/edu/harvard/iq/dataverse/export/DDIExportServiceBean.java
src/main/java/edu/harvard/iq/dataverse/export/DCTermsExporter.java
src/main/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtil.java
src/main/java/edu/harvard/iq/dataverse/export/HtmlCodeBookExporter.java
src/main/java/edu/harvard/iq/dataverse/export/DublinCoreExporter.java
src/main/java/edu/harvard/iq/dataverse/export/DDIExporter.java
src/main/java/edu/harvard/iq/dataverse/authorization/providers/shib/ShibUtil.java
src/main/java/edu/harvard/iq/dataverse/export/dublincore/DublinCoreExportUtil.java
src/main/java/edu/harvard/iq/dataverse/export/OpenAireExporter.java
src/main/java/edu/harvard/iq/dataverse/export/OAI_DDIExporter.java
src/main/java/edu/harvard/iq/dataverse/export/openaire/OpenAireExportUtil.java
src/main/java/edu/harvard/iq/dataverse/export/DataCiteExporter.java
src/main/java/edu/harvard/iq/dataverse/authorization/providers/oauth2/impl/OrcidOAuth2AP.java
src/main/java/edu/harvard/iq/dataverse/ingest/tabulardata/impl/plugins/xlsx/XLSXFileReaderSpi.java
src/main/java/edu/harvard/iq/dataverse/ingest/tabulardata/impl/plugins/xlsx/XLSXFileReader.java

@kcondon kcondon merged commit d8ce358 into IQSS:develop Aug 16, 2021
@kcondon kcondon assigned kcondon and unassigned poikilotherm Aug 16, 2021
@djbrooke djbrooke added this to the 5.7 milestone Aug 17, 2021
@poikilotherm poikilotherm deleted the sec38-div-issues branch August 23, 2021 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants