Add getReadableIds and use in filterReadAccess #277 #278

patrick-austin · 2022-02-21T16:24:01Z

Changes

Added getReadableIds function. The logic here is the same as in getReadable, however we accept and return lists of IDs rather than entities. When dealing with the free text search results, we only have IDs. Before we were finding the bean for each ID in turn in order to run authorisation on it, then proceeding to return the ID. But as we already have the klass of the entitity and the ID from Lucene we can simply operate on these instead.

Instead of hardcoding the number of results to get from Lucene at 1000, instead use the maxIdsInQuery setting. This is the limit we would send in one batch to the DB in getReadable/getReadableIds anyway, and the default value of 500 and max for Oracle of 1000 is the same order of magnitude as what we were currently using.

Impact

I ran a number of searches against my local development components, using the lils.yml as my dummy data. This contains:

Investigations	Datasets	Datafiles
100	300	29967

I searched for * (returning all entities before authorising) as both root and the non-root icatuser, representing best and worst case scenarios (all data visible, no data visible and needing to check DB for rules) for the old and new setup with maxIdsInQuery set to 500 and 1000. All times are in ms, and is measured between the first EntityBeanManager - Got X results from Lucene and EntityBeanManager - Returning Y results logs made by icat.server.

root

Setup	Investigations	Datasets	Datafiles
Old	91	205	212
New (500)	15	11	12
New (1000)	13	12	11

Old method is comparable for DS and DF since we only loop until we reach 300 accepted results, then return.
New only submits one batch of entities to check, and returns in consistently ~12ms.

non-root

Setup	Investigations	Datasets	Datafiles
Old	215	337	29261
New (500)	21	14	1502
New (1000)	15	14	690

Old method takes around 1 to 2 ms per entity checked as we need to check every ID individually before being sure none are authorised.
New method takes roughly the same amount of time for Investigations and DS as these both require a single batch of IDs (less than the limit in settings. Datafiles takes around half the time when the limit is doubled to 1000, as we only need to send half as many batches (averaging 24ms per batch).

Also searched for a specific single result using id:... in both old and new setup as root:

Setup	Investigations	Datasets	Datafiles
Old	17	14	16
New	11	15	14

Times are comparable with old and new approaches. For the new method, the time taken to authorise 1 result and hundreds (as in the * example) was also comparable.

It's worth noting that in these tests I didn't actually have any rules in the DB to evaluate.

patrick-austin · 2022-03-11T14:01:34Z

Following discussion with @kevinphippsstfc, have:

Added explicit config option lucene.searchBlockSize instead of piggy backing maxIdsInQuery
Refactored duplicated code in GateKeeper
Added HasEntityId interface so that the getReadableIds function can run on either EntityBaseBean or ScoredEntityBaseBean without needing to loop to get their id
Changed logic in filterReadAccess so we add all newResults if we have no restrictions, and only iterate and check against the id set if restrictions were present. Should also now throw if we're over maxEntities.

This shouldn't change the performance of existing usage of getReadable too much, as we do not introduce any additional loops, only refactoring some code and adding the if (restrictions == null) check to enable early returns of all beans. Further improvements or changes for clarity are welcome.

kevinphippsstfc

Overall, this looks great, thanks! I've just requested a few additions to the comments to hopefully help future developers.

I'm wondering how best to test this given that I don't think the part of the code using "restrictions" will get tested much (if at all?) by the unit and integration tests. We could do with testing it on an ICAT containing some of the usual InvestigationUser and InstrumentScientist rules/restrictions.

src/main/java/org/icatproject/core/manager/EntityBeanManager.java

src/main/java/org/icatproject/core/manager/GateKeeper.java

…/icatproject/icat.server into 277_filterReadAccess_improvement

patrick-austin · 2022-03-17T18:14:27Z

Overall, this looks great, thanks! I've just requested a few additions to the comments to hopefully help future developers.

I'm wondering how best to test this given that I don't think the part of the code using "restrictions" will get tested much (if at all?) by the unit and integration tests. We could do with testing it on an ICAT containing some of the usual InvestigationUser and InstrumentScientist rules/restrictions.

Have added/expanded the documentation of the new/modified functions, if anything still needs clarifying just let me know.

patrick-austin · 2022-10-06T15:08:21Z

Have expanded TestRS to ensure it covers these changes. These changes will impact:

All Lucene searches
Get and Search requests
MD exports

Note that the last two are only affected if an INCLUDE query for a collection is used. For example, Facility INCLUDE InvestigationType will run the changed code as there are multiple types at a facility but Investigation INCLUDE InvestigationType will not, as there is only one per investigation.

Some implicit testing was taking place, as if the authorization were overly restrictive (i.e. if getReadableIds is hacked to return no ids) then the following tests would already fail:

TestRS.testGet
TestRS.testLuceneDatafiles
TestRS.testLuceneDatasets
TestRS.testLuceneInvestigations
TestRS.testSearch

along with some tests in TestWS (as the functionality in question is in Gatekeeper, I only changed TestRS as part of this PR).

I have expanded these tests to also assert that the tests are not too permissive, by performing searches as the piOne user who cannot see anything unless new rules are set.

The existing exportMetaDataQuery is marked with @Ignore, so I created a new test which performs both "permissive" and "restrictive" checks on the inclusion of related entities in the export dump.

Also refactored code to set up root and piOne sessions, and some automatic formatting of the TestRS file. Note that this branch does not have fixes for the unlrelated test failures: #275 #268

src/test/java/org/icatproject/integration/TestRS.java

Co-authored-by: Viktor Bozhinov <45173816+VKTB@users.noreply.github.com>

Add getReadableIds and use in filterReadAccess #277

3770bbf

agbeltran requested a review from kevinphippsstfc March 3, 2022 15:20

patrick-austin and others added 2 commits March 9, 2022 23:43

Add lucene.searchBlockSize and refactor getReadable #277

7bf55e9

Fix change to lucenePopulateBlockSize #277

d972057

kevinphippsstfc requested changes Mar 17, 2022

View reviewed changes

src/main/java/org/icatproject/core/manager/EntityBeanManager.java Show resolved Hide resolved

src/main/java/org/icatproject/core/manager/GateKeeper.java Show resolved Hide resolved

src/main/java/org/icatproject/core/manager/GateKeeper.java Outdated Show resolved Hide resolved

patrick-austin added 2 commits March 17, 2022 18:10

Add documentation to altered methods #277

939e421

Merge branch '277_filterReadAccess_improvement' of https://github.com…

eed5d21

…/icatproject/icat.server into 277_filterReadAccess_improvement

Expand TestRS to cover getReadableIds change #277

6c9477a

patrick-austin requested a review from kevinphippsstfc October 6, 2022 15:08

patrick-austin requested a review from VKTB October 21, 2022 08:51

VKTB reviewed Oct 24, 2022

View reviewed changes

src/test/java/org/icatproject/integration/TestRS.java Outdated Show resolved Hide resolved

src/test/java/org/icatproject/integration/TestRS.java Outdated Show resolved Hide resolved

Apply suggestions from code review

03213aa

Co-authored-by: Viktor Bozhinov <45173816+VKTB@users.noreply.github.com>

VKTB approved these changes Oct 24, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add getReadableIds and use in filterReadAccess #277 #278

Add getReadableIds and use in filterReadAccess #277 #278

patrick-austin commented Feb 21, 2022

patrick-austin commented Mar 11, 2022

kevinphippsstfc left a comment

patrick-austin commented Mar 17, 2022

patrick-austin commented Oct 6, 2022

Add getReadableIds and use in filterReadAccess #277 #278

Are you sure you want to change the base?

Add getReadableIds and use in filterReadAccess #277 #278

Conversation

patrick-austin commented Feb 21, 2022

Changes

Impact

root

non-root

patrick-austin commented Mar 11, 2022

kevinphippsstfc left a comment

Choose a reason for hiding this comment

patrick-austin commented Mar 17, 2022

patrick-austin commented Oct 6, 2022