You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[The] discrepancy between the number of the permission documents and the indexed dvobjects is still there, even on an index created from scratch. So, it cannot be explained by permission documents being left behind when objects are deleted. curl "http://172.31.88.120:8983/solr/collection1/query?debug=query&q=definitionPointDocId:*&rows=0"
The sum of q=dvObjectType:dataverses, ...:datasets and ...:files is 3715147.
curl "http://172.31.88.120:8983/solr/collection1/query?debug=queryq=*:*&rows=0" returns the number that is the sum of the 2 numbers above.
This number of extra permission docs is growing, the longer the index is in place.
On a closer look:
none of [these] are literal duplicates; i.e., there are no datasets with multiple definitionPointDocId:dataset_NNNNN permission docs.
However, there multiple cases of, for example, both definitionPointDocId:dataset_NNNNN
and definitionPointDocId:dataset_NNNNN_draft permission docs, when the only indexed dvobject document for it is id:dataset_NNNNN .
Seeing such cases for both datasets and files. Also seeing cases that are reverse of the above (i.e., both perm. documents exist when only an indexed draft document is present).
Meaning, likely, that in some cases we fail to remove the permission doc for the draft when we publish; and in some - create permission docs for published documents while they are still in draft.
For example, we may be creating definitionPointDocId:file_MMMMM permission documents for unpublished files, when the parent datasets has published versions.
This does not appear to corrupt the index in a way that would affect the accuracy of the results of searches. But the redundant permission docs are likely slowing down the lookups on such.
We should also experiment with dropping creating permission documents for public indexed dvobjects altogether, in combination with the new "avoid expensive solr join" mechanism (#10555). Still debating if that should be handled as a separate issue.
The text was updated successfully, but these errors were encountered:
From slack earlier:
On a closer look:
This does not appear to corrupt the index in a way that would affect the accuracy of the results of searches. But the redundant permission docs are likely slowing down the lookups on such.
We should also experiment with dropping creating permission documents for public indexed dvobjects altogether, in combination with the new "avoid expensive solr join" mechanism (#10555). Still debating if that should be handled as a separate issue.
The text was updated successfully, but these errors were encountered: