Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACL access allow_ignore_embargo doesn't work with a user #715

Closed
krakan opened this issue May 10, 2022 · 1 comment
Closed

ACL access allow_ignore_embargo doesn't work with a user #715

krakan opened this issue May 10, 2022 · 1 comment

Comments

@krakan
Copy link
Contributor

krakan commented May 10, 2022

Describe the bug

When specifying an embargo for a collection and then putting allow_ignore_embargo for a specific user in the aclj-file, the embargoed url:s don't show up in the search results. If one circumvents the search results and goes directly to the page view, the page actually shows up.

Steps to reproduce the bug

config.yaml:

collections:
  exempel:
    archive_paths: /usr/local/sample_archive/warcs/
    index: /usr/local/sample_archive/cdx
    embargo:
      newer:
        years: 10
    acl_paths: allows.aclj

allows.aclj:

com,example)/ - {"access": "allow_ignore_embargo", "user": "admin"}

Actual behavior

curl -iHX-Pywb-ACL-User:admin 'http://localhost:8090/exempel/cdx?url=example.com&output=json'
HTTP/1.1 200 OK
Content-Type: text/x-ndjson


Expected behavior

curl -iHX-Pywb-ACL-User:admin 'http://localhost:8090/exempel/cdx?url=example.com&output=json'
HTTP/1.1 200 OK
Content-Type: text/x-ndjson

{"urlkey": "com,example)/", "timestamp": "20130729195151", "url": "http://test@example.com/", "mime": "warc/revisit", "status": "-", "digest": "B2LTWWPUOYAH7UIPQ7ZUPQ4VMBSVC36A", "redirect": "-", "robotflags": "-", "length": "591", "offset": "355", "filename": "example-url-agnostic-revisit.warc.gz", "source": "exempel:url-agnost-example.cdx", "source-coll": "exempel", "access": "block"}
{"urlkey": "com,example)/", "timestamp": "20140127171200", "url": "http://example.com", "mime": "text/html", "status": "200", "digest": "B2LTWWPUOYAH7UIPQ7ZUPQ4VMBSVC36A", "redirect": "-", "robotflags": "-", "length": "1046", "offset": "334", "filename": "dupes.warc.gz", "source": "exempel:dupes.cdx", "source-coll": "exempel", "access": "block"}
{"urlkey": "com,example)/", "timestamp": "20140127171251", "url": "http://example.com", "mime": "warc/revisit", "status": "-", "digest": "B2LTWWPUOYAH7UIPQ7ZUPQ4VMBSVC36A", "redirect": "-", "robotflags": "-", "length": "553", "offset": "11875", "filename": "dupes.warc.gz", "source": "exempel:dupes.cdx", "source-coll": "exempel", "access": "block"}

Screenshots

2022-05-10-173408_732x388_scrot

Environment

  • OS: RHEL 7
  • Browser [e.g. chrome, safari]: curl or Firefox
  • Version [e.g. 22]: pywb 2.6.7

Additional context

The docs specifically mentions this use case.

@krakan
Copy link
Contributor Author

krakan commented Feb 10, 2023

Circling back to this issue, here's a failing test case to prove the point:

diff --git a/tests/test_embargo.py b/tests/test_embargo.py
index 4c1ab21e..ff959682 100644
--- a/tests/test_embargo.py
+++ b/tests/test_embargo.py
@@ -46,8 +46,13 @@ class TestEmbargoApp(BaseConfigTest):
     def test_embargo_ignore_acl_with_header_only(self):
         # ignore embargo with custom header only
         headers = {"X-Pywb-ACL-User": "staff2"}
+
+        resp = self.testapp.get('/pywb-embargo-acl/cdx?url=http://example.com/?example=1', headers=headers)
+        assert len(resp.text.splitlines()) > 0
         resp = self.testapp.get('/pywb-embargo-acl/20140126201054mp_/http://example.com/?example=1', status=200, headers=headers)

+        resp = self.testapp.get('/pywb-embargo-acl/cdx?url=http://example.com/?example=1')
+        assert len(resp.text.splitlines()) == 0
         resp = self.testapp.get('/pywb-embargo-acl/20140126201054mp_/http://example.com/?example=1', status=404)

It's the first assert that fails as the CDX query returns no lines although fetching the actual data works.

krakan added a commit to krakan/pywb that referenced this issue Feb 13, 2023
In particular the X-Pywb-ACL-User header must be forwarded in order
for it to be able to control CDX-queries
tw4l pushed a commit that referenced this issue Feb 15, 2023
In particular the X-Pywb-ACL-User header must be forwarded in order
for it to be able to control CDX-queries
@tw4l tw4l closed this as completed Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants