Skip to content
This repository has been archived by the owner on Jan 5, 2023. It is now read-only.

https://catalog.data.gov is redirecting to https://catalog-next.data.gov/dataset #561

Open
adborden opened this issue Feb 4, 2021 · 10 comments
Labels
bug Something isn't working

Comments

@adborden
Copy link
Contributor

adborden commented Feb 4, 2021

After switching the origin servers to catalog-next, everything seemed working with the one exception:

Browsing to https://catalog.data.gov results in a redirect to https://catalog-next.data.gov/dataset

How to reproduce

  1. Description of steps to reproduce the issue.

Expected behavior

302 redirect to https://catalog.data.gov/dataset

Actual behavior

302 redirect to https://catalog-next.data.gov/dataset

@adborden adborden added the bug Something isn't working label Feb 4, 2021
@adborden
Copy link
Contributor Author

adborden commented Feb 4, 2021

This is complicated because there's essentially three reverse proxies. CloudFront, FCS Netscaler, and Apache.

With our current configuration, we're expecting these headers to be passed through to all three, Host: catalog-next.data.gov and X-Forwarded-Host: catalog.data.gov and for CKAN to respect X-Forwarded-Host (or just use ckan.site_url).

I'm seeing three areas for a potential fix:

  1. Fix CKAN to use ckan.site_url in all redirects or respect X-Forwarded-Host
  2. Tweak the Apache config to force a specific Host header to pass to CKAN.
  3. Update FCS to change the catalog.data.gov route to point to the catalog-next hosts.

Right now I'm leaning toward the FCS change.

@adborden
Copy link
Contributor Author

adborden commented Feb 4, 2021

Discussed this with @avdata99 and @hkdctol . Andres will spend ~1 hour investigating (1) above to see if there is a fix or change that should happen in CKAN. The redirect is handled by pylons so this behavior may change anyway in CKAN 2.9.

I will open a ticket with FCS for (3) and hopefully we can schedule something for Friday or Monday.

@adborden
Copy link
Contributor Author

adborden commented Feb 4, 2021

I opened RITM0810420 for the Netscaler change.

@avdata99
Copy link
Contributor

avdata99 commented Feb 4, 2021

CKAN overrides a function in the mapper but lets routes Mapper (old v1.13) to handle the redirections
https://github.com/ckan/ckan/blob/2.8/ckan/config/routing.py#L49-L54

Pylons in CKAN is using that routes
https://github.com/ckan/ckan/blob/2.8/ckan/config/middleware/pylons_app.py#L63

This Mapper uses an URLGenerator that allows defining an environment
https://github.com/bbangert/routes/blob/v1.13/routes/util.py#L273

There are some tests that can be useful to see how this environ works
https://github.com/bbangert/routes/blob/v1.13/tests/test_functional/test_explicit_use.py#L27-L28

The flask_app is using ckan site_url
https://github.com/ckan/ckan/blob/2.8/ckan/config/middleware/__init__.py#L195-L198

Is still not clear for me if we can override the headers at some point in CKAN

@adborden
Copy link
Contributor Author

adborden commented Feb 4, 2021

Thanks @avdata99, that tells me that CKAN should respect the X-Forwarded-Host header, but something is still not working. Maybe gunicorn is not passing this header through.

In local development, with debugging enabled, can you dump out the enviornment? The test would be:

$ curl -v -H 'X-Forwarded-Host: catalog.data.gov' http://localhost:5000

You should see Location: http://catalog.data.gov/dataset in the response.

@adborden
Copy link
Contributor Author

adborden commented Feb 4, 2021

... and you should see the HTTP_X_FORWARDED_HOST=catalog.data.gov in the environment.

@avdata99
Copy link
Contributor

avdata99 commented Feb 5, 2021

@adborden that what I see locally

curl -v -H 'X-Forwarded-Host: catalog.data.gov' http://localhost:5000
* Rebuilt URL to: http://localhost:5000/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 5000 (#0)
> GET / HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.58.0
> Accept: */*
> X-Forwarded-Host: catalog.data.gov
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 302 Found
< Server: PasteWSGIServer/0.5 Python/2.7.18
< Date: Fri, 05 Feb 2021 13:37:27 GMT
< Content-Type: text/plain; charset=utf8
< Location: http://localhost:5000/dataset
< Connection: close
< 
* Closing connection 0

@adborden
Copy link
Contributor Author

adborden commented Feb 5, 2021

Okay, and in the CKAN logs, can you dump out the environ and see if HTTP_X_FORWARDED_HOST is included?

Your server is Server: PasteWSGIServer/0.5 Python/2.7.18, so we've ruled out gunicorn getting in the way.

So, from the pylons/routes code, it looks like this should be supported but something is not working. We should be seeing Location: http://catalog.data.gov/dataset. I would ask the CKAN folks or open an issue, but I don't think we need to pursue this further.

@avdata99
Copy link
Contributor

avdata99 commented Feb 5, 2021

@adborden if I dump the environ here I see this:

ckan_1   | 2021-02-05 15:23:54,217 INFO  [ckan.config.middleware] Serving request via pylons_app app
ckan_1   | 2021-02-05 15:23:54,218 INFO  [ckan.config.middleware] Environ 
{
'SCRIPT_NAME': '', 
'REQUEST_METHOD': 'GET', 
'ckan.app': 'pylons_app', 
'PATH_INFO': '/', 
'SERVER_PROTOCOL': 'HTTP/1.1', 
'QUERY_STRING': '', 
'CONTENT_LENGTH': '0', 
'HTTP_USER_AGENT': 'curl/7.58.0', 
'SERVER_NAME': '0.0.0.0', 
'REMOTE_ADDR': '172.28.0.1', 
'wsgi.url_scheme': 'http', 
'SERVER_PORT': '5000', 
'CKAN_CURRENT_URL': '/', 
'CKAN_LANG': 'en', 
'wsgi.input': <socket._fileobject object at 0x7f65e12978d0 length=0>, 
'HTTP_HOST': 'localhost:5000', 
'wsgi.multithread': True, 
'HTTP_ACCEPT': '*/*', 
'CKAN_LANG_IS_DEFAULT': True, 
'wsgi.version': (1, 0), 
'wsgi.run_once': False, 
'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f65e68b7270>, 
'wsgi.multiprocess': False, 
'HTTP_X_FORWARDED_HOST': 'catalog.data.gov', 
'CONTENT_TYPE': '', 
'paste.httpserver.thread_pool': <paste.httpserver.ThreadPool object at 0x7f65e5b9d290>
}

@adborden
Copy link
Contributor Author

adborden commented Feb 5, 2021

👍 looks like a bug to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants