Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #66017 #55

Merged
merged 1 commit into from
Sep 8, 2016
Merged

fixes #66017 #55

merged 1 commit into from
Sep 8, 2016

Conversation

jpmschuler
Copy link
Contributor

https://forge.typo3.org/issues/66017

Implemented fix that uses https://tools.ietf.org/html/rfc3986#section-4.2 compliant URIs like "//www.google.com", thus omitting protocol instead of trying to detect it.
So in all cases where http was already used it is still used, in all scenarios where https was used, this will stick to https. No PHP code necessary, all done on Client Side.

@h4de5
Copy link

h4de5 commented May 8, 2017

it seems some clients can not yet handle those RFC complaint URIs. E.g. Solr indexing will not be able to follow redirects from cooluri accordingly and will interpret them as relative urls.

Solr Indexer requests: http://domain.com/?id=1234 => cooluri send: Location //domain.com/foo/bar => Solr Indexer subsequently calls: http://domain.com/domain.com/foo/bar => cooluri delivers 404 page => solr indexes 404 page ..

see: https://forum.typo3.org/index.php?t=msg&th=215837&#msg_749437

@jpmschuler
Copy link
Contributor Author

The URIs of type "//whatever" without protocol and colon was already part of RFC 1808 from 1995 - I don't think "not yet" covers any relevant client.

If your Solr isn't able to follow the links accordingly, probably it's merely a Solr configuration error and not a missing feature.

@h4de5
Copy link

h4de5 commented May 8, 2017

thanks for the hint. so far this is all I got from going through the apache access logs and the solr log files. At the moment I can not rule out a misconfiguration, but when I disable cooluri, indexing works fine (tho' it will show "uncool" URLs in the search result.)

@jpmschuler
Copy link
Contributor Author

Of course, makes sense. I don't doubt that. Your header analysis is correct IMHO. And this architecture change in CoolURI will trigger exactly this relocation behavior, which TYPO3 core doesn't need.

I just doubt that Solr isn't capable of this. (And that there are other clients, which are not capable).

@h4de5
Copy link

h4de5 commented May 9, 2017

just a brief follow up: I am talking about the Solr typo3 extension (https://github.com/TYPO3-Solr/ext-solr).
This extension uses file_get_contents + http headers to load pages and at least my tests this functions seems not to be capable of handling this protocol omits correctly.

@jpmschuler
Copy link
Contributor Author

jpmschuler commented May 9, 2017

hm.. file_get_contents indeed has problems (cUrl doesn't btw - one could use curl to resolve the url or even use cUrl to get the content).

Nevertheless, I investigated a bit and found following:

  • Location header needs an "absolute URI" which is defined in RFC 2616 as having an scheme included
  • RFC 7231 (June 2014) allows for relative URIs

So in fact you were right, that it is "quite new standard" - although it's not the URI scheme itself, but the HTTP Location header. Sorry for the misunderstanding. Didn't know the necessary change in RFC 7231 was so new beforehand.

You should perhaps open an issue for Solr to use curl, PHP to fix file_get_contents and CoolURI to fix this issue here differently.

The function CoolURI needs would need to take target page pages.url_scheme into account, and fall back to current URLs $_ SERVER['HTTPS'] if url_scheme is 0. Reintroducing "http://" would be a major problem, as stated in the original issue. I don't have the time currently for a pull request fixing this.

@h4de5
Copy link

h4de5 commented May 9, 2017

seems like something almost 3 years ago and the term "new" can only mix well in rfc context.. ;-)

I have patched/xclassed our solr extension, using a curl request - which does work fine so far. Also I am not expecting you to change anything in cooluri. I am quite happy with this fix as it was solving a problem we had behind a reverse proxy/loadbalancer that was terminating all https requests, leaving $_ SERVER['HTTPS'] empty.

I am just trying to help the next guy, who wants to figure out, why solr is not indexing correctly.

Thanks for your further investigations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants