-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes #66017 #55
fixes #66017 #55
Conversation
it seems some clients can not yet handle those RFC complaint URIs. E.g. Solr indexing will not be able to follow redirects from cooluri accordingly and will interpret them as relative urls. Solr Indexer requests: http://domain.com/?id=1234 => cooluri send: Location //domain.com/foo/bar => Solr Indexer subsequently calls: http://domain.com/domain.com/foo/bar => cooluri delivers 404 page => solr indexes 404 page .. see: https://forum.typo3.org/index.php?t=msg&th=215837&#msg_749437 |
The URIs of type "//whatever" without protocol and colon was already part of RFC 1808 from 1995 - I don't think "not yet" covers any relevant client. If your Solr isn't able to follow the links accordingly, probably it's merely a Solr configuration error and not a missing feature. |
thanks for the hint. so far this is all I got from going through the apache access logs and the solr log files. At the moment I can not rule out a misconfiguration, but when I disable cooluri, indexing works fine (tho' it will show "uncool" URLs in the search result.) |
Of course, makes sense. I don't doubt that. Your header analysis is correct IMHO. And this architecture change in CoolURI will trigger exactly this relocation behavior, which TYPO3 core doesn't need. I just doubt that Solr isn't capable of this. (And that there are other clients, which are not capable). |
just a brief follow up: I am talking about the Solr typo3 extension (https://github.com/TYPO3-Solr/ext-solr). |
hm.. file_get_contents indeed has problems (cUrl doesn't btw - one could use curl to resolve the url or even use cUrl to get the content). Nevertheless, I investigated a bit and found following:
So in fact you were right, that it is "quite new standard" - although it's not the URI scheme itself, but the HTTP Location header. Sorry for the misunderstanding. Didn't know the necessary change in RFC 7231 was so new beforehand. You should perhaps open an issue for Solr to use curl, PHP to fix file_get_contents and CoolURI to fix this issue here differently. The function CoolURI needs would need to take target page pages.url_scheme into account, and fall back to current URLs $_ SERVER['HTTPS'] if url_scheme is 0. Reintroducing "http://" would be a major problem, as stated in the original issue. I don't have the time currently for a pull request fixing this. |
seems like something almost 3 years ago and the term "new" can only mix well in rfc context.. ;-) I have patched/xclassed our solr extension, using a curl request - which does work fine so far. Also I am not expecting you to change anything in cooluri. I am quite happy with this fix as it was solving a problem we had behind a reverse proxy/loadbalancer that was terminating all https requests, leaving $_ SERVER['HTTPS'] empty. I am just trying to help the next guy, who wants to figure out, why solr is not indexing correctly. Thanks for your further investigations! |
https://forge.typo3.org/issues/66017
Implemented fix that uses https://tools.ietf.org/html/rfc3986#section-4.2 compliant URIs like "//www.google.com", thus omitting protocol instead of trying to detect it.
So in all cases where http was already used it is still used, in all scenarios where https was used, this will stick to https. No PHP code necessary, all done on Client Side.