Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVERY router_http config thinks it resulted in 500 Internal Error, but in reality, delivers correct headers/body to client #232

Closed
anthonyrisinger opened this issue Apr 16, 2013 · 8 comments

Comments

@anthonyrisinger
Copy link
Contributor

EDIT...

it's getting super late and im making dumb mistakes and rambling i think... let me just say what i EXPECT and/or WANT to happen... based on my understanding and experiments...

i want to use the cachestore plugin to save the response BODY actually sent to the client -- REGARDLESS of which plugin generated/found/sourced/retrieved/proxied the ACTUAL content -- this will allow me to cache binary blobs on the fly, eg. awesome-package.tar.xz, as they are delivered to the first client.

...this is how i understand the plugin to work. the only other things i'd like are:

  • a way to store items DIRECTLY (maybe add a value=[...]?) so i can store stats/relations
  • a way to trigger a command from route... slow, but a universal last resort...
  • should http:HOSTNAME be working? yes, yes it should :p
  • ICING: cache route var: cache[name][key] :D :D :D

...sry for length, i'll cleanup soon as i can, but hopefully the problem is evident below.

BRIEF

aside: i am implementing a self-correcting, distributed+intelligent proxy-cache for serving Archlinux packages to my network... it will auto-cache and auto-follow packages i download naturally, and thus make them available for other systems for some specified time...then automatically sync new version until expiry.

in short, the new 1.9 routing features are 100% badass, and im 99% sure im going to be able to build out the whole "application" using PURELY uWSGI's own configuration as the logic.

...anyways, every router_http setup i try fails [ie. logs] an HTTP 500, but still proxies the request properly, with no erroneous headers or data sent to client! and combined with the routing instruction cachestore, it doesn't seem to want to save the correct data...sometimes making a SECOND request with empty GET path and empty X-Forwarded-For, and storing that instead! (possibly only when http:IPADDR used with --route-run... needs more info)

REPRO

  • build HEAD with routing support, then:
uwsgi --socket @local --http-socket 127.0.0.1:8000 \
    --cache2 name=one,items=2 \
    --route-run cachestore:name=one,key=one \
    --route-run http:173.194.64.99:80,www.google.com

(--route-run vs --route* doesn't seem to make much difference here)

  • try curl -D - http://127.0.0.1:8000/a/ ... you will get a 200 OK (edited):
HTTP/1.1 200 OK
[...]
Connection: close

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
[...]
</html>

...but the server will log self-conflicting info:

[pid: 22290|app: -1|req: -1/1] 127.0.0.1 () {24 vars in 256 bytes} [Tue Apr 16 03:28:06 2013] GET /a/ => generated 743 bytes in 71 msecs via route() (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 0)

...because it's also trying to run the default handler, python, which doesnt exist... ALSO AFAICT, the response is NOT stored in cache as cache.key: one.one (which works with other plugins, or gridfs request plugin at the least)

@prymitive
Copy link
Contributor

as they are delivered to the first client

IMHO not necessarily (depends on how it is implemented in uWSGI), if request takes some time then there is a window of opportunity for race condition:

  • first client starts request
  • second client starts the same request
  • second client finishes and response is cached
  • first client finishes and response is cached

If both responses are identical then there is no issue, just overhead, but if responses might be different and you depend on the content, then keep in mind that those request->response->cache operations are not atomic.

@prymitive
Copy link
Contributor

Also HTTP status for responses defaults to 500, so just might not updated witch cached value.

@anthonyrisinger
Copy link
Contributor Author

ah yeah thats a good point about racing on initial seed... maybe i should just have a "learn" period, then populate the cache with a uwsgi-cron.

i had the cache population stuff working well for awhile, then i got tired and started breaking everything... but i know there is some kind of prob here :)

@anthonyrisinger
Copy link
Contributor Author

they way i understood the cachestore was that it's transparent... it just hooks the output stream and silently records it, not affecting the actual outcome at all (but maybe i read code wrong, it seemed to work that way, and i didnt have too much time to play)

@unbit
Copy link
Owner

unbit commented Apr 16, 2013

both http and uwsgi routers are high performance solutions, zero parsing is involved (the output is directly written to the client socket). For your objective a "proxy" plugin will be a much better solution. It basically would work like the gridfs one (that translate requests to mongodb packets) in both request plugin and routing instruction mode. I will commit something later (no more than a dozens of lines as we already have all the functions already available)

@unbit
Copy link
Owner

unbit commented Apr 16, 2013

Nevermind, hacking on http and uwsgi routers has been easier.

Basically they will honour headers (so you can use cachestore and cache) when offloading is not in place. If you want to ignore offloading and force the headers to be parsed just use proxyhttp and proxyuwsgi as actions.

Remember both are async friendly, so a good setup for a caching proxy would be something like that:

[uwsgi]
http-socket = :9090
; coroutine based async mode
async = 1000
ugreen = true
; 100 * 128k items
cache2 = name=mypages,items=100,blocksize=128000

; we want automatic set of mime types
mime-file = /etc/mime.types
; check if we are gzip enabled (the line has been splitted, sorry)
route-if = contains:${HTTP_ACCEPT_ENCODING};gzip cache:name=mypages,key=GZIPPED${REQUEST_URI},content_encoding=gzip,mime=1
; fallback to uncompressed body
route-run = cache:name=mypages,key=${REQUEST_URI},mime=1
; if we are here no cache item is found
route-run = cachestore:name=mypages,key=${REQUEST_URI},gzip=GZIPPED${REQUEST_URI}
route-run = proxyhttp:81.174.68.52:80,unbit.it

@anthonyrisinger
Copy link
Contributor Author

perfect! that looks awesome, thanks @unbit

...looking forward to trying this out when i get home. that's a good point about the async mode, i would have never thought of that, but i suppose the builtin routers/loops/etc would need the user to configure async mode explicitly, even if the user never plans on handling actual requests with an embedded/dynamic lang.

thanks again! ill close this out tonight once i have a chance to try it out.

@anthonyrisinger
Copy link
Contributor Author

finally got a chance to try this out and so far it's working just as it should :)

now i just need to endow it with knowledge of the server layout, along with some persistence + addtl node capabilities... will drop a note on-list if i manage to wrangle this into something useful, which seems likely, given that it's pretty much doing what i want in less than 10 lines... :D weeee! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants