Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Factorisation of implementation into modules and layers #135

Merged
merged 188 commits into from
Jan 16, 2018

Conversation

samoconnor
Copy link
Contributor

@samoconnor samoconnor commented Dec 13, 2017

Update: Architecture documentation is now more complete: HTTP.jl Internal Architecture and
the new HTTP.open function provides a more general interface for request and/or response body streaming:

HTTP.open("GET", "https://tinyurl.com/bach-cello-suite-1-ogg") do http
    open(`vlc -q --play-and-exit --intf dummy -`, "w") do vlc
        write(vlc, http)
    end
end

Update: Server side is working (but not yet integrated with existing API). See example of using HTTP.listen to create a minimal WebSockets server,
See also example here

This branch started as PR #124 (fixes for parser state machine bugs #125 and #126). It evolved from there in an attempt to address some of the other issues I've found (e.g. #108, #115, #120, #122, #130, #131).

The result is an attempt to:

  • split functionality into distinct modules and layers,
  • reduce the size of the interfaces between modules and layers,
  • reduce the degree of type and data interdependancy between modules and layers.

For example the parse! function signature before and after:

function parse!(r::Union{Request, Response}, parser, bytes, len=length(bytes);
        lenient::Bool=true, host::String="", method::Method=GET,
        maxuri::Int64=DEFAULT_MAX_URI, maxheader::Int64=DEFAULT_MAX_HEADER,
        maxbody::Int64=DEFAULT_MAX_BODY, maintask::Task=current_task(),
        canonicalizeheaders::Bool=true)::Tuple{ParsingErrorCode, Bool, Bool, String}
parseheaders(::Parser, bytes) do h::Pair{String,String} ... -> excess
parsebody(::Parser, bytes) -> data, excess

Another example is the single consolidated retry loop using Base.retry:

abstract type RetryLayer{Next <: Layer} <: Layer end

isrecoverable(e::Base.UVError) = true
isrecoverable(e::Base.DNSError) = true
isrecoverable(e::Base.EOFError) = true
isrecoverable(e::HTTP.StatusError) = e.status < 200 || e.status >= 500
isrecoverable(e::Exception) = false

function request(::Type{RetryLayer{Next}}, a...; maxretries=2, kw...) where Next
    retry(request,
          delays=ExponentialBackOff(n = maxretries),
          check=(s,ex)->(s,isrecoverable(ex)))(Next, a...; kw...)
end

And another example is the seperate layer for StatusError:

abstract type ExceptionLayer{Next <: Layer} <: Layer end

struct StatusError <: Exception
    status::Int16
    response::Messages.Response
end

function request(::Type{ExceptionLayer{Next}}, a...; kw...) where Next
    res = request(Next, a...; kw...)
    if iserror(res)
        throw(StatusError(res.status, res))
    end
    return res
end

I have started writing some architecture notes. This is probably the best place to start to get a feel for the changes in this PR.

One of the outcomes is that cold requests are a bit faster. The command below
was taking about 8 seconds, and now takes about 5.

time julia -e 'using HTTP; @show HTTP.request("GET", "http://192.168.0.239")'

I don't know exactly why it is faster, but I suspect that separating the code
into smaller methods with less arguments and simpler return types makes things
easier for the compiler.

Add mark/reset support to Form <: IO

Dont call eof() in body(::IO, ...) if content length is known (eof() can block)

Work around JuliaLang/julia#24465

Use non-blocking readavailable() to read serverlog in test/server.jl

Hack test/fifobuffer.jl to work with BufferStream/IOBuffer implementation.

    The BufferStream API does not expose the size of the buffer, so tests
    that passed a fixed size to FIFOBuffer(n) have been tweaked to pass with
    BufferStream's automatic buffer size managment behaviour.

Add note re @test_broken and https://github.com/kennethreitz/httpbin/issues/340#issuecomment-330176449
Move header canonicalization out of parser.
move "close(response.body)" from parser to client

remove @nread -- redundant now that header size is unrestricted

remove unused handlers: onmessagecomplete, onstatus, onmessagebegin,
onheaderscomplete
…uliaWeb#126.

Replace local `KEY` in `parse!()` with `parser.previous_field`.

Add onurlbytes(). Accumulates URL in `parser.valuebuffer`.

Use `Ref{String}` to return `extra`/`upgrade` so that caller can
tell
the difference between upgrade with empty string and no upgrade.

Add `s_req_fragment_start` to list of states where the url_mark
should
be reset to 1.
Store Vector{Pair} headers in Parser struct.

Move cookie processing from parser to client.

Move special treatement of multi-line and repeated headers out of
main parser function.
…Parser.

append!() is replaced by write(). Faster, less allocaiton.

    unsafe_string(pointer(x.buf), length(x.buf))

is replaced by

    String(take!(x.buf))

```

mutable struct s1
    buf::Vector{UInt8}
end

mutable struct s2
    buf::IOBuffer
end

a(x::s1, s) = append!(x.buf, s)

a(x::s2, s) = write(x.buf, s)

r(x::s1) = unsafe_string(pointer(x.buf), length(x.buf))

r(x::s2) = String(take!(x.buf))

function go(x)

    inbuf = Vector{UInt8}([0])

    for i = 1:100000
        inbuf[1] = UInt8(i % 255)
        a(x, inbuf)
    end

    outbuf = String[]

    for i = 1:1000
        push!(outbuf, r(x))
    end
    println(length(outbuf))
end

t1() = go(s1(Vector{UInt8}()))
t2() = go(s2(IOBuffer()))

t1()
t2()

println("t1, Vector...")
@time t1()

println("")
println("t2, IOBuffer...")
@time t2()
```

```
t1, Vector...
1000
  0.101289 seconds (1.12 k allocations: 95.733 MiB, 65.69% gc time)

t2, IOBuffer...
1000
  0.002676 seconds (2.04 k allocations: 241.281 KiB)
```
Explicitly handle data within main parse loop instead of using *_mark
to clean up loose ends.

Number of places that callbacks are called is reduced.

Add `while p <= len` inner loop for URL parsing
(follows pattern used for header field and value parsing)

Remove on*bytes functions, just `write()` to valuebuffer or fieldbuffer
as needed.
Purpose is not clear.
Wasn't working anyway in some cases in origin master branch.
e.g. In the "Foo: F\01ailure" test case nread was buffer length + 1.
no error.

Remove redundant write-back of parser.state.
fix tests for invalid content length
… calls to parse!()

ensure that catch-all return at end of main loop is only used for incomplete messages
functions.

Replace multiple return points in main parse!() loop with single return
at the bottom.

Make s_message_done a loop termination condidion.
(was using string(r) for both).

Work around for JuliaLang/MbedTLS.jl#113

Try to rationalise Base.write, Base.string, Base.show and opts.logbody (WIP)
@EricForgy
Copy link
Contributor

EricForgy commented Jan 14, 2018

When I ran test/clients.jl as part of the full test suite, it failed, but when I ran it in isolation, it passed. Is there some non-deterministic test in there? Trying again... 😅

Update: Client test failed again when run all together...

Update^2: Failed again when run in isolation 🤔

Update^3: Looks like a bug in client. Headers are not being populated correctly. Or it could be in the parameters. Parser?

@samoconnor
Copy link
Contributor Author

samoconnor commented Jan 14, 2018

Is tmpout a path in Linux or something?

function testget(url)
    mktempdir() do d
        cd(d) do
            cmd = `"curl -v -s $url > tmpout 2>&1"`
            cmd = `bash -c $cmd`
            run(cmd)
            return String(read(joinpath(d, "tmpout")))
        end
    end
end

The command curl -v -s $url > tmpout 2>&1 says send curl's STDOUT to a file called tmpout, merging stream 2 (STDERR) into stream 1 (STDOUT).

We can just use Requests because all you seem to want is an independent test of get.

That's fine for windows. The advantages of curl are that it is widely field-validated; it supports fetching multiple urls over the same connection; and it runs in a seperate process.

I'll submit a PR that uses Requests instead of curl. I hope that is alright 🙏

👍

@samoconnor
Copy link
Contributor Author

Update^2: Failed again when run in isolation 🤔

Update^3: Looks like a bug in client. Headers are not being populated correctly. Or it could be in the parameters. Parser?

Please post the full logs whenever a test fails (create a seperate issue on HTTP.jl if you like). Records of intermittent failure are very valuable.

@EricForgy
Copy link
Contributor

Please post the full logs whenever a test fails (create a seperate issue on HTTP.jl if you like). Records of intermittent failure are very valuable.

Yes. Sorry. Basically. the header test with parameter Dict("hey" => "dude") was failing with `cannot find key "hey"``. Then I tried again putting the parameters directly in the URL and it was still failing.

URLs are the subset of URIs that specify a means of acessing a resource
in addition to specifying the resource name. e.g http://foo.com/bar?a=b is a URL
but bar?a=b is not.

Changes:
 - Variables of type URI that must contain a compelte URL renamed uri -> url.
 - Remove the "URL" constructor for URIs. It seems to have been constructing
   URI's that were not URLs for some input, which seems wrong. The other "URI"
   constructors should suffice.
 - Simplify the main kwargs URI constructor by putting the sanity checks in
   preconditions.

Changes:
 - Renames HTTP.Request.uri -> HTTP.Request.target
 - Added "target=" kw arg to request(MessageLayer, ...) to allow setting
   the target to somthing that cannot be derived from the request URL.
 - Remove the old behaviour (I assume untested) of handling "CONNECT" as
   a special case and using the Request URL host:port as the target.
   I seems that this would have created a request for the server to proxy
   to itself.
 - Remove the now redundant option isconnect from the url parser.
   A client making a CONNECT Request should use target="$host:$port".
   A server implemetning a CONNECT proxy should use http_parse_host(target)
@samoconnor
Copy link
Contributor Author

HI @EricForgy , re the Dict("hey" => "dude") test, I suspect that was down to dispatch ambiguity between the HTTP.get defined in the old client.jl file and the new HTTP.get defined in HTTP.jl.

@samoconnor
Copy link
Contributor Author

samoconnor commented Jan 15, 2018

v0.6 win64: 436548 passed, 0 failed, 1 errored, 4 broken.
-- Error is rmdir: Permission denied for tmp files

v0.6 linux all passing.

@EricForgy
Copy link
Contributor

Yay! 🎉

This is great progress 🙌

that the result can be losslessly transformed back into the input.

Don't allow '*' at the start of URI path.
JuliaWeb@1226c4b#r26878192

Add sentinal const blank_userinfo to handle test cases with empty '@'
userinfo.

Remove UF_XXX and UF_XXX_MASK constants. It's simpler to use the state
machine constants to keep track of state, and use local variables to
keep track of values
@EricForgy
Copy link
Contributor

I'm trying my best to get the tests to pass on Windows v0.6.2.

@samoconnor I submitted a PR to your PR, but it is still a work in progress, but please have a look at the new testget. I think this captures what you wanted and I can get the tests that use it to pass now. There are still 3 tests failing, but I'm running out of steam tonight 😅

@quinnj
Copy link
Member

quinnj commented Jan 16, 2018

I went ahead and merged. Sorry if that was premature, but the travis queues are downright awful today and I want to do some more testing locally on some related things. I also added @samoconnor as an official collaborator to the repo, so you should be able to make branches off the repo directly/merge/revert etc as needed. Let's get things cleaned up and in a good state wrt to tests and we can do a release.

@EricForgy
Copy link
Contributor

Yay! 🎉 🎉 🎉

@samoconnor
Copy link
Contributor Author

Thx @quinnj, I've unsquashed the commit to avoid loosing git blame granularity.

@quinnj
Copy link
Member

quinnj commented Jan 17, 2018

Oh, good call. Sorry about squashing that; it's the default so I was just clicking through things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants