Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Status code" for non-HTTP errors #66

Closed
robingustafsson opened this issue Jan 5, 2017 · 21 comments
Closed

"Status code" for non-HTTP errors #66

robingustafsson opened this issue Jan 5, 2017 · 21 comments
Milestone

Comments

@robingustafsson
Copy link
Member

I've not seen how k6 exposes errors that are not HTTP in a way that makes it possible to keep a test going in face of such errors and being able to use them for setting up checks, thresholds or custom metrics.

Load Impact uses 1000 + the error code of libcurl (https://curl.haxx.se/libcurl/c/libcurl-errors.html) in the request.status field of response objects when DNS/TCP etc. fails, see http://support.loadimpact.com/knowledgebase/articles/174137-what-status-codes-are-there.

Right now when I execute the following simple script:

import sleep from "k6";
import http from "k6/http";

export let options = {
    vus: 1
};

export default function() {
    http.get("http://robins-fake-domain.com/");
    sleep(1);
};

This seems to execute the http.get which then fails with an exception causing the VU to start over, bypassing my sleep(1). This is bad IMO. I think we should instead treat it like a failed request and give it a made up status code, similar to how we handle it in current generation of Load Impact, and then let JS execution continue.

@liclac
Copy link
Contributor

liclac commented Jan 13, 2017

I feel like throwing an exception is better because it means any dependent work isn't run unless you catch the error explicitly. Imagine if you were writing a test for a web shop; if you can't load the shopping cart, you can't grab a CSRF to submit it, and the failure cascades.

@ragnarlonn
Copy link

ragnarlonn commented Jan 16, 2017

To me, that sounds like a 500-response should also result in an exception? I think one operation should only have one way of generating errors. It will be messy to have to check for two completely different types of errors in your code. You would both have to check that the response code is OK and you get something useful back from your HTTP request, and also have code to catch any exceptions generated as a result of a failure to create a TCP connection e.g.

@micsjo
Copy link
Contributor

micsjo commented Jan 16, 2017

To me any response whatsoever from requests should not throw an exception at all. From an execution point of view they are all valid responses. What we do with them is another thing.

The question is if we can generate response codes (like cUrl does) for events lower in the stack (such as timeout, cannot connect, ssl negotiation failure and so on) so they in turn do not generate exceptions?

@ragnarlonn
Copy link

@micsjo Read the thread again :) The initial question was what you wrote above: @robingustafsson asked if we can do what libcurl and rload does and generate unique response codes for lower-level errors. @uppfinnarn thought exceptions were better for lower-level errors. I countered that it will be painful to have to deal with two different types of errors (i.e. non-200 response codes, and exceptions) when you issue HTTP requests. You have to both wrap your http.get() call in a try/catch statement, in order to handle e.g. connection timeouts, and also check HTTP response codes to figure out if your HTTP transaction did what it was supposed to do. @uppfinnarn is right that it is two separate types of errors (on different levels) but as long as we don't let the user do TCP connection handling themselves I think we should go with error codes rather than exceptions.

@liclac
Copy link
Contributor

liclac commented Jan 16, 2017

I don't really like that approach. What are you going to do with that error code?

By making hard failures to perform an action (eg. connection failures) errors, VU code can assume all actions succeed. If it asks for an HTTP response, it will have an HTTP response. If not, an error is thrown and logged, and the VU is restarted.

It completely frees the user from the burden of error handling and lets them focus on describing their target system's desired behavior, not ours. If they want to handle errors themselves (eg. for cleanup or whatever), they can use try/catch, but this way it's not a default expectation for the user to do that.

@ragnarlonn
Copy link

I'm not sure about this one. Trying to explain how I am thinking here. Sorry for writing a novel. Would be interesting to hear what @micsjo thinks about all this.

I think this may in one sense be a disconnect between us who are thinking mainly about load testing and @uppfinnarn thinking mainly about functional testing. In a load test I often expect failed TCP connections, and it is convenient for me not to have to add a lot of try/catch statements to handle that. I don't see failed connections as hard errors as much as data points to be plotted in a graph.

For functional testing I agree that a failed connection is more of a "hard error" that should perhaps stop code execution. But if you have e.g. a complex user scenario that spans many minutes and hundreds, or thousands of HTTP requests, is halted execution really what you want when/if a TCP connection fails?

Maybe it's no big deal? If we assume TCP connections fail randomly throughout script execution, it may not matter [for a load test] if script execution is occasionally stopped and restarted in the midst of things. People will still see all their script code exercised, although code at the end of a script will be less executed than code in the beginning of course, which may or may not be an issue. Anyway, we want to optimize for the common use case. Let's say the common use case is people who don't bother with try/catch clauses but just issue an HTTP request and check the status code. How likely are they to get weird test executions (and results) because of failed connections that are not handled properly?

Perhaps a bigger issue is code complexity. How will our script examples look, for instance? If the "right" way to do things involves substantially more lines of code, with try/catch clauses, should we promote that in our examples, at the expense of less readability and more clutter, or should we promote the "simple" way of doing things (without try/catch), at the expense of setting people up for potential trouble? (perhaps the risk of trouble is slim enough?)

@martinfijal
Copy link
Contributor

Agree with Ragnar that this is very much about a load testing vs functional testing point of view. No matter which type of test we primarily optimize against, my view is that our defaults should help to avoid situations that can cause damage due to knowledge gaps.

Running the following script in a large test on my own failing servers will results in a DOS attack on the first URL in the script with our current implementation. That is not what I except when I run this.

export default function() {
	group("group", function() {
                http.get("http://latest.cool.3rd.party.widget/");
		http.get("http://my-own-broken-server/");

                 sleep(1);
	});

Note: removing the sleep will have the same effect and is another behavior of our current load generator I don't like. 99.9% of people omitting the sleep aren't after testing how many requests the load generator can squeeze out, but rather just not understanding how tests are run.


On the topic of how to present the errors: instead of introducing magical HTTP status codes, what about setting an error object on the response? I can then do something like resp.has_error() and then take whatever action I'd like. We could also provide an API (be part of checks?!) that will abort a test if any error is set on the response for a more functional testing style.

Disclaimer: I still know very little about our current API design.

@liclac
Copy link
Contributor

liclac commented Jan 30, 2017

Okay, so, I think this test pretty well illustrates this entire problem:

import { group } from "k6";
import http from "k6/http";

export default function() {
    check(http.get(""), {
        "status is 200": (res) => res.status == 200,
    });

    group("login", function() {
        let res = http.post("http://example.com/login", ...);
        check(res, {
            "status is 200": (res) => res.status == 200,
            "contains a token": (res) => res.json().token,
        });

        let token = res.json().token;

        group("logout", function() {
            check(http.post("http://example.com/logout", { "token": token }), {
                "status is 200": (res) => res.status == 200,
            });
        })
    })

    group("articles", function() {
        // other, unrelated stuff
    })
}

Right now, if there's no need for error handling boilerplate to check if eg. the login endpoint has a connection error before running the dependent logout group, because it will throw an exception and terminate the script.

But this also means that the completely unrelated "articles" group is never run, or anything else further down. In effect, the further towards the top something is, the more statistically likely it is to actually be run, thus load is put disproportionately onto earlier endpoints.

One way we could handle this without introducing more boilerplate for the user would be to make each group() call. This way, if there's a connection failure in the "login" group, the dependent (nested) "logout" group won't be run, but the unrelated "articles" group will. Groups would, like checks, be reported with an error rate on the end-of-test summary.

To handle errors manually, you could pass in a function as the second argument, eg.

group("my group", function() {
    throw new Error("test error");
}, (e) => throw e);

@micsjo
Copy link
Contributor

micsjo commented Jan 31, 2017

Allowing tests to continue when failing does not make sense for any functional tests unless you actively choose/code to do so. I have a hard time to see how the suggested solution actually solves our problem.

PROPOSAL:

All normal http responses (regardless of 100-200-300-400-500 and so on) are always considered a successful request. As it should be - the request completed but the response might not be what you wanted.

Other home-brew response codes (a la curl) for timeout, ssl negotiation failed, connection dropped, not possible to make connection etc will return a special http response code.

To control execution behavior the default is to fail the request. Just as you would expect in a functional test. It would return the error property and also the special http response code.

When the "performance test context" flag is set either command line or as an option in script the behavior is to not fail the request.

It is fully understood it will add more code to the handling of requests but the opposite (to just let it always fail for all incomplete http requests) will lead to large amounts of code needed for all performance test execution to handle all possible errors that should not be considered errors in that context.

The only other option I see is to never fail http requests for any transport related issues whatsoever and let the tester add code to validate/verify as needed.

@liclac
Copy link
Contributor

liclac commented Jan 31, 2017

I'm ideologically opposed to flags or configuration changing script behavior. A function called with the same parameters in two different places should do the same thing, always, otherwise you sacrifice your ability to reason about the code. It may start innocuously enough, little convenient heuristics or environmental stuff, but then someone will say "you already made this function do two different things, why not add a flag that changes this function too?", and then it snowballs from there.

In the end, you end up where PHP is now: an unholy mess of unpredictable global state mutations and all-powerful configuration files, where the same script run on two different machines produce completely different, likely incorrect, results.

@micsjo
Copy link
Contributor

micsjo commented Jan 31, 2017

That leaves us with another option: one http request for functional tests and one for performance tests.

Having all requests that don't complete as expected fail is useless in a performance testing context.

@ragnarlonn
Copy link

ragnarlonn commented Jan 31, 2017

I agree that changing script behaviour depending on config doesn't feel right. I also don't like the idea of forcing people to write one type of code for functional and one type for load testing. That will efficiently kill any chance of being able to re-use code between the two types of tests - something that is a very strong selling point if we can pull it off for even a portion of the market.

I think Martin's ideas are worth considering more. Can we return an HTTP response code under "normal" circumstances, and then if the TCP connection fails we return HTTP response code 'nil' (or whatever) and set an error object that the user can query, if they are interested in knowing exactly what happened?

@ghost
Copy link

ghost commented Jan 31, 2017

Hi @uppfinnarn
Why don't we have independent scenarios that can optionally contain groups but must contain one or more requests? If there is a TCP connection error etc, then only that scenario throws an exception and restarts but other scenarios continue to run. So I think each scenario will be a separate VU.

@micsjo
Copy link
Contributor

micsjo commented Feb 1, 2017

Simple example of what happens. Make a k6 script with a request for

https://static.hotjar.com/c/hotjar-235820.js?sv=5

It fails with "stream error: REFUSED STREAM".

A corresponding request for

https://www.google-analytics.com/plugins/ua/ecommerce.js

fails with "stream error: PROTOCOL_ERROR".

It actually is nothing wrong with the above requests. Add a User-Agent header for a proper browser and they will return a well formatted response.

But I strongly oppose the fact they fail. To me they didn't fail, they just didn't complete the way I expected.

@liclac
Copy link
Contributor

liclac commented Feb 2, 2017

They most certainly fail. They are rejected by the server and produce a protocol error, just like a timeout or a broken pipe would.

With this proposal implemented, that failure would be contained within a group() and thus fail only the rest of the group - and anything that is after an HTTP request in a group can reasonably be assumed to be related to said HTTP request.

@micsjo
Copy link
Contributor

micsjo commented Feb 6, 2017

You are assuming they are at all in a group. Which isn't necessarily true.

Also - the behavior of failing one group and the continuing with the next is plain bad in functional testing. It doesn't make sense either in performance testing since you have dependencies from requests.

Sample of effects [pseudo code]:

group ("g1") {
request_1("get static content");
request_2("get page");
request_3("do login");
}

group ("g2") {
request_4("do user action");
request_5("get more static content");
}

The above code will produce a hornets nest of debugging issues if request_1 fails and thus fails the entire group. When you get to the next group you will be unauthenticated and unauthorized and fail subsequent requests.

This is but one example. It's simply a bad idea.

Then it is better to fail hard and eject execution. It will just lead to manually implementing workarounds like [pseudo code]:

// if in performance test run
#IFDEF PERF_TEST_RUN
try {
#ENDIF

// my entire script

#IFDEF PERF_TEST_RUN
} catch (e) {
// ignore crap
}
#ENDIF

@liclac
Copy link
Contributor

liclac commented Feb 6, 2017

You are assuming they are at all in a group. Which isn't necessarily true.

This is what conventions are for. We don't have to assume anything, we can tell people what the correct way of doing things is, and make that way as convenient as possible.

If they for whatever reason don't want to use groups, then their script will blow up when there's an error. Which may be exactly what they want.

group ("g1") {
request_1("get static content");
request_2("get page");
request_3("do login");
}

group ("g2") {
request_4("do user action");
request_5("get more static content");
}

If you really want to structure it like that, you can make the script explode regardless:

group("g1", function() {
    // ...
}, (e) => throw e);

Or you can do some good ol' try/catch-ing:

group("g1", function() {
    try {
        // ...
    } catch(e) {
        // ...
    }
});

The idiomatic way to do it would be to design your script so that groups are hierarchic, and there is one or at most a few related requests in a group. If you don't wanna do that, you can use the error callback to set a "loggedIn" flag to false, or propagate it upwards, or do any number of other things, but this establishes a sane default.

@micsjo
Copy link
Contributor

micsjo commented Feb 6, 2017

To summarize the two main use cases:

MAKES SENSE FOR FUNCTIONAL TESTS

  • Fail a request and fail the test (unless you introduce error handling)

MAKES SENSE FOR NON-FUNCTIONAL TESTS

  • Do not fail requests whatsoever (unless you purposefully catch and fail manually)

To go the middle road as suggested and fail groups is insane and does not make sense for neither functional tests nor non-functional tests.

If we can't find a better solution or implementation I strongly suggest we select ONE of them, not the in-between which solves nothing for any of the use cases but makes both harder to implement.

@liclac
Copy link
Contributor

liclac commented Apr 20, 2017

Okay so, what do we want to do here, add Error objects to responses?

check(http.get("http://example.com/"), {
    "no error": (r) => !r.error,
    "status is 200": (r) => r.status == 200,
});

@robingustafsson
Copy link
Member Author

@liclac I think the error object approach sounds like the best way forward.

When will it be set though, only on non-HTTP errors or also for >399 HTTP status code? Also, how will these errors be reported to collectors?

@liclac
Copy link
Contributor

liclac commented May 2, 2017

I think that'd be redundant; if you're testing an endpoint that's supposed to return a certain error for whatever reason, there's no reason to error.

Collectors would still get errors as the "error" tag on samples, same as always.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants