-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration tests which test happy path #5968
Conversation
21478b7
to
9ec907b
Compare
274463e
to
13e249c
Compare
13e249c
to
e4430f2
Compare
This PR adds simple go based integration tests, which can run Loki in two modes: - single-binary - micro-services with index-gateway After standing up it performs the most basic duties like: - ingest logs - run a range query - query label names and label values This is meant to catch more significant problem as early as possible in the PR. We should aim to keep this as simple as possible.
e4430f2
to
12e4d7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting us started on this @simonswine!
Added a couple comments
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineA", now.Add(-45*time.Minute), map[string]string{"job": "fake"})) | ||
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineB", now.Add(-45*time.Minute), map[string]string{"job": "fake"})) | ||
|
||
// TODO: Flushing is currently causing a panic, as the boltdb shipper is shared using a global variable in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes removing this is not trivial, I would like to get the tests in the current form and address this later in a targeted PR.
The PR is complex enough as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on this test. I definitely see it's merits, but in the interest of keeping the scope of these tests very small, I'm leaning towards only testing things that go through the "front-door", meaning only having a distributor and query frontend client. In fact, I'd like to go one step further and only have a gateway client, and only testing things you can do through the gateway.
Could we test this in another way, or in a faster test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the panic, I might be able to help getting this fixed.
Makes configuration much more readable and avoids easy mistakes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve this, but I think we need some consensus before merging.
Integration tests has been a hotly-contested topic, and AFAICS these tests cover the same surface-area as the Loki Canary. The scope here will naturally diverge/expand over time, and this has added benefits over the Canary (can be automated), so I really want this personally.
Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>
773e61b
to
3919361
Compare
Did we agree to add integration tests? I know @slim-bean was skeptical. I'd be afraid that this opens the door for more evolved tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this conversation started. I like the idea of adding integration tests, and I think this is the right approach. I really like there is no docker or building of binaries involved here, as this also makes this a really useful debug tool. I have 2 thoughts.
- How would you feel about a gateway and a single client, limiting us to only being able to test "front-door" behavior in this suite?
- How would you feel about an SSD test? I think this could either be a 3rd test or it could replace single binary?
"sharedDataPath": c.cluster.sharedPath, | ||
"grpcPort": c.grpcPort, | ||
"httpPort": c.httpPort, | ||
"schedulerPort": c.grpcPort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this become a problem in microservices mode, ie. won't the scheduler have a different grpc port than the component for which you might be currently writing the config (ie. query-frontend)?
tQueryFrontend = clu.AddComponent( | ||
"query-frontend", | ||
"-target=query-frontend", | ||
"-frontend.scheduler-address="+tQueryScheduler.GRPCURL().Host, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, so this is how you overcome the different grpc port for the scheduler. since you will only have 1 instance for this test, I feel like we could pass this to the cluster setup so we don't have to remember this here. WDYT?
now := time.Now() | ||
cliDistributor := client.New(tenantID, "", tDistributor.HTTPURL().String()) | ||
cliDistributor.Now = now | ||
cliIngester := client.New(tenantID, "", tIngester.HTTPURL().String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're going to cover flush in a later PR, can we get rid of the cliIngester
for now as well?
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineA", now.Add(-45*time.Minute), map[string]string{"job": "fake"})) | ||
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineB", now.Add(-45*time.Minute), map[string]string{"job": "fake"})) | ||
|
||
// TODO: Flushing is currently causing a panic, as the boltdb shipper is shared using a global variable in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on this test. I definitely see it's merits, but in the interest of keeping the scope of these tests very small, I'm leaning towards only testing things that go through the "front-door", meaning only having a distributor and query frontend client. In fact, I'd like to go one step further and only have a gateway client, and only testing things you can do through the gateway.
Could we test this in another way, or in a faster test?
There's no doubt for me with regards to canary this is debuggable and gives faster feedback, I love both and I want this too. |
} | ||
|
||
// RunRangeQuery runs a query and returns an error if anything went wrong | ||
func (c *Client) RunRangeQuery(query string) (*Response, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love that we do a follow up PR and change this to use either logcli client or https://github.com/grafana/loki-client-go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a great idea 👍
return component | ||
} | ||
|
||
type Component struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using docker is a gift.
Tests are taking 5s.
Really want this to be a thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There's definitively improvement we can do but I feel like this is such a great starting point and already has helped debug issue.
I support this, my concerns about integration tests largely stem from two things: Runtime and scope creep.
I agree with @simonswine on that first point, let's scrutinize additions carefully and see if we can maintain that amazing 5s runtime! This LGTM, nice work! Fun fact, I wrote promtail_test.go which is also really an integration test disguised as a unit test. I think it's largely been a net positive but we've also chased a lot of failures/flakes in it over the years. I think it's tricky to get this kind of testing right, but I definitely think we should merge this and give this a go. |
I appreciate the discourse in this thread and appreciate y'all valuing my opinions here. I feel like I've come down a bit too hard on integration tests but I have some strong opinions here. I really like this approach as others have mentioned, no docker or binaries and you can run the debugger. This is awesome, nice work @simonswine I want to merge this to get this rolling, but I also think we should add the SSD mode 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just have two comments.
|
||
go func() { | ||
for { | ||
time.Sleep(time.Millisecond * 200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm late to the game but why sleep? What do you think about using eventually
on a condition instead? @simonswine
func getFreePort() (port int, err error) { | ||
var a *net.TCPAddr | ||
if a, err = net.ResolveTCPAddr("tcp", "localhost:0"); err == nil { | ||
var l *net.TCPListener | ||
if l, err = net.ListenTCP("tcp", a); err == nil { | ||
defer l.Close() | ||
return l.Addr().(*net.TCPAddr).Port, nil | ||
} | ||
} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one. I was about to hint at this.
This PR adds simple go based integration tests, which can run Loki
in two modes:
After standing up it performs the most basic duties like:
This is meant to catch more significant problem as early as possible in
the PR.
We should aim to keep this as simple as possible.
Note:
integration/client/client.go
is copied from the closed source Granana Enterprise Logs code based