Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration tests which test happy path #5968

Merged
merged 3 commits into from
May 3, 2022

Conversation

simonswine
Copy link
Contributor

@simonswine simonswine commented Apr 20, 2022

This PR adds simple go based integration tests, which can run Loki
in two modes:

  • single-binary
  • micro-services with index-gateway

After standing up it performs the most basic duties like:

  • ingest logs
  • run a range query
  • query label names and label values

This is meant to catch more significant problem as early as possible in
the PR.

We should aim to keep this as simple as possible.

Note: integration/client/client.go is copied from the closed source Granana Enterprise Logs code based

@simonswine simonswine force-pushed the 20220420_integration-happy-path-tests branch from 21478b7 to 9ec907b Compare April 20, 2022 12:50
@simonswine
Copy link
Contributor Author

@chaudum this is also reproducing the bug mentioned in #5965

@simonswine simonswine force-pushed the 20220420_integration-happy-path-tests branch 2 times, most recently from 274463e to 13e249c Compare April 20, 2022 17:13
@simonswine simonswine force-pushed the 20220420_integration-happy-path-tests branch from 13e249c to e4430f2 Compare April 21, 2022 07:41
This PR adds simple go based integration tests, which can run Loki
in two modes:

- single-binary
- micro-services with index-gateway

After standing up it performs the most basic duties like:
- ingest logs
- run a range query
- query label names and label values

This is meant to catch more significant problem as early as possible in
the PR.

We should aim to keep this as simple as possible.
@simonswine simonswine force-pushed the 20220420_integration-happy-path-tests branch from e4430f2 to 12e4d7d Compare April 21, 2022 07:49
@simonswine simonswine marked this pull request as ready for review April 21, 2022 08:03
@simonswine simonswine requested a review from a team as a code owner April 21, 2022 08:03
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting us started on this @simonswine!
Added a couple comments

integration/cluster/cluster.go Outdated Show resolved Hide resolved
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineA", now.Add(-45*time.Minute), map[string]string{"job": "fake"}))
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineB", now.Add(-45*time.Minute), map[string]string{"job": "fake"}))

// TODO: Flushing is currently causing a panic, as the boltdb shipper is shared using a global variable in:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes removing this is not trivial, I would like to get the tests in the current form and address this later in a targeted PR.

The PR is complex enough as it is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm torn on this test. I definitely see it's merits, but in the interest of keeping the scope of these tests very small, I'm leaning towards only testing things that go through the "front-door", meaning only having a distributor and query frontend client. In fact, I'd like to go one step further and only have a gateway client, and only testing things you can do through the gateway.

Could we test this in another way, or in a faster test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share the panic, I might be able to help getting this fixed.

integration/cluster/cluster.go Outdated Show resolved Hide resolved
Makes configuration much more readable and avoids easy mistakes
integration/cluster/cluster.go Outdated Show resolved Hide resolved
integration/cluster/cluster.go Outdated Show resolved Hide resolved
integration/cluster/cluster.go Outdated Show resolved Hide resolved
integration/cluster/cluster.go Outdated Show resolved Hide resolved
integration/cluster/cluster.go Outdated Show resolved Hide resolved
integration/cluster/cluster.go Outdated Show resolved Hide resolved
integration/loki_micro_services_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve this, but I think we need some consensus before merging.

Integration tests has been a hotly-contested topic, and AFAICS these tests cover the same surface-area as the Loki Canary. The scope here will naturally diverge/expand over time, and this has added benefits over the Canary (can be automated), so I really want this personally.

Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>
@simonswine simonswine force-pushed the 20220420_integration-happy-path-tests branch from 773e61b to 3919361 Compare April 21, 2022 09:52
@vlad-diachenko vlad-diachenko self-requested a review April 21, 2022 14:50
@jeschkies
Copy link
Contributor

Did we agree to add integration tests? I know @slim-bean was skeptical. I'd be afraid that this opens the door for more evolved tests.

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this conversation started. I like the idea of adding integration tests, and I think this is the right approach. I really like there is no docker or building of binaries involved here, as this also makes this a really useful debug tool. I have 2 thoughts.

  1. How would you feel about a gateway and a single client, limiting us to only being able to test "front-door" behavior in this suite?
  2. How would you feel about an SSD test? I think this could either be a 3rd test or it could replace single binary?

"sharedDataPath": c.cluster.sharedPath,
"grpcPort": c.grpcPort,
"httpPort": c.httpPort,
"schedulerPort": c.grpcPort,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this become a problem in microservices mode, ie. won't the scheduler have a different grpc port than the component for which you might be currently writing the config (ie. query-frontend)?

tQueryFrontend = clu.AddComponent(
"query-frontend",
"-target=query-frontend",
"-frontend.scheduler-address="+tQueryScheduler.GRPCURL().Host,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, so this is how you overcome the different grpc port for the scheduler. since you will only have 1 instance for this test, I feel like we could pass this to the cluster setup so we don't have to remember this here. WDYT?

now := time.Now()
cliDistributor := client.New(tenantID, "", tDistributor.HTTPURL().String())
cliDistributor.Now = now
cliIngester := client.New(tenantID, "", tIngester.HTTPURL().String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're going to cover flush in a later PR, can we get rid of the cliIngester for now as well?

require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineA", now.Add(-45*time.Minute), map[string]string{"job": "fake"}))
require.NoError(t, cliDistributor.PushLogLineWithTimestamp("lineB", now.Add(-45*time.Minute), map[string]string{"job": "fake"}))

// TODO: Flushing is currently causing a panic, as the boltdb shipper is shared using a global variable in:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm torn on this test. I definitely see it's merits, but in the interest of keeping the scope of these tests very small, I'm leaning towards only testing things that go through the "front-door", meaning only having a distributor and query frontend client. In fact, I'd like to go one step further and only have a gateway client, and only testing things you can do through the gateway.

Could we test this in another way, or in a faster test?

@cyriltovena
Copy link
Contributor

I'll approve this, but I think we need some consensus before merging.

Integration tests has been a hotly-contested topic, and AFAICS these tests cover the same surface-area as the Loki Canary. The scope here will naturally diverge/expand over time, and this has added benefits over the Canary (can be automated), so I really want this personally.

There's no doubt for me with regards to canary this is debuggable and gives faster feedback, I love both and I want this too.

}

// RunRangeQuery runs a query and returns an error if anything went wrong
func (c *Client) RunRangeQuery(query string) (*Response, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love that we do a follow up PR and change this to use either logcli client or https://github.com/grafana/loki-client-go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a great idea 👍

return component
}

type Component struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using docker is a gift.

@cyriltovena
Copy link
Contributor

Tests are taking 5s.

ok github.com/grafana/loki/integration 5.367s coverage: [no statements]

Really want this to be a thing.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

There's definitively improvement we can do but I feel like this is such a great starting point and already has helped debug issue.

@slim-bean
Copy link
Collaborator

I support this, my concerns about integration tests largely stem from two things:

Runtime and scope creep.

We should aim to keep this as simple as possible.

Tests are taking 5s.

I agree with @simonswine on that first point, let's scrutinize additions carefully and see if we can maintain that amazing 5s runtime!

This LGTM, nice work!

Fun fact, I wrote promtail_test.go which is also really an integration test disguised as a unit test. I think it's largely been a net positive but we've also chased a lot of failures/flakes in it over the years.

I think it's tricky to get this kind of testing right, but I definitely think we should merge this and give this a go.

@slim-bean
Copy link
Collaborator

I appreciate the discourse in this thread and appreciate y'all valuing my opinions here. I feel like I've come down a bit too hard on integration tests but I have some strong opinions here.

I really like this approach as others have mentioned, no docker or binaries and you can run the debugger. This is awesome, nice work @simonswine

I want to merge this to get this rolling, but I also think we should add the SSD mode 👍

@slim-bean slim-bean merged commit 213bb86 into main May 3, 2022
@slim-bean slim-bean deleted the 20220420_integration-happy-path-tests branch May 3, 2022 08:06
Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I just have two comments.


go func() {
for {
time.Sleep(time.Millisecond * 200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late to the game but why sleep? What do you think about using eventually on a condition instead? @simonswine

Comment on lines +332 to +342
func getFreePort() (port int, err error) {
var a *net.TCPAddr
if a, err = net.ResolveTCPAddr("tcp", "localhost:0"); err == nil {
var l *net.TCPListener
if l, err = net.ListenTCP("tcp", a); err == nil {
defer l.Close()
return l.Addr().(*net.TCPAddr).Port, nil
}
}
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one. I was about to hint at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants