-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tag based search in Jaeger UI is not finding spans when Badger storage is used. #1882
Comments
@aachinfiev Thanks for reporting the issue - I've moved it to the main jaeger repo as it is more related to the backend than the UI. What version are you using? How are you deploying the backend? @burmanm Any thoughts on what might be happening? |
Version of jaeger? What kind of tags are not found? "but it doesn't find anything that was done earlier in the day." How long back did you search and how far is the TTL? If the tags finding works for short period of time, then probably either one of those is in error. |
Thank you guys for a quick response. I am running jaeger all-in-one binary version 1.13 on windows server 2016. Badger is configured using SPAN_STORAGE_TYPE: "badger"
BADGER_EPHEMERAL: "false"
BADGER_DIRECTORY_VALUE: "/badger/data"
BADGER_DIRECTORY_KEY: "/badger/key" Tag types: I tried using custom tags (e.g. build.info) as well as tags using Process area (e.g. hostname). Hostname doesn't seem to filter by at all. Not sure if that's supported. I reran my search and saw that anything longer than about 2.5 hours it stops finding by a tag even though I am seeing the span when no filter is provided. I select the same service & operation in the search window which the tag is applied on in the span view. If I use tag from the earlier span it locates it, but if I use tag from older span it doesn't. Same tag just different value. I did not alter any TTL from default. Not sure even how that can be configured. Is there some sort of a default in a plugin that could tell it over which time period things are being indexed? |
Would it be possible to update to 1.14 and see if that still happens? |
I just updated to 1.14, removed the whole previous storage I had and started new service. Now I am seeing index problems even for something that completed 4min ago. I did a trace and then tried to find it by tag or process. I had 4 traces within 5min interval. Some spans get located with hostname filter, others do not. No spans were located by tag value. I tired again after another 10min and still it's not finding by tags at all. When I look at the storage folder structure with data/key folders, I see pre-allocated data folder of 2gb with lock file, but for index folder I don't see any data files beside LOCK/MANIFEST. It seems like it's not creating any indices on disk. When I ran the all-in-one in console once it stated that Archive index was not initialized even though it created the storage folders. Do I need to configure something for badger or jaeger to make it index? |
Here is output from startup phase:
|
After a few more traces I see *.sst file in the key folder. Seems like index is being built but not consistently. Some spans are found using tags and other are not. |
.sst files are generated when memtable flush or compaction happens. The data is first written to memory and the WAL log, so this sounds normal. I haven't been able to replicate your disappearing issue yet though. Any chance you could provide examples (query + spans) ? |
I created a sample test: Recorded couple spans: Viewed top span to make sure it has a tag Then tried to search spans using the tag while all operation was selected Then tried to select specific operation The span JSON piece that has the tag looks like this
In this case it was using int64 for the parameter. But I had same issues with strings. URL used during the search is:
|
Here is a sample program I ran in LinqPad. I left minimal details in it: async Task Main()
{
using (var loggerFactory = new LoggerFactory().AddConsole())
using (var tracer = InitTracer("test-handler", loggerFactory))
{
var ecd = new TestHandler(tracer, loggerFactory);
await ecd.DoMagic();
}
}
private static Tracer InitTracer(string serviceName, ILoggerFactory loggerFactory)
{
var samplerConfiguration = new Jaeger.Configuration.SamplerConfiguration(loggerFactory)
.WithType(ConstSampler.Type)
.WithParam(1);
var senderConfiguration = new Jaeger.Configuration.SenderConfiguration(loggerFactory)
.WithAgentHost("XX.XX.XX.XX");
var reporterConfiguration = new Jaeger.Configuration.ReporterConfiguration(loggerFactory)
.WithSender(senderConfiguration)
.WithLogSpans(true);
return (Tracer)new Jaeger.Configuration(serviceName, loggerFactory)
.WithSampler(samplerConfiguration)
.WithReporter(reporterConfiguration)
.GetTracer();
}
class TestHandler
{
ITracer _tracer;
ILogger<TestHandler> _logger;
public TestHandler(ITracer tracer, ILoggerFactory loggerFactory)
{
_tracer = tracer;
_logger = loggerFactory.CreateLogger<TestHandler>();
}
public async Task DoMagic()
{
var span =_tracer.BuildSpan("do-magic").WithTag("build.version", 2).WithTag("build.name", "test-build");
using (var scope = span.StartActive(true))
{
var data = await GetData();
}
}
private async Task<(int, int)> GetData()
{
using (var scope = _tracer.BuildSpan("get-data").StartActive(true))
{
await Task.Delay(100);
return (3, 5);
}
}
} I tested build.version and build.name tags (int, string) and it wasn't finding those either. |
I can confirm I'm also seeing this issue with v1.14.
It's strangely inconsistent: sometimes searching by tag works, sometimes it doesn't. Trying to get a consistent test case to reproduce, and I'll report back. P.S. Jaeger is using Badger 1.5.3. Perhaps it's worth upgrading to 1.5.5? I realize upgrading to 1.6.0 is a no-go because of #1884 . |
UPDATE: Steps to reproduce:
Based on this scenario, the issue might be the way that tags are marshaled and then queried.
Nothing sticks out to me, but I'm still learning Go. Maybe someone else has a better eye? @burmanm |
I wonder if this is some sort of Windows issue. I tried this with a unit test (as described in your comment): func TestIssue1882(t *testing.T) {
runFactoryTest(t, func(tb testing.TB, sw spanstore.Writer, sr spanstore.Reader) {
s1 := model.Span{
TraceID: model.TraceID{
Low: rand.Uint64(),
High: 0,
},
SpanID: model.SpanID(rand.Uint64()),
Process: &model.Process{
ServiceName: "issue",
},
Tags: model.KeyValues{
model.KeyValue{
Key: "alice",
VInt64: 1,
VType: model.Int64Type,
},
},
StartTime: time.Now(),
Duration: 1 * time.Second,
}
err := sw.WriteSpan(&s1)
assert.NoError(t, err)
s2 := model.Span{
TraceID: model.TraceID{
Low: rand.Uint64(),
High: 0,
},
SpanID: model.SpanID(rand.Uint64()),
Process: &model.Process{
ServiceName: "issue",
},
Tags: model.KeyValues{
model.KeyValue{
Key: "alice",
VInt64: 11,
VType: model.Int64Type,
},
},
StartTime: time.Now(),
Duration: 1 * time.Second,
}
err = sw.WriteSpan(&s2)
assert.NoError(t, err)
params := &spanstore.TraceQueryParameters{
StartTimeMin: time.Now().Add(-1 * time.Minute),
StartTimeMax: time.Now(),
ServiceName: "issue",
Tags: map[string]string{
"alice": "11",
},
}
traces, err := sr.FindTraces(context.Background(), params)
assert.NoError(t, err)
assert.Equal(t, 1, len(traces))
})
} But it finds it correctly? |
Alternative (instead of Windows issue, it's a C# client difference somehow?) |
@burmanm : Very interesting... I'm on a Mac, It might be worth noting that the spans are there in Badger, even if the search-via-tag-value comes up empty. i.e., I verify my C++ client is working properly it by manually finding the span via the UI. Would a screencast be helpful? |
Ah, alright. This is odd then, I guess it would be interesting to see the raw-data of those keys in the badger - is something wrong stored there or is the service name somehow differently encoded. Screens might hide these if they do some UTF-8 <-> UTF-16 conversion for example. |
Are there tools to be able to inspect what is stored in badger? At least to figure out if it's a storage or query issue. |
Hmm... I can ZIP up my Badger directory after creating the spans, if that would help. (I haven't had a lot of luck using the badger CLI tools to inspect the keys/values because they're all in binary.) |
Yes, I've been using a fork of https://github.com/nak3/badger-cli that works against Badger 1.5.3 (the version that Jaeger uses). I can push up that fork or just the binary if that helps. If someone can point me to some nice functions that can unmarshall the keys and values, I can add that in. |
Those index queries don't use values, so keys (and their bytes) are quite enough. |
@burmanm :
NOTE: I'm excluding the keys pertaining to jaeger-query. |
I have another dump to show. Here, I have three spans with different tag values:
The Jaeger UI finds
Maybe it's a URL-encoding thing?
|
All those keys look correct, they have one additional byte
And I hope despite the output the times are actually internally in the UTC for every case. |
Windows Firefox + Linux all-in-one combo works at least fine. So it's not a browser thing either it appears. Byte dumps look correct also:
(the part until 00 05 which is timestamp) and index seek looks to be hit the correct one also:
The fact that the last tag in this order on your env can't be found is troubling. The first two seeks find always min 2 keys, (alice=2, alice=22, rejects alice=22, stops), (alice=22, alice=222, rejects alice=222, stops) but the last one only one (alice=222, stops). Did the unit test work in MacOS? I'm just trying to minimize the searching space for problems. I don't have access to MacOS, but I'll test in full-Windows later. |
@burmanm : I'm on a Mac (10.14.6 (18G1012)), but I've been running jaeger through Docker for Mac (2.1.0.4 (39773)). Unfortunately, I don't have a dev environment set up to debug this (any easy tips?), but it's very easy to reproduce:
|
I'll create a docker image for you tomorrow that logs all the relevant info when processing.. Lets see if we can find out the reason. |
@burmanm : Fascinating! I think we nailed this one! I tried i.e., This works:
This will result in this issue:
I'm guessing something changed in the past two months. P.S. It might be worth noting that although |
Good to hear, thanks for helping. I don't think there's 1.15 tag yet, that's why there's no such version. |
Great to hear that this was already fixed. Is there a way to get windows binaries? I tried building on windows but got stuck with some tools missing. Thank you both for troubleshooting. |
You can compile the Windows binaries in Linux. |
I tried building windows binaries on linux (ubuntu distro) on both Windows Subsystem and docker image and in both cases I couldn't get past step
Gopkg.toml has
I followed install steps from Jager CONTRIBUTING.md. Is there something else that's missing or changed to get beyond this step? Thanks. |
could you try with GO111MODULE=off ? |
For me it is now working with the latest all-in-one docker image, but was not working with 1.14... |
I was able to get past make install-tools but when trying to do
Any idea what could be wrong? Do I have to downgrade/upgrade something? I am using |
@aachinfiev this was a new issue that our CI did not catch. I just fixed it in #1907. Make sure to rerun |
I got the latest and tried to build:
and getting
|
that's expected, you need to |
@yurishkuro I saw that you updated it to build binaries as part of CI and released 1.15.1. Thank you very much for that update. I was able to get that binary and confirm that this issue is fixed and search is working properly with badger running on windows. Thanks everyone for helping out. |
With 1.15.1 the tag search now works, but I only see traces having an ID starting with |
@flixr :
I think you should create a new issue for this, and provide a ZIP of your badger directory and Docker command. I'm not using the Zipkin format, but everything seems good w/ 1.15.1 (I see all traces, not just ones starting with |
See #1913 |
Requirement - what kind of business use case are you trying to solve?
Tag based search in Jaeger UI is not finding spans when Badger storage is used.
Problem - what in Jaeger blocks you from solving the requirement?
Tag search in Jaeger UI not finding spans based on tags when badger plugin is used for storage across given time range. I have a series of traces recorded for different hosts and specifying hostname or custom tag that was done within last couple hours locates the span, but it doesn't find anything that was done earlier in the day.
Looking through the code for the plugin it seems like it should be creating an index and storage for all the tags in the plugin and therefore searchable for a given service and operation. However selecting item from UI with correct settings doesn't find the right traces.
Proposal - what do you suggest to solve the problem or improve the existing situation?
I suspect the bug is in the plugin itself (either writer or reader) because when you use memory storage search works properly. I think plugin should allow searching across the full built index (scoped to time range) and find all the traces even if they are stored on disk and not just memory.
Any open questions to address
Just wanted to figure out if there is some undocumented behavior such as only doing search across traces in memory rather than storage or this is something that's not working and should be.
The text was updated successfully, but these errors were encountered: