Profiler highlights runtime.convTslice #166

timbray · 2022-12-30T00:48:41Z

timbray
Dec 30, 2022
Maintainer

I'm building a big PR in support of #153 and one of the things I decided to bring over from Event Ruler is the CL2 benchmark, see https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/Benchmarks.java#L469.

I just pulled in the first test (look for EXACT_RULES and EXACT_MATCHES) and noticed that Ruler was a quite a bit faster than Quamina, 200K matches/sec as opposed to 150K or so. So I profiled it and the majority of time is in CoreMatcher.matchesForFields() as one would expect, but more or less 100% of the elapsed time in that func seems to be runtime.convTslice(). I looked at the source for that (https://go.dev/src/runtime/iface.go) and found it fairly puzzling, so I guess the next step is to look at the generated code and see where that's being called?

If the profiler is right, if we could remove this routine (looks like housekeeping) Quamina could become much faster. At the back of my mind I'm wondering if it's related to the X type being any and if that forces silly slice reprocessing to some more specific type.

Anyhow, just posting this to capture state of mind as I knock off for the day.

yosiat · 2023-01-07T10:18:05Z

yosiat
Jan 7, 2023
Collaborator

@timbray It looks like runtime.convTslive() is coming from sort.Slice we have on the fields.

I tried replacing sort.Slice which operates on any, with sort.Sort it looks something like:

type FieldsSort []Field

func (a FieldsSort) Len() int {
	return len(a)
}
func (a FieldsSort) Less(i, j int) bool {
	return bytes.Compare(a[i].Path, a[j].Path) < 0
}
func (a FieldsSort) Swap(i, j int) {
	a[i], a[j] = a[j], a[i]
}

sort.Sort(FieldsSort(fields))

And now there is no calls to runtime runtime.convTslice (from what I see in the cpu profile of TestCl2)

I ran cl2 tests and city-lots to compare if it actually improves the situation:

	EXACT	ANYTHING-BUT
Baseline	2,630,469.1	710,226.7
sort.Sort	2,697,063.3	739,819.4
Sort check	2,879,297.3	750,239.4

Baseline: #167 (33d2255)
sort.Sort: Replacing sort.Slice with sort.Sort
Sort check: do sort only if array have more than one item (on top of sort.Sort)

name \ time/op    baseline     sort.Sort    sort-check
CityLots-10       4.30µs ± 3%  4.35µs ± 3%  4.14µs ± 1%

name \ alloc/op   baseline     sort-Sort    sort-check
CityLots-10         828B ± 1%    697B ± 1%    694B ± 1%

name \ allocs/op  baseline     sort-Sort    sort-check
CityLots-10         31.0 ± 0%    28.4 ± 2%    28.0 ± 0%

0 replies

timbray · 2023-01-10T01:06:10Z

timbray
Jan 10, 2023
Maintainer Author

OK, I replaced sort.Slice with sort.Sort as in your code and cl2_test increased from 153K/second to 159K/second. Not huge but every little bit helps. Now to look at the profiler again…

2 replies

timbray Jan 10, 2023
Maintainer Author

Haha, but profiler says convTslice is still >50% of the runtime of matchesForFields()

yosiat Jan 10, 2023
Collaborator

@timbray Isn't that weird you are getting 153k events per second, while I am getting 2.6 million / 750k?

 ❯ go test -v -run "^TestCl2"
=== RUN   TestCl2
lines: 213068
Field matchers: 6 (avg size 1.000, max 1), Value matchers: 1, SmallTables 44 (avg size 3.128, max 7), singletons 0
Field matchers: 6 (avg size 1.000, max 1), Value matchers: 1, SmallTables 44 (avg size 3.128, max 7), singletons 0
EXACT events/sec: 2630469.1
Field matchers: 6 (avg size 4.000, max 4), Value matchers: 4, SmallTables 72 (avg size 3.056, max 5), singletons 0
ANYTHING-BUT events/sec: 712602.0
--- PASS: TestCl2 (0.95s)
PASS
ok      github.com/timbray/quamina      1.205s

(I am on M1 Mac laptop, with go 1.19)

timbray · 2023-01-10T17:24:00Z

timbray
Jan 10, 2023
Maintainer Author

!!!!!!! OK, that does it, I must get an M1. I have a 2019 intel Mac. BTW which configuration is yours?

1 reply

yosiat Jan 11, 2023
Collaborator

I was testing on MacBook Pro 16-inch, 2021, Apple M1 Max with 64GB of RAM.

Today I tested on a friend's Intel-based Macbook (2.3 GHz 8-Core Intel Core i9) and we are getting -

=== RUN   TestCl2
lines: 213068
Field matchers: 6 (avg size 1.000, max 1), Value matchers: 1, SmallTables 44 (avg size 3.128, max 7), singletons 0
Field matchers: 6 (avg size 1.000, max 1), Value matchers: 1, SmallTables 44 (avg size 3.128, max 7), singletons 0
EXACT events/sec: 1521914.3
Field matchers: 6 (avg size 4.000, max 4), Value matchers: 4, SmallTables 72 (avg size 3.056, max 5), singletons 0
ANYTHING-BUT events/sec: 457227.5

Which is too weird, we are doing (for EXACT) 1.5/2.5 million events per second and you are getting 150k.

timbray · 2023-01-10T17:25:22Z

timbray
Jan 10, 2023
Maintainer Author

Also it's really weird that you're getting much faster time on EXACT than ANYTHING-BUT. This is on the upstream origin or on my PR branch?

1 reply

yosiat Jan 11, 2023
Collaborator

Did the tests on your branch (v1.0prep)

timbray · 2023-01-11T16:58:31Z

timbray
Jan 11, 2023
Maintainer Author

OK, pardon, I was being stupid. I used to run the TestCityLots all the time and I was used to seeing numbers like 150K-180K on TestCityLots and so I interpreted 15xxxxx as 15xxxx. So we are in the same region, although the M1 is considerably faster.

TestCityLots is a lot slower because it is matching the numeric fields so it has to read all those big floating-point arrays. So the very nice performance of TestCL2 is probably due mostly to your work on the flattener so it can skip most of the data.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiler highlights runtime.convTslice #166

{{title}}

Replies: 5 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Profiler highlights runtime.convTslice #166

timbray Dec 30, 2022 Maintainer

Replies: 5 comments · 4 replies

yosiat Jan 7, 2023 Collaborator

timbray Jan 10, 2023 Maintainer Author

timbray Jan 10, 2023 Maintainer Author

yosiat Jan 10, 2023 Collaborator

timbray Jan 10, 2023 Maintainer Author

yosiat Jan 11, 2023 Collaborator

timbray Jan 10, 2023 Maintainer Author

yosiat Jan 11, 2023 Collaborator

timbray Jan 11, 2023 Maintainer Author

timbray
Dec 30, 2022
Maintainer

Replies: 5 comments 4 replies

yosiat
Jan 7, 2023
Collaborator

timbray
Jan 10, 2023
Maintainer Author

timbray Jan 10, 2023
Maintainer Author

yosiat Jan 10, 2023
Collaborator

timbray
Jan 10, 2023
Maintainer Author

yosiat Jan 11, 2023
Collaborator

timbray
Jan 10, 2023
Maintainer Author

yosiat Jan 11, 2023
Collaborator

timbray
Jan 11, 2023
Maintainer Author