Update hashbrown #445

thaliaarchi · 2024-10-18T09:27:07Z

hashbrown 0.15 was released this month, which notably removed the RawTable API in favor of HashTable and changed its hasher from ahash to the faster foldhash.

Migrate to HashTable
Update indexmap to bump its hashbrown dependency
Refactor a manual implementation of an index map with cloning in TermDag to indexmap::IndexMap

Besides this, Max's symbol_table still requires an older version of hashbrown in the lockfile. I have a draft to update that and could rebase to include it, but didn't figure out how to run its Criterion benchmarks.

saulshanabrook · 2024-10-18T13:46:11Z

src/termdag.rs

+    /// A bidirectional map between deduplicated `Term`s and indices.
+    nodes: IndexSet<Term>,


Could we change this to pub? I expose the TermDag fields in the Python bindings, because it's used when getting the extracted node(s).

Would it be better to create getters for what you need? Then it matters less what it's stored as. Since there were no other uses of the field, it looked safe to make private; a method would also make that more clear. pub is fine too, though 🤷‍♀️

Yeah, that sounds good to me.

Looking through the usages in Python, it seems like the only things I do with a termdag is call term_to_expr and then lookup a term based on its id (termdag.nodes[term_id])

Looking up a term by TermId is already exposed via TermDag::get. So it sounds like what you need is already public?

What about the conversion in convert_struct! in your earlier comment? Would you need a fn TermDag::iter(&self) -> impl Iterator<Item = &Term>?

Naw that's ok, I can just not expose the TermDag struct like that, and instead just expose a few methods on it.

codspeed-hq · 2024-10-18T13:53:09Z

CodSpeed Performance Report

Merging #445 will not alter performance

_{Comparing thaliaarchi:update-hashbrown (90e6e69) with main (b0db068)}

Summary

✅ 6 untouched benchmarks

saulshanabrook

Thanks for this PR!

Overall this looks good, I like the simplification of the TermDag from this change.

In terms of the benchmarks, we just added them and are still refining their use. I don't think the slowdowns listed are significant. I have an open PR (#444) to turn off these shorter benchmarks, which have high variability due to indeterminism in the memory allocator.

If you click on the details for one of the "slowdowns" you can see that it's due to allocation during parsing:

The longer running benchmarks, like eggcc-extraction and math-microbenchmark seem to have no change from this PR.

Alex-Fischman · 2024-10-19T00:46:22Z

+1 to not letting "performance regressions" in the parser block this PR

thaliaarchi · 2024-10-19T05:15:23Z

I've pushed a commit, which replaces symbol_table with my fork that bumps their version of hashbrown. It's a hack to run benchmarks in CI. I'll then submit a PR to symbol_table, depending on the results, and rebase here.

It looks like running benchmarks isn't automatic; could you trigger another run?

saulshanabrook · 2024-10-22T16:49:32Z

@thaliaarchi I believe the benchmarks have run on your most recent push! The benchmark comment gets updated whenever a new run is processed in this branch.

It seems like the only regression is in cykjson. I wouldn't personally let that block merging in this PR, but I am not very familiar with what that example is for.

yihozhang · 2024-10-24T01:48:30Z

cykjson is a small cool egglog example that does the CYK parsing algorithm of JSON-like strings. It is a more Datalog-like workload (dynamic programming) with some e-class manipulations

Need to update with expose termdag api for Python

hashbrown 0.15 removed the RawTable API in favor of HashTable; migrate to that. It also switched to foldhash, a faster hasher than ahash. Update indexmap too, which depends on hashbrown.

This removes the need to duplicate `Term`s for hash-consing.

thaliaarchi · 2024-10-24T02:55:23Z

I dropped the commit for benchmarking updating symbol_table. If there's a response there later, it can be updated here separately.

I also changed fn TermDag::get(&self, id: TermId) -> Term to return &Term instead. All internal usages use the result by reference, so cloning is unnecessary. Does this affect anything externally?

Alex-Fischman · 2024-10-24T04:14:20Z

I also changed fn TermDag::get(&self, id: TermId) -> Term to return &Term instead. All internal usages use the result by reference, so cloning is unnecessary. Does this affect anything externally?

This is good, egglog is unstable and users can clone if they need it.

saulshanabrook · 2024-10-24T20:09:54Z

Thanks @thaliaarchi for working on this and responding to all the feedback! If you have anything to add, we are also discussing the tradeoffs with hash performance and determinism in this post: #439 (comment)

EDIT: It looks like these changes also caused a 7% speedup in the biggest benchmark (added to main after this PR was started, so wasn't included in the comparison here), which is pretty nice! https://codspeed.io/egraphs-good/egglog/runs/671a868380493f6bc05c7bfc

thaliaarchi · 2024-10-25T06:01:04Z

@saulshanabrook Thanks! I'm glad to see such speedups!

thaliaarchi requested a review from a team as a code owner October 18, 2024 09:27

thaliaarchi requested review from ajpal and removed request for a team October 18, 2024 09:27

saulshanabrook reviewed Oct 18, 2024

View reviewed changes

saulshanabrook previously approved these changes Oct 18, 2024

View reviewed changes

thaliaarchi mentioned this pull request Oct 19, 2024

Update hashbrown and switch to foldhash mwillsey/symbol_table#8

Merged

saulshanabrook self-requested a review October 24, 2024 01:54

saulshanabrook removed request for ajpal and saulshanabrook October 24, 2024 01:55

thaliaarchi added 2 commits October 23, 2024 19:39

Update hashbrown

1796a74

hashbrown 0.15 removed the RawTable API in favor of HashTable; migrate to that. It also switched to foldhash, a faster hasher than ahash. Update indexmap too, which depends on hashbrown.

Replace ad hoc index set in TermDag with IndexSet

90e6e69

This removes the need to duplicate `Term`s for hash-consing.

thaliaarchi force-pushed the update-hashbrown branch from a92c5b7 to 90e6e69 Compare October 24, 2024 02:51

saulshanabrook approved these changes Oct 24, 2024

View reviewed changes

saulshanabrook merged commit af49ae2 into egraphs-good:main Oct 24, 2024
5 checks passed

This was referenced Oct 25, 2024

Fix sources of nondeterminism in egglog #439

Merged

Update symbol_table #456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update hashbrown #445

Update hashbrown #445

thaliaarchi commented Oct 18, 2024

saulshanabrook Oct 18, 2024

thaliaarchi Oct 19, 2024

saulshanabrook Oct 22, 2024

thaliaarchi Oct 24, 2024

saulshanabrook Oct 24, 2024

codspeed-hq bot commented Oct 18, 2024 •

edited

Loading

saulshanabrook left a comment •

edited

Loading

Alex-Fischman commented Oct 19, 2024

thaliaarchi commented Oct 19, 2024 •

edited

Loading

saulshanabrook commented Oct 22, 2024

yihozhang commented Oct 24, 2024

thaliaarchi commented Oct 24, 2024

Alex-Fischman commented Oct 24, 2024

saulshanabrook commented Oct 24, 2024 •

edited

Loading

thaliaarchi commented Oct 25, 2024

		/// A bidirectional map between deduplicated `Term`s and indices.
		nodes: IndexSet<Term>,

Update hashbrown #445

Update hashbrown #445

Conversation

thaliaarchi commented Oct 18, 2024

saulshanabrook Oct 18, 2024

Choose a reason for hiding this comment

thaliaarchi Oct 19, 2024

Choose a reason for hiding this comment

saulshanabrook Oct 22, 2024

Choose a reason for hiding this comment

thaliaarchi Oct 24, 2024

Choose a reason for hiding this comment

saulshanabrook Oct 24, 2024

Choose a reason for hiding this comment

codspeed-hq bot commented Oct 18, 2024 • edited Loading

CodSpeed Performance Report

Merging #445 will not alter performance

Summary

saulshanabrook left a comment • edited Loading

Choose a reason for hiding this comment

Alex-Fischman commented Oct 19, 2024

thaliaarchi commented Oct 19, 2024 • edited Loading

saulshanabrook commented Oct 22, 2024

yihozhang commented Oct 24, 2024

thaliaarchi commented Oct 24, 2024

Alex-Fischman commented Oct 24, 2024

saulshanabrook commented Oct 24, 2024 • edited Loading

thaliaarchi commented Oct 25, 2024

codspeed-hq bot commented Oct 18, 2024 •

edited

Loading

saulshanabrook left a comment •

edited

Loading

thaliaarchi commented Oct 19, 2024 •

edited

Loading

saulshanabrook commented Oct 24, 2024 •

edited

Loading