Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve: review of everything using set #1705

Merged
merged 27 commits into from
Nov 4, 2023
Merged

improve: review of everything using set #1705

merged 27 commits into from
Nov 4, 2023

Conversation

mehah
Copy link
Contributor

@mehah mehah commented Oct 17, 2023

this PR aims to exchange for the best set or for another more viable collection.

when to use???

  • std::set
    • Never!!! Only if the order is important
  • std::unordered_set
    • It is the best option for any type of search, but with few moments of inserting and removing.
  • phmap::flat_hash_set
    • On average, this is the best option if you are not going to do a linear search.
  • vector_set
    • It seems like the best option, but it isn't. It's great if you're inserting elements in batches, because every time you insert them, a flag fires telling you that you need to do sort + unique and this happens if you do a search or a delete.
    • In short, it's perfect for keeping a set of unique information for a short period of time, for example within a method.
stdext::vector_set<int> unique_list;

std::vector<int> list1 { 1, 3, 4, 5 };
std::vector<int> list2 { 4, 2, 87, 2 };
std::vector<int> list3 { 3, 1, 99, 5 };
unique_list.insertAll(list1);
unique_list.insertAll(list2);
unique_list.insertAll(list3);

for (auto v : unique_list) {
	// print 1, 2, 3, 4, 5, 87, 99
}

Benchmark
image

const auto &benchmark = [](std::string_view imple, auto set) {
	constexpr auto stress = 99999999;

	Benchmark bm;
	for (size_t d = 0; d < stress; d++) {
		set.emplace((rand() % stress) + 1);
	}
	set.begin(); // Force Update vector_set
	g_logger().info("{} - emplace {}ms", imple, bm.duration());
	bm = {};
	for (size_t d = 0; d < 999999; d++) {
		for (const auto &v : set) { }
	}
	g_logger().info("{} - linear search {}ms", imple, bm.duration());
	bm = {};
	for (size_t d = 0; d < stress; d++) {
		set.contains((rand() % stress) + 1);
	}
	g_logger().info("{} - random search {}ms", imple, bm.duration());
	bm = {};
	for (size_t d = 0; d < stress; d++) {
		set.erase((rand() % stress) + 1);
	}
	g_logger().info("{} - random remove {}ms", imple, bm.duration());
};

benchmark("std::vector_set", stdext::vector_set<uint32_t>());
benchmark("std::set", std::set<uint32_t>());
benchmark("std::unordered_set", std::unordered_set<uint32_t>());
benchmark("phmap::flat_hash_set", phmap::flat_hash_set<uint32_t>());

@mehah mehah marked this pull request as draft October 17, 2023 20:42
@mehah mehah marked this pull request as ready for review October 17, 2023 22:01
@dudantas dudantas marked this pull request as draft October 21, 2023 02:02
@mehah mehah requested review from luan and dudantas October 23, 2023 20:54
@mehah mehah marked this pull request as ready for review October 23, 2023 20:54
Copy link

sonarqubecloud bot commented Nov 4, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@dudantas dudantas merged commit 1ad50f0 into main Nov 4, 2023
@dudantas dudantas deleted the hashset_to_vectorset branch November 4, 2023 02:03
marcusvcj pushed a commit to marcusvcj/canary that referenced this pull request Nov 4, 2023
Refactor set implementation across the codebase for optimal performance.

Detailed Description:
This encompasses a comprehensive review and overhaul of `std::set` usage within the project, replacing it with more efficient collections as applicable. The refactoring targets instances where set operations are utilized, with a critical analysis of performance considerations.

Key Changes:
• Avoid `std::set` due to ordering overhead when order is not a requirement.
• Employ `std::unordered_set` for scenarios with infrequent insertions and deletions but frequent searches.
• Integrate `phmap::flat_hash_set` as a balanced option for various operations without linear searches.
• Introduce `vector_set` for batch insertions and short-term unique data handling within methods.

Performance Insights:
The `vector_set` is particularly notable for its batch insertion efficiency and automatic sorting and deduplication upon search or deletion operations, making it suitable for maintaining a unique set of elements temporarily.

Example Usage:
stdext::vector_set<int> unique_list;
unique_list.insertAll({1, 3, 4, 5});
unique_list.insertAll({4, 2, 87, 2});
unique_list.insertAll({3, 1, 99, 5});

unique_list now contains 1, 2, 3, 4, 5, 87, 99 without duplicates
marcusvcj pushed a commit to marcusvcj/canary that referenced this pull request Nov 20, 2023
Refactor set implementation across the codebase for optimal performance.

Detailed Description:
This encompasses a comprehensive review and overhaul of `std::set` usage within the project, replacing it with more efficient collections as applicable. The refactoring targets instances where set operations are utilized, with a critical analysis of performance considerations.

Key Changes:
• Avoid `std::set` due to ordering overhead when order is not a requirement.
• Employ `std::unordered_set` for scenarios with infrequent insertions and deletions but frequent searches.
• Integrate `phmap::flat_hash_set` as a balanced option for various operations without linear searches.
• Introduce `vector_set` for batch insertions and short-term unique data handling within methods.

Performance Insights:
The `vector_set` is particularly notable for its batch insertion efficiency and automatic sorting and deduplication upon search or deletion operations, making it suitable for maintaining a unique set of elements temporarily.

Example Usage:
stdext::vector_set<int> unique_list;
unique_list.insertAll({1, 3, 4, 5});
unique_list.insertAll({4, 2, 87, 2});
unique_list.insertAll({3, 1, 99, 5});

unique_list now contains 1, 2, 3, 4, 5, 87, 99 without duplicates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants