-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR] Ease the use of similar concept (binary tree vs red black tree, rope vs string vs buffer,...) #11722
Comments
I am not clear on what is being suggested here. I would imagine that one can write code parameterized over "concepts" with our existing trait system. (E.g. you make a Tree trait, and the code is parameterized over <T:Tree>, and then callers feed in the particular tree implementation. Likewise for a character sequence.) |
Your proposal looks like c++ template, but you don't answer the question : "how the end user will know which tree to use ?" To take an Ocaml like example, you will use a AbstractString, the compiler will use String, Buffer or Rope. Rope if the string is big, Buffer if there is a lot of concatenation, String otherwise. A beginner will not need to know every concept of the standard library to have fast code. The standard library could offer many different contener without flooding the user. You could also see this as a user guided way to make optimisation, that replace high level code, by an other one (imagine a serie of string concatenation replace by a buffer creation). But it looks more easy to do, in large scale. |
Branching based on the current representation and then switching the representation based on usage implies a very high bookkeeping overhead. For an unordered map, a hash table is almost always the right answer. For an ordered map, a tuned B-tree implementation will beat a red-black or AVL tree. Red-black trees are only useful because iterators and node pointers are never invalidated by insertions or removals of elements other than the one that's directed pointed. Rust's unique strings/vectors are owned, mutable data structures so copy-on-write is not feasible. It would require many cases of locking to handle sending data between tasks. The obvious representation is as a dynamic array, and it's possible to write a type using small string optimization to trade off overall performance and code size for decreased memory usage and better memory locality with small strings. See #4991 for the existing issue about this. A rope has a different time complexity and performance characteristics than a dynamic array. Switching dynamically would be very risky for a language focused on reliable performance/latency. Additionally, keeping track of usage via metadata to choose the representation would be too expensive. As for integration with the compiler itself to choose statically, I do not think it is realistic. It would require writing profiles for the application and doing high-level Rust-specific profile-guided optimization. Rust inherits the philosophy of C++ in avoiding hard-wired language features possible to implement in a library, with the standard types as unprivileged as possible so third party libraries able to reimplement data structures with the same power available. If there are specific instances where this can be shown to make sense, then lets open issues for those. I don't think we need a general metabug about it. |
This is a compiler choice, nothing is dynamique. The heuristic could be very simple : use of implemented method, presence or not of loops, number of method calls, use of parameter to other function (to avoid recopy if possible). Red-black trees is use everywhere in linux, so it has some advantges. "a hash table is almost always the right answer.", sure but with 2 or 3 more contener, the compiler could choose always the best solution. "Rust's unique strings/vectors are owned, mutable data structures so copy-on-write is not feasible. It would require many cases of locking to handle sending data between tasks." If the compiler detect that a string, can't be send between task, it could choose to use copy-on-write. "The obvious representation [of string] is as a dynamic array, " not for text widget. Insertion is very costly. Contener of string to handle a string, is much more effiscient. "Switching dynamically " I never propose to do it dynamically, only at compile time. A future version could use profile guided compiling, but only in the future. I don't understand your point on library, contener/String are example, because they have known interface. But you could extend it to any algorithme (search, sort, strategy pattern...) |
The compiler will be unable to determine this information in anything but a toy example. Retrieving the necessary metadata would require dynamic instrumentation via profile-guided optimization. Even if this was a realistic optimization, Rust is diametrically opposed to an implicit performance model like this. The performance of an application should not be vulnerable to small inconsequential changes to the code destroying performance across the codebase by subverting a fragile whole program optimization.
Red-black trees are a well-known data structure so they're widely used. Data structures like this with one element per-node are viable intrusive data structures, which is why the Linux kernel itself uses them. This is not something Rust's ordered map can expose. Rust is also unable to take advantage of the iterator invalidation guarantees which are required by the C++ standard and result in the implementations being forced into using a red-black tree. As soon as a high quality implementation of a B-tree is finished, Rust can replace the current
It will be unable to determine this in most cases. Rust doesn't have an effects system, and this probably wouldn't be possible to model in a reasonably simple implementation of one. Copy-on-write is not worth doing for anything but large strings, so that's more unrealistic analysis to perform. The compiler just isn't going to have static knowledge about the string sizes in real world applications.
I don't really see how you're going to speed up sorting or searches by having the compiler select different algorithms. If allocating is okay, then a well-written merge sort is hard to beat. Otherwise, a heap sort works well when allocation is unacceptable without losing Rust could always use a radix sort for integers as the cases when it can be used are static knowledge. A library is fully capable of doing this without hard-wiring special cases into the compiler. |
There is many sort of contener and choosing the best, can't be easy to do. For string, it's easy to always use String type (as in ocaml or java),but concatenation will be slow, very slow. It could be great if the compiler use Buffer instead. Or rope (a contener of string), if the string is udge.
To do that, i propose to autorise a virtual class to be declared, the compiler will choose a daughter class. The class invariant should help to avoid mistakes.
Compiler could choose with heuristics as loops presence, and frequency of use of methods.
The text was updated successfully, but these errors were encountered: