-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node: 0: error: ...ross-inline.h:106: Maximum zero-offset tie chain reached (100), increase #define in ross-types.h #231
Comments
So this is one downside to the ROSS unbiased tiebreaker. The unbiased tiebreaker feature of ROSS will fairly and consistently choose an ordering of events that are tied temporally with other events. Things get complicated, however, when zero-offset events are also present. To clarify: zero offset events are events that are created with zero tw_stime delay from the event that created them. Since zero offset events naturally tie, temporally, with their causal event (and also any events that tie with it), consistently ordering those events in a fair way requires an array of tie breaking values (automatically generated by ROSS) with a cardinality that is equal to the number of zero offset "generations" Ex: if you have an event A that creates another event with zero offset, A'. And A' creates another zero offset event A'', and so on and so forth to get A''''', you'd need a tie breaking value array of size 6 to fairly break ties in a way that doesn't violate causality. That size is the max tie chain length and because this is encoded into messages transmitted across PEs, it has to be statically allocated into each event. Thus the longer that chain needs to be, the heavier the impact on memory will be. Setting that value to 20,000 will mean that each event has an array of 20,000 64-bit floats encoded into it. That's a very heavy structure. Solutions:
If you want some more context on this tie breaking feature, here's a paper I wrote on it: https://nmcglo.com/public-files/papers/2021_wsc_tiebreaker.pdf |
Thanks for your reply. Actually, i don't know the principle behind CODES and ROSS. |
Apologies for delay in response, I've been starting a new job and traveling a lot of November. The quick solution is actually to set -DUSE_RAND_TIEBREAKER=off when configuring ROSS (then rebuild ROSS and CODES), this will disable the deterministic tiebreaker feature of ROSS which reverts the functionality of ROSS in handling event processing order to the state that it was a year or so ago. For the most part, it is "good enough". The tiebreaker's purpose is to guarantee the deterministic ordering of event processing when there exists simultaneous events in the simulation. Without the tiebreaker there is a mild probability of non-deterministic output and the tiebreaking of simultaneous events is not 'unbiased' which implies that there will be some ruleset that will break ties in a way that doesn't assign an equal probability to any ordering of these simultaneous events. It should not make significant difference semantically unless you're trying to make very formal and strict statistical analysis on the output of many runs. |
Hello,
I am testing the running multiple jobs with contiguous allocation as in the Exercise 3 in (https://github.com/codes-org/codes/wiki/quick-start-interconnects). However, this error, node: 0: error: /home/codes-dev/build-ross/include/ross-inline.h:106: Maximum zero-offset tie chain reached (100), increase #define in ross-types.h occurs. I try to increase the value MAX_TIE_CHAIN in ross-types.h. However, with this value increasing, the simulation eat much memory and run extremely slowly.
For this case, i have increased MAX_TIE_CHAIN from 100 to 20000, and the error disappears. However, the memory required is more than 300G, which lead to the program broken.
How to fix this problem. Thanks a lot.
The text was updated successfully, but these errors were encountered: