tutorial pdes ross

Tutorial: PDES and ROSS - Starting from Scratch

This page assumes that you are entirely unfamiliar with CODES and PDES. You're here because you want to utilize the CODES network interconnect simulator for some purpose, but not everyone that is sitting where you're currently sitting will want to use CODES for the same purpose. That makes writing introductions to the simulator somewhat difficult as information that is critical for some may be useless for others. So assuming that you know nothing and are here to learn at least a little about everything is actually easier than trying to make a page for every experience level.

CODES is a multi-purpose High Performance Computing network interconnection simulation framework. It is made up of a few core modules. Number one, CODES runs using Rensselaer's Optimistic Simulation System (ROSS), a massively parallel discrete event simulation (PDES) framework, as its engine. All interactions between entities in a CODES simulation are modeled using events in the ROSS PDES engine. Thus, in order to really understand what is happening in CODES under-the-hood, one must understand ROSS and PDES, first.

ROSS and PDES

ROSS (Rensselaer's Optimistic Simulation System) is a parallel discrete event simulation (PDES) framework. So if one is to be working with ROSS, they should probably also become familiar with PDES itself. PDES is an expansion on DES or discrete event simulation.

The essense of DES is that all entities or actors in a simulation are called Logical Processes (LPs). These LPs each have unique state and a set of functions that dictate their behavior throughout the simulation. Any interaction between two LPs is called an event in the simulation. We frequently use the term event message colloquially both as a carrrier of an event or interchangeably with the term event itself.

When an LP receives a message from another, its behavior is determined by an Event Handler function. This function is written by the developer to essentially update the receiving LP's state based on the information encoded in the event message and, if necessary, create a new event in the simulation.

ROSS Toy Model - Mail

There are lots of discrete event simluation examples I could use but many have been described before and I want to offer a different perspective to potentially grant you some orthogonal information to better supplement your knowledge of DES.

An example toy DES simulation could be "Simulating a large town's postal system". In this simulation there will could be several types of LPs: Post Offices, Mail Carriers, and Households.

-Post Offices process mail, forwarding it either to other post offices or handing it off to mail carriers who deliver mail to households in the post office's district.

-Mail carriers simply pick up mail from their assigned post office and deliver it to the household specified on the letter. They also receive new letters from households on their route to introduce to the system.

-Households create and receive letters. This is a good example of how one doesn't necessarily have to be super granular when creating LPs. This LP is an abstraction of a mailbox located at a specific address as well as all of the people residing at the home that might send a letter.

Let's say that each household has a 10% probability of sending a letter every 100 timesteps in the simulation. This would be modeled by the household scheduling a "heartbeat" (self directed and periodic) event every 100 timesteps and the LP uses a random number generator to determine if it puts anything in its mailbox. Let's call this type of event H_HEARTBEAT Now we know how letters end up in the mailboxes, ready to be picked up by mail carriers. When the mail carrier visits the household and requests letters from the mailbox, the household would send an event H_LETTERSEND for each letter in its mailbox, popping one letter each time.

Mail carriers visit each household once every 100 timesteps. We'd utilize a similar heartbeat message to "remind" the mail carrier that it needs to start its rounds, visiting each household to check their mailbox. We'll call this type of event M_HEARTBEAT. Each time the mail carrier checks a mailbox, that's another event in the system. The mail carrier interacts with the household LP by sending an event to that household LP that requests all mail that the household has in its mailbox. (M_REQUEST) If it has any letters destined for the visited household, it'd send an event to that LP (M_LETTERDELIVER) for each of them. Once the mail carrier has visited all of its assigned households, it goes to the post office to drop off its load of new letters (M_LETTERDELIVER for each) and it then requests (M_REQUEST) a new load of letters to deliver to households.

Post Offices process all letters in its incoming mailbox once every 100 timesteps. We'd again utilize a similar heartbeat message to ensure this happens (P_HEARTBEAT). Each time it wakes up with its heartbeat message, it processes all letters in its incoming mailbox, if a letter in question is destined for a household in its own district, it puts it in its local outgoing mail, if the letter in question is destined for a household in a different district then it sends the letter to the post office in that district. This is an interaction between two post offices and again is represented by an event in the simulation: P_TRANSFER. We could have, here, utilized another LP type like a Transfer Carrier that simply routes mail to and from post offices but I'd rather not add much more complication to this toy model. When the post office receives a M_REQUEST event from a mail carrier, it needs to send a P_LETTERSEND event to the respective mail carrier for each letter that it has in its local mail outbox that is destined for a household that that mail carrier visits.

This likely seems like a mess to implement but let's organize it a little bit more. Let's enumerate and describe the events that we have to implement event handlers for for each LP based on the types of events that each LP will receive.

Household

H_HEARTBEAT: This self-event causes the household to decide whether or not to create a letter to send to another household. We schedule a new one each time we receieve one to ensure that this heartbeat continues ad-infinitum (or until some end time).
M_REQUEST: This is an event sent by a mail carrier. When received by a household, they should then send H_LETTERSEND events to the mail carrier for each letter in its mailbox.
M_LETTERDELIVER: This is an event sent by a mail carrier. When received by a household, they should then increment their counter of the number of letters received. Nothing else is performed. (Except the virtual people in the household would read their letter but that just happens in our imagination!)

Mail Carrier

M_HEARTBEAT: This self-event causes the mail carrier to wake up and start its route. We schedule a new one each time to ensure this heartbeat continues ad-infinitum.
H_LETTERSEND: This is an event sent by a household as a response to a sent M_REQUEST event by the specific mail carrier. It represents a letter being given from the household's mailbox to the mail carriers outgoing load. The mail carrier LP would add this letter to its data structure representing the letters to deliver to its assigned post office. The mail carrier, after having visited each household in its route, would send a M_LETTERDELIVER message to its post office for each letter in its outgoing mail load.
P_LETTERSEND: This is an event sent by a post office as a response to a sent M_REQUEST event by the specific mail carrier. It represents a letter from the post office that is destined for a household in the the mail carrier's route. These letters will sit with the mail carrier until it starts its next route.

Post Office

P_HEARTBEAT: This self-event causes the post office to wake up and process all letters in its inbox. If a letter in its inbox is destined for a household in its district, then it puts it in its local outbox (for a mail carrier to pick up). If a letter in its inbox is destined for a household in its district, it sends a P_TRANSFER message to the respective post office with the letter in question.
P_TRANSFER: This is an event sent by a different post office, it contains a letter that is destined for a household in the receiving post office's district. For some added realism, let's have the post office put the letter into its inbox so it will be processed the next time the post office executes its heartbeat. We could have just as easily put the letter into its own local outbox but since when are post offices perfectly efficient?
M_LETTERDELIVER: This is an event sent by a mail carrier. When received by a post office, they should put the given letter into their inbox for processing at its next heartbeat.
M_REQUEST: This is an event sent by a mail carrier. When received by a post office, they should then send a P_LETTERSEND to the mail carrier for each letter that is destined for a household in the specific mail carrier's route.

We now have a total of 10 unique Event Type, Recipient Type pairs. Each LP will have to have an event handler for each type of event that it could possibly receive. We don't need to implement an event handler for the event type P_TRANSFER on the Household LP because the Household LP type will never receive that type of event. This seems like a lot of work but this is a somewhat complicated toy model designed to have some interesting dynamics (like depending on how you schedule things a mail carrier could arrive at a post office very shortly after the post office processes its heartbeat and so that mail will have an extra delay before it is delivered - hey, accidents happen!).

Granularity of LPs

How granular to make LPs is a tough choice and there's no good rule of thumb as if you make it too coarse, you might lose out on simulating certain dynamics of interaction. For instance, it would be technically possible for us to make a single LP representing ALL post offices, but then we have to consider how to represent how mail is transferred from one post office to another if they're all within the same LP. Would that be represented as a event message to itself? How should LP state be written to keep track of all mail at all post offices? This particular simplification might not be very consequential to the final results of the simulation - but it's still something to consider thoroughly as it may not always be so inconsequential with more complex models. On the other end, we could make things even more granular by breaking up a household into multiple LPs. One for the mailbox, and one for each member of the household who might send or receieve a letter. This gives us the benefit of possibly taking into account the order with which each member of a household might send or receive a letter.

So it sort of seems like if we go to coarse with our granularity, then we risk losing out on some behavior that we would have gotten if we didn't simplify it so much. So why not just make it as fine grained as possible; make an LP for every single tiny potential entity in the simulation that could have an effect on how mail is delivered? We could have an LP for each mail truck, different trucks might have different engines or problems that affect how quickly a mail carrier can deliver mail. We could break up Post Office LPs into multiple LPs: one for its inbox, one for its to-household outbox, one for its to-post-office outbox, and multiple for all of the workers that process mail from inbox to its correct outbox. The problem with too much granularity is that this will greatly increase the number of events in the simulation, and that obviously affects the total number of events to be processed by whatever computer is running this simulation. With more granularity, comes at the cost of greater runtimes.

Parallelism to the Rescue

Here's where the P in PDES comes in. Parallel Discrete Event Simulation is an expansion on regular discrete event simulation. Logical Processes (LPs) are spread across multiple Processing Elements (PEs). Each PE represents a physical process on a computer in charge of processing all events received by the LPs that it houses. In ROSS, each PE represents a single physical MPI rank. So if you run a ROSS simulation with mpirun -n 4, then you'll be running the specified simulation with 4 PEs. Similarly, if all you do is change that to mpirun -n 16, then you'll be running the same simulation but with 16 PEs. The same LPs are now just spread across the PEs with less overal density.

So now we suddenly have the power to parallelize the processing of events in a simulation, now we can make it very fine grained and we won't miss out on any potential intersting dynamics! Right? Well not quite, just yet. Simply spreading LPs to different cores or processors would technically multiply the total processing power dedicated to processing events and it would then be faster but there's no free lunch. Because we're using different processors for the processing of events, then the LPs on each processor (or PE) will be using a different clock for the scheduling of an event. This means that two different LPs on different PEs might have a different idea of what has happened at wall clock time: 100. Additionally, one needs to also consider the distributed computing problem of consensus, ensuring that all LPs across all PEs are in agreement about what events have happened in the past. The ROSS simulation framework abstracts these problems away from the end-user but it has essentially two modes for parallel execution of a discrete event simulation - but it also allows for the same simulation to be run all on one PE in Sequential Execution, essentially removing the P from PDES.

The first method by which ROSS can make a DES sim into a PDES sim is through Conservative Parallel Execution. In this method, LPs are spread across PEs but there is additional overhead that prevents LPs from scheduling events in a way that would contradict a previously agreed upon causality. This synchronization overhead inflicts a great deal of slow-down onto the simulation than if we just let LPs process events as fast as they can with no concern for causality agreement. But the resulting simulation woudln't really be an accurate representation, the results would likely be drastically different from sequential version - and that's not what we want.

This is where Optimistic Parallel Execution comes into play. That slow-down in conservative mode as a result of the protective, causality-maintaining, synchronization overhead takes away from some of our potential gains of running in parallel! We want our simulation to run quickly! Well if we want to make sure everything is as accurate as it would be in sequential mode then there's going to have to be SOME level of synchronization, there's no getting around that. But what if we let every PE process events as fast as they'd like and we just handle the conflicts when they come up? If conflicts happen less frequently than not, then we are still going to have some advantage by running in parallel. ROSS makes use of the Time Warp algorithm for handling the rollback of the entire simulation to a previous, valid, state. ROSS keeps track of all messages in case it needs to create "anti-messages" to undo the effects of any conflicting messages to get the simulation back to a valid state (this is instead of periodically creating full simulation "snapshots" to revert to). There are occaisionally moments of synchronization where a minimum-agreed-upon-time is negotiated. This is the latest time that all PEs agree on the causality of all events prior to that time. No events can be scheduled before this time by any LP. This time is known as Global Virtual Time (GVT). GVT is periodically updated as the simulation progresses and any stored events (that were kept in case we needed to undo them!) prior to that time can be garbage collected.

This method of execution is more complicated than sequential (or conservative for that matter) but the consensus and synchronization management is handled by ROSS. What is left to the developer of a ROSS model is the creation of a Reverse Event Handler function for LPs. Like the (Forward) Event Handler function introduced earlier, which describes how an LPs state changes when it receieves an event message, the Reverse Event Handler describes how an LPs state changes when it receives an anti event message. It "undoes" the changes of the forward event handler. This process is known as Reverse Computation and it is the main technique that ROSS uses to maintain high parallel performance gains without sacrificing overall simulation validity.

Remember the 10 forward event handlers one would have to write for the hypothetical mail delivery ROSS model? Now the developer of that model would have to write 10 reverse event handlers to allow for optimistic execution!

Determinism

This might have already occurred to you, but what happens when we have randomness in our model? If we roll back all events to a previous time and replay forward, even if every LP started with their own RNG seed, how does the final result always line up with what we expect from sequential execution? ROSS has a trick up its sleeve, each LP has a (multiple, actually) rollback-able RNG. During reverse computation, one needs only call the RNG rollback function the number of times that the RNG was used during forward computation. This can be done easilly by encoding a "number of times the RNG was used while processing this event" into the received event which is saved until garbage collection or processing by the reverse handler which then just calls the RNG rollback function that number of times!

With this, if reverse computation handlers are written by the developer correctly, the simulation should ALWAYS have the exact same output regardless of if it was run with 1, 4, 16, or 65,536 PEs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial pdes ross

Tutorial: PDES and ROSS - Starting from Scratch

ROSS and PDES

ROSS Toy Model - Mail

Household

Mail Carrier

Post Office

Granularity of LPs

Parallelism to the Rescue

Determinism

Next Up:

Tutorial: CODES overview

Master (stable branch)

Kronos-develop branch

Clone this wiki locally