Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about utilization log #6

Open
GitHubDiom opened this issue Apr 6, 2021 · 3 comments
Open

question about utilization log #6

GitHubDiom opened this issue Apr 6, 2021 · 3 comments
Assignees

Comments

@GitHubDiom
Copy link

I ran the file ext/raith21/main.py and check the output dataFrame.

And I found something wrong with the values in the log file.

Some values for CPU and memory exceed 100 and even reach 500.

I used Pandas to analyze the data and the results are as follows, I don't know if I modified the program to cause the data recording error.

                cpu           mem
count  73803.000000  73803.000000
mean      88.563560     30.511850
std      105.502109     42.766048
min        0.000000      0.000000
25%        4.750000      0.967088
50%       50.833333     11.770266
75%      144.333333     44.669338
max      529.000000    214.148662
@WSeubring
Copy link

WSeubring commented Apr 25, 2021

Got a question in the same trend, as the utilization also goes over 1. I am using default components with a custom fuctionsim based on the invocation used for the skippy exeriment. To my (limited) knowledge, the functionsim in Raith21 seems to allow multiple parallel invocations on the same replica, causing a higher utilization.

I tried using a lock, with a resource for each replica with a capacity of 1 to limit the concurrent execution (based on skippy exeriment). See code below. This still results in a utilization over 1.

My question is: How can the concurrency of the invocation on the replicas be enforced using faas-sim?

Initialization of Simpy resources for each replica. (functionsim.py init)

        def resource_factory():
            # each function replica can serve one request at a time (concurrency limit = 1)
            return simpy.Resource(env, capacity=1)
        
        self.resources: Dict[FunctionReplica, simpy.Resource] = defaultdict(resource_factory)

Locking based on resource capacity (functionsim.py invoke)

  with resource.request() as lock:
            yield lock 
            #execution
            resource.release(lock)

@phip123
Copy link
Contributor

phip123 commented Apr 30, 2021

Hi, sorry for the late response.

The reason for a seemingly too high utilization is caused by the time of utilization logging.
The function I'm referring to is here:

faas-sim/sim/faas/system.py

Lines 396 to 405 in 1dbe97d

def simulate_function_invocation(env: Environment, replica: FunctionReplica, request: FunctionRequest):
node = replica.node
node.current_requests.add(request)
env.metrics.log_start_exec(request, replica)
yield from replica.simulator.invoke(env, replica, request)
env.metrics.log_stop_exec(request, replica)
node.current_requests.remove(request)

This leads to logging the execution of a function, before the corresponding FunctionSimulator executes it.
For example the HTTP simulators in raith21 use locking to guarantee a maximum number of parallel requests, but the logging happens before enforcing this.

We are currently restructuring the resource utilization model (this will entail a seperation of logging and utilization and puts the responsibility of claiming resources to the implementation of FunctionSimulator).

For now, I suggest you to remove the lines, that log & modify the current requests list, from simulate_function_invocation and call them manually at the right moment. This heavily depends on your FunctionSimulator approach (process-per-request, an internal HTTP server etc.).

Further, it may be normal for the utilization to go higher than expected.
If real life, the CPU would simply run @ 100% and schedule between tasks.
To simulate this performance interference, caused by competing resource access, we added the possibility to predict the performance degradation. We will release our models but the implemenation is able to take any kind of scikit regression model.
Look at:

def create_degradation_model_input(node_state: NodeState, start_ts: int, end_ts: int) -> np.ndarray:

@phip123 phip123 self-assigned this Apr 30, 2021
@WSeubring
Copy link

Thanks for the response!

Manualy calling the logging worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants