-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service hangs when returning full year of data for many points #732
Comments
I was certainly able to reproduce this issue with the attached script. Should be able to get to the bottom of it soon. import requests
import time
def run_boptest_simulation():
# Define the base URL for the BOPTEST server
base_url = "http://localhost"
# Initialize the test case
testcase = "bestest_air"
testid = requests.post("{0}/testcases/{1}/select".format(base_url, testcase)).json()["testid"]
init_url = f"{base_url}/initialize/{testid}"
init_params = {
"start_time": 0,
"warmup_period": 0,
"end_time": 365 * 24 * 3600 # One year in seconds
}
response = requests.put(init_url, json=init_params)
if response.status_code != 200:
raise Exception(f"Failed to initialize test case: {response.text}")
# Run the simulation
advance_url = f"{base_url}/advance/{testid}"
step_size = 3600 # One hour in seconds
current_time = 0
end_time = 365 * 24 * 3600
while current_time < end_time:
response = requests.post(advance_url)
if response.status_code != 200:
raise Exception(f"Failed to advance simulation: {response.text}")
current_time += step_size
print(f"Advanced to {current_time / 3600} hours")
# Retrieve results
measurements = requests.get('{0}/measurements/{1}'.format(base_url, testid)).json()['payload']
inputs = requests.get('{0}/inputs/{1}'.format(base_url, testid)).json()['payload']
points_list = list(inputs.keys()) + list(measurements.keys())
res = requests.put('{0}/results/{1}'.format(base_url,testid),json={'point_names':points_list,
'start_time':0.0,
'final_time':current_time}).json()['payload']
res = requests.put("{0}/stop/{1}".format(base_url,testid))
if res.status_code == 200:
print('Done shutting down test case.')
else:
print('Error shutting down test case.')
if __name__ == "__main__":
run_boptest_simulation() |
Hey guys, So here's the deal. The results payload for the example that I pasted above is 360 MB. Over 1/3 of a GB! Redis has a soft limit of 8 MB for messages, which could be increased to 32 MB perhaps, but it is not advisable. I'm not actually sure what redis is doing when it receives this massive payload as I don't see it passing through when I watch the redis stream. I see the request for results come in and then no record of the response. The worker thinks it fired the response down the pipe, but I think Redis just outright rejects it, although I'm not sure where the log of that might be. (I can elaborate on how you can monitor the traffic through redis at some point). I did finally get a timeout from the client after 20 minutes, which is the timeout time when you run boptest locally. I think though that the issue is more than just how much can we cram through redis. I think sending this much data over an HTTP GET response is also ill advised without some care about what headers we are sending back and how the client deals with it. If the client just does a On the server side, it might be something like this....
Then the client would need to stream the results to a file, by doing something like this... import os
import requests
url = "https://boptest.net/<testid>/results"
file_path = "results.tar"
# Check how many bytes we have already downloaded
resume_header = {}
if os.path.exists(file_path):
file_size = os.path.getsize(file_path)
resume_header = {"Range": f"bytes={file_size}-"}
# Make the request with a Range header
with requests.get(url, headers=resume_header, stream=True) as response:
response.raise_for_status()
with open(file_path, "ab") as f: # "ab" to append if resuming
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk) I don't know if this is a burden we want to put on clients though. Putting aside, for the moment, how we serve the large responses back to clients over http, and back to the redis issue. I think this is easily enough solved as we have object storage readily available in the form of minio / s3. The worker can simply put the results payload in object storage, send a message to the server (over redis) that the results are in, and the web server can serve it up however we decide we want to handle that. |
Hey guys, My 2 cents Best, |
I don't think I"m 100% following what @icupeiro is saying. The response to the To your question @dhblum , the messages that we send over redis don't stick around. In the pub/sub scenario like we are using in BOPTEST they just flow through from publisher to subscriber and then they are gone. |
what @dhblum said is correct. Sorry for not clarifying! |
ok I see. That puts @dhblum's comment about how the messages are stored into context. I will create a test for this and see what is going on. In general, I think my previous comments remain valid in that these responses are too large to send as a message. I will create a test and log message sizes to see what we exactly the details are. |
Thanks @kbenne. I agree we'll have to do something about this, and your idea of using minio for large data requests (or all data requests?) seems like it could work. But it would still be good to debug more what's going on even in the for-loop case to understand how redis is operating right now in this case. |
This issue is observed when running a simulation for a full year and then requesting a full year's worth of data for many points at a time using the
/results
API. The process hangs with no return to the client's request. To reproduce, select a test case and run a full year simulation, then use:For the test case
bestest_air
, the following Service log is observed, the end of which is where Service seems to hang.It seems to hang on the reply from the worker back to web.
Note that requesting only one or two data point names at a time works ok for me. Thus, I wonder if this is a through-put issue on the response from worker back to web. Need to look into this further.
I would appreciate insights from @kbenne on this.
Also FYI @icupeiro and @EttoreZ.
The text was updated successfully, but these errors were encountered: