-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Descriptors do not parse Nones to h5 #7
Comments
Hey @GemmaTuron I reviewed this and indeed there is nothing useful in the logs unfortunately. It could be a memory issue. Since we've both tried the models through CLI, I'll try to use the Python API for them, but in general how would you suggest I try reproducing this? I can let this run on a Linux machine with the data you are using and see if the same behavior is repeated. |
mmm is there a way to see more verbosity from the python API? then we can maybe better see what is going on. |
Hi @DhanshreeA and @Abellegese I have little time next week to look at this in detail. If either of you has some spare time and can think about it or elucidate the cause for failure that would be super, but no pressure. To give a few more pointers:
Okay I hope this is helpful |
Okay @GemmaTuron |
Hello @DhanshreeA From the above there is one model that certainly does not work,
The log error is this one: |
LOL I have found I was not adding the ".csv" in the input file and Ersilia was not telling me but running the model and producing this weird output. This needs to be more informative. But it brings me back to the starting point as to why models are failing, if I run them manually they are able to produce outputs for all the inputs I pass to them, so it does not seem to be a memory error. Do you have any suggestions of how to debug this? For the moment I will set to run in verbose mode the two first models (DILI and AMES) see if it happens exactly the same and if the verbose mode gives some more info. |
Hi @DhanshreeA and @Abellegese Session needs to be closed once the other model is already initialized:
Session seems properly closing:
|
Update:
|
I have identified what the problem is. I will explain using When a molecule gives "None" as a result, instead of being properly parsed into the .csv format (where under each column we would have a None) we have the first column with a list of Nones and the rest of the columns are empty:
What we need is:
I am 99% sure this is due to the FAST API packaging because this did not happen before, and is happening across all models. @DhanshreeA can you take care of fixing this? For completion I have found the same issue in To facilitate the screening, three molecules that fail in the eos3ae6 are: [Cl-].[NH4+] NLXLAEXVIDQMFP-UHFFFAOYSA-N A molecule that fails in eos4u6p: Two molecules that fail in eos8a4x: |
Okay this is interesting @GemmaTuron Id think I now have clear picture of whats going on. This wont be a problem at our model out content checker because the current implementatiion, thes are the problem I need to fix in test commad:
But I will try to fix this in easy way possible. |
Also more information in this from @Abellegese - this only happens in Docker models not when you fetch from GitHub |
Okay @Abellegese Some more information on this error. I have found two types of behaviour:
Example: model eos3ae6
I cannot try with eos4u6p as it does not fetch from github.
Example: eos8a4x
|
Yes exactly, this what I found as well. Thanks @GemmaTuron .
I will work on that. More info is also appreciated. |
I am sorry I cannot provide more info as I have not found more molecules that are correct but fail to be described at the individual model level. For eos4u6p I can re-try if we solve the github fetch :) |
Hi @GemmaTuron okay the first reason why the h5 fails is, for instance for bash /root/bundles/eos3ae6/20250116-5251bd9f-927d-454f-acdc-5d4bb17a2c0f/model/framework/run.sh /root/bundles/eos3ae6/20250116-5251bd9f-927d-454f-acdc-5d4bb17a2c0f/model/framework /tmp/ersilia-4c1em5lh/input-2ee8617a-bc66-4e88-8e54-725203863869.csv /tmp/ersilia-4c1em5lh/output-2ee8617a-bc66-4e88-8e54-725203863869.csv
INFO: 172.17.0.1:38364 - "POST /run HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
File "/root/bundles/eos3ae6/20250116-5251bd9f-927d-454f-acdc-5d4bb17a2c0f/app/main.py", line 219, in run
response = orient_to_json(R, header, data, orient, output_type)
File "/root/bundles/eos3ae6/20250116-5251bd9f-927d-454f-acdc-5d4bb17a2c0f/app/utils.py", line 50, in orient_to_json
record[columns[j]] = values_serializer([values[i][j]])[0]
File "/root/bundles/eos3ae6/20250116-5251bd9f-927d-454f-acdc-5d4bb17a2c0f/app/utils.py", line 33, in values_serializer
return [float(x) for x in values]
File "/root/bundles/eos3ae6/20250116-5251bd9f-927d-454f-acdc-5d4bb17a2c0f/app/utils.py", line 33, in <listcomp>
return [float(x) for x in values]
ValueError: could not convert string to float: ''
No computed charges. |
This results a status code (ersilia-py3.12) ersilia bug-fixes-h5 docker history ersiliaos/eos3ae6
IMAGE CREATED CREATED BY SIZE COMMENT
cf446ace057e 3 weeks ago RUN |1 MODEL=eos3ae6 /bin/sh -c apt-get upda… 260MB buildkit.dockerfile.v0
<missing> 3 weeks ago RUN |1 MODEL=eos3ae6 /bin/sh -c apt-get upda… 11.4MB buildkit.dockerfile.v0
<missing> 3 weeks ago COPY ./eos3ae6 /root/eos3ae6 # buildkit 216kB buildkit.dockerfile.v0
<missing> 3 weeks ago WORKDIR /root 0B buildkit.dockerfile.v0
<missing> 3 weeks ago ENV MODEL=eos3ae6 0B buildkit.dockerfile.v0
<missing> 3 weeks ago ARG MODEL=eos3ae6 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ENTRYPOINT ["sh" "/root/docker-entrypoint.sh… 0B buildkit.dockerfile.v0
<missing> 6 weeks ago EXPOSE map[80/tcp:{}] 0B buildkit.dockerfile.v0
<missing> 6 weeks ago RUN /bin/sh -c apt-get clean && apt-get auto… 52MB buildkit.dockerfile.v0
<missing> 6 weeks ago COPY . /ersilia-pack # buildkit 79.2kB buildkit.dockerfile.v0
<missing> 6 weeks ago WORKDIR /root 0B buildkit.dockerfile.v0
<missing> 2 months ago CMD ["python3"] 0B buildkit.dockerfile.v0
<missing> 2 months ago RUN /bin/sh -c set -eux; for src in idle3 p… 36B buildkit.dockerfile.v0
<missing> 2 months ago RUN /bin/sh -c set -eux; savedAptMark="$(a… 40.9MB buildkit.dockerfile.v0
<missing> 2 months ago ENV PYTHON_SHA256=bfb249609990220491a1b92850… 0B buildkit.dockerfile.v0
<missing> 2 months ago ENV PYTHON_VERSION=3.10.16 0B buildkit.dockerfile.v0
<missing> 2 months ago ENV GPG_KEY=A035C8C19219BA821ECEA86B64E628F8… 0B buildkit.dockerfile.v0
<missing> 2 months ago RUN /bin/sh -c set -eux; apt-get update; a… 2.33MB buildkit.dockerfile.v0
<missing> 2 months ago ENV LANG=C.UTF-8 0B buildkit.dockerfile.v0
<missing> 2 months ago ENV PATH=/usr/local/bin:/usr/local/sbin:/usr… 0B buildkit.dockerfile.v0
<missing> 2 months ago # debian.sh --arch 'amd64' out/ 'bullseye' '… 80.7MB debuerreotype 0.15
|
Its straight forward to solve the h5 issues but I just want to figure out the root of cause of the issue. I will post what source code change has been made between the github and the docker image. |
Hi @GemmaTuron As I previously suspected the
{
"run": {
"input": {
"key": {
"type": "string"
},
"input": {
"type": "string"
},
"text": {
"type": "string"
}
},
"output": {
"outcome": {
"type": "numeric_array",
"shape": [
33
],
"meta": [
"R_0",
"R_1",
"R_2",
"R_3",
"R_4",
"R_5",
"R_6",
"R_7",
"R_8",
"R_9",
"R_10",
"I_0",
"I_1",
"I_2",
"I_3",
"I_4",
"I_5",
"I_6",
"I_7",
"I_8",
"I_9",
"I_10",
"IR_0",
"IR_1",
"IR_2",
"IR_3",
"IR_4",
"IR_5",
"IR_6",
"IR_7",
"IR_8",
"IR_9",
"IR_10"
]
}
}
}
}
{
"run": {
"input": {
"key": {
"type": "string"
},
"input": {
"type": "string"
},
"text": {
"type": "string"
}
},
"output": {
"whales": {
"type": "numeric_array",
"shape": [
33
],
"meta": [
"R_0",
"R_1",
"R_2",
"R_3",
"R_4",
"R_5",
"R_6",
"R_7",
"R_8",
"R_9",
"R_10",
"I_0",
"I_1",
"I_2",
"I_3",
"I_4",
"I_5",
"I_6",
"I_7",
"I_8",
"I_9",
"I_10",
"IR_0",
"IR_1",
"IR_2",
"IR_3",
"IR_4",
"IR_5",
"IR_6",
"IR_7",
"IR_8",
"IR_9",
"IR_10"
]
}
}
}
} |
Okay that is helpful @Abellegese I see the change in the output section from "outcome" to "whales" but I do not see why this would affect the parsing of the None results or I am missing something here? This model was probably updated by @DhanshreeA locally but changes were never pushed into the repository. |
The issue detected
Root Cause
Solution: Schema-Driven Data Standardization # api_schema.json
{
"run": {
"output": {
"outcome": {
"meta": ["R_0", "R_1", ..., "IR_10"] # <-- Defines expected keys
}
}
}
} Implementation Flow
Few Advantages
Closing Remarks
So the mismatch detector designed to have O(1) time complexity and is superfast when we have large data. |
Hi @Abellegese Many thanks this is super helpful. I will try the branch and let you know if it fixes the issue by Monday. |
I am running validations of the Zaira pipeline with the purpose of comparing several descriptors. I have a fairly large list of model descriptors set up in vars.py, and those get correctly picked by ZairaChem in
parameters.json
:and the reference descriptor eos7w6n, in total 13 descriptors. I have set an automated run across TDC benchmarks with 5 fold validations. On the first dataset tested (DILI), the descriptor eos3ae6 was not calculated (done_eos.json):
On the second dataset (AMES, currently running) we have lost more descriptors: eos4u6p, eos3ae6, eos8ax4. See the done_eos.json:
The problem is that they do appear as calculated without problems in the logs. I have attached below the different logs for example:
It seems to me the models are running into errors that are not reported in the logs. @DhanshreeA do you pick up what could be happening from the information I am sharing? Could also potentially be a memory issue?
I tried the models individually through the CLI (not the python API) before I set the pipelines to run and they did work
The text was updated successfully, but these errors were encountered: