Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thrift exception thrown at specific simulation time #36

Open
ljamt opened this issue Sep 1, 2021 · 12 comments
Open

Thrift exception thrown at specific simulation time #36

ljamt opened this issue Sep 1, 2021 · 12 comments

Comments

@ljamt
Copy link
Member

ljamt commented Sep 1, 2021

Running simulations with proxyfmu seems to crash after some hours (simulation time). For a given OspSystemStructure.xml the exception is thrown at the same simulation time every time.

Here is an example using the Damper.fmu

<?xml version="1.0" encoding="UTF-8"?>
<OspSystemStructure xmlns="http://opensimulationplatform.com/MSMI/OSPSystemStructure"
                    version="0.1">
    <BaseStepSize>0.01</BaseStepSize>
    <Simulators>
        <Simulator name="damper" source="proxyfmu://localhost?file=Damper.fmu" />
    </Simulators>   
</OspSystemStructure>

The case can easily be replicated by using cosim-cli. You can download the cosim executable that includes proxyfmu here.

Run the following command:

.\cosim.exe run C:\path\to\configuration\above\ -d 30000 --mr-progress 1000 --log-level trace

The following will be printed to console after 21840 seconds:

@progress 728 21840.000000 21840.000000
Thrift: Tue Aug 31 08:48:48 2021 TSocket::read() recv() <Host: localhost Port: 60788>: An established connection was aborted by the software in your host machine.

Observations:

  • If simulating without logging results to csv (--output-config none), the simulation will run for approximately 43200 seconds before crashing. However, this is also happening on the same time for every simulation.

  • The proxyfmu process is still alive after cosim-cli (libcosim) has crashed, but its state is unknown.

  • The problem occurs for both FMI 1.0 and 2.0 FMUs

  • Due to not getting any output from the proxyfmu process it is still not confirmed if it is the proxyfmu or the libcosim side where the problem occurs.

Simulation time for crash with single fmu configurations:
DPController.fmu runs for ~41940 seconds
Damper.fmu runs for ~21840 seconds

@markaren
Copy link
Collaborator

markaren commented Sep 1, 2021

I would think this is due to proxyfmu throwing an exception. That would manifest as a thrift exception on the caller side.
Why is another question. Likely related to the values read/write? However, that does not make sense wr.t the change seen with -output-config none. Memory issue?

@markaren
Copy link
Collaborator

markaren commented Sep 2, 2021

I'm able to reproduce using only the proxyfmu API. The FMU does not matter, it threw an exception after some 2.800.000 calls to get and step functions even for identity.fmu.

@ljamt
Copy link
Member Author

ljamt commented Sep 2, 2021

I'm able to reproduce using only the proxyfmu API. The FMU does not matter, it threw an exception after some 2.800.000 calls to get and step functions even for identity.fmu.

Great news that it is reproducible. I originally posted the issue on libcosim until we figured out the root cause. Are you suspecting the issue to be related to the thrift wrapper? In that case we should tranfer this issue to proxy-fmu.

@ljamt ljamt transferred this issue from open-simulation-platform/libcosim Sep 2, 2021
@ljamt
Copy link
Member Author

ljamt commented Sep 2, 2021

Closing as duplicate of #34

@ljamt ljamt closed this as completed Sep 2, 2021
@davidhjp01
Copy link
Contributor

davidhjp01 commented Feb 6, 2023

We observe this issue re-occuring with the latest libcosim/0.10.1@osp/stable which uses thrift/0.17.0
image

Used the tutorial demo (mass spring damper) on Windows.

@davidhjp01 davidhjp01 reopened this Feb 6, 2023
@markaren
Copy link
Collaborator

markaren commented Feb 6, 2023

There was a memory leak in those kinds of FMUs that was fixed at some point. Are we sure its not the FMUs?

@davidhjp01
Copy link
Contributor

We observe this with quarter-truck and FMUs generated from other Simulink models. They run fine with thrift 0.13.0.

@markaren
Copy link
Collaborator

markaren commented Feb 6, 2023

Would be interessting to see if this is a boost dependency issue. libcosim forces dependencies to build with 1.71.0 rather than 1.81.0 requested by thrift.

Anyhow, would a conan override be sufficient for end-users?

@davidhjp01
Copy link
Contributor

davidhjp01 commented Feb 6, 2023

Actually it seems the issue was due to the old version of boost. I tested sometime ago with the upgraded boost (1.81.0) and the simulation ran without crashing. But cmake generated a lot of warnings during the build. I think it is because boost/1.81.0 is too new to my current cmake's (3.25.1) findboost:
image

Also proxyfmu is statically linked with boost. So I guess we need to upgrade boost manually in proxyfmu? Not sure if we also need to upgrade libcosim and cosim-cli etc.

Edit: the issue was not due to boost see below..

@markaren
Copy link
Collaborator

markaren commented Feb 6, 2023

Ah, right, it's statically linked! I think the only reason proxyfmu uses 1.71.0 is becouse I wanted to avoid the two specifying different versions for the sake of double downloads.

@davidhjp01
Copy link
Contributor

Just done some testing locally, but I still observe the error using the new build.
It is bit strange. Only difference when the simulation did not crash is that I explicitly specified boost/1.81.0 in proxyfmu's conanfile.py.

@davidhjp01 davidhjp01 linked a pull request Feb 7, 2023 that will close this issue
@davidhjp01
Copy link
Contributor

davidhjp01 commented Feb 7, 2023

@markaren I suggest to revert thrift version back to 0.13.0? until someone else can debug into thrift more thoroughly

@davidhjp01 davidhjp01 removed a link to a pull request Feb 7, 2023
davidhjp01 added a commit to open-simulation-platform/libcosim that referenced this issue Feb 8, 2023
A temporary workaround for an issue found in proxyfmu: open-simulation-platform/proxy-fmu#36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants