Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrieving a signal using signal_name acquired by listing signals in a group does not always work #1123

Open
MaximeLecuona opened this issue Jan 6, 2025 · 5 comments

Comments

@MaximeLecuona
Copy link

Python version

python=3.10.14 (main, Apr 6 2024, 18:45:05) [GCC 9.4.0]
os=Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.31
numpy=1.26.4
asammdf=7.4.5

Code

MDF version: 4.00

The code is split in 2 parts. It is normally running in a BEAM pipeline in Google dataflow (so actual Linux, not wsl but the error does reproduce in wsl).
The first part of the code creates a list of signal names contained in a given group this way:
signal_names = list(mdf_file.get_group(gindex).columns) #gindex being the group index

The second part works with each signals, and starts by actually retrieving the signal of interest like this:
s = mdf_file.get(signal_name, gindex, raw=True)

In the vast majority of cases, this works just fine, but there are some some very isolated cases (specific signals in specific files, which I will try to share with you if I am allowed) where step 2 generates an error

Traceback (most recent call last):
  File "/home/mlecuona/ep67_data_engineering_utilities/debugging.py", line 25, in <module>
    for item in _convert_mdf(spec, default_extract_msg_and_ntwrk):
  File "/home/mlecuona/venv310/lib/python3.10/site-packages/ep67_data_engineering_utilities/beam/timeseries/file_conversion.py", line 196, in _convert_mdf
    s = mdf_file.get(spec["SIGNAL_NAME"], spec["index"], raw=True)
  File "/home/mlecuona/venv310/lib/python3.10/site-packages/asammdf/blocks/mdf_v4.py", line 6617, in get
    gp_nr, ch_nr = self._validate_channel_selection(name, group, index)
  File "/home/mlecuona/venv310/lib/python3.10/site-packages/asammdf/blocks/mdf_common.py", line 82, in _validate_channel_selection
    raise MdfException(f'Channel "{name}" not found')
asammdf.blocks.utils.MdfException: Channel "OvrrnLimChrgPwLimWhl_Tq_Zz.OvrrnLimChrgPwLim_PwkW_Yy" not found

But if at this point, I generate the signal name this way, it works :
signal_name = mdf_file.groups[gindex]['channels'][signal_index].name #signal index being the index of the signal in the group

I have implemented this workaround with error handling, so this is not exactly critical, but it did strike me as a bug that might not be 100% my fault

@danielhrisca
Copy link
Owner

danielhrisca commented Jan 10, 2025

The first part of the code creates a list of signal names contained in a given group this way: signal_names = list(mdf_file.get_group(gindex).columns) #gindex being the group index

The second part works with each signals, and starts by actually retrieving the signal of interest like this: s = mdf_file.get(signal_name, gindex, raw=True)

This is inefficient: you first extract all the signals as a pandas dataframe using get_group then you individually extract the signals again using get

In the vast majority of cases, this works just fine, but there are some some very isolated cases (specific signals in specific files, which I will try to share with you if I am allowed) where step 2 generates an error

what value does mdf_file.groups[gindex]['channels'][signal_index].name have?

@MaximeLecuona
Copy link
Author

So, for the first observation, I agree. I don't love this myself, and when I am dealing with all of this in one single process, then I do try to make it more efficient. But here, basically, the first part of the code is meant to map out what signals exist and what group they're in, to then "farm out" their extraction to a bunch of other processes, sometimes running in entirely different machines, that will do the extracting. If there is a way to get that list of signals names without generating a pandas dataframe, I'm very interested.

For the second, the people generating the data got back to me, and it would appear that there was an error in how they were creating the mdf files. But I'm still curious as to how this particular mismatch is possible (not to the point of wasting your time if you have better things to do though). 'OvrrnLimChrgPwLimWhl_Tq_Zz/isy' is the value that mdf_file.groups[gindex]['channels'][signal_index].name returns

@danielhrisca
Copy link
Owner

If there is a way to get that list of signals names without generating a pandas dataframe, I'm very interested.

names = [ch.name for ch in mdf.groups[index].channels]

@danielhrisca
Copy link
Owner

For the second, the people generating the data got back to me, and it would appear that there was an error in how they were creating the mdf files. But I'm still curious as to how this particular mismatch is possible (not to the point of wasting your time if you have better things to do though). 'OvrrnLimChrgPwLimWhl_Tq_Zz/isy' is the value that mdf_file.groups[gindex]['channels'][signal_index].name returns

I could not reproduce the issue, maybe you can send a demo file

@MaximeLecuona
Copy link
Author

I'll try to see if I can get some sample files sent to me without sensitive data. Otherwise, it's no longer a pressing concern as the flaw in the data should be removed going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants