-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: optionally include first record in data payload #18
Comments
Sample datasets: https://nasfish.fish.washington.edu/echotools/datasets/dbdreader/dbdreader_20230430.zip I believe the glider in question is a Slocum G3. Will continue to investigate as well. This is a useful tool for climbing around the binary data looking at the data records: https://hexed.it/ Results from python supplied script:
|
Did some tracing into the C code, I am wondering if there is an initial mismatch:
The first three items are correct, detect and skip over the first two bytes as the start of the cycle head That is as far as I have traced at the moment. Using the binary snooper I can clearly see the first record of data in the |
Hi @jr3cermak, Thank you for doing such a thorough investigation and providing the test data and scripts. Yes, dbdreader skips the first line of data. On purpose. What I noticed when starting coding dbdreader a long time ago, is that all parameters are set to as "UPDATED" in the first state bytes section of each file. I also noticed that usually the second entry was some considerable time later. My assessment is that it is very unlikely that all parameters are measured at the time of file creation. Nevertheless, they get all published in the dbd file, and I suspect they will take whatever value is in memory. Either these values are nonsense, or were measured some (long) time before. In either case these data points have no scientific value. So the first data line is skipped over. As for the number of bytes to skip being 12, 13 or 14, 13 is the correct number. In earlier versions of dbdreader, I would skip 17 bytes, which I found out by trial and error, as only then the decoding of the state bytes would make sense. Also these bytes would always be the same. The Glider manual which described the data format, did not mention to skip these bytes, though. Later, with the arrival of G3 gliders, that use little endian byte order as opposed to the big endian of persistor based gliders, @erinaceous used these bytes to test the byte order, since these 17 bytes are composed of 12 34 as two bytes, and 4 bytes representing a float 123.456 and 8 bytes representing a double 123456789.12345. The next byte is always 'd' or 64 in hex. This makes the 17 bytes that I needed to skip initially. So in summary, I don't think that this is an issue, unless it is your opinion that you require the first data line as well, even though it does not contain any useful information. |
Thank you @smerckel for the confirmation that the first record is intentionally skipped at the moment. Let me do some checking with my upstream processing to see what is typically done with the first record. There might be a desire to have two additional options: (1) allow passing all values to reproduce the original slocum binary behavior; (2) at least pass along the timestamp with the remaining requested columns filled with nan or missing values. |
@smerckel I disagree that the first line never has value and am of the opinion that it should be a optional choice to keep or drop the first line, but default to dropping. The initialization line is useful for diagnostic purposes more so than science purposes. I think it makes sense to almost always drop the first line of the science files, as pretty much all initialization values for the instruments are zero (although an exception may be something like the card data space used). However the flight files have some use of reading the initialization values, especially if someone ever needs to review data from only 1 segment (therefore only 1 dbd/sbd file) for diagnostics. Many sensors/variables only have a value in that first initialization line (e.g. m_science_on, m_why_started, u_alpha_system_clock_lags_gps, etc.) and not again for the rest of the file. Therefore it may be useful to see what values certain variables initialize with before starting the segment. As for the first 17 bytes, according to Dave Pingal's (of TWR) original binary reading python package (he called
and yes, I am aware that he has a similar line twice that overwrites byte2. |
@s-pearce , thanks for the explanation. I can see the point of having the ability to extract the first line too. I have coded that now, and you can set the behaviour by the class variable SKIP_INITIAL_LINE for the DBD class. In this way, you can set the behaviour of DBD and MultiDBD by setting one variable once, and affects all future calls to the get*() methods. The change is applied to the master branch for now. I made some other changes to bring uniformity in the return values of the get* methods. After updating the manual, I think the code can then be released as version 0.5.0. Thank you @jr3cermak, @s-pearce for the feedback. |
Ooh, could I recommend you make that an instance variable rather than a class-level variable? example: Using dbdreader in multi-threaded code which is handling reading slocum data as it comes off of multiple different gliders, so getting .tbd and .sbd out of order and/or instantiating MultiDBDs for different gliders concurrently. I could see wanting to enable getting the initial line for the extra engineering info on one glider but not another, in parallel. |
I think I would agree with @erinaceous. I just tried out the new
but if I assign the variables before changing
Also thanks for adding in these feature. It is much appreciated. |
@s-pearce : the behaviour you describe was actually intentional. Setting DBD.SKIP_INITIAL_LINE sets how the DBD instance will treat the initial lines of the binary files to be read (until its value is changed again). I considered this more as a policy: set once, preferably at the top, and then process the dbd files. I can see the point of @erinaceous too, although it is a bit hypothetical. Making SKIP_INITIAL_LINE an instance variable means that there is more fine grained control on how the dbd files are read, at the expense of more coding on the user's side. I could add a keyword skip_initial_line to both the DBD and MultiDBD class constructors, which then sets the behaviour for all get*() methods invoked for this instance. You could then also directly set the attribute during the life time of the instance. I would prefer this over making skip_initial_line a keyword to all get*() methods. The example above would then be something like:
Perhaps one of you guys may have a better alternative for the keyword name?
Because the individual DBD instances are created in the constructor of MultiDBD, changing the behaviour of each DBD requires a new method to MultiDBD, something like set_skip_initial_line(boolean value). |
The commit 9c0c949 is working perfectly for our needs. |
I am looking into the possibility that dbdreader might be skipping the first data record. I am constructing a test package which I will share soon of 9 pairs of science and glider data segments. I will also share code which I am using to demonstrate the issue at hand.
In a nutshell, using the slocum linux binaries in converting a sdb and tbd file and then performing a merge reveals that the first record might be missing from the dbdreader decoded messages.
Here are the first couple of records as decoded using the slocum linux binaries:
Here are the first couple of records from dbdreader:
All as is should be, starting with the 2nd record of the slocum binary decoder (1st record of the dbdreader).
I will post a link soon to the dataset, cache files and code that I am working with. The general pattern seems to persist with any pair of science and glider files, so you should be able to reproduce with existing sample data.
The ultimate goal is to allow us replace the slocum binary tools with this library in its entirety. I just noticed this issue at this point and want to see if it was an issue or there might be a reason the first record might be skipped?
The text was updated successfully, but these errors were encountered: