-
Notifications
You must be signed in to change notification settings - Fork 6
Initial draft of harp file standard #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bruno-f-cruz
wants to merge
4
commits into
main
Choose a base branch
from
feat-logging-spec
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+61
−0
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
<img src="./assets/HarpLogo.svg" width="200"> | ||
|
||
# Standardized Harp file format | ||
|
||
## Introduction | ||
|
||
This document defines a standardized file format for logging data from Harp devices. The file format is based on the [Harp Binary Protocol](./BinaryProtocol-8bit.md) and is designed for efficient data logging and parsing. | ||
|
||
One of the main advantages of using a standardized binary communication protocol is that logging data from Harp devices can be largely generalized. Conceptually, because all Harp messages share a common standard structure, we can write all the binary data emitted from a device directly into a single binary file. However, this is not always the most convenient way to log data. For instance, if one is interested in ingesting only a subset of messages (e.g. only the messages from a particular sensor connected to the Harp device), this approach would require a post-processing step to filter out the messages of interest. Furthermore, each address, as per Harp protocol spec, has potentially different data formats (e.g. U8 vs U16) or even different lengths if array registers are involved. This can make it more complex to parse and analyze a binary file offline, since we will have to examine the header of each and every message in the file to determine how to extract its contents. | ||
|
||
This processing step could be entirely eliminated if we could ensure that all messages in a single binary file had the same format. Fortunately, for any given Harp device, the payload stored in a specific register address is guaranteed to have a fixed format. This can be leveraged in order to save messages from a specific register into different fixed-format files, by employing a de-multiplexing strategy. | ||
|
||
## Harp file format | ||
|
||
For each device, we define a "container" file format which is essentially a folder that will store data from a single device, and where the payload from messages coming from each register is saved sequentially to a separate binary file: | ||
|
||
```plaintext | ||
📦<Device> | ||
┣ 📜<DeviceName>_0_<suffix>.bin | ||
┣ 📜<DeviceName>_1_<suffix>.bin | ||
┣ ... | ||
┗📜<DeviceName>_<Reg>_<suffix>.bin | ||
``` | ||
--- | ||
|
||
The various components of this convention are detailed below. | ||
|
||
- the character `_` is reserved as a separator between fields. | ||
- `<DeviceName>` should match the `device.yml` metadata file that fully defines the device and can be found in the repository of each device ([e.g.](https://raw.githubusercontent.com/harp-tech/device.behavior/main/device.yml)). This file can be seen as the "ground-truth" specification of the device. It is used to automatically generate documentation, interfaces and data ingestion tools. While this is not a strict requirement, it is highly recommended. | ||
- `<Device>` is an arbitrary name that identifies the device being used. | ||
- `<Reg>` is the register number that is logged in the binary file. | ||
- `<suffix>` is an optional suffix that can be co-opted by the user to add any additional information to the file name (e.g. a timestamp, a sequence number, etc). If there is no `<suffix>`, the final `_` should be omitted. | ||
- `.harp` is the extension for the container folder. | ||
|
||
### The optional `device.yml` file | ||
|
||
Including the `device.yml` file that corresponds to the interface used to log the device's data is recommended. To do this, we place a `device.yml` file at the root of the container folder. The folder structure thus becomes: | ||
```plaintext | ||
📦<Device>.harp | ||
┣ 📜<DeviceName>_0_<suffix>.bin | ||
┣ 📜<DeviceName>_1_<suffix>.bin | ||
┣ ... | ||
┣ 📜<DeviceName>_<Reg>_<suffix>.bin | ||
┗ 📜device.yml (Optional) ``` | ||
--- | ||
``` | ||
|
||
## Best practices and application notes | ||
|
||
### Logging the device's initial configuration | ||
|
||
Most registers in a Harp device will not emit periodic events. As such, it is impossible to know their state unless explicitly queried. For configuration registers we do want to know this state, since it will define the behavior of the device at runtime. We also want to include metadata registers such as the device name and versions. Fortunately, the [Device specification](./Device.md) defines a feature for dumping the values of all registers during acquisition. By sending a single message to the `R_OPERATION_CTRL` register with `Bit3` set to 1, we can make the device send a rapid sequence of `READ` type messages with the contents of all registers. | ||
|
||
> [!IMPORTANT] | ||
> In your experiments, always validate that your logging routine has fully initialized before requesting a read dump from the device. Failure to do so may result in missing data. | ||
|
||
|
||
## Release notes | ||
|
||
- v0.1 | ||
* First draft. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to require that
<DeviceName>
matches the device type, or is it enough to match the device folder? In our current standards we tend to have individual files inside container folders be named following the full hierarchy structure leading up to the file.This is useful in cases where there may be other types of files stored in the experimental device folder which are not harp binary files themselves, but are data stored and synched together with this device (e.g. video frame data).
In this case if we change the convention for non-Harp files the naming convention will be inconsistent. Alternatively, if we do change the convention to name it after the Harp device type, the naming convention may be misleading, e.g. consider the following:
If we do want to include the device type in the name I guess we could add the container name as a prefix and keep the harp device type in the file:
There are definitely other possible alternatives such as nested folders, etc, but as above I feel somehow we should avoid being overly prescriptive. It is true that the current
harp-python
currently adheres to a specific structure, but perhaps we could make it flexible to adhere to a variety of options.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the standard should encompass multi-file-type containers. In my mind, if you consider the harp container as "indivisible", your use case could instead be modelled as:
What do we gain by doing this? In my mind more predictability of the file structure. At the end of the day, the reason for standardizing the structure are, in my mind:
a. If device.yml is present, we are done
b. if no device.yml is present, it would be nice to at least have a sane way to find where the whoami register is so we can read it and attempt to fetch the device.yml
2.to have a patterned way to open all files inside the container in a easily automated way.
If we want to allow for multiple file names inside the container, I am worried about running into the following situation:
If I now want to ingest this datastream using the harp python library I would need to solve the generate case of Behavior vs Wheel. I am sure this can be handle by a custom regexp, but I would rather consider that as outside the scope of this standard. Otherwise, not really sure we gain much by standardizing anything other than "all harp files are expected to have data from a single register, wherein all messages have the same length and data type".