-
Notifications
You must be signed in to change notification settings - Fork 0
File Formats
Belki imports datasets from tab-separated values (TSV) files. They are text files representing tabular data and may written by spreadsheet applications or data processing software, e.g. Python Pandas.
In general, Belki operates on multidimensional data, ie. a vector of numerical values for each protein, of same length for all proteins. The dimensions can be freely named.
Practical example: You perform several MS measurements and would like to visualize their combined output in Belki. Each of these measurements contributes a dimension of a protein's data vector. If a protein is missing in one of the measurements, insert zero.
You may offer data in a normalized in the range [0;1] or raw. If there is a high dynamic range in your data, and auto-normalization is off, it will be displayed in log-scale in Belki.
You may also offer a confidence score for each data point. These scores are visualized in Belki alongside their resp. data points. Scores can be any number without a predefined range. Currently, lower scores are seen as better (this will be configurable).
This is the most simple format understood by Belki. It has a one-line header, followed by each protein in a column. Illustratory example:
Measurment 1 | Meas. B | Another run | |
---|---|---|---|
PLEC_HUMAN | 2.1 | 0.5 | 0 |
AHNK_HUMAN | 3.145 | 0.6 | 0 |
MYH9_HUMAN | 0.456 | 0.7 | 12 |
MYOF_HUMAN | 0.123 | 0.8 | 0 |
DYHC1_HUMAN | 0.123 | 0.9 | 6 |
This format is more complex but may contain additional information. In this format, multiple lines are used for each protein. Again, an illustratory example representing the same data as above:
Protein | Pair | Dist | Score |
---|---|---|---|
PLEC_HUMAN | Measurment 1 | 2.1 | 1 |
PLEC_HUMAN | Meas. B | .5 | 1 |
PLEC_HUMAN | Another run | .5 | 1 |
AHNK_HUMAN | Measurment 1 | 3.145 | 0.6 |
AHNK_HUMAN | Meas. B | 0.6 | 1 |
AHNK_HUMAN | Another run | 0 | 1 |
ℹ️ To disable auto-normalization, right now you need to change the feature column name from "Dist" to "AbundanceLeft" and use "Load Abundance Values" from File menu to import the dataset.
Belki projects are stored in files with suffix .belki. These files use the CBOR format to encode the data.
CBOR binary data serialization format loosely based on JSON, similar to MessagePack. It can be easily read and written by other applications. File contents are self-explanatory, but a schema is currently not available.