Skip to content

Robotics-Mechatronics-UMA/dataset_extract_sequence

Repository files navigation

Extract sequence

Overview

Script to extract a sub-sequence from a dataset composed of folder for each device and data in one data.txt file or multiple files.

Installation

Just clone the repository and check the dependencies.

Dependencies

  • Python 3.5
  • Python packages:
    • os
    • argparse
    • shutil (for copying files)

Contents

  • *_options: the scripts to store the options for 'argparse'.
  • extract_sequence: class to extract data from a single sequence (e.g. folder).
  • extract_single: script to extract data from a single sequence. It recieves several options (see Single extraction options or extract_single_options.py).
  • extract_file: script to automate the process of extracting data from multiple sequences. It uses a .txt file as a single parameter with a specific format to describe the different sequences that can be shared and edited (see Extraction format or extraction_format.txt).

Usage

The recommended method is extract_file.

In the case of extract_single, you have to specify all the needed arguments. There are different examples in tests_single.txt:

python extract_from_file.py -o output/ -f input/ -d imu -t single_file --start 1559735717.606276 --end 1559735717.720282

In the case of extract_file, you have to create a .txt file to store the information about the extraction in the right format (see Extraction format or extraction_format.txt). There is an example in the file test_sequence.txt:

python extract_file.py -f test_file.txt

Both extracts have a sequencing options. This options allows to create a correspondence file between the extracted timestamps and a continuous sequence number in order to index consecutive data.

For the output folder, if the folder is a subdirectory, the intermediate subdorectories must be created by the user.

Single extraction options

  • -o/--output: Output sequence name (i.e. relative path from the execution directory).

  • -f/--folder: Objective folder to extract from (i.e. relative path from the execution directory).

  • -d/--device_list: Spaced separated list with the aimed devices.

  • -t/--data_formats: Format to extract files from {single_file / multiple_files}.

  • --start: Start time for the sequence.

  • --end: End time for the sequence.

  • -v/--verbose: Optional parameter to print additional info for debug purposes.

Extraction format

This is the format to be followed by any .txt that wants to be used as an input for the extract_file.py script. Each line corresponds to a sequence that can have the same name. The format uses | , : or separators depending on the case.

Each line is:

    output | folder | device1 device2 ... | format1 format2 ... | interaval1 interval2 ...

Output

The relative path and name (from the script execution directory) for the output sequence (e.g. folder). If it already exists, it will add the data appending to the existent data.txt and merging (with override if it exists) of the files for the already existent devices.

Folder

The relative path and name (from the script execution directory) for the input sequence to be splitted. It is assumed to have the following format:

sequence/
    device1/
        data.txt                (if single file)
    device2/
        device-tmstmp1.ext     (if multiple files)
        device-tmstmp1.ext
        .
        .
        .
    device3/
    .
    .
    .

Devices

It corresponds to directories under the folder parameter and is understood as different data sources.

The number of devices and formats must be the same.

Formats

The format can be:

  • single_file: in case that the data is stored in a data.txt file. The structure of the data inside the file is expected to be:

      timestamp data1 data2 data3 ...
    
  • multiple_files: in case that the data is stored in multiple files. The filename of the data is expected to be:

      device-timestamp.extension
    

Interval

Each interval is of type:

start:end

The units must be the same as in the data as they are compared.

Using -1 as start or end means that there is no low or up bound.

About

Script to extract a sub-sequence from a dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages