Skip to content
/ yactfr Public

yet another CTF reader: a CTF reading library offering a C++14 API

License

Notifications You must be signed in to change notification settings

eepp/yactfr

Repository files navigation

yactfr

yactfr (yac·tif·er) is yet another CTF reader.

The yactfr library is written in C++14 and offers a (🥁…​) C++14 API.

While the CTF reading libraries that I know about focus on decoding and providing you with complete, ordered event record objects, the yactfr API offers a lower level of CTF processing: you iterate individual element sequences to obtain elements: beginning/end of packet, beginning/end of event record, beggining/end of structure, individual data stream scalar values like fixed-length integers, fixed-length floating point numbers, and null-terminated strings, specific clock value update, known data stream ID, and the rest. This means the sizes of event records and packets don’t influence the performance or memory usage of yactfr.

Some use cases where this library can prove useful:

  • Trace inspection programs, for example when you need to debug a custom CTF producer.

  • Foundation of an API with a higher level of abstraction.

  • Targeted offline analyses with low memory usage.

The name yactfr takes its inspiration from yajl, a JSON parser with a similar approach.

Warning
yactfr is not mature yet! Its API is still experimental: the interface could change at any time.

Notable features

  • Full CTF 1.8.3 and CTF 2 support.

  • Only depends on Boost at build time, and nothing at run time (except for your C++ runtime library):

    $ ldd libyactfr.so
            linux-vdso.so.1 (0x00007ffe98fa5000)
            libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fb8337d2000)
            libm.so.6 => /usr/lib/libm.so.6 (0x00007fb83343d000)
            libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fb833225000)
            libc.so.6 => /usr/lib/libc.so.6 (0x00007fb832e69000)
            /usr/lib64/ld-linux-x86-64.so.2 (0x00007fb833e51000)
  • Simple, documented C++14 API with a focus on immutability.

  • Efficient data stream decoding: zero memory allocation, zero smart pointer reference count change, zero virtual method call, and zero copy during an element sequence iteration.

    • yactfr decodes a data stream fixed-length bit array in place: the result is available as an element member.

    • A data stream string/BLOB is available as one or more raw data parts in consecutive elements. The actual raw data is a pointer to the current data block of the data source (for example, from a memory mapped region), which means zero string/BLOB copy.

      This is possible because, as per CTF 1.8.3 and CTF2-SPEC-2.0, data stream strings and BLOBs have an 8-bit alignment requirement.

  • Decodes CTF packets of any size and event records of any size with steady memory usage and performance.

  • Offers a memory mapped file view data source, but you can also implement your own data source.

  • It’s safe to use two different iterators on the same element sequence in two different threads.

  • Full CTF metadata text inspection and validation: a parsing error exception contains a list of locations (offset in bytes, line number, and column number) with associated context messages.

  • Full data stream packet decoding information with the input iterator interface, for example:

    • The decoded packet magic number.

    • The decoded metadata stream UUID.

    • The expected total length of the current packet.

    • The expected content length of the current packet.

    • The default clock is updated.

    • The next data stream type to use.

    • The next event record type to use.

    • The data stream (instance) ID.

  • “Seek packet” iterator method to seek the beginning of an element known to be located at a specific offset, in bytes, within the same element sequence.

    You can use this method with non-CTF packet index information, for example LTTng's packet index files (not directly supported by yactfr).

  • Decoding error reporting with precise message, offset in element sequence (bits) where this message applies within the element sequence, and other relevant properties.

  • Performance (nop iteration loop) is similar to Babeltrace 1.5.3’s (-o dummy) with a release build.

    I don’t publish numbers: experiment by yourself to confirm or deny this claim.

Limitations

Note that after a few years of working with all sorts of real life CTF traces, I never caught one which yactfr would refuse to decode, but I’m still honest:

  • Only builds on a Linux/Unix platform.

    The non-portable part is the memory mapped file source which uses system functions such as open(), close(), fstat(), mmap(), and madvise().

  • Decodes up to, and including, 64-bit signed/unsigned CTF fixed-length bit arrays and integers (no “big integers”).

  • Decodes up to, and including, 63-bit (effective, which means 72 total) signed/unsigned CTF variable-length integers.

  • Only decodes 32-bit and 64-bit CTF fixed-length floating point numbers.

  • Packet lengths must be multiples of 8 bits (I’m still not sure, but this could be enforced by the specification anyway), and be at least 8 bits.

  • TSDL (CTF 1.8): Doesn’t handle single-line comment continuation when the line ends with \:

    event {
        name = "hello"; // this is a comment \
        id = 23; we're still in the comment started above here
        id = 42;
        ...
    };
  • TSDL (CTF 1.8): Doesn’t support relative dynamic-length array type lengths and variant type selectors in data type aliases (or named structure/variant types) which target structure member types outside this data type alias.

    For example, this is not supported (TSDL):

    fields := struct {
        int len;
    
        typealias struct {
            int sequence[len];
        } := my_struct;
    
        struct {
            int len;
            my_struct a_struct;
        } field;
    };

    This is also not supported (TSDL):

    fields := struct {
        enum {
            ...
        } tag;
    
        variant my_variant <tag> {
            ...
        } a_variant;
    
        my_variant the_variant;
    };

    The example above would work, however, if the selector location of the variant type would be absolute:

    fields := struct {
        enum {
            ...
        } tag;
    
        variant my_variant <event.fields.tag> {
            ...
        } a_variant;
    
        my_variant the_variant;
    };
  • API and ABI backward compatibility is not guaranteed at this point.

    Please rebuild your project if you change the yactfr version.

Build and install yactfr

Make sure you have the build time requirements:

  • Linux/Unix platform

  • CMake ≥ 3.10.0

  • C++14 compiler

  • Boost ≥ 1.58

  • If you build the API documentation: Doxygen

Build and install yactfr from source
$ git clone https://github.com/eepp/yactfr
$ cd yactfr
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=release ..
$ make
# make install

You can specify your favorite C and C++ compilers with the usual CC and CXX environment variables when you run cmake, and additional options with CFLAGS and CXXFLAGS.

Specify -DOPT_BUILD_DOC=YES to cmake to enable the HTML API documentation build (requires Doxygen). The documentation is available in BUILD/doc/api/output/html, where BUILD is your build directory.

Specify -DCMAKE_INSTALL_PREFIX=PREFIX to cmake to install yactfr to the PREFIX directory instead of the default /usr/local directory.

For example, this is how I run cmake for development:

$ CC=clang CXX=clang++ CXXFLAGS='-Wextra -Wall -pedantic' \
  cmake .. -DCMAKE_BUILD_TYPE=debug -DOPT_BUILD_DOC=ON

For production, you should make a release build:

$ CC=clang CXX=clang++ \
  cmake .. -DCMAKE_BUILD_TYPE=release -DOPT_BUILD_DOC=ON

Run the tests

Once you have built the project in the build directory, you can run the tests. You need Python 3 and pytest.

Run the yactfr tests from the build directory.
$ make check

If you’re in a hurry and you have the pytest-xdist package, you can parallelize the testing process. You need to set the YACTFR_BINARY_DIR environment variable to the build directory (absolute path), for example:

Run the yactfr tests in parallel from the build directory.
$ make tests
$ YACTFR_BINARY_DIR=$(pwd) pytest -n logical

Usage examples

In the examples below, the program accepts two arguments:

  1. The path to the metadata stream file of the trace (required).

  2. The path to a data stream file of the same trace (required by some example).

Build the API documentation for a thorough reference.

Note
The examples are not necessarily optimal: their purpose is to show what the yactfr API looks like.
Example 1. Print all the data stream’s event record names.
#include <cassert>
#include <fstream>
#include <iostream>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the event record names
    for (auto& elem : seq) {
        if (elem.isEventRecordInfoElement()) {
            auto& erInfo = elem.asEventRecordInfoElement();

            // the name of an event record type is optional
            if (erInfo.type()->name()) {
                std::cout << *erInfo.type()->name() << std::endl;
            }
        }
    }
}
Example 2. Print all the fixed-length signed integers of the sched_switch event records and their offset.
#include <cassert>
#include <fstream>
#include <iostream>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the fixed-length signed integers of the `sched_switch` ERs
    const auto endIt = seq.end();
    bool inSchedSwitchEventRecord = false;

    for (auto it = seq.begin(); it != endIt; ++it) {
        if (it->isEventRecordInfoElement()) {
            auto& ertElem = it->asEventRecordInfoElement();

            // the name of an event record type is optional
            if (ertElem.type()->name() && *ertElem.type()->name() == "sched_switch") {
                std::cout << "---" << std::endl;
                inSchedSwitchEventRecord = true;
            } else {
                inSchedSwitchEventRecord = false;
            }

            continue;
        }

        if (inSchedSwitchEventRecord && it->isFixedLengthSignedIntegerElement()) {
            std::cout << it.offset() << ": ";

            auto& intElem = it->asFixedLengthSignedIntegerElement();

            if (intElem.structureMemberType()) {
                std::cout << intElem.structureMemberType()->displayName() << ": ";
            }

            std::cout << intElem.value() << std::endl;
        }
    }
}
Example 3. Print all the packet offsets and lengths (both in bits): slow version.

In this example, we iterate all the elements of the data stream. The next example shows how to do the same faster.

#include <cassert>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the packet offsets and lengths (both in bits)
    const auto endIt = seq.end();
    yactfr::Index curPktOffset = 0;
    unsigned long curPktNumber = 0;

    for (auto it = seq.begin(); it != endIt; ++it) {
        if (it->isPacketBeginningElement()) {
            // save packet beginning offset
            curPktOffset = it.offset();
        } else if (it->isPacketEndElement()) {
            // back to first level: end of packet
            const auto pktLen = it.offset() - curPktOffset;

            std::cout << "Packet #" << curPktNumber << ":    " <<
                         "Offset: " << std::setw(10) << curPktOffset << "    " <<
                         "Size: " << std::setw(10) << pktLen <<
                         std::endl;
            ++curPktNumber;
        }
    }
}
Example 4. Print all the packet offsets and lengths (both in bits): fast version.

This is a faster version of the previous example.

Instead of decoding the whole packet to find its length, we use the expected packet total length property of the “packet info” element. This element is available after the decoder reads the packet context. Then, we make the iterator seek the next packet directly.

Note that this example doesn’t work if the packet context type doesn’t contain an expected packet total length fixed-length unsigned integer, in which case the data stream must contain a single packet. This could be detected by inspecting the metadata (trace type) and using the size of the whole data stream file as the unique packet total length.

#include <cassert>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <yactfr/yactfr.hpp>

int main(const int argc, const char * const argv[])
{
    assert(argc == 3);

    // open metadata stream file
    std::ifstream metadataFile {argv[1], std::ios::binary};

    // create metadata stream object
    const auto metadataStream = yactfr::createMetadataStream(metadataFile);

    // we have the metadata text at this point: safe to close the file
    metadataFile.close();

    // get a trace type from the metadata text
    auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

    // create a memory mapped file view factory to read the data stream file
    yactfr::MemoryMappedFileViewFactory factory {argv[2]};

    // create an element sequence from the trace type and data source factory
    yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

    // print all the packet offsets and lengths (both in bits)
    const auto endIt = seq.end();
    auto it = seq.begin();
    yactfr::Index curPktOffset = 0;
    unsigned long curPktNumber = 0;

    while (it != endIt) {
        if (it->isPacketBeginningElement()) {
            // save packet beginning offset
            curPktOffset = it.offset();
        } else if (it->isPacketInfoElement()) {
            // this element contains the expected total length of the current packet
            auto& elem = it->asPacketInfoElement();

            assert(elem.expectedTotalLength());
            std::cout << "Packet #" << curPktNumber << ":    " <<
                         "Offset: " << std::setw(10) << curPktOffset << "    " <<
                         "Size: " << std::setw(10) << *elem.expectedTotalLength() <<
                         std::endl;
            ++curPktNumber;

            /*
             * Seek the next packet without iterating the intermediate
             * elements. The expected offset is in bytes, so we need to
             * divide what we have by 8.
             */
            it.seekPacket((curPktOffset + *elem.expectedTotalLength()) / 8);
            continue;
        }

        ++it;
    }
}

Contribute and report bugs

Please contribute with GitHub pull requests and report bugs as GitHub issues.

Community

See eepp.ca.

I’m eepp on Libera.Chat and OFTC.

About

yet another CTF reader: a CTF reading library offering a C++14 API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published