Skip to content

Event parsers and serializers should have roundtrip property #42

@osa1

Description

@osa1

By "roundtrip property" I mean for any event if I serialize it with putEvent,
getEvent on the resulting ByteString should give me the original event. #41
shows that we currently don't have this property.

Motivation

I'm trying to improve large eventlog handling of ThreadScope [1]. For this I
need some kind of filesystem-backed data structure to be able to serialize
unused events [2] to a file and read the ones that are needed. As shown in #41
we currently can't do this.

Problem

The reason why we currently don't have this property is GHC expects that event
parsers should be able to parse an event using only a prefix of the
serialization of the event, and ghc-events is implemented accordingly. This
means that GHC can add more fields to an event as long as (1) it updates the
event size in the .eventlog file header (2) it puts the new field at the end of
the event. [3] is an example where this happens. Event parsers can then check
size of an event using the .eventlog file header, and skip any space between
the assumed size of the event and what the header says the event's size is [4].

The problem occurs because the serializer (putEvent and friends) do not take
these extra fields into account and serializes events based on their assumed
size by ghc-events [5]. So if we try to parse the serialized event using the
original parser used to read the .eventlog file the parse fails.

Proposed fix

I propose fixing this by adding a new "extras" field to events:

data Event = Event
  { evTime   :: {-# UNPACK #-}!Timestamp
  , evSpec   :: EventInfo
  , evCap    :: Maybe Int
  , evExtras :: ByteString -- NEW FIELD
  } deriving Show

Parsers can then put the skipped data in this field and serializers can
directly serialize that field after serializing the event.

Because this is a breaking change a major version bump will be needed. A major
user is ThreadScope, which already has an upper bound <0.9.0 so I don't think
this will cause major breakage, and I'll also update ThreadScope so that the
next version will be compatible with new ghc-events.


[1]: haskell/ThreadScope#29
[2]: e.g. with the help of an LRU cache
[3]: https://phabricator.haskell.org/D3658
[4]: See padParser and its call site for the implementation of this.
[5]: In putEventSpec


How does that sound? @maoe I think these days you're maintaining the library,
is there anyone else I should ping about this? The implementation should be
simple but I just wanted to get some opinions before investing more into it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions