diff --git a/doc/advanced_topics.md b/doc/advanced_topics.md index 1ebdcad28..c2235f565 100644 --- a/doc/advanced_topics.md +++ b/doc/advanced_topics.md @@ -110,3 +110,126 @@ To implement your own transient event store, the only requirement is to set the - Run pre-commit manually `$ pre-commit run --all-files` + +## Retrieving the EDM definition from a data file +It is possible to get the EDM definition(s) that was used to generate the +datatypes that are stored in a data file. This makes it possible to re-generate +the necessary code and build all libraries again in case they are not easily +available otherwise. To see which EDM definitions are available in a data file +use the `podio-dump` utility + +```bash +podio-dump +``` +which will give an (exemplary) output like this +``` +input file: + +EDM model definitions stored in this file: edm4hep + +[...] +``` + +To actually dump the model definition to stdout use the `--dump-edm` option +and the name of the datamodel you want to dump: + +```bash +podio-dump --dump-edm edm4hep > dumped_edm4hep.yaml +``` + +Here we directly redirected the output to a yaml file that can then again be +used by the `podio_class_generator.py` to generate the corresponding c++ code +(or be passed to the cmake macros). + +**Note that the dumped EDM definition is equivalent but not necessarily exactly +the same as the original EDM definition.** E.g. all the datatypes will have all +their fields (`Members`, `OneToOneRelations`, `OneToManyRelations`, +`VectorMembers`) defined, and defaulted to empty lists in case they were not +present in the original EDM definition. The reason for this is that the embedded +EDM definition is the pre-processed and validated one [as described +below](#technical-details-on-edm-definition-embedding) + +### Accessing the EDM definition programmatically +The EDM definition can also be accessed programmatically via the +`[ROOT|SIO]FrameReader::getEDMDefinition` method. It takes an EDM name as its +single argument and returns the EDM definition as a JSON string. Most likely +this has to be decoded into an actual JSON structure in order to be usable (e.g. +via `json.loads` in python to get a `dict`). + +### Technical details on EDM definition embedding +The EDM definition is embedded into the core EDM library as a raw string literal +in JSON format. This string is generated into the `DatamodelDefinition.h` file as + +```cpp +namespace ::meta { +static constexpr auto __JSONDefinition = R"EDMDEFINITION()EDMDEFINITION"; +} +``` + +where `` is the name of the EDM as passed to the +`podio_class_generator.py` (or the cmake macro). The `` +is obtained from the pre-processed EDM definition that is read from the yaml +file. During this pre-processing the EDM definition is validated, and optional +fields are filled with empty defaults. Additionally, the `includeSubfolder` +option will be populated with the actual include subfolder, in case it has been +set to `True` in the yaml file. Since the json encoded definition is generated +right before the pre-processed model is passed to the class generator, this +definition is equivalent, but not necessarily equal to the original definition. + +#### The `DatamodelRegistry` +To make access to information about currently loaded and available datamodels a +bit easier the `DatamodelRegistry` (singleton) keeps a map of all loaded +datamodels and provides access to this information possible. In this context we +refer to an *EDM* as the shared library (and the corresponding public headers) +that have been compiled from code that has been generated from a *datamodel +definition* in the original YAML file. In general whenever we refer to a +*datamodel* in this context we mean the enitity as a whole, i.e. its definition +in a YAML file, the concrete implementation as an EDM, as well as other related +information that is related to it. + +Currently the `DatamodelRegistry` provides mainly access to the original +definition of available datamodels via two methods: +```cpp +const std::string_view getDatamodelDefinition(const std::string& edmName) const; + +const std::string_view getDatamodelDefinition(size_t index) const; +``` + +where `index` can be obtained from each collection via +`getDatamodelRegistryIndex`. That in turn simply calls +`::meta::DatamodelRegistryIndex::value()`, another singleton like object +that takes care of registering an EDM definition to the `DatamodelRegistry` +during its static initialization. It is also defined in the +`DatamodelDefinition.h` header. + +Since the datamodel definition is embedded as a raw string literal into the core +EDM shared library, it is in principle also relatively straight forward to +retrieve it from this library by inspecting the binary, e.g. via +```bash +readelf -p .rodata libedm4hep.so | grep options +``` + +which will result in something like + +``` + [ 300] {"options": {"getSyntax": true, "exposePODMembers": false, "includeSubfolder": "edm4hep/"}, "components": {<...>}, "datatypes": {<...>}} +``` + +#### I/O helpers for EDM definition storing +The `podio/utilities/DatamodelRegistryIOHelpers.h` header defines two utility +classes, that help with instrumenting readers and writers with functionality to +read and write all the necessary EDM definitions. + +- The `DatamodelDefinitionCollector` is intended for usage in writers. It + essentially collects the datamodel definitions of all the collections it encounters. + The `registerDatamodelDefinition` method it provides should be called with every collection + that is written. The `getDatamodelDefinitionsToWrite` method returns a vector of all + datamodel names and their definition that were encountered during writing. **It is + then the writers responsibility to actually store this information into the + file**. +- The `DatamodelDefinitionHolder` is intended to be used by readers. It + provides the `getDatamodelDefinition` and `getAvailableDatamodels` methods. + **It is again the readers property to correctly populate it with the data it + has read from file.** Currently the `SIOFrameReader` and the `ROOTFrameReader` + use it and also offer the same functionality as public methods with the help + of it. diff --git a/include/podio/CollectionBase.h b/include/podio/CollectionBase.h index 670291ba3..fcb81401a 100644 --- a/include/podio/CollectionBase.h +++ b/include/podio/CollectionBase.h @@ -76,6 +76,9 @@ class CollectionBase { /// print this collection to the passed stream virtual void print(std::ostream& os = std::cout, bool flush = true) const = 0; + + /// Get the index in the DatatypeRegistry of the EDM this collection belongs to + virtual size_t getDatamodelRegistryIndex() const = 0; }; } // namespace podio diff --git a/include/podio/DatamodelRegistry.h b/include/podio/DatamodelRegistry.h new file mode 100644 index 000000000..a32aa8218 --- /dev/null +++ b/include/podio/DatamodelRegistry.h @@ -0,0 +1,99 @@ +#ifndef PODIO_DATAMODELREGISTRY_H +#define PODIO_DATAMODELREGISTRY_H + +#include +#include +#include +#include + +namespace podio { + +/** + * Global registry holding information about datamodels and datatypes defined + * therein that are currently known by podio (i.e. which have been dynamically + * loaded). + * + * This is a singleton which is (statically) populated during dynamic loading of + * generated EDMs. In this context an **EDM refers to the shared library** that + * is compiled from the generated code from a datamodel definition in YAML + * format. When we refer to a **datamodel** in this context we talk about the + * entity as a whole, i.e. its definition in a YAML file, but also the concrete + * implementation as an EDM, as well as all other information that is related to + * it. In the API of this registry this will be used, unless we want to + * highlight that we are referring to a specific part of a datamodel. + */ +class DatamodelRegistry { +public: + /// Get the registry + static const DatamodelRegistry& instance(); + + // Mutable instance only used for the initial registration! + static DatamodelRegistry& mutInstance(); + + ~DatamodelRegistry() = default; + DatamodelRegistry(const DatamodelRegistry&) = delete; + DatamodelRegistry& operator=(const DatamodelRegistry&) = delete; + DatamodelRegistry(DatamodelRegistry&&) = delete; + DatamodelRegistry& operator=(const DatamodelRegistry&&) = delete; + + /// Dedicated index value for collections that don't have a datamodel + /// definition (e.g. UserDataCollection) + static constexpr size_t NoDefinitionNecessary = -1; + /// Dedicated index value for error checking, used to default init the generated RegistryIndex + static constexpr size_t NoDefinitionAvailable = -2; + + /** + * Get the definition (in JSON format) of the datamodel with the given + * edmName. + * + * If no datamodel with the given name can be found, an empty datamodel + * definition, i.e. an empty JSON object ("{}"), is returned. + * + * @param name The name of the datamodel + */ + const std::string_view getDatamodelDefinition(std::string_view name) const; + + /** + * Get the defintion (in JSON format) of the datamodel wth the given index. + * + * If no datamodel is found under the given index, an empty datamodel + * definition, i.e. an empty JSON object ("{}"), is returned. + * + * @param index The datamodel definition index that can be obtained from each + * collection + */ + const std::string_view getDatamodelDefinition(size_t index) const; + + /** + * Get the name of the datamodel that is stored under the given index. + * + * If no datamodel is found under the given index, an empty string is returned + * + * @param index The datamodel definition index that can be obtained from each + * collection + */ + const std::string& getDatamodelName(size_t index) const; + + /** + * Register a datamodel return the index in the registry. + * + * This is the hook that is called during dynamic loading of an EDM to + * register information for this EDM. If an EDM has already been registered + * under this name, than the index to the existing EDM in the registry will be + * returned. + * + * @param name The name of the EDM that should be registered + * @param definition The datamodel definition from which this EDM has been + * generated in JSON format + * + */ + size_t registerDatamodel(std::string name, std::string_view definition); + +private: + DatamodelRegistry() = default; + /// The stored definitions + std::vector> m_definitions{}; +}; +} // namespace podio + +#endif // PODIO_DATAMODELREGISTRY_H diff --git a/include/podio/ROOTFrameReader.h b/include/podio/ROOTFrameReader.h index 1850a0b02..1a2f48a4d 100644 --- a/include/podio/ROOTFrameReader.h +++ b/include/podio/ROOTFrameReader.h @@ -4,6 +4,7 @@ #include "podio/CollectionBranches.h" #include "podio/ROOTFrameData.h" #include "podio/podioVersion.h" +#include "podio/utilities/DatamodelRegistryIOHelpers.h" #include "TChain.h" @@ -79,6 +80,16 @@ class ROOTFrameReader { /// Get the names of all the availalable Frame categories in the current file(s) std::vector getAvailableCategories() const; + /// Get the datamodel definition for the given name + const std::string_view getDatamodelDefinition(const std::string& name) const { + return m_datamodelHolder.getDatamodelDefinition(name); + } + + /// Get all names of the datamodels that ara available from this reader + std::vector getAvailableDatamodels() const { + return m_datamodelHolder.getAvailableDatamodels(); + } + private: /** * Helper struct to group together all the necessary state to read / process a @@ -132,6 +143,7 @@ class ROOTFrameReader { std::vector m_availCategories{}; ///< All available categories from this file podio::version::Version m_fileVersion{0, 0, 0}; + DatamodelDefinitionHolder m_datamodelHolder{}; }; } // namespace podio diff --git a/include/podio/ROOTFrameWriter.h b/include/podio/ROOTFrameWriter.h index 9428ed929..2546613d8 100644 --- a/include/podio/ROOTFrameWriter.h +++ b/include/podio/ROOTFrameWriter.h @@ -3,6 +3,7 @@ #include "podio/CollectionBranches.h" #include "podio/CollectionIDTable.h" +#include "podio/utilities/DatamodelRegistryIOHelpers.h" #include "TFile.h" @@ -80,6 +81,8 @@ class ROOTFrameWriter { std::unique_ptr m_file{nullptr}; ///< The storage file std::unordered_map m_categories{}; ///< All categories + + DatamodelDefinitionCollector m_datamodelCollector{}; }; } // namespace podio diff --git a/include/podio/SIOBlock.h b/include/podio/SIOBlock.h index 95e3de27f..3e02561b8 100644 --- a/include/podio/SIOBlock.h +++ b/include/podio/SIOBlock.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -16,6 +17,7 @@ #include #include #include +#include namespace podio { @@ -26,6 +28,34 @@ void handlePODDataSIO(devT& device, PODData* data, size_t size) { device.data(dataPtr, count); } +/// Write anything that iterates like an std::map +template +void writeMapLike(sio::write_device& device, const MapLikeT& map) { + device.data((int)map.size()); + for (const auto& [key, value] : map) { + device.data(key); + device.data(value); + } +} + +/// Read anything that iterates like an std::map +template +void readMapLike(sio::read_device& device, MapLikeT& map) { + int size; + device.data(size); + while (size--) { + detail::GetKeyType key; + device.data(key); + detail::GetMappedType value; + device.data(value); + if constexpr (podio::detail::isVector) { + map.emplace_back(std::move(key), std::move(value)); + } else { + map.emplace(std::move(key), std::move(value)); + } + } +} + /// Base class for sio::block handlers used with PODIO class SIOBlock : public sio::block { @@ -141,6 +171,32 @@ class SIOEventMetaDataBlock : public sio::block { podio::GenericParameters* metadata{nullptr}; }; +/** + * A block to serialize anything that behaves similar in iterating as a + * map, e.g. vector>, which is what is used + * internally to represent the data to be written. + */ +template +struct SIOMapBlock : public sio::block { + SIOMapBlock() : sio::block("SIOMapBlock", sio::version::encode_version(0, 1)) { + } + SIOMapBlock(std::vector>&& data) : + sio::block("SIOMapBlock", sio::version::encode_version(0, 1)), mapData(std::move(data)) { + } + + SIOMapBlock(const SIOMapBlock&) = delete; + SIOMapBlock& operator=(const SIOMapBlock&) = delete; + + void read(sio::read_device& device, sio::version_type) override { + readMapLike(device, mapData); + } + void write(sio::write_device& device) override { + writeMapLike(device, mapData); + } + + std::vector> mapData{}; +}; + /** * A block for handling the run and collection meta data */ @@ -219,6 +275,9 @@ namespace sio_helpers { /// The name of the TOCRecord static constexpr const char* SIOTocRecordName = "podio_SIO_TOC_Record"; + /// The name of the record containing the EDM definitions in json format + static constexpr const char* SIOEDMDefinitionName = "podio_SIO_EDMDefinitions"; + // should hopefully be enough for all practical purposes using position_type = uint32_t; } // namespace sio_helpers diff --git a/include/podio/SIOFrameReader.h b/include/podio/SIOFrameReader.h index d7a2c5e8c..5fefdab75 100644 --- a/include/podio/SIOFrameReader.h +++ b/include/podio/SIOFrameReader.h @@ -4,6 +4,7 @@ #include "podio/SIOBlock.h" #include "podio/SIOFrameData.h" #include "podio/podioVersion.h" +#include "podio/utilities/DatamodelRegistryIOHelpers.h" #include @@ -53,12 +54,24 @@ class SIOFrameReader { /// Get the names of all the availalable Frame categories in the current file(s) std::vector getAvailableCategories() const; + /// Get the datamodel definition for the given name + const std::string_view getDatamodelDefinition(const std::string& name) const { + return m_datamodelHolder.getDatamodelDefinition(name); + } + + /// Get all names of the datamodels that ara available from this reader + std::vector getAvailableDatamodels() const { + return m_datamodelHolder.getAvailableDatamodels(); + } + private: void readPodioHeader(); /// read the TOC record bool readFileTOCRecord(); + void readEDMDefinitions(); + sio::ifstream m_stream{}; ///< The stream from which we read /// Count how many times each an entry of this name has been read already @@ -68,6 +81,8 @@ class SIOFrameReader { SIOFileTOCRecord m_tocRecord{}; /// The podio version that has been used to write the file podio::version::Version m_fileVersion{0}; + + DatamodelDefinitionHolder m_datamodelHolder{}; }; } // namespace podio diff --git a/include/podio/SIOFrameWriter.h b/include/podio/SIOFrameWriter.h index 1ccc7a2e8..a8a7d084f 100644 --- a/include/podio/SIOFrameWriter.h +++ b/include/podio/SIOFrameWriter.h @@ -2,6 +2,7 @@ #define PODIO_SIOFRAMEWRITER_H #include "podio/SIOBlock.h" +#include "podio/utilities/DatamodelRegistryIOHelpers.h" #include @@ -35,6 +36,7 @@ class SIOFrameWriter { private: sio::ofstream m_stream{}; ///< The output file stream SIOFileTOCRecord m_tocRecord{}; ///< The "table of contents" of the written file + DatamodelDefinitionCollector m_datamodelCollector{}; }; } // namespace podio diff --git a/include/podio/UserDataCollection.h b/include/podio/UserDataCollection.h index 2365c5094..7d28e2c99 100644 --- a/include/podio/UserDataCollection.h +++ b/include/podio/UserDataCollection.h @@ -3,6 +3,7 @@ #include "podio/CollectionBase.h" #include "podio/CollectionBuffers.h" +#include "podio/DatamodelRegistry.h" #include "podio/utilities/TypeHelpers.h" #include @@ -172,6 +173,10 @@ class UserDataCollection : public CollectionBase { } } + size_t getDatamodelRegistryIndex() const override { + return DatamodelRegistry::NoDefinitionNecessary; + } + // ----- some wrapers for std::vector and access to the complete std::vector (if really needed) typename std::vector::iterator begin() { diff --git a/include/podio/utilities/DatamodelRegistryIOHelpers.h b/include/podio/utilities/DatamodelRegistryIOHelpers.h new file mode 100644 index 000000000..4ca996ae6 --- /dev/null +++ b/include/podio/utilities/DatamodelRegistryIOHelpers.h @@ -0,0 +1,76 @@ +#ifndef PODIO_UTILITIES_DATAMODELREGISTRYIOHELPERS_H +#define PODIO_UTILITIES_DATAMODELREGISTRYIOHELPERS_H + +#include "podio/CollectionBase.h" +#include "podio/DatamodelRegistry.h" + +#include +#include +#include +#include + +namespace podio { + +/** + * Helper class to collect the datamodel (JSON) definitions that should be + * written. + */ +class DatamodelDefinitionCollector { +public: + /** + * Register the datamodel definition of the EDM this collection is from to be + * written. + * + * @param coll A collection of an EDM + * @param name The name under which this collection is stored on file + */ + void registerDatamodelDefinition(const podio::CollectionBase* coll, const std::string& name); + + /// Get all the names and JSON definitions that need to be written + std::vector> getDatamodelDefinitionsToWrite() const; + +private: + std::set m_edmDefRegistryIdcs{}; ///< The indices in the EDM definition registry that need to be written +}; + +/** + * Helper class to hold and provide the datamodel (JSON) definitions for reader + * classes. + */ +class DatamodelDefinitionHolder { +public: + /// The "map" type that is used internally + using MapType = std::vector>; + /// Constructor from an existing collection of names and datamodel definitions + DatamodelDefinitionHolder(MapType&& definitions) : m_availEDMDefs(std::move(definitions)) { + } + + DatamodelDefinitionHolder() = default; + ~DatamodelDefinitionHolder() = default; + DatamodelDefinitionHolder(const DatamodelDefinitionHolder&) = delete; + DatamodelDefinitionHolder& operator=(const DatamodelDefinitionHolder&) = delete; + DatamodelDefinitionHolder(DatamodelDefinitionHolder&&) = default; + DatamodelDefinitionHolder& operator=(DatamodelDefinitionHolder&&) = default; + + /** + * Get the datamodel definition for the given datamodel name. + * + * Returns an empty model definition if no model is stored under the given + * name. + * + * @param name The name of the datamodel + */ + const std::string_view getDatamodelDefinition(const std::string& name) const; + + /** + * Get all names of the datamodels that have been read from file + */ + std::vector getAvailableDatamodels() const; + +protected: + MapType m_availEDMDefs{}; +}; + +} // namespace podio + +#endif // PODIO_UTILITIES_DATAMODELREGISTRYIOHELPERS_H diff --git a/include/podio/utilities/TypeHelpers.h b/include/podio/utilities/TypeHelpers.h index b351e2118..74d1a4d28 100644 --- a/include/podio/utilities/TypeHelpers.h +++ b/include/podio/utilities/TypeHelpers.h @@ -1,9 +1,11 @@ #ifndef PODIO_UTILITIES_TYPEHELPERS_H #define PODIO_UTILITIES_TYPEHELPERS_H +#include #include #include #include +#include #include namespace podio { @@ -100,6 +102,62 @@ namespace detail { template static constexpr bool isVector = IsVectorHelper::value; + /** + * Helper struct to detect whether a type is a std::map or std::unordered_map + */ + template + struct IsMapHelper : std::false_type {}; + + template + struct IsMapHelper> : std::true_type {}; + + template + struct IsMapHelper> : std::true_type {}; + + /** + * Alias template for deciding whether the passed type T is a map or + * unordered_map + */ + template + static constexpr bool isMap = IsMapHelper::value; + + /** + * Helper struct to homogenize the (type) access for things that behave like + * maps, e.g. vectors of pairs (and obviously maps). + * + * NOTE: This is not SFINAE friendly. + */ + template >, + typename IsVector = std::bool_constant && (std::tuple_size() == 2)>> + struct MapLikeTypeHelper {}; + + /** + * Specialization for actual maps + */ + template + struct MapLikeTypeHelper, std::bool_constant> { + using key_type = typename T::key_type; + using mapped_type = typename T::mapped_type; + }; + + /** + * Specialization for vector of pairs / tuples (of size 2) + */ + template + struct MapLikeTypeHelper, std::bool_constant> { + using key_type = typename std::tuple_element<0, typename T::value_type>::type; + using mapped_type = typename std::tuple_element<1, typename T::value_type>::type; + }; + + /** + * Type aliases for easier usage in actual code + */ + template + using GetKeyType = typename MapLikeTypeHelper::key_type; + + template + using GetMappedType = typename MapLikeTypeHelper::mapped_type; + } // namespace detail // forward declaration to be able to use it below diff --git a/python/podio/base_reader.py b/python/podio/base_reader.py index b45cfa3f1..88d3acc3e 100644 --- a/python/podio/base_reader.py +++ b/python/podio/base_reader.py @@ -55,3 +55,28 @@ def is_legacy(self): bool: True if this is a legacy file reader """ return self._is_legacy + + @property + def datamodel_definitions(self): + """Get the available datamodel definitions from this reader. + + Returns: + tuple(str): The names of the available datamodel definitions + """ + if self._is_legacy: + return () + return tuple(n.c_str() for n in self._reader.getAvailableDatamodels()) + + def get_datamodel_definition(self, edm_name): + """Get the datamodel definition as JSON string. + + Args: + str: The name of the datamodel + + Returns: + str: The complete model definition in JSON format. Use, e.g. json.loads + to convert it into a python dictionary. + """ + if self._is_legacy: + return "" + return self._reader.getDatamodelDefinition(edm_name).data() diff --git a/python/podio/generator_utils.py b/python/podio/generator_utils.py index 9711dcdf1..e50b139e7 100644 --- a/python/podio/generator_utils.py +++ b/python/podio/generator_utils.py @@ -4,6 +4,7 @@ """ import re +import json def _get_namespace_class(full_type): @@ -183,12 +184,19 @@ def setter_name(self, get_syntax, is_relation=False): return self.name return _prefix_name(self.name, 'set') + def _to_json(self): + """Return a string representation that can be parsed again.""" + # The __str__ method is geared towards c++ too much, so we have to build + # things again here from available information + def_val = f'{{{self.default_val}}}' if self.default_val else '' + description = f' // {self.description}' if self.description else '' + return f'{self.full_type} {self.name}{def_val}{description}' + class DataModel: # pylint: disable=too-few-public-methods """A class for holding a complete datamodel read from a configuration file""" + def __init__(self, datatypes=None, components=None, options=None): - self.datatypes = datatypes or {} - self.components = components or {} self.options = options or { # should getters / setters be prefixed with get / set? "getSyntax": False, @@ -197,3 +205,22 @@ def __init__(self, datatypes=None, components=None, options=None): # use subfolder when including package header files "includeSubfolder": False, } + self.components = components or {} + self.datatypes = datatypes or {} + + def _to_json(self): + """Return the dictionary, so that we can easily hook this into the pythons + JSON ecosystem""" + return self.__dict__ + + +class DataModelJSONEncoder(json.JSONEncoder): + """A JSON encoder for DataModels, resp. anything hat has a _to_json method.""" + + def default(self, o): + """The override for the default, first trying to call _to_json, otherwise + handing off to the default JSONEncoder""" + try: + return o._to_json() # pylint: disable=protected-access + except AttributeError: + return super().default(o) diff --git a/python/podio/podio_config_reader.py b/python/podio/podio_config_reader.py index a72bcce06..3992a3aa9 100644 --- a/python/podio/podio_config_reader.py +++ b/python/podio/podio_config_reader.py @@ -407,24 +407,21 @@ def _read_datatype(cls, value): return datatype @classmethod - def read(cls, yamlfile, package_name, upstream_edm=None): - """Read the datamodel definition from the yamlfile.""" - with open(yamlfile, "r", encoding='utf-8') as stream: - content = yaml.load(stream, yaml.SafeLoader) - + def parse_model(cls, model_dict, package_name, upstream_edm=None): + """Parse a model from the dictionary, e.g. read from a yaml file.""" components = {} - if "components" in content: - for klassname, value in content["components"].items(): + if "components" in model_dict: + for klassname, value in model_dict["components"].items(): components[klassname] = cls._read_component(value) datatypes = {} - if "datatypes" in content: - for klassname, value in content["datatypes"].items(): + if "datatypes" in model_dict: + for klassname, value in model_dict["datatypes"].items(): datatypes[klassname] = cls._read_datatype(value) options = copy.deepcopy(cls.options) - if "options" in content: - for option, value in content["options"].items(): + if "options" in model_dict: + for option, value in model_dict["options"].items(): options[option] = value # Normalize the includeSubfoler internally already here @@ -438,3 +435,11 @@ def read(cls, yamlfile, package_name, upstream_edm=None): datamodel = DataModel(datatypes, components, options) validator.validate(datamodel, upstream_edm) return datamodel + + @classmethod + def read(cls, yamlfile, package_name, upstream_edm=None): + """Read the datamodel definition from the yamlfile.""" + with open(yamlfile, "r", encoding='utf-8') as stream: + content = yaml.load(stream, yaml.SafeLoader) + + return cls.parse_model(content, package_name, upstream_edm) diff --git a/python/podio/test_DataModelJSONEncoder.py b/python/podio/test_DataModelJSONEncoder.py new file mode 100644 index 000000000..b63ff22f0 --- /dev/null +++ b/python/podio/test_DataModelJSONEncoder.py @@ -0,0 +1,71 @@ +#!/usr/bin/env python3 +"""Unit tests for the JSON encoding of data models""" + +import unittest + +from podio.generator_utils import DataModelJSONEncoder +from podio.podio_config_reader import MemberParser + + +def get_member_var_json(string): + """Get a MemberVariable encoded as JSON from the passed string. + + Passes through the whole chain of parsing and JSON encoding, as it is done + during data model encoding. + + Args: + string (str): The member variable definition as a string. NOTE: here it is + assumed that this is a valid string that can be parsed. + + Returns: + str: The json encoded member variable + """ + parser = MemberParser() + member_var = parser.parse(string, False) # be lenient with missing descriptions + return DataModelJSONEncoder().encode(member_var).strip('"') # strip quotes from JSON + + +class DataModelJSONEncoderTest(unittest.TestCase): + """Unit tests for the DataModelJSONEncoder and the utility functionality in MemberVariable""" + + def test_encode_only_types(self): + """Test that encoding works for type declarations only""" + for mdef in (r"float someFloat", + r"ArbitraryType name", + r"std::int16_t fixedWidth", + r"namespace::Type type"): + self.assertEqual(get_member_var_json(mdef), mdef) + + # Fixed with without std are encoded with std namespace + fixed_w = r"int32_t fixedWidth" + self.assertEqual(get_member_var_json(fixed_w), f"std::{fixed_w}") + + def test_encode_array_types(self): + """Test that encoding array member variable declarations work""" + for mdef in (r"std::array anArray", + r"std::array fwArr", + r"std::array typeArr", + r"std::array namespacedTypeArr"): + self.assertEqual(get_member_var_json(mdef), mdef) + + def test_encode_default_vals(self): + """Test that encoding definitions with default values works""" + for mdef in (r"int i{42}", + r"std::uint32_t uint{64}", + r"ArbType a{123}", + r"namespace::Type t{whatever}", + r"std::array fs{3.14f, 6.28f}", + r"std::array typeArr{1, 2, 3}"): + self.assertEqual(get_member_var_json(mdef), mdef) + + def test_encode_with_description(self): + """Test that encoding definitions that contain a description works""" + for mdef in (r"int i // an unitialized int", + r"std::uint32_t ui{42} // an initialized unsigned int", + r"std::array fs // a float array", + r"std::array tA{1, 2, 3} // an initialized array of namespaced types", + r"AType type // a very special type", + r"nsp::Type nspT // a namespaced type", + r"nsp::Type nspT{with init} // an initialized namespaced type", + r"ArbitratyType arbT{42} // an initialized type"): + self.assertEqual(get_member_var_json(mdef), mdef) diff --git a/python/podio_class_generator.py b/python/podio_class_generator.py index 7c6fc5b40..ec015230c 100755 --- a/python/podio_class_generator.py +++ b/python/podio_class_generator.py @@ -17,7 +17,7 @@ import jinja2 from podio.podio_config_reader import PodioConfigReader -from podio.generator_utils import DataType, DefinitionError +from podio.generator_utils import DataType, DefinitionError, DataModelJSONEncoder THIS_DIR = os.path.dirname(os.path.abspath(__file__)) TEMPLATE_DIR = os.path.join(THIS_DIR, 'templates') @@ -113,6 +113,8 @@ def process(self): for name, datatype in self.datamodel.datatypes.items(): self._process_datatype(name, datatype) + self._write_edm_def_file() + if 'ROOT' in self.io_handlers: self._create_selection_xml() self.print_report() @@ -203,6 +205,9 @@ def _fill_templates(self, template_base, data): def _process_component(self, name, component): """Process one component""" + # Make a copy here and add the preprocessing steps to that such that the + # original definition can be left untouched + component = deepcopy(component) includes = set() includes.update(*(m.includes for m in component['Members'])) @@ -368,6 +373,18 @@ def _preprocess_datatype(self, name, definition): return data + def _write_edm_def_file(self): + """Write the edm definition to a compile time string""" + model_encoder = DataModelJSONEncoder() + data = { + 'package_name': self.package_name, + 'edm_definition': model_encoder.encode(self.datamodel), + 'incfolder': self.incfolder, + } + + self._write_file('DatamodelDefinition.h', + self._eval_template('DatamodelDefinition.h.jinja2', data)) + def _get_member_includes(self, members): """Process all members and gather the necessary includes""" includes = set() diff --git a/python/templates/CMakeLists.txt b/python/templates/CMakeLists.txt index c3c382ad5..be5f4b307 100644 --- a/python/templates/CMakeLists.txt +++ b/python/templates/CMakeLists.txt @@ -14,6 +14,7 @@ set(PODIO_TEMPLATES ${CMAKE_CURRENT_LIST_DIR}/selection.xml.jinja2 ${CMAKE_CURRENT_LIST_DIR}/SIOBlock.cc.jinja2 ${CMAKE_CURRENT_LIST_DIR}/SIOBlock.h.jinja2 + ${CMAKE_CURRENT_LIST_DIR}/DatamodelDefinition.h.jinja2 ${CMAKE_CURRENT_LIST_DIR}/macros/collections.jinja2 ${CMAKE_CURRENT_LIST_DIR}/macros/declarations.jinja2 ${CMAKE_CURRENT_LIST_DIR}/macros/implementations.jinja2 diff --git a/python/templates/Collection.cc.jinja2 b/python/templates/Collection.cc.jinja2 index c265c29a7..8c121de20 100644 --- a/python/templates/Collection.cc.jinja2 +++ b/python/templates/Collection.cc.jinja2 @@ -4,6 +4,7 @@ // AUTOMATICALLY GENERATED FILE - DO NOT EDIT #include "{{ incfolder }}{{ class.bare_type }}Collection.h" +#include "{{ incfolder }}DatamodelDefinition.h" {% for include in includes_coll_cc %} {{ include }} @@ -178,6 +179,10 @@ podio::CollectionReadBuffers {{ collection_type }}::createBuffers() /*const*/ { {{ macros.vectorized_access(class, member) }} {% endfor %} +size_t {{ collection_type }}::getDatamodelRegistryIndex() const { + return {{ package_name }}::meta::DatamodelRegistryIndex::value(); +} + #ifdef PODIO_JSON_OUTPUT void to_json(nlohmann::json& j, const {{ collection_type }}& collection) { j = nlohmann::json::array(); diff --git a/python/templates/Collection.h.jinja2 b/python/templates/Collection.h.jinja2 index 049ecff79..2c1a80e3b 100644 --- a/python/templates/Collection.h.jinja2 +++ b/python/templates/Collection.h.jinja2 @@ -130,6 +130,8 @@ public: return m_isValid; } + size_t getDatamodelRegistryIndex() const final; + // support for the iterator protocol iterator begin() { return iterator(0, &m_storage.entries); diff --git a/python/templates/DatamodelDefinition.h.jinja2 b/python/templates/DatamodelDefinition.h.jinja2 new file mode 100644 index 000000000..17a300cb9 --- /dev/null +++ b/python/templates/DatamodelDefinition.h.jinja2 @@ -0,0 +1,30 @@ +// AUTOMATICALLY GENERATED FILE - DO NOT EDIT + +#include "podio/DatamodelRegistry.h" + +namespace {{ package_name }}::meta { +/** + * The complete definition of the datamodel at generation time in JSON format. + */ +static constexpr auto {{ package_name }}__JSONDefinition = R"DATAMODELDEF({{ edm_definition }})DATAMODELDEF"; + +/** + * The helper class that takes care of registering the datamodel definition to + * the DatamodelRegistry and to provide the index in that registry. + * + * Implemented as a singleton mainly to ensure only a single registration of + * each datamodel, during the constructor + */ +class DatamodelRegistryIndex { +public: + static size_t value() { + static auto index = DatamodelRegistryIndex(podio::DatamodelRegistry::mutInstance().registerDatamodel("{{ package_name }}", {{ package_name }}__JSONDefinition)); + return index.m_value; + } + +private: + DatamodelRegistryIndex(size_t v) : m_value(v) {} + size_t m_value{podio::DatamodelRegistry::NoDefinitionAvailable}; +}; + +} // namespace {{ package_name }}::meta diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index d0017cba7..589c073a6 100755 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -48,7 +48,10 @@ SET(core_sources CollectionIDTable.cc GenericParameters.cc ASCIIWriter.cc - EventStore.cc) + EventStore.cc + DatamodelRegistry.cc + DatamodelRegistryIOHelpers.cc + ) SET(core_headers ${CMAKE_SOURCE_DIR}/include/podio/CollectionBase.h @@ -59,6 +62,8 @@ SET(core_headers ${CMAKE_SOURCE_DIR}/include/podio/ObjectID.h ${CMAKE_SOURCE_DIR}/include/podio/UserDataCollection.h ${CMAKE_SOURCE_DIR}/include/podio/podioVersion.h + ${CMAKE_SOURCE_DIR}/include/podio/DatamodelRegistry.h + ${CMAKE_SOURCE_DIR}/include/podio/utilities/DatamodelRegistryIOHelpers.h ) PODIO_ADD_LIB_AND_DICT(podio "${core_headers}" "${core_sources}" selection.xml) diff --git a/src/DatamodelRegistry.cc b/src/DatamodelRegistry.cc new file mode 100644 index 000000000..d5a96e364 --- /dev/null +++ b/src/DatamodelRegistry.cc @@ -0,0 +1,63 @@ +#include "podio/DatamodelRegistry.h" + +#include +#include +#include +#include + +namespace podio { +const DatamodelRegistry& DatamodelRegistry::instance() { + return mutInstance(); +} + +DatamodelRegistry& DatamodelRegistry::mutInstance() { + static DatamodelRegistry registryInstance; + return registryInstance; +} + +size_t DatamodelRegistry::registerDatamodel(std::string name, std::string_view definition) { + const auto it = std::find_if(m_definitions.cbegin(), m_definitions.cend(), + [&name](const auto& kvPair) { return kvPair.first == name; }); + + if (it == m_definitions.cend()) { + int index = m_definitions.size(); + m_definitions.emplace_back(name, definition); + return index; + } + + // TODO: Output? + return std::distance(m_definitions.cbegin(), it); +} + +const std::string_view DatamodelRegistry::getDatamodelDefinition(std::string_view name) const { + const auto it = std::find_if(m_definitions.cbegin(), m_definitions.cend(), + [&name](const auto& kvPair) { return kvPair.first == name; }); + if (it == m_definitions.cend()) { + std::cerr << "PODIO WARNING: Cannot find the definition for the EDM with the name " << name << std::endl; + static constexpr std::string_view emptyDef = "{}"; // valid empty JSON + return emptyDef; + } + + return it->second; +} + +const std::string_view DatamodelRegistry::getDatamodelDefinition(size_t index) const { + if (index >= m_definitions.size()) { + std::cerr << "PODIO WARNING: Cannot find the definition for the EDM with the index " << index << std::endl; + static constexpr std::string_view emptyDef = "{}"; // valid empty JSON + return emptyDef; + } + + return m_definitions[index].second; +} + +const std::string& DatamodelRegistry::getDatamodelName(size_t index) const { + if (index >= m_definitions.size()) { + std::cout << "PODIO WARNING: Cannot find the name of the EDM with the index " << index << std::endl; + static const std::string emptyName = ""; + return emptyName; + } + return m_definitions[index].first; +} + +} // namespace podio diff --git a/src/DatamodelRegistryIOHelpers.cc b/src/DatamodelRegistryIOHelpers.cc new file mode 100644 index 000000000..901dbb113 --- /dev/null +++ b/src/DatamodelRegistryIOHelpers.cc @@ -0,0 +1,49 @@ +#include "podio/utilities/DatamodelRegistryIOHelpers.h" +#include + +namespace podio { + +void DatamodelDefinitionCollector::registerDatamodelDefinition(const podio::CollectionBase* coll, + const std::string& name) { + const auto edmIndex = coll->getDatamodelRegistryIndex(); + if (edmIndex == DatamodelRegistry::NoDefinitionAvailable) { + std::cerr << "No EDM definition available for collection " << name << std::endl; + } else { + if (edmIndex != DatamodelRegistry::NoDefinitionNecessary) { + m_edmDefRegistryIdcs.insert(edmIndex); + } + } +} + +std::vector> DatamodelDefinitionCollector::getDatamodelDefinitionsToWrite() const { + std::vector> edmDefinitions; + edmDefinitions.reserve(m_edmDefRegistryIdcs.size()); + for (const auto& index : m_edmDefRegistryIdcs) { + const auto& edmRegistry = podio::DatamodelRegistry::instance(); + edmDefinitions.emplace_back(edmRegistry.getDatamodelName(index), edmRegistry.getDatamodelDefinition(index)); + } + + return edmDefinitions; +} + +const std::string_view DatamodelDefinitionHolder::getDatamodelDefinition(const std::string& name) const { + const auto it = std::find_if(m_availEDMDefs.cbegin(), m_availEDMDefs.cend(), + [&name](const auto& entry) { return std::get<0>(entry) == name; }); + + if (it != m_availEDMDefs.cend()) { + return std::get<1>(*it); + } + + return "{}"; +} + +std::vector DatamodelDefinitionHolder::getAvailableDatamodels() const { + std::vector defs{}; + defs.reserve(m_availEDMDefs.size()); + std::transform(m_availEDMDefs.cbegin(), m_availEDMDefs.cend(), std::back_inserter(defs), + [](const auto& elem) { return std::get<0>(elem); }); + + return defs; +} + +} // namespace podio diff --git a/src/ROOTFrameReader.cc b/src/ROOTFrameReader.cc index e3d6c6aba..f8880133c 100644 --- a/src/ROOTFrameReader.cc +++ b/src/ROOTFrameReader.cc @@ -179,6 +179,7 @@ std::vector getAvailableCategories(TChain* metaChain) { auto* branches = metaChain->GetListOfBranches(); std::vector brNames; brNames.reserve(branches->GetEntries()); + for (int i = 0; i < branches->GetEntries(); ++i) { const std::string name = branches->At(i)->GetName(); const auto fUnder = name.find("___"); @@ -189,7 +190,6 @@ std::vector getAvailableCategories(TChain* metaChain) { std::sort(brNames.begin(), brNames.end()); brNames.erase(std::unique(brNames.begin(), brNames.end()), brNames.end()); - return brNames; } @@ -217,6 +217,14 @@ void ROOTFrameReader::openFiles(const std::vector& filenames) { m_fileVersion = versionPtr ? *versionPtr : podio::version::Version{0, 0, 0}; delete versionPtr; + if (auto* edmDefBranch = root_utils::getBranch(m_metaChain.get(), root_utils::edmDefBranchName)) { + auto* datamodelDefs = new DatamodelDefinitionHolder::MapType{}; + edmDefBranch->SetAddress(&datamodelDefs); + edmDefBranch->GetEntry(0); + m_datamodelHolder = DatamodelDefinitionHolder(std::move(*datamodelDefs)); + delete datamodelDefs; + } + // Do some work up front for setting up categories and setup all the chains // and record the available categories. The rest of the setup follows on // demand when the category is first read diff --git a/src/ROOTFrameWriter.cc b/src/ROOTFrameWriter.cc index d98d6763a..3f552d69f 100644 --- a/src/ROOTFrameWriter.cc +++ b/src/ROOTFrameWriter.cc @@ -35,6 +35,8 @@ void ROOTFrameWriter::writeFrame(const podio::Frame& frame, const std::string& c for (const auto& name : catInfo.collsToWrite) { auto* coll = frame.getCollectionForWrite(name); collections.emplace_back(name, const_cast(coll)); + + m_datamodelCollector.registerDatamodelDefinition(coll, name); } // We will at least have a parameters branch, even if there are no @@ -129,6 +131,9 @@ void ROOTFrameWriter::finish() { auto podioVersion = podio::version::build_version; metaTree->Branch(root_utils::versionBranchName, &podioVersion); + auto edmDefinitions = m_datamodelCollector.getDatamodelDefinitionsToWrite(); + metaTree->Branch(root_utils::edmDefBranchName, &edmDefinitions); + metaTree->Fill(); m_file->Write(); diff --git a/src/SIOBlock.cc b/src/SIOBlock.cc index 6c7a95ba6..c0a514e6a 100644 --- a/src/SIOBlock.cc +++ b/src/SIOBlock.cc @@ -49,41 +49,19 @@ void SIOCollectionIDTableBlock::write(sio::write_device& device) { device.data(_isSubsetColl); } -template -void writeParamMap(sio::write_device& device, const GenericParameters::MapType& map) { - device.data((int)map.size()); - for (const auto& [key, value] : map) { - device.data(key); - device.data(value); - } -} - -template -void readParamMap(sio::read_device& device, GenericParameters::MapType& map) { - int size; - device.data(size); - while (size--) { - std::string key; - device.data(key); - std::vector values; - device.data(values); - map.emplace(std::move(key), std::move(values)); - } -} - void writeGenericParameters(sio::write_device& device, const GenericParameters& params) { - writeParamMap(device, params.getIntMap()); - writeParamMap(device, params.getFloatMap()); - writeParamMap(device, params.getStringMap()); - writeParamMap(device, params.getDoubleMap()); + writeMapLike(device, params.getIntMap()); + writeMapLike(device, params.getFloatMap()); + writeMapLike(device, params.getStringMap()); + writeMapLike(device, params.getDoubleMap()); } void readGenericParameters(sio::read_device& device, GenericParameters& params, sio::version_type version) { - readParamMap(device, params.getIntMap()); - readParamMap(device, params.getFloatMap()); - readParamMap(device, params.getStringMap()); + readMapLike(device, params.getIntMap()); + readMapLike(device, params.getFloatMap()); + readMapLike(device, params.getStringMap()); if (version >= sio::version::encode_version(0, 2)) { - readParamMap(device, params.getDoubleMap()); + readMapLike(device, params.getDoubleMap()); } } @@ -148,7 +126,7 @@ SIOBlockLibraryLoader::SIOBlockLibraryLoader() { const auto status = loadLib(lib); switch (status) { case LoadStatus::Success: - std::cout << "Loaded SIOBlocks library \'" << lib << "\' (from " << dir << ")" << std::endl; + std::cerr << "Loaded SIOBlocks library \'" << lib << "\' (from " << dir << ")" << std::endl; break; case LoadStatus::AlreadyLoaded: std::cerr << "SIOBlocks library \'" << lib << "\' already loaded. Not loading again from " << dir << std::endl; diff --git a/src/SIOFrameReader.cc b/src/SIOFrameReader.cc index 47f5ec082..0997ae8dc 100644 --- a/src/SIOFrameReader.cc +++ b/src/SIOFrameReader.cc @@ -6,6 +6,7 @@ #include #include +#include #include namespace podio { @@ -23,6 +24,7 @@ void SIOFrameReader::openFile(const std::string& filename) { // NOTE: reading TOC record first because that jumps back to the start of the file! readFileTOCRecord(); readPodioHeader(); + readEDMDefinitions(); // Potentially could do this lazily } std::unique_ptr SIOFrameReader::readNextEntry(const std::string& name) { @@ -54,7 +56,13 @@ std::unique_ptr SIOFrameReader::readEntry(const std::string& name, } std::vector SIOFrameReader::getAvailableCategories() const { - return m_tocRecord.getRecordNames(); + // Filter the availalbe records from the TOC to remove records that are + // stored, but use reserved record names for podio meta data + auto recordNames = m_tocRecord.getRecordNames(); + recordNames.erase(std::remove_if(recordNames.begin(), recordNames.end(), + [](const auto& elem) { return elem == sio_helpers::SIOEDMDefinitionName; }), + recordNames.end()); + return recordNames; } unsigned SIOFrameReader::getEntries(const std::string& name) const { @@ -101,4 +109,22 @@ void SIOFrameReader::readPodioHeader() { m_fileVersion = static_cast(blocks[0].get())->version; } +void SIOFrameReader::readEDMDefinitions() { + const auto recordPos = m_tocRecord.getPosition(sio_helpers::SIOEDMDefinitionName); + if (recordPos == 0) { + // No EDM definitions found + return; + } + m_stream.seekg(recordPos); + + const auto& [buffer, _] = sio_utils::readRecord(m_stream); + + sio::block_list blocks; + blocks.emplace_back(std::make_shared>()); + sio::api::read_blocks(buffer.span(), blocks); + + auto datamodelDefs = static_cast*>(blocks[0].get()); + m_datamodelHolder = DatamodelDefinitionHolder(std::move(datamodelDefs->mapData)); +} + } // namespace podio diff --git a/src/SIOFrameWriter.cc b/src/SIOFrameWriter.cc index f33bdbccc..360c948d2 100644 --- a/src/SIOFrameWriter.cc +++ b/src/SIOFrameWriter.cc @@ -8,6 +8,7 @@ #include "sioUtils.h" #include +#include namespace podio { @@ -35,6 +36,7 @@ void SIOFrameWriter::writeFrame(const podio::Frame& frame, const std::string& ca collections.reserve(collsToWrite.size()); for (const auto& name : collsToWrite) { collections.emplace_back(name, frame.getCollectionForWrite(name)); + m_datamodelCollector.registerDatamodelDefinition(collections.back().second, name); } // Write necessary metadata and the actual data into two different records. @@ -49,7 +51,14 @@ void SIOFrameWriter::writeFrame(const podio::Frame& frame, const std::string& ca } void SIOFrameWriter::finish() { + auto edmDefMap = std::make_shared>( + m_datamodelCollector.getDatamodelDefinitionsToWrite()); + sio::block_list blocks; + blocks.push_back(edmDefMap); + m_tocRecord.addRecord(sio_helpers::SIOEDMDefinitionName, sio_utils::writeRecord(blocks, "EDMDefinitions", m_stream)); + + blocks.clear(); blocks.emplace_back(std::make_shared(&m_tocRecord)); auto tocStartPos = sio_utils::writeRecord(blocks, sio_helpers::SIOTocRecordName, m_stream); diff --git a/src/rootUtils.h b/src/rootUtils.h index 5bce3d702..215c7fea6 100644 --- a/src/rootUtils.h +++ b/src/rootUtils.h @@ -7,6 +7,7 @@ #include "podio/CollectionIDTable.h" #include "TBranch.h" +#include "TChain.h" #include "TClass.h" #include "TTree.h" @@ -35,6 +36,12 @@ constexpr static auto paramBranchName = "PARAMETERS"; */ constexpr static auto versionBranchName = "PodioBuildVersion"; +/** + * The name of the branch in which all the EDM names and their definitions are + * stored in the meta data tree. + */ +constexpr static auto edmDefBranchName = "EDMDefinitions"; + /** * Name of the branch for storing the idTable for a given category in the meta * data tree diff --git a/src/selection.xml b/src/selection.xml index 3c0be36e3..d198bfab6 100644 --- a/src/selection.xml +++ b/src/selection.xml @@ -15,6 +15,7 @@ + diff --git a/src/sioUtils.h b/src/sioUtils.h index 6e340d8fa..204867eaf 100644 --- a/src/sioUtils.h +++ b/src/sioUtils.h @@ -9,6 +9,7 @@ #include #include +#include #include namespace podio { diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index ad24cbf31..2b056bc74 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -73,7 +73,7 @@ endforeach() if (NOT DEFINED CACHE{PODIO_TEST_INPUT_DATA_DIR} OR NOT EXISTS ${PODIO_TEST_INPUT_DATA_DIR}/example.root) message("Getting test input files") execute_process( - COMMAND bash ${CMAKE_CURRENT_LIST_DIR}/get_test_inputs.sh + COMMAND bash ${CMAKE_CURRENT_LIST_DIR}/scripts/get_test_inputs.sh OUTPUT_VARIABLE podio_test_input_data_dir RESULT_VARIABLE test_inputs_available ) @@ -245,3 +245,47 @@ else() LD_LIBRARY_PATH=${CMAKE_CURRENT_BINARY_DIR}:${CMAKE_BINARY_DIR}/src:$:$ENV{LD_LIBRARY_PATH} ) endif() + +# Add tests for storing and retrieving the EDM definitions into the produced +# files +add_test(datamodel_def_store_roundtrip_root ${CMAKE_CURRENT_LIST_DIR}/scripts/dumpModelRoundTrip.sh ${CMAKE_CURRENT_BINARY_DIR}/example_frame.root datamodel) +add_test(datamodel_def_store_roundtrip_root_extension ${CMAKE_CURRENT_LIST_DIR}/scripts/dumpModelRoundTrip.sh ${CMAKE_CURRENT_BINARY_DIR}/example_frame.root datamodel extension_datamodel) + + +# Need the input files that are produced by other tests +set_tests_properties( + datamodel_def_store_roundtrip_root + datamodel_def_store_roundtrip_root_extension + PROPERTIES + DEPENDS write_frame_root + ) + +set(sio_roundtrip_tests "") +if (TARGET read_sio) + add_test(datamodel_def_store_roundtrip_sio ${CMAKE_CURRENT_LIST_DIR}/scripts/dumpModelRoundTrip.sh ${CMAKE_CURRENT_BINARY_DIR}/example_frame.sio datamodel) + add_test(datamodel_def_store_roundtrip_sio_extension ${CMAKE_CURRENT_LIST_DIR}/scripts/dumpModelRoundTrip.sh ${CMAKE_CURRENT_BINARY_DIR}/example_frame.sio datamodel extension_datamodel) + + set(sio_roundtrip_tests + datamodel_def_store_roundtrip_sio + datamodel_def_store_roundtrip_sio_extension + ) + + set_tests_properties( + ${sio_roundtrip_tests} + PROPERTIES + DEPENDS write_frame_sio + ) +endif() + +# We need to convert this into a list of arguments that can be used as environment variable +list(JOIN PODIO_IO_HANDLERS " " IO_HANDLERS) + +set_tests_properties( + datamodel_def_store_roundtrip_root + datamodel_def_store_roundtrip_root_extension + ${sio_roundtrip_tests} + PROPERTIES + WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} + ENVIRONMENT + "PODIO_BASE=${CMAKE_SOURCE_DIR};IO_HANDLERS=${IO_HANDLERS};LD_LIBRARY_PATH=${CMAKE_CURRENT_BINARY_DIR}:${CMAKE_BINARY_DIR}/src:$ENV{LD_LIBRARY_PATH};PYTHONPATH=${CMAKE_SOURCE_DIR}/python:$ENV{PYTHONPATH};ROOT_INCLUDE_PATH=${CMAKE_SOURCE_DIR}/tests/datamodel:${CMAKE_SOURCE_DIR}/include:$ENV{ROOT_INCLUDE_PATH}" + ) diff --git a/tests/CTestCustom.cmake b/tests/CTestCustom.cmake index 6d1e4e165..71812378a 100644 --- a/tests/CTestCustom.cmake +++ b/tests/CTestCustom.cmake @@ -52,6 +52,11 @@ if ((NOT "@FORCE_RUN_ALL_TESTS@" STREQUAL "ON") AND (NOT "@USE_SANITIZER@" STREQ podio-dump-sio podio-dump-detailed-sio podio-dump-detailed-sio-legacy + + datamodel_def_store_roundtrip_root + datamodel_def_store_roundtrip_root_extension + datamodel_def_store_roundtrip_sio + datamodel_def_store_roundtrip_sio_extension ) # ostream_operator is working with Memory sanitizer (at least locally) diff --git a/tests/scripts/dumpModelRoundTrip.sh b/tests/scripts/dumpModelRoundTrip.sh new file mode 100755 index 000000000..9f9bc2148 --- /dev/null +++ b/tests/scripts/dumpModelRoundTrip.sh @@ -0,0 +1,36 @@ +#!/usr/bin/env bash +# Script to check that an EDM definition dumped from a file is "equivalent" to +# the original definition. Essentially does not check that the YAML file is the +# same, but rather that the generated code is the same + +set -eu + +INPUT_FILE=${1} # the datafile +EDM_NAME=${2} # the name of the EDM +COMP_BASE_FOLDER="" # where the source to compare against is +if [ -$# -gt 2 ]; then + COMP_BASE_FOLDER=${3} +fi + +# Create a few temporary but unique files and directories to store output +DUMPED_MODEL=${INPUT_FILE}.dumped_${EDM_NAME}.yaml +OUTPUT_FOLDER=${INPUT_FILE}.dumped_${EDM_NAME} +mkdir -p ${OUTPUT_FOLDER} + +# Dump the model to a yaml file +${PODIO_BASE}/tools/podio-dump --dump-edm ${EDM_NAME} ${INPUT_FILE} > ${DUMPED_MODEL} + +# Regenerate the code via the class generator and the freshly dumped modl +${PODIO_BASE}/python/podio_class_generator.py \ + --clangformat \ + ${DUMPED_MODEL} \ + ${OUTPUT_FOLDER} \ + ${EDM_NAME} \ + ${IO_HANDLERS} + +# Compare to the originally generated code, that has been used to write the data +# file. Need to diff subfolders explitly here because $PODIO_BASE/tests contains +# more stuff +diff -ru ${OUTPUT_FOLDER}/${EDM_NAME} ${PODIO_BASE}/tests/${COMP_BASE_FOLDER}/${EDM_NAME} +diff -ru ${OUTPUT_FOLDER}/src ${PODIO_BASE}/tests/${COMP_BASE_FOLDER}/src +diff -u ${OUTPUT_FOLDER}/podio_generated_files.cmake ${PODIO_BASE}/tests/podio_generated_files.cmake diff --git a/tests/get_test_inputs.sh b/tests/scripts/get_test_inputs.sh similarity index 100% rename from tests/get_test_inputs.sh rename to tests/scripts/get_test_inputs.sh diff --git a/tools/podio-dump b/tools/podio-dump index efb5dfc18..8685aa19e 100755 --- a/tools/podio-dump +++ b/tools/podio-dump @@ -2,6 +2,8 @@ """podio-dump tool to dump contents of podio files""" import sys +import json +import yaml from podio.reading import get_reader @@ -15,9 +17,11 @@ def print_general_info(reader, filename): Args: reader (root_io.Reader, sio_io.Reader): An initialized reader """ - print(f'input file: {filename}\n') legacy_text = ' (this is a legacy file!)' if reader.is_legacy else '' - print(f'Frame categories in this file{legacy_text}:') + print(f'input file: {filename}{legacy_text}\n') + print(f'datamodel model definitions stored in this file: {", ".join(reader.datamodel_definitions)}') + print() + print('Frame categories in this file:') print(f'{"Name":<20} {"Entries":<10}') print('-' * 31) for category in reader.categories: @@ -68,6 +72,18 @@ def print_frame(frame, cat_name, ientry, detailed): print('\n', flush=True) +def dump_model(reader, model_name): + """Dump the model in yaml format""" + if model_name not in reader.datamodel_definitions: + print(f'ERROR: Cannot dump model \'{model_name}\' (not present in file)') + return False + + model_def = json.loads(reader.get_datamodel_definition(model_name)) + print(yaml.dump(model_def, sort_keys=False, default_flow_style=False)) + + return True + + def main(args): """Main""" try: @@ -76,6 +92,12 @@ def main(args): print(f'ERROR: Cannot open file \'{args.inputfile}\': {err}') sys.exit(1) + if args.dump_edm is not None: + if dump_model(reader, args.dump_edm): + sys.exit(0) + else: + sys.exit(1) + print_general_info(reader, args.inputfile) if args.category not in reader.categories: print(f'ERROR: Cannot print category \'{args.category}\' (not present in file)') @@ -120,6 +142,9 @@ if __name__ == '__main__': type=parse_entry_range, default=[0]) parser.add_argument('-d', '--detailed', help='Dump the full contents not just the collection info', action='store_true', default=False) + parser.add_argument('--dump-edm', + help='Dump the specified EDM definition from the file in yaml format', + type=str, default=None) clargs = parser.parse_args() main(clargs)