Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sina/hdf5 output support #1480

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

doutriaux1
Copy link

Summary

  • This PR is a feature
  • It does the following (modify list as needed):
    • Adds the capacity to output sina files as hdf5 and opens the door for more formats in the future

message << "The '" << RELATIONSHIPS_KEY
<< "' element of a document must be an array";
throw std::invalid_argument(message.str());
conduit::Node relationship_nodes = asNode[RELATIONSHIPS_KEY];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a ref to avoid a copy

conduit::Node modifiedRelationshipsNode;

removeSlashes(relationshipNode, modifiedRelationshipsNode);
relationshipsNode.append() = modifiedRelationshipsNode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could pass ref to append result into removeSlashes to avoid one extra copy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this sub module add intended?

Copy link
Member

@bgunnar5 bgunnar5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions for you. Also wondering if the fortran changes are tested?

} // namespace
}

void protocol_warn(std::string protocol, std::string const &name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name should be camel case protocolWarn instead (see here). Also, protocol can probably be a const reference here like name is.

Comment on lines +43 to +58
size_t pos = name.rfind('.');

if (pos != std::string::npos) {
std::string found = name.substr(pos+1);

if (("." + found) != protocol && protocol == ".json") {
std::cout << ".json extension not found, did you mean to save to this format?";
} else if (("." + found) != protocol && protocol == ".hdf5") {
std::cout << ".hdf5 extension not found, did you use one of its other supported types? (h5, hdf, ...)";
} else {
return;
}
} else {
std::cout << "No file extension found, did you mean to use one of " << protocol << "'s supported types?";
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reduce redundancy in if/else chain if we use a map in this function instead:

// Define a mapping of protocols to their warning messages
std::unordered_map<std::string, std::string> protocolMessages = {
    {".json", ".json extension not found, did you mean to save to this format?"},
    {".hdf5", ".hdf5 extension not found, did you use one of its other supported types? (h5, hdf, ...)"}
};

// Check that the file name has a file extension
size_t pos = name.rfind('.');
if (pos != std::string::npos) {
    std::string found = name.substr(pos);
    
    // Check if the found extension matches the expected protocol
    if (found != protocol) {
        auto messageIt = protocolMessages.find(protocol);
        if (messageIt != protocolMessages.end()) {
            std::cout << messageIt->second << std::endl;
        }
    }
} else {
    std::cout << "No file extension found, did you mean to use one of " 
              << protocol << "'s supported types?" << std::endl;
}

This might make it easier to add support for new file extensions in the future as well.

// Find and replace all occurrences of "__SLASH__"

while ((pos = restoredKey.find(slashSubstitute, pos)) != std::string::npos) {
restoredKey.replace(pos, toReplace.length(), replacement);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toReplace does not exist within the scope of this function. Should this be slashSubstitute instead?

Comment on lines +214 to +217
if (relationship_nodes.number_of_children() == 0)
{
relationship_nodes.set(conduit::DataType::list());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be adding in this same number of children check for Records as well? If so, maybe a refactor of this entire method would help reduce redundant code:

void Document::createFromNode(conduit::Node const &asNode,
                              RecordLoader const &recordLoader)
{
    auto processChildNodes = [&](const char* key, std::function<void(conduit::Node&)> addFunc) {
        if (asNode.has_child(key)) {
            conduit::Node& childNodes = asNode[key];
            if (childNodes.number_of_children() == 0) {
                childNodes.set(conduit::DataType::list());
            }
            if (!childNodes.dtype().is_list()) {
                std::ostringstream message;
                message << "The '" << key << "' element of a document must be an array";
                throw std::invalid_argument(message.str());
            }

            auto childIter = childNodes.children();
            while (childIter.has_next()) {
                auto child = childIter.next();
                addFunc(child);
            }
        }
    };

    processChildNodes(RECORDS_KEY, [&](conduit::Node& record) {
        add(recordLoader.load(record));
    });

    processChildNodes(RELATIONSHIPS_KEY, [&](conduit::Node& relationship) {
        add(Relationship{relationship});
    });
}

(This is untested so I'm not sure if it'd actually compile).

Comment on lines +297 to +314
if (protocol == Protocol::JSON)
{
std::string message {"Could not save to '"};
message += fileName;
message += "'";
throw std::ios::failure {message};
protocol_warn(".json", fileName);
auto asJson = document.toJson();
std::ofstream fout {tmpFileName};
fout.exceptions(std::ostream::failbit | std::ostream::badbit);
fout << asJson;
fout.close();
}
else if (protocol == Protocol::HDF5)
{
protocol_warn(".hdf5", fileName);
document.toHDF5(tmpFileName);
}
else
{
throw std::invalid_argument("Invalid format choice. Please enter 'json' or 'hdf5'.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might consider porting this to switch rather than if/else

Comment on lines +335 to +347
if (protocol == Protocol::JSON) {
std::ifstream file_in {path};
std::ostringstream file_contents;
file_contents << file_in.rdbuf();
file_in.close();
node.parse(file_contents.str(), "json");
return Document {node, recordLoader};
} else if (protocol == Protocol::HDF5) {
conduit::Node modifiedNode;
conduit::relay::io::load(path, "hdf5", node);
restoreSlashes(node, modifiedNode);
return Document {modifiedNode, recordLoader};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to saveDocument we might consider using a switch statement here as well.

Comment on lines +234 to +243
/**
* \brief Get the current file format version.
*
* \return A string representing the file format version.
*/
inline std::string getSinaFileFormatVersion()
{
return std::to_string(SINA_FILE_FORMAT_VERSION_MAJOR) + "." +
std::to_string(SINA_FILE_FORMAT_VERSION_MINOR);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove as this is a duplicate of the same function below (lines 245-254)

@@ -53,7 +53,7 @@ Adding Data

Once we have a Record, we can add different types of data to it. Any Datum
object that is added will end up in the "data" section of the record in
the JSON file.
the file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the output file" instead for clarity

}
//! [end io write]

//! [begin io read]
void load()
{
axom::sina::Document doc = axom::sina::loadDocument("my_output.json");
axom::sina::Document doc1 = axom::sina::loadDocument("my_output.json");
axom::sina::Document doc2 = axom::sina::loadDocument("my_output.json", axom::sina::Protocol::HDF5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should this be "my_output.hdf5" as the first argument to loadDocument

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants