Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[idl_parser] Track included files by hash #6434

Merged
merged 4 commits into from
Mar 1, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion include/flatbuffers/idl.h
Original file line number Diff line number Diff line change
Expand Up @@ -993,7 +993,7 @@ class Parser : public ParserState {
std::string file_identifier_;
std::string file_extension_;

std::map<std::string, std::string> included_files_;
std::map<uint64_t, std::string> included_files_;
std::map<std::string, std::set<std::string>> files_included_per_file_;
std::vector<std::string> native_included_files_;

Expand Down
22 changes: 0 additions & 22 deletions src/idl_gen_ts.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -308,28 +308,6 @@ class TsGenerator : public BaseGenerator {
return "NS" + NumToString(HashFnv1a<uint64_t>(file.c_str()));
}

std::string GenPrefixedImport(const std::string &full_file_name,
const std::string &base_name) {
// Either keep the include path as it was
// or use only the base_name + kGeneratedFileNamePostfix
std::string path;
if (parser_.opts.keep_include_path) {
auto it = parser_.included_files_.find(full_file_name);
FLATBUFFERS_ASSERT(it != parser_.included_files_.end());
path = flatbuffers::StripExtension(it->second) +
parser_.opts.filename_suffix;
} else {
path = base_name + parser_.opts.filename_suffix;
}

// Add the include prefix and make the path always relative
path = flatbuffers::ConCatPathFileName(parser_.opts.include_prefix, path);
path = std::string(".") + kPathSeparator + path;

return "import * as " + GenFileNamespacePrefix(full_file_name) +
" from \"" + path + "\";\n";
}

// Adds a source-dependent prefix, for of import * statements.
std::string GenPrefixedTypeName(const std::string &typeName,
const std::string &file) {
Expand Down
36 changes: 27 additions & 9 deletions src/idl_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3153,13 +3153,29 @@ CheckedError Parser::ParseRoot(const char *source, const char **include_paths,
return NoError();
}

// Generate a unique hash for a file based on its name and contents (if any).
static uint64_t HashFile(const char *source_filename, const char *source) {
auto stripped_source_fillename = StripPath(source_filename);
auto hash = HashFnv1a<uint64_t>(stripped_source_fillename.c_str());
if (source && *source) hash ^= HashFnv1a<uint64_t>(source);

return hash;
}

CheckedError Parser::DoParse(const char *source, const char **include_paths,
const char *source_filename,
const char *include_filename) {
uint64_t source_hash = 0;
if (source_filename) {
if (included_files_.find(source_filename) == included_files_.end()) {
included_files_[source_filename] =
include_filename ? include_filename : "";
// If the file is in-memory, don't include its contents in the hash as we
// won't be able to load them later.
if (FileExists(source_filename))
source_hash = HashFile(source_filename, source);
else
source_hash = HashFile(source_filename, nullptr);

Copy link
Contributor

@vglavnyy vglavnyy Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how it works.
Why parse should continue processing if a schema has include filename directive but the file doesn't exist?
Can you give an example that covers both paths of this if (FileExists(source_filename))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (included_files_.find(source_hash) == included_files_.end()) {
included_files_[source_hash] = include_filename ? include_filename : "";
files_included_per_file_[source_filename] = std::set<std::string>();
} else {
return NoError();
Expand Down Expand Up @@ -3210,12 +3226,14 @@ CheckedError Parser::DoParse(const char *source, const char **include_paths,
return Error("unable to locate include file: " + name);
if (source_filename)
files_included_per_file_[source_filename].insert(filepath);
if (included_files_.find(filepath) == included_files_.end()) {

std::string contents;
bool file_loaded = LoadFile(filepath.c_str(), true, &contents);
if (included_files_.find(HashFile(filepath.c_str(), contents.c_str())) ==
included_files_.end()) {
// We found an include file that we have not parsed yet.
// Load it and parse it.
std::string contents;
if (!LoadFile(filepath.c_str(), true, &contents))
return Error("unable to load include file: " + name);
// Parse it.
if (!file_loaded) return Error("unable to load include file: " + name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, why file_loaded checked if and only its hash not found?
Why empty contents is used for calculation?

Copy link
Contributor Author

@mmmspatz mmmspatz Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we encounter an include directive, we need to answer two questions:

  1. Has the included file already been parsed?
  2. If not, can we load it for parsing?

It's necessary to defer query 2 until we know the answer to query 1 is "no", becasue Parser::Parse() can be called schemas that do not exist on-disk (example). If one of those in-memory schemas is subsequently included in another schema, we won't be able to load it from disk but we can know that it was previously parsed based on its name alone. At least, that is the assumtion that was made even prior to #6371.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmmspatz thank you for the explanation.
Your solution looks correct.

ECHECK(DoParse(contents.c_str(), include_paths, filepath.c_str(),
name.c_str()));
// We generally do not want to output code for any included files:
Expand All @@ -3232,7 +3250,7 @@ CheckedError Parser::DoParse(const char *source, const char **include_paths,
// entered into included_files_.
// This is recursive, but only go as deep as the number of include
// statements.
if (source_filename) { included_files_.erase(source_filename); }
included_files_.erase(source_hash);
return DoParse(source, include_paths, source_filename,
include_filename);
}
Expand Down