Improve DAG processor performance when sorting by mtime#60864
Improve DAG processor performance when sorting by mtime#60864ephraimbuddy wants to merge 6 commits intoapache:mainfrom
Conversation
| try: | ||
| mtime = os.path.getmtime(file.absolute_path) | ||
| files_with_mtime[file] = mtime | ||
| stat = self._file_stats[file] |
There was a problem hiding this comment.
| stat = self._file_stats[file] | |
| stat = self._file_stats.get(file) |
It may not be in _file_stats yet if its a new file, but that's okay. Not sure what it'd do with the sorting below though.
There was a problem hiding this comment.
Looks like tests are failing due to that. I will check it properly tomorrow
There was a problem hiding this comment.
Ok. So using get resulted in problems and I verified that using self._file_stats[file] creates defaults
There was a problem hiding this comment.
Not sure we want to create it though, until we've parsed it yet?
There was a problem hiding this comment.
This default dict creates entry if missing:
When file_parsing_sort_mode is set to "modified_time", the DAG processor previously re-sorted the entire file queue on every bundle refresh, even when no file modification times had changed. This change caches the last seen modification time for each file in DagFileStat and skips the sort entirely when no mtimes have changed since the last check.
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
56edf98 to
a96229f
Compare
| return # No changes, skip sorting | ||
|
|
||
| # Sort by mtime descending and rebuild queue | ||
| sorted_files = [f for f, _ in sorted(files_with_mtime.items(), key=itemgetter(1), reverse=True)] |
There was a problem hiding this comment.
Wouldn't this put new files at the end of the list?
There was a problem hiding this comment.
Nope. This is replicating what _resort_by_mtime does but optimizing by avoiding unnecessory resorting.
New files would have most recent mtimes which is higher thus processed first since it's by descending order. Older ones will be done last
When file_parsing_sort_mode is set to "modified_time", the DAG processor previously re-sorted the entire file queue on every bundle refresh, even when no file modification times had changed.
This change caches the last seen modification time for each file in DagFileStat and skips the sort entirely when no mtimes have changed since the last check.
A follow up on #60003