Optimisation of directory type checking in file list #1381
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an attempt to improve the performance of generating a file list for directories with lots of subdirectories. The main performance bottlenecks in the current code are:
There are three options for displaying file lists: filtering by content, filtering by file extension, and no filtering. Filtering by content additionally slows down the file list generation because it reads the signature of each ordinary file to determine the type. Guessing the type from the file extension speeds up the processing of ordinary files, but not subdirectories, which are always checked with casacore's
ImageOpener
. Turning off filtering entirely has no additional performance impact on the image file list (because detection of directory-based images is still required), but speeds up the region file list (because if no filtering is required, all directories can be shown without additional checks).This PR is an attempt to add a less expensive heuristic for directory-based images, to be used when the option to filter by content is not selected. It performs almost the same checks as
ImageOpener
, but only for directories, by looking for files inside the directory withfs::exists
, and without distinguishing between different CASA image subtypes.The existing code assumes that there are directory image formats which we do not support, and handles them differently. However, it's clear from the
ImageOpener
code that the GIPSY format is a pair of files, not a directory (soImageOpener
would never return that type for a directory), and theCAIPS
andNEWSTAR
types are obsolete and never returned byImageOpener
(at all). So I have removed this option from the code, and not implemented it in the alternative code.The result: the alternative code appears to be slightly faster, but I don't know if it's faster enough for it to make sense to add it as an alternative to the casacore code. If this implementation is sufficient for our purposes (e.g. we don't need to read the
table.info
file because we don't need do distinguish between CASA sub-types here), then perhaps we should replace the casacore check with this (for a modest speed improvement in all cases).Other optimizations we discussed:
I think we're planning to use the last option as our long-term solution, and I would suggest applying that strategy to all files: instead of loading file information up-front when generating the file list, we could initially return just a bare list of files and directories, and then return information for lists of files and directories as the frontend requests them.
Checklist