Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve format identification #719

Merged
merged 21 commits into from
Aug 17, 2015
Merged

Conversation

astrofrog
Copy link
Member

At the moment, glue uses a default factory based on filename in order to open files. This can lead to issues in cases where readers define more specialized versions of common formats. For instance, the astrodendro package stores dendrograms in FITS and HDF5 files, but there is already a general FITS and HDF5 reader (which is also the default factory for any file whose name ends in hdf5 or fits).

This PR makes the following changes:

  • The dendrogram identifier function now checks more than the filename, and checks that the file structure corresponds to what would be expected from a dendrogram. Note that for the FITS format, the HDUs were missing names, so instead for old files I rely on the layout of the file which is unlikely to mimic a dendrogram file just by chance. I've also opened a PR to add HDU names for the astrodendro package, see Give FITS HDUs names dendrograms/astrodendro#144 - and have preemptively also added these names to the logic.
  • The format identification to no longer have a 'default', but instead to check with all identifier functions which ones could read the files.
  • The data factories now have the concept of 'priority'. For example, the dendrogram FITS/HDF5 loader has a higher priority than the general one, if the file truly is a dendrogram file, because then we should use the more specialized reader.

With these changes, one can do for example glue dendro.fits and it will correctly open up as a dendrogram dataset.

There are failing tests, so this is still a WIP, but @ChrisBeaumont let me know if you have any thoughts in the mean time about this approach.

except AttributeError:
return 0

return [self.item(f, f.label, f.identifier, get_priority(f)) for f in __factories__]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is priority not defined in a namedtuple?

@astrofrog astrofrog changed the title WIP: improve format identification Improve format identification Aug 16, 2015
@astrofrog
Copy link
Member Author

@ChrisBeaumont - I have implemented your suggestions. In a separate pull request, I can change all the internal data loaders to use the @data_factory decorator too (I'll do it separately because in some cases it requires defining the identifier before the data factory, but I don't want to clutter the diffs here).

Does this seem ok now?

@ChrisBeaumont
Copy link
Member

+1

astrofrog added a commit that referenced this pull request Aug 17, 2015
@astrofrog astrofrog merged commit 1d092de into glue-viz:master Aug 17, 2015
@astrofrog
Copy link
Member Author

@ChrisBeaumont - thanks for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants