Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imageset paths() manipulates paths according to OS, breaking filename checking in DIALS image grouping code #771

Open
dagewa opened this issue Nov 22, 2024 · 11 comments · Fixed by #772 · May be fixed by #777
Open

Imageset paths() manipulates paths according to OS, breaking filename checking in DIALS image grouping code #771

dagewa opened this issue Nov 22, 2024 · 11 comments · Fixed by #772 · May be fixed by #777

Comments

@dagewa
Copy link
Member

dagewa commented Nov 22, 2024

This code in DIALS gets called by xia2.ssx_reduce:

    def _get_expt_file_to_groupsdata(self, data_file_pairs: List[FilePair]):
        expt_file_to_groupsdata: Dict[Path, GroupsForExpt] = {}

        for fp in data_file_pairs:
            expts = load.experiment_list(fp.expt, check_format=False)
            # need to match the images to the imagesets.
            images = set()
            image_to_group_info = {}
            for iset in expts.imagesets():
                images.update(iset.paths())
            for iset in images:
                image = None
                for ifile in self._files_to_groups_dict.keys():
                    if iset == ifile.name:
                        image = ifile
                        break
                if image is None:
                    raise ValueError(f"Imageset {iset} not found in metadata")
                image_to_group_info[iset] = self._files_to_groups_dict[image]
...

The call to iset.paths() on Windows returns something like this

In [12]: iset.paths()
Out[12]: ['c:\\dls\\mx\\data\\nt30330\\nt30330-15\\VMXi-AB1698\\well_42\\images\\image_58766.nxs']

The original path (a location on the DLS file system) has been manipulated by adding a drive letter and altering the path separator. This causes a failure later on when these paths are checked against cached file names in a dictionary (if iset == ifile.name), because the manipulated path does not match the expected path and we end up with

Error: Imageset c:\dls\mx\data\nt30330\nt30330-15\VMXi-AB1698\well_42\images\image_58766.nxs not found in metadata
@dagewa
Copy link
Member Author

dagewa commented Nov 22, 2024

This is the immediate cause of this test failure on Windows
xia2/xia2#824 (comment)

@dagewa
Copy link
Member Author

dagewa commented Nov 22, 2024

In this case, paths() is

In [11]: iset.paths
Out[11]: <bound method _.paths of <dxtbx.imageset.ImageSetLazy object at 0x000001CCFBEB6A70>>

@ndevenish
Copy link
Collaborator

ImageSetLazy

Should we be expecting to see this anywhere outside of stills_process? I thought this was only for handling inside there?

@dagewa
Copy link
Member Author

dagewa commented Nov 22, 2024

No idea about ImageSetLazy...

@dagewa
Copy link
Member Author

dagewa commented Nov 22, 2024

Short reproducer of the issue

>>> from dxtbx.model.experiment_list import ExperimentList
>>> el=ExperimentList.from_file("C:/Users/fcx32934/.cache/dials_data/dtpb_serial_processed/well39_batch12_integrated.expt", check_format=False)
>>> el[0].imageset.paths()
['c:\\dls\\mx\\data\\nt30330\\nt30330-15\\VMXi-AB1698\\well_39\\images\\image_58763.nxs']

@ndevenish
Copy link
Collaborator

I think here is where it goes wrong:

filenames = [
resolve_path(p, directory=self._directory) if not get_url_scheme(p) else p
for p in imageset["images"]
]

Calling this:

def resolve_path(path, directory=None):
"""Resolve a file path.
First expand any environment and user variables. Then create the absolute
path by applying the relative path to the provided directory, if necessary.
Args:
path (str): The path to resolve
directory (Optional[str]): The local path to resolve relative links
Returns:
str: The absolute path to the file to read
"""
if not path:
return ""
path = os.path.expanduser(os.path.expandvars(path))
if directory and not os.path.isabs(path):
path = os.path.join(directory, path)
return os.path.abspath(path)

@ndevenish
Copy link
Collaborator

.. I think that this handling is possibly necessary because we retain the ability to have relative path names, which AFAIK we only use in tests? If this was always required to be absolute, we could probably skip this?

Or alternatively, only do this replacement if it ends up pointing to a file that actually exists, otherwise leave it as the original filename?

@dagewa
Copy link
Member Author

dagewa commented Nov 22, 2024

Damn, your mention of resolve_path triggered some stirring of the grey matter and I went back and found this #613. I have been in this dungeon before.

@ndevenish
Copy link
Collaborator

I wonder if this is actually the best way to resolve:

only do this replacement if it ends up pointing to a file that actually exists, otherwise leave it as the original filename?

@dagewa
Copy link
Member Author

dagewa commented Nov 22, 2024

That makes a lot of sense

@dagewa
Copy link
Member Author

dagewa commented Dec 10, 2024

Nope, #772 did not work at Diamond

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants