Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexing error on artifact removal #10

Open
tbenst opened this issue Apr 1, 2021 · 5 comments
Open

indexing error on artifact removal #10

tbenst opened this issue Apr 1, 2021 · 5 comments

Comments

@tbenst
Copy link
Collaborator

tbenst commented Apr 1, 2021

I'm not positive, but I think this error may relate to "Handle dual-channel where one channel is functional and the other is still measured." In this recording, located at /oak/stanford/groups/deissero/users/tyler/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044 (error trace below using my local path-- can replace /data/dlab with /oak/stanford/groups/deissero/users/tyler for sherlock), I record a structural channel in red (Ch2) and a functional channel in green (Ch3). I primarily care about removing stim artefact from Ch3, although don't mind removing from both!

I get the following error on preprocessing:

> python ~/code/two-photon/two-photon/process.py --input_dir /data/dlab/b115/ --output_dir /data/dlab/b115/process-output/ --recording 2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2:TSeries_lrhab_raphe_40trial-044 --preprocess
2021-03-31 18:55:50.618 metadata:22 INFO Extracting metadata from xml files:
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044.xml
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044_Cycle00001_VoltageRecording_001.xml
2021-03-31 18:55:53.797 metadata:102 INFO The following metadata is written to: /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/output/metadata.json
{'channels': {0: {'enabled': True, 'name': 'frame starts'},
              1: {'enabled': True, 'name': 'secondary'},
              2: {'enabled': True, 'name': 'winfluo'},
              3: {'enabled': True, 'name': 'Blue'},
              4: {'enabled': True, 'name': 'VR timestamps'},
              5: {'enabled': True, 'name': 'green'},
              6: {'enabled': True, 'name': 'LED'},
              7: {'enabled': True, 'name': 'respir'}},
 'laser': {'power': None, 'wavelength': None},
 'layout': {'frames_per_sequence': 5, 'sequences': 7272},
 'optical_zoom': 1.0,
 'period': 0.0655215,
 'size': {'channels': 2,
          'frames': 7272,
          'x_px': 1024,
          'y_px': 1024,
          'z_planes': 5}}
2021-03-31 18:55:53.960 process:92 INFO Found stim channel "respir", enabled=True
TYLER DEBUG: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044_Cycle00001_VoltageRecording_001.csv
2021-03-31 18:56:07.721 process:161 INFO Read voltage recordings from: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044_Cycle00001_VoltageRecording_001.csv, preview:
   Time(ms)  frame starts  secondary  ...     green       LED    respir
0       0.0      0.086975   0.013428  ...  0.020142  0.000916 -0.028687
1       0.2      4.942322   0.013733  ...  0.020447  0.001526 -0.030518
2       0.4      4.935303   0.014038  ...  0.020752  0.000916 -0.028381
3       0.6      4.938049   0.013428  ...  0.020752  0.000916 -0.028992
4       0.8      4.939880   0.014343  ...  0.020447  0.001221 -0.027771

[5 rows x 9 columns]
2021-03-31 18:56:11.504 utils:129 INFO Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-03-31 18:56:11.504 utils:141 INFO NumExpr defaulting to 8 threads.
2021-03-31 18:56:11.764 artefacts:15 INFO Stored calculated frame starts in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/output/frame_start.h5, preview:
Int64Index([1, 334, 667, 1000, 1333], dtype='int64')
2021-03-31 18:56:11.769 artefacts:21 INFO Calculating artefact regions
2021-03-31 18:56:15.409 artefacts:41 INFO Stored calculated artefact positions in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/output/artefact.h5, preview:
       z_plane  y_min  y_max
frame                       
60           3    479    512
61           1    455    490
61           4    433    469
62           2    414    449
63           0    390    426
2021-03-31 18:56:20.405 tiffdata:43 INFO Found data with shape(frames, z_planes, y_pixels, x_pixels): (7272, 5, 1024, 1024)
2021-03-31 18:56:20.722 transform:41 INFO Writing uncorrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/hdf5/uncorrected/uncorrected.h5
[########################################] | 100% Completed |  6min 45.8s
2021-03-31 19:03:10.541 transform:46 INFO Writing corrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/hdf5/data/data.h5
[                                        ] | 1% Completed | 53.8s
Traceback (most recent call last):
  File "/home/tyler/code/two-photon/two-photon/process.py", line 354, in <module>
    main()
  File "/home/tyler/code/two-photon/two-photon/process.py", line 114, in main
    preprocess(basename_input, dirname_output, fname_csv, fname_uncorrected_hdf5, fname_hdf5, mdata,
  File "/home/tyler/code/two-photon/two-photon/process.py", line 173, in preprocess
    transform.convert(data, fname_data, df_artefacts, fname_uncorrected)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 59, in convert
    data_corrected.to_hdf5(fname_data, HDF5_KEY)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 1475, in to_hdf5
    return to_hdf5(filename, datapath, self, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 4577, in to_hdf5
    store(list(data.values()), dsets)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 981, in store
    result.compute(**kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/optimization.py", line 961, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 151, in get
    result = _execute_task(task, cache)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/utils.py", line 29, in apply
    return func(*args, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 436, in _pass_extra_kwargs
    return func(*args[len(keys) :], **kwargs)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 87, in remove_artefacts
    before = chunk[index - 1, row.z_plane, y_slice]
IndexError: index 2 is out of bounds for axis 1 with size 2
@tbenst tbenst changed the title indexing error on artifact removal error for multi-channel recording? indexing error on artifact removal Apr 1, 2021
@tbenst
Copy link
Collaborator Author

tbenst commented Apr 1, 2021

This issue doesn't appear to be related to multiple channels, as I replicated the error on another file that only recorded from one channel. On Oak, the data is at /oak/stanford/groups/deissero/users/tyler/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038.

Sherlock command: python process.py --input_dir /oak/stanford/groups/deissero/users/tyler/b115/ --output_dir /oak/stanford/groups/deissero/users/tyler/b115/process-output/ --recording 2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1:TSeries-lrhab_raphe_stim-40trial-038 --preprocess

Output (run locally):

❯ python ~/code/two-photon/two-photon/process.py --input_dir /data/dlab/b115/ --output_dir /data/dlab/b115/process-output/ --recording 2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1:TSeries-lrhab_raphe_stim-40trial-038 --preprocess
2021-03-31 23:43:31.162 metadata:22 INFO Extracting metadata from xml files:
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038.xml
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038_Cycle00001_VoltageRecording_001.xml
2021-03-31 23:43:34.014 metadata:102 INFO The following metadata is written to: /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/output/metadata.json
{'channels': {0: {'enabled': True, 'name': 'frame starts'},
              1: {'enabled': True, 'name': 'secondary'},
              2: {'enabled': True, 'name': 'winfluo'},
              3: {'enabled': True, 'name': 'Blue'},
              4: {'enabled': True, 'name': 'VR timestamps'},
              5: {'enabled': True, 'name': 'green'},
              6: {'enabled': True, 'name': 'LED'},
              7: {'enabled': True, 'name': 'respir'}},
 'laser': {'power': None, 'wavelength': None},
 'layout': {'frames_per_sequence': 5, 'sequences': 7272},
 'optical_zoom': 1.0,
 'period': 0.065518959,
 'size': {'channels': 1,
          'frames': 7272,
          'x_px': 1024,
          'y_px': 1024,
          'z_planes': 5}}
2021-03-31 23:43:34.135 process:92 INFO Found stim channel "respir", enabled=True
TYLER DEBUG: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038_Cycle00001_VoltageRecording_001.csv
2021-03-31 23:43:49.189 process:161 INFO Read voltage recordings from: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038_Cycle00001_VoltageRecording_001.csv, preview:
   Time(ms)  frame starts  secondary  ...     green       LED    respir
0       0.0      0.085144   0.013733  ...  0.019836  0.000916 -0.030212
1       0.2      4.934998   0.013733  ...  0.021057  0.001221 -0.029907
2       0.4      4.934387   0.013123  ...  0.020752  0.001221 -0.029602
3       0.6      4.935303   0.013123  ...  0.020752  0.001221 -0.029297
4       0.8      4.936218   0.013428  ...  0.020752  0.000610 -0.030823

[5 rows x 9 columns]
2021-03-31 23:43:52.034 utils:129 INFO Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-03-31 23:43:52.034 utils:141 INFO NumExpr defaulting to 8 threads.
2021-03-31 23:43:52.239 artefacts:15 INFO Stored calculated frame starts in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/output/frame_start.h5, preview:
Int64Index([1, 334, 668, 1000, 1333], dtype='int64')
2021-03-31 23:43:52.240 artefacts:21 INFO Calculating artefact regions
2021-03-31 23:43:55.193 artefacts:41 INFO Stored calculated artefact positions in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/output/artefact.h5, preview:
       z_plane  y_min  y_max
frame                       
60           3    479    512
61           1    455    490
61           4    432    467
62           2    412    447
63           0    389    425
2021-03-31 23:46:51.074 tiffdata:43 INFO Found data with shape(frames, z_planes, y_pixels, x_pixels): (7272, 5, 1024, 1024)
2021-03-31 23:46:51.374 transform:41 INFO Writing uncorrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/hdf5/uncorrected/uncorrected.h5
[                                        ] | 0% Completed |  2.1s/home/tyler/opt/anaconda3/lib/python3.8/site-packages/tifffile/tifffile.py:12019: RuntimeWarning: invalid value encountered in true_divide
  'ZDistance': values[:, 0] / values[:, 1],
[########################################] | 100% Completed |  4min 10.3s
2021-03-31 23:51:05.250 transform:46 INFO Writing corrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/hdf5/data/data.h5
[                                        ] | 1% Completed |  9.9s
Traceback (most recent call last):
  File "/home/tyler/code/two-photon/two-photon/process.py", line 354, in <module>
    main()
  File "/home/tyler/code/two-photon/two-photon/process.py", line 114, in main
    preprocess(basename_input, dirname_output, fname_csv, fname_uncorrected_hdf5, fname_hdf5, mdata,
  File "/home/tyler/code/two-photon/two-photon/process.py", line 173, in preprocess
    transform.convert(data, fname_data, df_artefacts, fname_uncorrected)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 59, in convert
    data_corrected.to_hdf5(fname_data, HDF5_KEY)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 1475, in to_hdf5
    return to_hdf5(filename, datapath, self, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 4577, in to_hdf5
    store(list(data.values()), dsets)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 981, in store
    result.compute(**kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/optimization.py", line 961, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 151, in get
    result = _execute_task(task, cache)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/utils.py", line 29, in apply
    return func(*args, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 436, in _pass_extra_kwargs
    return func(*args[len(keys) :], **kwargs)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 87, in remove_artefacts
    before = chunk[index - 1, row.z_plane, y_slice]
IndexError: index 4 is out of bounds for axis 1 with size 3

@tbenst
Copy link
Collaborator Author

tbenst commented Apr 1, 2021

@chrisroat I think the issue is the use of row.z_plane during a call to map_overlap.

In this case, the uncorrected hdf5 file has a dataset with shape (7272, 5, 1024, 1024), and dask chooses a chunk size of (34, 3, 512, 512). z_plane can't be used to index a chunk

@chrisroat
Copy link
Contributor

Doe the chunks parameter in da.from_array solve this issue?

I've learned a bit more about ome.tiff and how to handle z_planes and timepoints. I am likely going to revisit a lot of logic here that kinda of got piled as I learned on-the-job.

One thing I've considered is to process each z-plane independently. I think it would be more efficient (data could be stored in z/time/y/x ordering), but may not be worth it as it could be surprising/confusing if all the rest of the code is time/z/y/x. I value the "principle of least surprise".

@tbenst
Copy link
Collaborator Author

tbenst commented Apr 9, 2021

Doe the chunks parameter in da.from_array solve this issue?

yes, chunks=('auto', -1, -1, -1) fixed this by ensuring z indexing is valid.

One thing I've considered is to process each z-plane independently. I think it would be more efficient (data could be stored in z/time/y/x ordering), but may not be worth it as it could be surprising/confusing if all the rest of the code is time/z/y/x. I value the "principle of least surprise".

I prefer TZYX since my analysis usually runs on all z-planes concurrently, but others like NWB store by plane. Does TZYX preclude / add difficulty parallelization for some reason?

@chrisroat
Copy link
Contributor

Not really. Operating on t-slices is perfectly fine. I'd actually prefer to keep TZYX, as it's more typical and natural (and less surprising, I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants