Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix repeated slicing causing missing uns, AttributeError #133

Merged
merged 3 commits into from
Apr 30, 2019

Conversation

ryan-williams
Copy link
Contributor

Fixes #77

Minimal repro of AttributeError (before this fix)

import anndata as ad
import numpy as np
from anndata import AnnData
from copy import deepcopy

a = AnnData(
    np.array([[1, 2], [3, 4], [5, 6]]),
    uns = { 'foo': np.ones((3, 3)) }
)

a1 = a[:, :]
a2 = a1[:, :]

a.uns   # {'foo': array([[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]])}
a1.uns  # {'foo': array([[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]])}
a2.uns  # {} 😱

a2[:, :]  # 😱💥 AttributeError
Stack trace

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-23-b5a867d66b61> in <module>
----> 1 a2[:, :]


~/c/anndata/anndata/base.py in __getitem__(self, index)
   1300     def __getitem__(self, index: Index) -> 'AnnData':
   1301         """Returns a sliced view of the object."""
-> 1302         return self._getitem_view(index)
   1303 
   1304     def _getitem_view(self, index: Index) -> 'AnnData':


~/c/anndata/anndata/base.py in _getitem_view(self, index)
   1304     def _getitem_view(self, index: Index) -> 'AnnData':
   1305         oidx, vidx = self._normalize_indices(index)
-> 1306         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1307 
   1308     def _remove_unused_categories(self, df_full, df_sub, uns):


~/c/anndata/anndata/base.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, oidx, vidx)
    663             if not isinstance(X, AnnData):
    664                 raise ValueError('`X` has to be an AnnData object.')
--> 665             self._init_as_view(X, oidx, vidx)
    666         else:
    667             self._init_as_actual(


~/c/anndata/anndata/base.py in _init_as_view(self, adata_ref, oidx, vidx)
    692         self._varm = ArrayView(adata_ref.varm[vidx_normalized], view_args=(self, 'varm'))
    693         # hackish solution here, no copy should be necessary
--> 694         uns_new = deepcopy(self._adata_ref._uns)
    695         # need to do the slicing before setting the updated self._n_obs, self._n_vars
    696         self._n_obs = self._adata_ref.n_obs  # use the original n_obs here


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    178                     y = x
    179                 else:
--> 180                     y = _reconstruct(x, memo, *rv)
    181 
    182     # If is its own copy, don't memoize.


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    278     if state is not None:
    279         if deep:
--> 280             state = deepcopy(state, memo)
    281         if hasattr(y, '__setstate__'):
    282             y.__setstate__(state)


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    148     copier = _deepcopy_dispatch.get(cls)
    149     if copier:
--> 150         y = copier(x, memo)
    151     else:
    152         try:


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _deepcopy_dict(x, memo, deepcopy)
    238     memo[id(x)] = y
    239     for key, value in x.items():
--> 240         y[deepcopy(key, memo)] = deepcopy(value, memo)
    241     return y
    242 d[dict] = _deepcopy_dict


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    148     copier = _deepcopy_dispatch.get(cls)
    149     if copier:
--> 150         y = copier(x, memo)
    151     else:
    152         try:


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _deepcopy_tuple(x, memo, deepcopy)
    218 
    219 def _deepcopy_tuple(x, memo, deepcopy=deepcopy):
--> 220     y = [deepcopy(a, memo) for a in x]
    221     # We're not going to put the tuple in the memo, but it's still important we
    222     # check for it, in case the tuple contains recursive mutable structures.


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in <listcomp>(.0)
    218 
    219 def _deepcopy_tuple(x, memo, deepcopy=deepcopy):
--> 220     y = [deepcopy(a, memo) for a in x]
    221     # We're not going to put the tuple in the memo, but it's still important we
    222     # check for it, in case the tuple contains recursive mutable structures.


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    178                     y = x
    179                 else:
--> 180                     y = _reconstruct(x, memo, *rv)
    181 
    182     # If is its own copy, don't memoize.


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    278     if state is not None:
    279         if deep:
--> 280             state = deepcopy(state, memo)
    281         if hasattr(y, '__setstate__'):
    282             y.__setstate__(state)


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    148     copier = _deepcopy_dispatch.get(cls)
    149     if copier:
--> 150         y = copier(x, memo)
    151     else:
    152         try:


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _deepcopy_dict(x, memo, deepcopy)
    238     memo[id(x)] = y
    239     for key, value in x.items():
--> 240         y[deepcopy(key, memo)] = deepcopy(value, memo)
    241     return y
    242 d[dict] = _deepcopy_dict


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    178                     y = x
    179                 else:
--> 180                     y = _reconstruct(x, memo, *rv)
    181 
    182     # If is its own copy, don't memoize.


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    278     if state is not None:
    279         if deep:
--> 280             state = deepcopy(state, memo)
    281         if hasattr(y, '__setstate__'):
    282             y.__setstate__(state)


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    148     copier = _deepcopy_dispatch.get(cls)
    149     if copier:
--> 150         y = copier(x, memo)
    151     else:
    152         try:


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _deepcopy_dict(x, memo, deepcopy)
    238     memo[id(x)] = y
    239     for key, value in x.items():
--> 240         y[deepcopy(key, memo)] = deepcopy(value, memo)
    241     return y
    242 d[dict] = _deepcopy_dict


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
    178                     y = x
    179                 else:
--> 180                     y = _reconstruct(x, memo, *rv)
    181 
    182     # If is its own copy, don't memoize.


~/.pyenv/versions/3.7.0/lib/python3.7/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    305                 key = deepcopy(key, memo)
    306                 value = deepcopy(value, memo)
--> 307                 y[key] = value
    308         else:
    309             for key, value in dictiter:


~/c/anndata/anndata/base.py in __setitem__(self, idx, value)
    382         else:
    383             adata_view, attr_name = self._view_args
--> 384             _init_actual_AnnData(adata_view)
    385             getattr(adata_view, attr_name)[idx] = value
    386 


~/c/anndata/anndata/base.py in _init_actual_AnnData(adata_view)
    368 
    369 def _init_actual_AnnData(adata_view):
--> 370     if adata_view.isbacked:
    371         raise ValueError(
    372             'You cannot modify elements of an AnnData view, '


~/c/anndata/anndata/base.py in isbacked(self)
   1194     def isbacked(self) -> bool:
   1195         """``True`` if object is backed on disk, ``False`` otherwise."""
-> 1196         return self.filename is not None
   1197 
   1198     @property


~/c/anndata/anndata/base.py in filename(self)
   1210           want to copy the previous file, use ``copy(filename='new_filename')``.
   1211         """
-> 1212         return self.file.filename
   1213 
   1214     @filename.setter


AttributeError: 'AnnData' object has no attribute 'file'

Reasons for the AttributeError

Three things combined to cause a slice of a slice of a slice to crash:

Views can't be deepcopy'd:

  • __setitem__ calls _init_actual_AnnData calls isbackedfilenamefile
  • file isn't set while deepcopy is iterating through KVs calling __setitem__.

I considered just checking whether file was set yet when accessing filename, but this is not a very robust solution. This deepcopy problem was causing other issues (for example, uns on a slice of a slice was erroneously empty), and was liable to affect lots of fields.

_uns gets deepcopy'd when making a view

code:

# hackish solution here, no copy should be necessary
uns_new = deepcopy(self._adata_ref._uns)

On views, _uns holds a reference to its enclosing AnnData

code:

self._uns = DictView(uns_new, view_args=(self, 'uns'))

Fix

I resolved this problem by overriding _ViewMixin.__deepcopy__ to make a deepcopy of [the view's parent AnnData]'s copy of the field in question.

I think this should make things strictly more robust (as opposed to this being a hack to just make things work), but if it may break other assumptions lmk.

@codecov
Copy link

codecov bot commented Apr 10, 2019

Codecov Report

Merging #133 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
+ Coverage   73.57%   73.61%   +0.03%     
==========================================
  Files          11       11              
  Lines        2006     2009       +3     
==========================================
+ Hits         1476     1479       +3     
  Misses        530      530
Impacted Files Coverage Δ
anndata/base.py 75.92% <100%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34f4eb6...a1ae0a4. Read the comment docs.

@falexwolf
Copy link
Member

Interesting and thank you! It should definitely make the current way of dealing with .uns in a view much better. At some point, we should have a way of dealing with uns without doing a deep copy, but then this change will also be of no harm...

Thanks again!

@falexwolf
Copy link
Member

PS: We could drop the deepcopy if the elements of uns that are sliced got there own slot. Similar to loom. Something like .obss for .obs_sparse or similar.

@ryan-williams
Copy link
Contributor Author

Is this ready to go?

I don't totally follow the suggestion about dropping the deepcopy, but I thought it was meant as a possible follow-on idea.

@falexwolf falexwolf merged commit fc7f3cb into scverse:master Apr 30, 2019
@falexwolf
Copy link
Member

Oh sorry, I wanted to merge this ages ago... Thank you! 😄

@ryan-williams ryan-williams deleted the uns branch May 1, 2019 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'AnnData' object has no attribute 'file' on repeated subsetting.
2 participants