Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat errors with one dimensional arrays in obsm #652

Open
ivirshup opened this issue Nov 29, 2021 · 3 comments
Open

concat errors with one dimensional arrays in obsm #652

ivirshup opened this issue Nov 29, 2021 · 3 comments

Comments

@ivirshup
Copy link
Member

Seen in comment by @Hrovatin in #501 (comment)_

import anndata as ad, numpy as np

foo = ad.AnnData(np.ones((5, 20), obsm={"1d-array": np.ones(5)})

ad.concat([foo, foo])
Traceback
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/bd/43q20k0n6z15tdfzxvd22r7c0000gn/T/ipykernel_14520/2434769132.py in <module>
----> 1 ad.concat([a, a])

~/github/anndata/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    848             [a.layers for a in adatas], axis=axis, reindexers=reindexers
    849         )
--> 850         concat_mapping = inner_concat_aligned_mapping(
    851             [getattr(a, f"{dim}m") for a in adatas], index=concat_indices
    852         )

~/github/anndata/anndata/_core/merge.py in inner_concat_aligned_mapping(mappings, reindexers, index, axis)
    455         els = [m[k] for m in mappings]
    456         if reindexers is None:
--> 457             cur_reindexers = gen_inner_reindexers(els, new_index=index, axis=axis)
    458         else:
    459             cur_reindexers = reindexers

~/github/anndata/anndata/_core/merge.py in gen_inner_reindexers(els, new_index, axis)
    476         reindexers = [Reindexer(df_indices(el), common_ind) for el in els]
    477     else:
--> 478         min_ind = min(el.shape[alt_axis] for el in els)
    479         reindexers = [
    480             gen_reindexer(pd.RangeIndex(min_ind), pd.RangeIndex(el.shape[alt_axis]))

~/github/anndata/anndata/_core/merge.py in <genexpr>(.0)
    476         reindexers = [Reindexer(df_indices(el), common_ind) for el in els]
    477     else:
--> 478         min_ind = min(el.shape[alt_axis] for el in els)
    479         reindexers = [
    480             gen_reindexer(pd.RangeIndex(min_ind), pd.RangeIndex(el.shape[alt_axis]))

IndexError: tuple index out of range

Off the top of my head, I can't think of a reason this working would be inconsistent with the anndata model. Generally I would expect people to put 1d arrays in obs though.

Might have implications for the concatenation of awkward arrays (#647)

@Hrovatin
Copy link

Yes, it was mistake from my side to put one-d array (forgot to reshape it). But I got no warning when I made adata and then just by chance figured out what was the reason.

ivirshup added a commit that referenced this issue Feb 2, 2023
This fixed a number of tests because we had a 1d awkward array being generated, and we currently don't support 1d arrays in obsm well. Tracked in #652.
@ivirshup ivirshup added this to the 0.10.0 milestone Feb 2, 2023
ivirshup added a commit that referenced this issue Feb 7, 2023
* first attempt to support awkward arrays

* remove comments

* better comment

* add type to gen_adata

* first attempt at concat

* remove comment

* add outer concat

* add awkward to test dep

* add awk arr to data gen

* fix test base

* init test for concat

* fix concatenate tests

* create mock class for awkward array

* remove space

* import ak when needed

* relative import of awk array

* fix optional dep import

* resolve conflicts

* draft IO for akward arrays

* add awkward to docs and save form to attrs

* Update dependencies

* Update dim_len

* ignore vscode directory

* Validate that awkward arrays align to axes

* Fix reindexing during merge

* fix lint

* remove duplicate import

* Test different types of awkward arrays in different slots

* Better function to generate awkward arrays

* Better dim_len for awkward arrays

* Working out how to best check the dim_len

* Only accept awkward arrays that are "regular" in the aligned dimension

The conversion is left to the user. Explicit is better than implicit.

* Switch to v2 API

* WIP rewrite awkward array generation

* Improve awkward array generation and dim_len check

* Switch to new awkward array generation in all tests

* Fix test_transpose

* Fix/workaround more tests

* Add test for setting anndata slots to awkward arrays

* enable tests for 3d ragged array in layers

* Cleanup

* Fix that X could not be set when creating AnnData object from scratch.

Apparently the checks are quite different than when adding a Layer.

* Remove code to make awkward array regular after merge.

This is now done by the awkward array library.

* Do not explicitly copy awkward arrays

* Implement transposing awkward arrays

* Add docs stub and update type hints

* Fix: dtype not available during merge if both X are awkward

* Fix IO

* Request pre-release version of awkward

* Exclude awkward layer in loom tests

* Pull in only changes relevant to obsm/varm

* Update tests

* Fix type hints

* Update error message in algined mapping

* Use compat module to support both awkward v1.9rc and 2.x

* restructure tests

* Add tests for copies and view

* Remove unused imoport

* Fix how actual shape is computed in aligned mapping

* Attempt to support views with ak.behavior

* Use shallow copy

* Add dim_len_awkward function including tests

* Test that assigning an awkward v1 arrays fails

* Add stub for element-wise IO tests

* Restructur dim_len_awkward

* Add more test cases for awkward IO

* WIP add tests for concatenating AwkArrays with missing values

* Fix AwkwardArrayView

* Simplify awkward array view code

* Use None to remove name from awkward array

* Mark test_no_awkward_v1 as xfail for uns

* Add test for categorical arrays

* Update docs/fileformat-prose.rst

Co-authored-by: Isaac Virshup <ivirshup@gmail.com>

* Update anndata/_core/aligned_mapping.py

Co-authored-by: Isaac Virshup <ivirshup@gmail.com>

* Update anndata/tests/helpers.py

Co-authored-by: Isaac Virshup <ivirshup@gmail.com>

* Update awkward tests to use assert_equal with exact=True

* Bump required version

* Update categorical syntax, add new categorical test

* Start concat tests for awkward

* Add release notes

* Add testcases for dim_len with awkward arrays of strings

* Fix dim_len for arrays of strings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Awkward v2 fixes

Several functions changed until the stable awkward v2 version was released.

* Exclude awkward arrays from fill_value concat test

* fix flake8

* Add IO testcase for AIRR data

* Fix link

* Get inner join working for concatenation

* Bump some concatenation cases to a later PR

* Generate empty arrays for outer join

* Raise NotImplementedError when creating a view of an awkward array with custom behavior

* Add warning when setting awkward array in aligned mapping

* Get much more of concatenation 'working'

* Use warning instead of logging

* extend todo comment about views

* Fix IO, and to_memory for views of awkward arrays

* Removed a number of test cases that we're not targeting

This fixed a number of tests because we had a 1d awkward array being generated, and we currently don't support 1d arrays in obsm well. Tracked in #652.

* Implement outer indexing on axis 0 of an awkward array

* Fix gen_awkward when one of the dimensions has size 0

* Fix equality function for awkward arrays. Was throwing an error when the arrays weren't broadcastable.

* Modify outer concatenation test to accept current behaviour of awkward array

* Add tests for mixed type concatenation with awkward arrays

* Add warning about outer joins

* Call ak._util.arrays_approx_equal instead of rolling our own

* update awkward to 2.0.7 (unfortunately: errors)

* remove unnecessary checks from AwkwardArrayView

* Workaround scikit-hep/awkward#2209

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed extra layer of nesting from on-disk format for awkward arrays

---------

Co-authored-by: Gregor Sturm <mail@gregor-sturm.de>
Co-authored-by: Isaac Virshup <ivirshup@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@flying-sheep flying-sheep added Bug 🐛 and removed bug labels Jun 12, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Aug 23, 2023
@ivirshup ivirshup modified the milestones: 0.10.0, 0.11.0 Sep 5, 2023
@github-actions github-actions bot removed the stale label Sep 6, 2023
@ilan-gold ilan-gold self-assigned this Jun 3, 2024
@ilan-gold
Copy link
Contributor

@ivirshup @flying-sheep Should this work actually? Or should we start enforcing a multi-dimensional-ness for obsm? https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.obsm.html#anndata.AnnData.obsm makes it pretty clear that "Stores for each key a two or higher-dimensional" - does this mean that even something with shape [10, 1] should be disallowed?

@ivirshup ivirshup modified the milestones: 0.11.0, 0.12.0 Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants