Documentation about multi_index and query #347

michael-imbeault · 2020-06-23T16:21:49Z

I can't find mentions of multi_index nor for the query() method in the official docs - been using multi_index but it is outputting a lot more information that I need (about positions in the array, then the values themselves). Is there a parameter to output just a list of results containing only values following the order of the slices? And what is the purpose of .query, is there any more to it than just another way to read results instead of using A[:] ?

Is there any efficiency benefit at using multi_index vs looping through intervals using query or the normal [] read mode?

ihnorton · 2020-06-23T17:47:53Z

Hi @michael-imbeault, I will be taking a pass through the API docs this week to add some missing items, as well as fix a rendering issue preventing some docstrings from displaying. We also have documentation of multi_index specifically at https://docs.tiledb.com/main/api-usage/reading-arrays/multi-range-subarrays

Here is a summary for multi_index and query:

multi_index:

supports multiple sub-range queries per dimension and returns the cross-product of the specified ranges. Here is an example from the doc link above:

# slice subarrays [1,2]x[1,4] and [4,4]x[1,4]
A.multi_index[ [slice(1,2), 4], 1:4 ]

to expand on this: multi_index accepts a range (start:end), slice(start,end), or a list of slice objects or scalar index. For example:

A.multi_index[ [slice(1,2), 4], [slice(3,4), slice(5,6), 8] ]`

multi_index operates over the full, inclusive domain of the array
...results are endpoint inclusive, like TileDB core -- and unlike standard python slicing (TileDB arrays may be defined with dimensions that have arbitrary float or int start/end-points, and multi_index allows to query such intervals)
... also meaning that there is no wrap around for negative indexes to access the "last" element in the array
multi_index returns result coordinates for all dimensions, as separate named arrays (corresponding to the Dimension name)

.query:

the main purpose is to allow sub-selection on attributes, by passing a list of attributes and only querying those attributes. For example, this query will only return values for a and b, excluding any other attributes

A.query(attrs=['a','b']).multi_index[...]

Is there any efficiency benefit at using multi_index vs looping through intervals using query or the normal [] read mode?

For large multi-ranged queries, there can be a significant benefit to using multi_index, because TileDB is designed to efficiently fulfill such a query even for a very large number of ranges (parallelizing operations across multiple threads; storing range bounding boxes for tiles to optimize retrieval; selectively decompressing tiles; and other optimizations).

There can be an efficiency benefit to using .query if you know that some attribute results will not be needed, because core TileDB will not retrieve data for those attributes at all, reducing i/o and memory usage, etc.

michael-imbeault · 2020-06-23T21:56:10Z

Ok that's helpful - I did find https://docs.tiledb.com/main/api-usage/reading-arrays/multi-range-subarrays but its a little barebones at the moment - no mention of either multi_index nor query in https://tiledb-inc-tiledb-py.readthedocs-hosted.com/en/stable/python-api.html.

I'll be using multi_index - my initial expectation was that it would return a list of numpy arrays corresponding to the slices, not a dict with a single array encompassing all the slices I have to parse using the coordinate arrays. Is there plans to include a simple, already parsed output? The current way make sense for sparse arrays but seems suboptimal for dense arrays - creating those (potentially very large) coord arrays and keeping them in memory seems wasteful for some use cases.

- Don't return coords for dense multi_index by default (#347) Fix and test coords exclusion for sparse array queries

- Don't return coords for dense multi_index by default (#347) - Fix and test coords exclusion for sparse array queries

michael-imbeault closed this as completed Jun 23, 2020

michael-imbeault reopened this Jun 23, 2020

ihnorton self-assigned this Jun 25, 2020

ihnorton added a commit that referenced this issue Jun 26, 2020

Improve multi_index query/coords behavior

a9014bb

- Don't return coords for dense multi_index by default (#347) Fix and test coords exclusion for sparse array queries

ihnorton added a commit that referenced this issue Jun 26, 2020

Improve multi_index query/coords behavior

ccc6bca

- Don't return coords for dense multi_index by default (#347) - Fix and test coords exclusion for sparse array queries

ihnorton mentioned this issue Jun 26, 2020

Improve multi_index query/coords behavior #353

Merged

ihnorton added a commit that referenced this issue Jun 26, 2020

Improve multi_index query/coords behavior

58c96a9

- Don't return coords for dense multi_index by default (#347) - Fix and test coords exclusion for sparse array queries

antalakas pushed a commit that referenced this issue Jul 6, 2020

Improve multi_index query/coords behavior

2017ad9

- Don't return coords for dense multi_index by default (#347) - Fix and test coords exclusion for sparse array queries

ihnorton added this to the 0.6.6 milestone Jul 21, 2020

ihnorton mentioned this issue Jul 21, 2020

Docs update: clarify, add missing functions, misc fixes #368

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation about multi_index and query #347

Documentation about multi_index and query #347

michael-imbeault commented Jun 23, 2020

ihnorton commented Jun 23, 2020

michael-imbeault commented Jun 23, 2020

Documentation about multi_index and query #347

Documentation about multi_index and query #347

Comments

michael-imbeault commented Jun 23, 2020

ihnorton commented Jun 23, 2020

michael-imbeault commented Jun 23, 2020