Add `consolidated` structure family #668

danielballan · 2024-02-26T13:16:20Z

~~This builds on commits from #661 and should be merged after it.~~ [Update: #661 is in, and this has been rebased.]

Problem statement

This PR is designed to solve the same problem that the pandas BlockManager¹² solves: presenting data in a flat namespace to the user, but enabling groups of items in that namespace to be transparently backed by shared data structures, for better performance.

For example, data coming from Bluesky includes data stored directly in the Event documents and large data written externally by detectors as arrays. The data in the Event documents is a great fit for tabular storage and transfer formats (e.g. Feather, Parquet, even CSV...). The externally-written data is not; it is better stored and transferred in large N-dimensional array formats like Zarr, HDF5, or a simple C-ordered buffer.

Users focused on science would like to smooth over these details. That is, we want to store and (often) move the data like this:

data
├── table
│   ├── motor_readback
│   ├── motor_setpoint
├── image_array

But offer a way to model it to the user in a flat namespace:

data
├── motor_readback
├── motor_setpoint
├── image_array

When writing (especially appending) the client will want to use the former view, so both views need to be available.

Solution

This PR adds a new structure family, union. The name is inspired by AwkwardArray UnionForm. It holds a heterogenous mixture of structures (e.g. tables and arrays). It enables the columns of the table and the arrays to be explored from a flat namespace. Name collisions are forbidden. But it also describes the underlying structures individually, enabling them to be read or written separately.

To the user, this behaves much like a Container structure, would:

In [2]: c['x']
Out[2]: <UnionClient {'A', 'B', 'C'}>

I can, for example, access fields by key and download data:

In [3]: c['x']['A']
Out[3]: <ArrayClient shape=(3,) chunks=((3,),) dtype=int64>

In [4]: c['x']['A'][:]
Out[4]: array([1, 2, 3])

In [5]: c['x']['C']
Out[5]: <ArrayClient shape=(5, 5) chunks=((5,), (5,)) dtype=float64>

In [6]: c['x']['C'][:]
Out[6]: 
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

Digging a little deeper, we can see a difference from Containers. The union shows that A and B are backed by a table (coincidentally named "table", could be anything) while C is standalone array.

In [8]: c['x'].contents  # The name `contents` is up for discussion...
Out[8]: <UnionContents {'table', 'C'}>

In [9]: c['x'].contents['table']
Out[9]: <DataFrameClient ['A', 'B']>

In [10]: c['x'].contents['C']
Out[10]: <ArrayClient shape=(5, 5) chunks=((5,), (5,)) dtype=float64>

In [11]: c['x'].contents['table'].read()
Out[11]: 
   A  B
0  1  4
1  2  5
2  3  6

In [12]: c['x'].contents['C'].read()
Out[12]: 
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

The structure of the union node reveals more detail; expand to view:

In [16]: from dataclasses import asdict

In [17]: asdict(c['x'].structure())
Out[17]: 
{'contents': [{'data_source_id': 1,
   'structure_family': 'table',
   'structure': {'arrow_schema': 'data:application/vnd.apache.arrow.file;base64,/////+gCAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAEACAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAAAkCAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDMsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiQSIsICJmaWVsZF9uYW1lIjogIkEiLCAicGFuZGFzX3R5cGUiOiAiaW50NjQiLCAibnVtcHlfdHlwZSI6ICJpbnQ2NCIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiQiIsICJmaWVsZF9uYW1lIjogIkIiLCAicGFuZGFzX3R5cGUiOiAiaW50NjQiLCAibnVtcHlfdHlwZSI6ICJpbnQ2NCIsICJtZXRhZGF0YSI6IG51bGx9XSwgImNyZWF0b3IiOiB7ImxpYnJhcnkiOiAicHlhcnJvdyIsICJ2ZXJzaW9uIjogIjE0LjAuMiJ9LCAicGFuZGFzX3ZlcnNpb24iOiAiMi4wLjMifQAAAAIAAABEAAAABAAAANT///8AAAECEAAAABQAAAAEAAAAAAAAAAEAAABCAAAAxP///wAAAAFAAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAECEAAAABwAAAAEAAAAAAAAAAEAAABBAAAACAAMAAgABwAIAAAAAAAAAUAAAAA=',
    'npartitions': 1,
    'columns': ['A', 'B'],
    'resizable': False},
   'name': 'table'},
  {'data_source_id': 2,
   'structure_family': 'array',
   'structure': {'data_type': {'endianness': 'little',
     'kind': 'f',
     'itemsize': 8},
    'chunks': [[5], [5]],
    'shape': [5, 5],
    'dims': None,
    'resizable': False},
   'name': 'C'}],
 'all_keys': ['A', 'B', 'C']}

Unlike container, the union structure always describes its full contents inline. It does not support paginating through its contents. It is not designed to scale beyond ~1000 fields.

This script shows how the union was constructed. Code like this will rarely be user-facing; envision it wrapped in a utility that consumes Bluesky documents and writes and registers the relevant data into Tiled.

import numpy
import pandas

from tiled.client import from_profile
from tiled.structures.array import ArrayStructure
from tiled.structures.data_source import DataSource
from tiled.structures.table import TableStructure

c = from_profile("local", api_key="secret")

df = pandas.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
arr = numpy.ones((5, 5))

s1 = TableStructure.from_pandas(df)
s2 = ArrayStructure.from_array(arr)
x = c.create_union(
    [
        DataSource(structure_family="table", structure=s1, name="table"),
        DataSource(structure_family="array", structure=s2, name="C"),
    ],
    key="x",
)
x.contents["table"].write(df)
x.contents["C"].write(arr)

The requests look like:

INFO:     127.0.0.1:59404 - "POST /api/v1/metadata/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:59404 - "PUT /api/v1/table/full/x?data_source=table HTTP/1.1" 200 OK
INFO:     127.0.0.1:59404 - "PUT /api/v1/array/full/x?data_source=C HTTP/1.1" 200 OK

The query parameter ?data_source={name} is used to address a specific component backing the node.

Review of abstraction levels

Everything is just a node and we blissfully ignore anything about data sources.

c['x']['A']

We look at data source names denoting how fields are grouped in the underlying storage, but we still ignore everything about storage formats and other storage details.

c['x'].contents
c['x'].structure()

We look at low-level storage details, encoded in DataSource and Asset.

c['x'].data_sources()

To Do

Implement read() on UnionClient (i.e. c['x']) itself, which could pull each data source in turn and return an xarray.Dataset.
Implement GET /union/full/{path} to enable bulk download. This will work similar to container export.

padraic-shafer · 2024-02-26T14:48:11Z

I like the features of this PR.

I wonder whether we can consolidate some naming consistency around the behaviors of:

contents: c['x'].contents returns the names of the Union's data_sources
data_sources(): c['x'].data_sources() returns file-level details of the data_sources backing the Union's members
data_source: URL parameter containing the name of a Union member data_source, as in PUT /api/v1/table/full/x?data_source=table

On the other hand, this might already be as simple as it gets, and I just need a minute to get comfortable with the usage. :)

padraic-shafer · 2024-02-26T14:49:40Z

contents: c['x'].contents returns the names of the Union's data_sources

This is perhaps a bit too reductive a statement, as really this contains structure and named sources and can iterate through the members.

padraic-shafer · 2024-02-26T15:19:24Z

Even though the contents map 1:1 to a data_source, it might be cleaner to not use them interchangeably. I will refer to the contents here as parts for brevity and to avoid possible confusion with Container. I'm not attached to the name parts.

Then the above could be used something like this...maybe?

client['x'].parts   # <UnionParts {'table', 'C'}>
client['x'].data_sources()   # low-level storage details, encoded in DataSource and Asset
client['x']['C'].data_sources()   # one-element list, or would data_source() be better?

PUT /api/v1/array/full/x?part=C ...BODY  # Refer to the union member, rather than its data source

danielballan · 2024-02-26T16:30:43Z

I like that suggestion very well, and I like the name part.

In the future (months away) I hazily foresee enabling Tiled to track multiple versions of data:

replicas stored "on prem" or in the cloud
copies with different file formats and/or chunking to make a range of use cases fast

This is why data_sources() is a one-element list, in anticipation of there being more than one someday, and wanting to leave room for that.

But, this also underlines why separate part and data_source could be important: they happen to be 1:1 today but may not always be.

danielballan · 2024-02-26T17:19:01Z

I also like that giving a distinct name to this concept helps clarify which abstraction level you are operating at. Referring to a part moves you from (1) to (2) but until you mention a data_source you have not crossed into (3).

danielballan · 2024-02-27T18:27:51Z

Rebased on main after merging #661. The renaming of data_source to part, in the places discussed has been done.

The to-dos...

Implement read() on UnionClient (i.e. c['x']) itself, which could pull each data source in turn and return an xarray.Dataset.
Implement GET /union/full/{path} to enable bulk download. This will work similar to container export.

I would at least like the validate this branch by connecting it to be a Bluesky document stream before merging.

are, I believe, strictly additive and could be done in separate PRs or in this PR.

padraic-shafer

This is going to be very useful.

I've noted a few comments/questions. Additionally, should this 404 in PUT /awkward/full be handled by passing StructureFamily.awkward to SecureEntry, as you've updated for the other routes?

tiled/tiled/server/router.py

Lines 1328 to 1338 in a407021

    
           @router.put("/awkward/full/{path:path}") 
        
           async def put_awkward_full( 
        
               request: Request, 
        
               entry=SecureEntry(scopes=["write:data"]), 
        
               deserialization_registry=Depends(get_deserialization_registry), 
        
           ): 
        
               body = await request.body() 
        
               if entry.structure_family != StructureFamily.awkward: 
        
                   raise HTTPException( 
        
                       status_code=404, detail="This route is not applicable to this node." 
        
                   )

tiled/catalog/adapter.py

tiled/server/links.py

dylanmcreynolds

Just a random thought on naming. What you're building sounds very close to a view in SQL terminology...something that acts like a flat table but is backed by querying a subset of fields one or more joined tables.

dylanmcreynolds

And to expand on that thought...would it be useful to have a view rather than a union? If I know ahead of time that there will be cases during data analysis that will require 100 of the 1000 fields avaialble, maybe I could define the view with just those fields, and when pulling them out of tiled avoid marshalling the 900 unused fields.

danielballan · 2024-03-03T20:16:13Z

Rethinking this in terms of a "view" is very interesting. Off the top of my head, I like that:

A "view" is a widely-recognized concept and may require less explanation.
This seems like it might address a separate issue noticed by @genematx, that in the current data model data and timestamps are allowed to be uneven in length. (And, in fact, in the middle of an update, they always will be.) If there were one table with data and timestamps "views" into subsets of it, that would fix the problem.

danielballan · 2024-03-03T20:28:55Z

For example, maybe this is what a event stream could look like. Notice that:

The layout and URLs are backward compatible with what we have been doing. It simply adds a new key, __name_me__ (needs a good name...).
It ensures that data and timestamps are the same length because they are views on the same table.
It exposes the "real" structures to the client, but gives a flattened view of it too.

primary
├── data  # view
│   ├── time
│   ├── motor_readback
│   ├── motor_setpoint
│   └── image_array
├── timestamps  # view
│   ├── time
│   ├── motor_readback
│   ├── motor_setpoint
│   └── image_array
├── __name_me__
│   ├── event_table  # values stream inline in Event documents
│   │   ├── time
│   │   ├── data_motor_readback
│   │   ├── data_motor_setpoint
│   │   ├── timestamps_motor_readback
│   │   └──  timestamps_motor_setpoint
│   └── image_array  # externally-written array data

danielballan · 2024-03-03T21:16:24Z

Either path we take, union or view, would be speculative. We have to add this and really try it to understand how it flies. There is some risk either way.

View, in addition to being a more widely recognized concept, could solve a broader category of problems for us. BlueskyRun is very nested, and this can mean a lot of clicks in the UI. I can imagine a view (or views) of the good parts, flat. Views could be combined with containers to create a nested-but-not-THAT-nested structure. (Dare I call it a "projection"?)

I think we should keep the specification light and focused on current requirements, but I can see a lot of useful scope in this direction. Maybe I’ll start by opening a parallel PR for comparison that builds on this branch but refactors union into view.

danielballan · 2024-03-03T21:29:18Z

This is why we keep @dylanmcreynolds around. :-)

dylanmcreynolds · 2024-03-03T21:31:35Z

Could views be a better place to put Databroker's projections and projectors? To sum up, projection are a way add a mapping to the start document. projectors are a python function that take a run, its projection and returns an xarray with datafields mapped as specified in the projection. The major idea was to have a simple way to create multiple views from the same run. One projection could be customized for a user interface, another could be customized for a particular ontology (like nexus).

If we took this view idea even further, the definition of the view could also include information about mapping to a desired ontology.

padraic-shafer · 2024-03-03T22:00:34Z

tiled/catalog/adapter.py

+                if (catalog_adapter.structure_family == StructureFamily.union) and len(
+                    segments[i:]
+                ) == 1:
+                    # All the segments but the final segment, segments[-1], resolves


Suggested change

# All the segments but the final segment, segments[-1], resolves

# All segments except the final segment, segments[-1],

padraic-shafer · 2024-03-03T22:31:50Z

I think we should keep the specification light and focused on current requirements,

That sounds prudent.

... but I can see a lot of useful scope in this direction.

Do you envision that views might evolve to include keys from other nodes (ancestors, siblings, 2nd-cousin-once-removed), or is that something that should be firmly disallowed? I can imagine complications arising from access policy as well as latency/timeouts from trying to include too many keys.

danielballan · 2024-03-03T22:41:53Z

I think there will be significant pressure to enable views that reach anywhere across the tree:

Mix raw data and analyzed
Experiment with alternative views outside of the main tree without making it "noisy"
Probably more….

The specification of a union node involves listing data sources directly. If the specification of a view involves instead referencing other nodes, I think that access control is manageable, and the scaling can be managed if we enforce reasonable limits on the number of nodes allowed in one view.

padraic-shafer · 2024-03-03T22:51:11Z

It ensures that data and timestamps are the same length because they are views on the same table.

More generally, would merged views only work if all parts have the same length (number of rows)?...and if so would it enforce that by:

rejecting data sources that don't meet that condition? --OR--
returning a table with number of rows equals to the shortest part (like python's zip())? --OR--
returning a table with number of rows equals to the longest part (like python's itertools.zip_longest())? Filler value could be None, numpy.nan, "", or probably a user-supplied value --OR--
any of the above, depending on a query parameter passed by the caller?

Or should views require a key to join upon, using merge behavior such as LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, INNER JOIN?

padraic-shafer · 2024-03-03T23:18:45Z

Could views be a better place to put Databroker's projections and projectors?

So rather than injecting that info into the run documents, you're suggesting to instead let tiled handle that when the data gets accessed? That makes a lot of sense.

Run documents would be more "pure"--less coupled to how someone thought they should be viewed when they were recorded. When new views get dreamed up, they could be added to the tiled server config--restarting the server (or maybe registering the new view with the catalog) would allow that view to be applied to all data new and old.

dylanmcreynolds · 2024-03-03T23:58:50Z

So rather than injecting that info into the run documents, you're suggesting to instead let tiled handle that when the data gets accessed? That makes a lot of sense.

Maybe. The projections schema was added to the run start document so that they could be the default projection for a particular run. If a newer version were available, the projector code could use it if asked to.

But I only know of one case where projection/projectors were used since they were developed four years ago. Maybe that's a sign? Perhaps the issue is they weren't needed much, perhaps they weren't advertised well, or perhaps the mechanism was too complicated.

I think there will be significant pressure to enable views that reach anywhere across the tree.

That's an interesting thought. I feel like that if it kept the scope in check, I'd be happy to say that a view was limited to objects of the same row/timestamp. We could call it RowView and if we decided we need something more flexible in the future, come up with a SuperView with extra powers?

danielballan · 2024-03-04T00:02:39Z

To fit our present use case, we would need a view to look a lot like union, mixing tables and arrays in one namespace. There would be [edit: NO] special guarantees about the relation between the items in the namespace, nothing about their length or how to join them. (It goes without saying that the constitutive tables would each internally have the normal length guarantee that it is made of whole rows.)

I think the change from union is, the parts in a view would have their own canonical locations in the tree, as normal nodes. A view becomes an additional place to get (mixtures of…) nodes. Each views look a lot like a union would have, but their parts are pointers to first-class nodes, not to captive data sources. This enables us to separately build multiple views on the same data. And it avoids placing the canonical data in a weird structure that would require explanation (union).

As far as "projections" goes, I like that this presents clients with the result rather than instructions (in a novel descriptive language…) for rearranging a structure on the client side.

Yes, one can imagine constructing views dynamically through special adapters added to the server config—I think this is what @padraic-shafer’s last message envisions. For our present requirements I would start, though, by building them as static entities. The client specifies, via a HTTP request, "Add a view node to the database that combines array node X and columns A, B from table node Y into one namespace and present it to clients.

padraic-shafer · 2024-03-04T00:22:09Z

The client specifies, via a HTTP request, "Add a view node to the database that combines array node X and columns A, B from table node Y into one namespace and present it to clients.

I think this Union PR didn't yet add the capability to read the combined result at once, right? So what do we think should happen when the outer dimension of array X differs from the length of table Y? (or equivalently when event_table has more rows than image_array has images in the earlier example?)

I might be quibbling about edge cases. But I wonder about how 'edge'y these cases are. We could of course enforce this when the view node (meaning the outer node, not the projection node) is created, and then wait to see if it runs into issues during testing.

danielballan · 2024-03-04T00:23:30Z

That's an interesting thought. I feel like that if it kept the scope in check, I'd be happy to say that a view was limited to objects of the same row/timestamp. We could call it RowView and if we decided we need something more flexible in the future, come up with a SuperView with extra powers?

For our present requirements we can keep this pretty limited, and spin out further discussion on whether and how to expand it once we have something to play with.

Misc cleanup-prep changes from bluesky#668

Sort enum members. Move structure_family to the end, matching migration result. Add union structure. Sort Refactor get_adapter to accept optional data_source_id. Creating a union node works. Return correct union structure. Forgot to commit modules Validate data source consistency. Test mixing tables and arrays. Writing a table into a union node works. GET with '?data_source=<name>' works. Use name in filepath, instead of random hex. Expose list of all keys in structure. Only set include_data_sources param if not default (false). Refactor link-writing into separate module. Writing and reading tables works Writing and reading arrays works. Implement single-key access. Only specify include_data_sources if not default. Rename contents -> parts. Clarify precedence Co-authored-by: Padraic Shafer <76011594+padraic-shafer@users.noreply.github.com> Copyedit comment Co-authored-by: Padraic Shafer <76011594+padraic-shafer@users.noreply.github.com> Finish consolidating structure family check Add comment

FIX: pass all query params as kwargs

danielballan · 2024-12-12T16:56:07Z

TO DO:

Rename union to consolidated
More tests
Update docs, especially structure.md

genematx · 2024-12-12T23:01:00Z

tiled/client/container.py

@@ -653,7 +657,7 @@ def new(
            item["attributes"]["metadata"] = document.pop("metadata")
        # Ditto for structure
        if "structure" in document:
-            item["attributes"]["structure"] = STRUCTURE_TYPES[structure_family](
+            structure = STRUCTURE_TYPES[structure_family].from_json(
                document.pop("structure")
            )



Should we also check (structure_family != StructureFamily.consolidated) in line 670 below?

Good catch. Actually, I think we should do the opposite: remove the special-case for StructureFamily.container. That was left from when self._structure for containers was sometimes None. This PR adds a ContainerStructure class, so now all client objects have a structure.

Rename and Refactor Consolidated Structure

danielballan · 2024-12-17T19:01:26Z

We are reopening the naming discussion for this. We considered union but that was determined to be actively confusing. We then considered consolidated, but have started to question whether the collision with Bluesky's "Consolidator" is unhelpful. (They are related but not the same concept.) @genematx recently suggested:

fusion
amalgam
composite

The first two are a bit to jargon-y for my taste, but I think composite is common-enough word and an accurate description of what this is. Anyone else want to weigh in ?

padraic-shafer · 2024-12-17T21:05:09Z

composite is my top choice.

genematx · 2024-12-17T22:39:53Z

After further discussion with @danielballan, we realized that registering all tables and arrays as separate Tiled nodes under their parent container (as done currently) has advantages that could be valuable for the new consolidated object as well. Importantly, this would allow us to reuse all existing client infrastructure for writing and manipulating the data and address their individual data sources more easily (instead of assigning all data sources to the "consolidated" parent node). The new consolidated/composite/flattened object will define and use its own structure, but in most cases can be thought of as a special kind of container (either directly subclassed or not -- TBD) and implement most of the methods found in the container API (e.g. for write_array).

The distinguishing characteristics of the new structure from the usual container would be:

its awareness of the flattened namespace, which includes all columns from all tables + all array names -- this will be returned by the /search endpoint
the ability to address columns and array directly in the path, e.g. c['x']['col1'] -- /x/col1 or c['x']['arr1'] -- /x/arr1, assuming c['x'] is the new Consolidated client instance and 'col1' is a column in one of its children tables. This is the expected way the data can be accessed by clients agnostic of the internal container structure (i.e. which table contains column 'col1').
support for accessing entire tables (and arrays) as 'parts' of the structure through query parameters, e.g. /x?part=table1 or /x?part=arr1; this is a more efficient way for writing/appending data.
certain restrictions on the children (to ensure no name conflicts)
possibly have restrictions of the size of the namespace (number of columns and arrays) and allow pagination or not -- TBD

@genematx will investigate this option.

danielballan · 2024-12-18T16:30:47Z

Notes from discussion with @genematx and @tacaswell

/x/a
/x/b
/x/my_img

/x?part=my_img
/x?part=my_table

# Add a child to the consolidated structure x
POST /metadata/x?part=my_table

# Update metadata on a child node to the consolidated structure
PUT|PATCH /metadata/x?part=my_table

# Append rows (e.g. Bluesky events) to a table
PUT|PATCH /table/partition/x?part=my_table&partition=0

'https://.../api/v1/metadata/x?part=a'  # array in consolidated

>>> c.create_container('x').write_array([1,2,3], key='a').uri
'https://.../api/v1/metadata/x/a'  # array in container 

>>> c.create_consolidated('y').write_array([1,2,3], key='a').uri
'https://.../api/v1/metadata/y?part=a'  # array in consolidated

>>> c.create_container('x').write_dataframe({'j': [1,2,3]}, key='b').uri
'https://.../api/v1/metadata/x/b'  # table in container 

>>> c.create_consolidated('y').write_dataframe({'j': [1,2,3]}, key='b').uri
'https://.../api/v1/metadata/y?part=b'  # table in consolidated

>>> c['y']['j']
'https://.../api/v1/metadata/y/j'  # array (column) in consolidated

>>> c['y'].parts['b']
'https://.../api/v1/metadata/y?part=b'  # table in consolidated

'https://.../api/v1/metadata/y/b'  # 404!

>>> c['y'].structure()
{
    "flat_keys": ["a", "b", "img"],
    "contents": [ {"id": "...", "metadata": ..., "specs": ..., "structure": ...} ]
}

danielballan requested review from padraic-shafer, tacaswell and dylanmcreynolds February 26, 2024 13:17

danielballan force-pushed the union branch 2 times, most recently from 7910421 to a407021 Compare February 27, 2024 18:24

padraic-shafer requested changes Mar 2, 2024

View reviewed changes

tiled/catalog/adapter.py Outdated Show resolved Hide resolved

tiled/catalog/adapter.py Show resolved Hide resolved

tiled/catalog/adapter.py Outdated Show resolved Hide resolved

tiled/server/links.py Outdated Show resolved Hide resolved

tiled/server/links.py Outdated Show resolved Hide resolved

dylanmcreynolds reviewed Mar 3, 2024

View reviewed changes

padraic-shafer reviewed Mar 3, 2024

View reviewed changes

danielballan mentioned this pull request Mar 8, 2024

Refactors pulled from #668 #686

Merged

danielballan added a commit to danielballan/tiled that referenced this pull request Mar 12, 2024

Merge pull request #3 from padraic-shafer/cleanup-prep

e6d378d

Misc cleanup-prep changes from bluesky#668

danielballan added 4 commits December 10, 2024 18:37

Fix mismatch (from rebase, likely).

2ad9f3a

Make grouping clear

a1a027a

TMP Fix usage, but this needs re-examined.

6ed1167

danielballan force-pushed the union branch from 034c878 to 6ed1167 Compare December 12, 2024 15:25

genematx and others added 4 commits December 12, 2024 11:01

FIX: pass all query params as kwargs

63ddea6

Merge pull request #4 from genematx/consolidated-structure

210dc65

FIX: pass all query params as kwargs

Container may have structure (inlined contents).

eb0c9f2

Set all_keys correctly.

9dc3a1f

genematx added 5 commits December 12, 2024 15:25

MNT: rename UnionStructure to ConsolidatedStructure

16c79d2

MNT: rename UnionStructure to ConsolidatedStructure

af06a43

MNT: rename CatalogUnionAdapter and UnionLinks

d8ef72c

MNT: typing and lint

2380800

MNT: typing and lint

b76e8fd

genematx reviewed Dec 12, 2024

View reviewed changes

genematx added 8 commits December 13, 2024 10:27

ENH: refactor creation of ConsolidatedStructure as a classmethod

1fcc9dc

ENH: allow iterating over ConsolidatedClient and its parts

bdbb965

DOC: add Consolidated Structure to the docs

275f42c

MNT: remove dims from the Container client signature

bb8fcaf

TST: add tests for writing/reading consolidated structures

f3ec5f5

MNT: lint

7e74997

MNT: typing

04efd91

FIX: reading string-dtype columns from dataframes individually

c7ee962

danielballan changed the title ~~Add union structure family~~ Add consolidated structure family Dec 14, 2024

Merge pull request #5 from genematx/consolidated-structure

2548cea

Rename and Refactor Consolidated Structure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `consolidated` structure family #668

Add `consolidated` structure family #668

danielballan commented Feb 26, 2024 •

edited

Loading

padraic-shafer commented Feb 26, 2024

padraic-shafer commented Feb 26, 2024

padraic-shafer commented Feb 26, 2024

danielballan commented Feb 26, 2024 •

edited

Loading

danielballan commented Feb 26, 2024

danielballan commented Feb 27, 2024 •

edited

Loading

padraic-shafer left a comment

dylanmcreynolds left a comment

dylanmcreynolds left a comment •

edited

Loading

danielballan commented Mar 3, 2024

danielballan commented Mar 3, 2024

danielballan commented Mar 3, 2024

danielballan commented Mar 3, 2024

dylanmcreynolds commented Mar 3, 2024

padraic-shafer Mar 3, 2024 •

edited

Loading

padraic-shafer commented Mar 3, 2024

danielballan commented Mar 3, 2024 •

edited

Loading

padraic-shafer commented Mar 3, 2024 •

edited

Loading

padraic-shafer commented Mar 3, 2024

dylanmcreynolds commented Mar 3, 2024

danielballan commented Mar 4, 2024 •

edited

Loading

padraic-shafer commented Mar 4, 2024

danielballan commented Mar 4, 2024

danielballan commented Dec 12, 2024

genematx Dec 12, 2024

danielballan Dec 13, 2024

danielballan commented Dec 17, 2024

padraic-shafer commented Dec 17, 2024

genematx commented Dec 17, 2024 •

edited

Loading

danielballan commented Dec 18, 2024

	@router.put("/awkward/full/{path:path}")
	async def put_awkward_full(
	request: Request,
	entry=SecureEntry(scopes=["write:data"]),
	deserialization_registry=Depends(get_deserialization_registry),
	):
	body = await request.body()
	if entry.structure_family != StructureFamily.awkward:
	raise HTTPException(
	status_code=404, detail="This route is not applicable to this node."
	)

	# All the segments but the final segment, segments[-1], resolves
	# All segments except the final segment, segments[-1],

Add consolidated structure family #668

Are you sure you want to change the base?

Add consolidated structure family #668

Conversation

danielballan commented Feb 26, 2024 • edited Loading

Problem statement

Solution

Review of abstraction levels

To Do

Footnotes

padraic-shafer commented Feb 26, 2024

padraic-shafer commented Feb 26, 2024

padraic-shafer commented Feb 26, 2024

danielballan commented Feb 26, 2024 • edited Loading

danielballan commented Feb 26, 2024

danielballan commented Feb 27, 2024 • edited Loading

padraic-shafer left a comment

Choose a reason for hiding this comment

dylanmcreynolds left a comment

Choose a reason for hiding this comment

dylanmcreynolds left a comment • edited Loading

Choose a reason for hiding this comment

danielballan commented Mar 3, 2024

danielballan commented Mar 3, 2024

danielballan commented Mar 3, 2024

danielballan commented Mar 3, 2024

dylanmcreynolds commented Mar 3, 2024

padraic-shafer Mar 3, 2024 • edited Loading

Choose a reason for hiding this comment

padraic-shafer commented Mar 3, 2024

danielballan commented Mar 3, 2024 • edited Loading

padraic-shafer commented Mar 3, 2024 • edited Loading

padraic-shafer commented Mar 3, 2024

dylanmcreynolds commented Mar 3, 2024

danielballan commented Mar 4, 2024 • edited Loading

padraic-shafer commented Mar 4, 2024

danielballan commented Mar 4, 2024

danielballan commented Dec 12, 2024

genematx Dec 12, 2024

Choose a reason for hiding this comment

danielballan Dec 13, 2024

Choose a reason for hiding this comment

danielballan commented Dec 17, 2024

padraic-shafer commented Dec 17, 2024

genematx commented Dec 17, 2024 • edited Loading

danielballan commented Dec 18, 2024

Add `consolidated` structure family #668

Add `consolidated` structure family #668

danielballan commented Feb 26, 2024 •

edited

Loading

danielballan commented Feb 26, 2024 •

edited

Loading

danielballan commented Feb 27, 2024 •

edited

Loading

dylanmcreynolds left a comment •

edited

Loading

padraic-shafer Mar 3, 2024 •

edited

Loading

danielballan commented Mar 3, 2024 •

edited

Loading

padraic-shafer commented Mar 3, 2024 •

edited

Loading

danielballan commented Mar 4, 2024 •

edited

Loading

genematx commented Dec 17, 2024 •

edited

Loading