Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No clear way to handle sorting with versioned object #180

Open
ScopeyNZ opened this issue Sep 17, 2018 · 17 comments
Open

No clear way to handle sorting with versioned object #180

ScopeyNZ opened this issue Sep 17, 2018 · 17 comments

Comments

@ScopeyNZ
Copy link
Contributor

ScopeyNZ commented Sep 17, 2018

Overview

Sorting with versioned on has_many and many_many through is not predictable and probably hasn't really been properly considered. There are workarounds for common use cases like sorting of blocks in elemental, but no core support.

Acceptance Criteria

  • Works with has_many and many_many through
  • Previous sort order can be previewed for a certain timestamp (which usually correlates with a version of the parent record)
  • Changes to draft sort order can be previewed before they're published
  • Supports rollbacks to older version of parent record if those relationships are owned (should already be done through Rollback should include all owned records #94)
  • Does not show deleted records in archive mode (should already be done through RFC Archive mode should include all versioned records  #110)

Out of Scope

  • Write new parent record versions when child is edited
  • Strong relational integrity (MyRelationVersion in addition to MyRelationID). Previewing of older versions and rollbacks are based on closest version for this timestamp, not following references in a relational database.

Notes

Context

Problem

Currently when you sort a versioned object you may do one of three things to update the Sort column on a versioned DataObject:

1) Always use ->write (Every object gets a new version)

Update the object you are sorting with ->write(), then update all objects in between the new and old positions (exclusive) by looping and also using ->write.

Observations:

  • This is slow - If you have 1000 elements and move position 1000 to position 1 you've got 1000 writes.
  • You get a draft version of the object you updated and a draft version of all objects in between - not great for usability
  • Reverting to a version is ambiguous. When reverting, you'll have to revert some other objects too, but there's no way of determining which ones

2) Use ->write only on the re-ordered object (Only that object gets a version)

Update the object you are sorting with ->write(), update all other objects with an UPDATE query.

Note: Elemental does this (in version 4 of elemental) and CMS already does this with SiteTree (although it still loops the objects and runs individual updates - I can raise an issue for that)

Here's a pretty detailed example of this option:

Consider pages numbered 1 to 5 (1 2 3 4 5). Move page 4 to after 1 (1 4 2 3 5). The sort on the published pages will be (1 3 4 4 5). It might make more sense if I list the live sort order on each of the objects at the end:

Object Sort (Live) Sort (Draft)
1 1 1
2 3 3
3 4 4
4 4 2
5 5 5

All the objects other than 4 (the one that was actually moved) have been updated on Live to make way for the sort order of 4 that will be published, but it's not published yet!

Another example:

Initial list: A B C D E F G H I J
Move I to beginning: I A B C D E F G H J
Move B to end: I A C D E F G H J B
Move H to after A: I A H C D E F G J B
Move I again: A H C D E I F G J B
Final sort values (only moved objects have been updated):

Letter Sort (Live) Comment
A 1
B 10
C 3
D 4
E 5
F 6 F was never moved - still has original sort, but it should be 7th
G 7 Same situation as F, should be 8th
H 3 H was moved to 3, even though it should be at 2 now
I 6 Clashes with F now
J 10 J was never moved, now has same order as B, should be 9th

Observations:

  • Well, it doesn't really work as you can see... But...
  • This is better for performance. You always have 3 queries (->write on the moved object and a multi-row update on the draft and live tables of the object)
  • Only the moved object has a draft version
  • After moving and before publishing, you have some ambiguous sort as detailed above. Any record with the same "sort" will return in an ambiguous order - especially with PostgreSQL.
  • Reverting an object that has been moved doesn't really work, because the order of the other versioned objects isn't corrected. You have a similar problem to the previous observation. This could be an "easy" fix though - you can treat the revert like a move and re-order all the elements between the old and new positions again.

3) Direct update everything (No versioning)

Update all objects with direct queries

  • It works but you get no versioning of sort order.

Solution

I don't know.

Here are some ideas I can think of:

We advocate a solution

From simplest to hardest:

  1. We go with option 3. But saying "oh btw sorting isn't versioned" is pretty hard from a UX POV and from a "features of Versioned" POV.
  2. We go with option 2. We would need to implement some way to keep sorting consistent when rolling back - perhaps we just shuffle everything around the rolled back object similar to what happens when we move an object.

We write some code/extension that handles re-ordering collections of versioned object

Some notes on this:

  • We have an API for ordering ($page->moveAfter($pageId)?)
  • DataObjects that implement this won't have a "sort" column
  • We store the sort against the relationship owner somehow (might have to specify in code who the "owner" is)
  • When re-ordering we can write to just where-ever the sort is held but when re-ordering and editing there will be two writes.
@ScopeyNZ

This comment has been minimized.

@chillu
Copy link
Member

chillu commented Oct 9, 2018

Great write up! A few clarifications:

This is assuming has_may, right? A many_many can be a versioned relationship (many many through), presumably it doesn't have the same issues?

update all other objects with an UPDATE query.

Could you detail what the query looks like? Sorry, too lazy to go hunting for it. Only found some logic in BaseElement->Pos() (https://github.com/dnadesign/silverstripe-elemental/blob/93f736fb9c68e2fe29d9f6514026b02cefaf1ade/src/Models/BaseElement.php#L904)

After moving and before publishing, you have some ambiguous sort. Consider 5 pages labelled 1 to 5. Move page 4 to after 1 (1 4 2 3 5). The sort on the published pages is (1 2 3 4 4) - pages 4 and 5 have the same sort.

I don't understand why changing the sort order on a draft record without publishing anything would change sort orders on any published pages?

We go with option 2, but we don't update the sort of anything until it's published - This would make it very hard to show draft sort order

Isn't this just a matter of SELECT Sort FROM BaseElement_Live vs. SELECT Sort FROM BaseElement?

@ScopeyNZ
Copy link
Contributor Author

ScopeyNZ commented Oct 9, 2018

This is assuming has_may, right? A many_many can be a versioned relationship (many many through), presumably it doesn't have the same issues?

You're right I didn't really consider many_many. If you versioned your sort through many_many_through you would still have the same problem. If you moved one object, you would have to update the sort on each "join" object, and now you have to choose when to publish each one of those. This would be even more confusing UX wise (I imagine) because the actual objects won't have any changes so I'm not sure how the user would know to publish their changed sort orders.

Could you detail what the query looks like? Sorry, too lazy to go hunting for it. Only found some logic in BaseElement->Pos() (dnadesign/silverstripe-elemental:src/Models/BaseElement.php@93f736f#L904)

Sure. You can find all objects between the old and new position of the moved object and either increment or decrement the sort order of them all. Consider items numbers 1 to 5 (1 2 3 4 5). Move 4 to the first position: (4 1 2 3 5). Before you move it, find all objects from 1st to 3rd positions (inclusive) and because we're moving block 4 to a higher position we must increment all of those (1 2 3 all get +1).
Query: UPDATE table SET Sort = Sort + 1 WHERE Sort >= 1 AND Sort <= 3
Then: $object->Sort = 1; $object->write().

Here's a snippet from a PR in elemental for seeing it in code: https://github.com/dnadesign/silverstripe-elemental/blob/034e100826fd4f349bd30cc4c62b8534d7f73b25/src/Services/ReorderElements.php#L63-L84

I don't understand why changing the sort order on a draft record without publishing anything would change sort orders on any published pages?

That example was the idea where we only create a version for the moved object, meaning that other objects that will have their sort orders affected will be updated on the live version. Referring back to the example:

Move page 4 to after 1 (1 4 2 3 5). The sort on the published pages is (1 2 3 4 4) - pages 4 and 5 have the same sort.

It's actually incorrect, the sort on the published pages will be (1 3 4 4 5). It might make more sense if I list the live sort order on each of the objects at the end:

Object Sort
1 1
2 3
3 4
4 4
5 5

All the objects other than 4 (the one that was actually moved) have been updated to make way for the sort order of 4 that will be published, but it's not published yet!

Isn't this just a matter of SELECT Sort FROM BaseElement_Live vs. SELECT Sort FROM BaseElement?

If option 2 was chosen, but we don't update sort orders of live blocks until publish time then we'd have to calculate what the updated order of non-moved blocks would be. Another example:

Initial list: A B C D E F G H I J
Move I to beginning: I A B C D E F G H J
Move B to end: I A C D E F G H J B
Move H to after A: I A H C D E F G J B
Move I again: A H C D E I F G J B
Final sort values (only moved objects have been updated):

Letter Sort Comment
A 1
B 10
C 3
D 4
E 5
F 6 F was never moved - still has original sort, but it should be 7th
G 7 Same situation as F, should be 8th
H 3 H was moved to 3, even though it should be at 2 now
I 6 Clashes with F now
J 10 J was never moved, now has same order as B, should be 9th

Working through this example I'm realising that we can't update sort of everything else on publish, because there's a LOT of ambiguity when you have many sorts one after the other.

Sorry!

@chrispenny

This comment has been minimized.

@chrispenny

This comment has been minimized.

@ScopeyNZ
Copy link
Contributor Author

ScopeyNZ commented Dec 3, 2018

I've unassigned @micmania1 because the work he did was captured in other issues & PRs. This issue really refers to the high-level how is sorting supposed to be managed with versioned objects.

I think the issue got a bit carried away - perhaps the title should be more descriptive.

@ScopeyNZ ScopeyNZ changed the title Sorting isn't versioned correctly No clear way to handle sorting with versioned object Dec 3, 2018
@ScopeyNZ
Copy link
Contributor Author

ScopeyNZ commented Jan 18, 2019

Actually because it got a bit side tracked I'm going to re-raise the issue if that's okay with you @chillu?

I'd really like the goal of this ticket to be a solution to this problem statement:

I want to sort a versioned object that belongs to a parent object, and for it to make sense and work out of the box.

Currently all our instances of this already (SiteTree, BaseElement and File) all do it their own way - and they're all flawed.

This came up again in community Slack related to elemental.

@ScopeyNZ
Copy link
Contributor Author

ScopeyNZ commented Jan 25, 2019

Okay in the end I've just hidden @chrispenny's comments. That was some stellar investigation and we've identified this as a pretty big issue in #195.

I've updated the OP with a few examples and some possible solutions. I'm not sure about the actual impact of this because it's been like this since the dawn of versioning.

I've tagged this as needing feedback from @silverstripe/core-team because hopefully we can at least get consensus on a potential solution...

@sminnee
Copy link
Member

sminnee commented Jan 25, 2019

I'd say that in the first instance we want to close off #195 as best we can without this, acknowledge any limitations, and merge it into 4 so we can at least bank some improvement before getting lost in this one.

Secondly, it occurs to me that what we might need to do is create "the ordering of a set of siblings" as a separate piece of data for the purposes of versioning.

So you have, say, SiteTree_SortVersion as a separate table. When you change the order of some items, the Sort values of all the sibling items get logged in this table rather than as any version changes of those siblings. It ends up being managed in a similar way to how we're managing other related data objects.

Then, when you need to get historic data of SiteTree, you join SiteTree_SortVersion and grab the Sort value form there, rather than using SiteTree_Versions.Sort.

It's just a high-level sketch but this might be more robust?

To allow for sufficient backward compatibility, I would probably retain the existence of the SiteTree_Versions.Sort field as-is, and maybe have a build task or dev/build book to build out the initial version of the SiteTree_SortVersion dataset.

This would probably require some special knowledge added to the system to say "oh this field is for sorting". The improved behaviour would only apply if this feature was activated. We could enable it for SiteTree and blocks, but others would need to add it manually. As long as the system worked with its current bugs if you didn't do that, that seems fine to introduce in a minor release.

@ScopeyNZ
Copy link
Contributor Author

I'd say that in the first instance we want to close off #195 as best we can without this

Yes to be clear I consider this an entirely different issue. #195 is about tracking changes across an relationship graph. This is about sorting individual objects and only involves siblings when talking about how they need to be updated when moving an object around. Mention of "parents" is just as a potential solution. I don't think we should relate this to capturing changes in the "relationship graph" - #195.

So you have, say, SiteTree_SortVersion as a separate table

Having Sort on a different table isn't a bad idea from my PoV but I think it still makes more sense to have the sort against a parent - If you think about moving a block around you don't really expect to look at the history of that block to see it's movement within a list, you'd look at the page (or "elemental area") to see how things were re-ordered.

The problem with this is that there's nothing that's versioned that's a "parent" to SiteTree - SiteTree's only have a "parent" when subsites is installed. You almost need a new dataobject SiteTreeSort with SiteTreeSort_Live.

I would probably retain the existence of the SiteTree_Versions.Sort field as-is

This kind of scares me. I think you're right so far as preventing a fatal on trying to update non-existent fields - I'm sure there's plenty of code in the wild that's UPDATE SiteTree SET sort = ?. It'd be nice to hook into onBeforeWrite and catch updates to Sort in that way though.

I still am thinking of this like an extension that needs to be configured in some way:

SilverStripe\CMS\Model\SiteTree
  extensions:
    ordering: SilverStripe\Versioned\Sortable
  sort:
    [some config here...]

And then you can:

$page = SiteTree::get()->byId(1);
$page->moveAfter(SiteTree::get()->byId(9)) // or just ->moveAfter(9)

@chillu
Copy link
Member

chillu commented Jan 29, 2019

So there's four kinds of sort (assuming versioned objects on both sides of the relationship):

  • Sort Variation 1: many_many_extraFields: Legacy, should ignore
  • Sort Variation 2: has_many sort (e.g. ElementArea has_many Element, SiteTree with parents)
  • Sort Variation 3: Sort without relationship (e.g. toplevel SiteTree)
  • Sort Variation 4: many_many_through: Recommended for versioning many_many, has same issue. Conceptually the same as Sort Variation 2

Only Sort Variation 3 actually requires sorts to be stored on the record itself, because there's no relationship owner on the other side. We've somewhat abused SiteConfig for "toplevel SiteTree" config before (view/edit permissions), and it's already a hard dependency on silverstripe/cms.

I'm a bit worried about table explosion if we introduce <sorted-record>_SortVersion for everything. SiteTree_Sort and Element_Sort are just the tip of the iceberg. We'd also get GalleryItem_Sort, Link_Sort, RelatedDocument_Sort, potentially dozens of tables on a large install. Not only because the schema gets unwieldy, but because that's all stuff we need to join on to generate already expensive rollbacks and history views (#195 won't solve that).

How about making Sort a property on the owner of the sorted relationship. So a new ElementalArea.ElementSorts column, which contains a map of element identifiers to their sort order. That column is written on draft and live stages by whatever modifies the relationship (usually GridField), and updates the full list of siblings. We'd take away the ability to just call $myRecord->update(['Sort' => 2])->write(). This would also address the fact that records can have more than one sort property since they can be part of more than one has_many relationship. This field would need some serialisation, and support for in-memory sorting on the ORM level. That's really not ideal for paging through long sibling lists (e.g. entries under a "blog holder"), or limiting nodes like in our tree view. So ... a non-starter.

Aside: We could make this column a JSON type, but that would require MySQL 5.7 (see discussion). And you'd need MySQL 8.0 for proper JSON table functions to do the required virtual table joins, so it's a non starter as well. We could just support sorting on Postgres? ;)

I'm really tempted to just say "sort isn't versioned". We've spent an enormous amount of time on versioning edge cases in the last years, and it's exponentially increased the complexity both core committers and the average dev has to deal with. We spend weeks across a dozen people just discussing edge cases before we're in a position to implement anything. Would unversioned sorts really be so bad for author experience? /cc @clarkepaul

@sminnee
Copy link
Member

sminnee commented Jan 29, 2019

How about making Sort a property on the owner of the sorted relationship. So a new ElementalArea.ElementSorts column, which contains a map of element identifiers to their sort order.

So you're recommending an array field over a separate table? How confident are we that this will make faster rather than slower queries? I would expect a join to a separate table to be at least as fast, with appropriate indexes, and more consistently supported across database backends.

Other than "separate table vs array field" we're largely recommending the same thing.

The other thing I would say is that you can think of the sorting as being optionally "partitioned by" another field.

So SiteTree is sorted by Sort, partitioned by ParentID.
Element is sorted by Sort, partitioned by ElementAreaID.
Something else might by sorted by Sort and not partitioned.

ParentID = 0 is a legal 'partition' in that regard.

Would unversioned sorts really be so bad for author experience? /cc @clarkepaul

So we're saying:

  • You can't see reliable historical sort orders, either in the CMS or when previewing the archived site
  • You can't roll back to older versions and expect the sort orders to be reliable.

Not ideal, but doesn't seem like a showstopper. I wouldn't close as wontfix but maybe not put it on the must-have list for 4.4?

@chillu
Copy link
Member

chillu commented Jan 29, 2019

I like framing this as optional partitioning, makes a lot of sense.

"separate table vs. array field": My first concern is amount of tables. I've seen projects with 450 tables, and think SilverStripe is already pretty insane in terms of getting your head around the data model of your site as an average developer. It also increases dev/build times (more to inspect and adjust). Speed of array fields depends on database implementation - I'd wager it's just as fast as joins in Postgres, and could be slower in MySQL. it's a non-starter anyway, since we'd need to bump MySQL support to 8.0 for that. It's definitely slower than joins if we do this in-memory. Can we get a way with one or two tables for sort versioning? SortVersion and SortVersion_Live, with cols for ObjectClass etc? Am I the only one that's freaked out by projects with hundreds of database tables? :D

To clarify, an "unversioned sort" implementation really means "synced sort between stages". So when you change sort of an item, it directly writes every draft and every live record in the same partition without creating a new version. This introduces new issues with rollback though: You'd have to reset all items in the same partition to the date point in time (closest version). That might be just as time intensive to implement as actually fixing sort versioning in the first place. Aaaaargh.

@sminnee
Copy link
Member

sminnee commented Jan 30, 2019

Can we get a way with one or two tables for sort versioning? SortVersion and SortVersion_Live, with cols for ObjectClass etc?

If ObjectClass was an enum included in the index that would probably be fine.

Am I the only one that's freaked out by projects with hundreds of database tables?

It's probably worth clarifying that actual downsides of it, beyond heebie-jeebies. I've tended to recognise that heebie-jeebies can be a sign that something is problematic, but equally that something is merely unusual.

@tractorcow
Copy link
Contributor

tractorcow commented Jan 31, 2019

I had a similar problem with fluent localisation; The issue was I needed the ability to whitelist which fields needed localisation (or in this case, versioning) and not localise others.

Similarly, I had separate tables, except that the _Localised table contained only columns requiring localisation. Those missing fields were joined from the base table on every query.

Perhaps what versioning needs is a way to whitelist which fields are versioned, conditionally joining the base table when loading / lazy-loading fields that are not? This solution would suit problems other than simply sorting (e.g. 'ViewCount' may not need versioning).

In other words, Sort is not versioned, and wouldn't even be a column on _Live rows.

We've essentially dropped the concept of on live but not on stage so I'm not worried about missing joins (stage is source of truth).

I prefer joining existing tables (and trimming the complexity of one of them) much more than adding yet another table to the schema.

@chillu
Copy link
Member

chillu commented Mar 3, 2019

Flagging that we've spent literally hundreds of hours on versioning behaviour in the Open Sourcerers team in the last three months, so from my team's perspective I'm going to park this. It's a long standing issue, which can be remedied from an author perspective by simply publishing everything to get it back into a consistent state.

@sminnee
Copy link
Member

sminnee commented Mar 3, 2019

Would it be worth noting the quirk in our user and/or dev docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants