Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CM] Improve CRUD and RPC interfaces #153527

Closed
26 tasks done
vadimkibana opened this issue Mar 23, 2023 · 12 comments · Fixed by #154150
Closed
26 tasks done

[CM] Improve CRUD and RPC interfaces #153527

vadimkibana opened this issue Mar 23, 2023 · 12 comments · Fixed by #154150
Assignees
Labels
Feature:Content Management User generated content (saved objects) management Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)

Comments

@vadimkibana
Copy link
Contributor

vadimkibana commented Mar 23, 2023

We should improve the interfaces of Content Management CRUD layer and RPC. Currently, the CRUD layer is very permissive: it allows the storage layer to specify as its inputs and outputs for each CRUD method almost any interface, without any consistency between the methods. We should:

  • Provide strict interface for CRUD methods.
    • In ContentStorage interface
    • In ContentCrud class
    • In RPC
  • There should be one "shape" (interface) for content item data, which storage providers register, for a specific type. I.e. get and bulkGet (and other procedures) should return the content items in the same shape.
  • By procedure:
    • get — well defined content item interface that is returned, with ability to add more fields later.
    • bulkGet — should semantically be equivalent to get, with the only difference that it receives a list of IDs and returns a list of content items.
    • create — should return a content item, and return it same as get.
    • update — should be able to return the updated content item, same as get.
    • search
      • Search inputs should be well defined, e.g.:
        • Text to search by
        • Other filters
          • Tags list
        • Pagination options
          • Cursor
          • Limit of the number of content items to fetch
      • Search result should have a well defined output interface, which includes
        • Item list, same as in bulkGet
        • Pagination result interface
  • Remove all any, only leave unknown for the actual content item object.
    • No any types usage in CRUD request and response.
    • unknown used, but only for:
      • for actual content item data
      • optional options, which we pass through

The ContentStorage interface could look something like this:

image

@vadimkibana vadimkibana added Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) Feature:Content Management User generated content (saved objects) management labels Mar 23, 2023
@sebelga
Copy link
Contributor

sebelga commented Mar 23, 2023

As mentioned in our previous meeting I would like to understand better the why we would put in place those restriction for the CRUD + search on a content provider which is directly tight to their plugin and UI. No CM UIs is consuming those.
So far it seems that the only reason for doing this is for our caching on the browser side and as I mentioned I think this should be an optional performance boost that team can opt in. Aside, TS is just TS. Anyone can trick it and return other object from the DB. If we really want to guarantee that get and create return the same storage item we should have a runtime validation in place and schemas.

it allows the storage layer to specify as its inputs and outputs for each CRUD method almost any interface

This is not true for the inputs. Those are strict and restrictive. Except for the data, options and query.

Remove all any, only leave unknown for the actual content item object.

As library authors we actually mean that any value can be returned. "any" is thus technically correct. If unknown is used it breaks with the Generic as "xxx generic doest no satisfy the constraint unknown". I've mentioned this to @Dosant in previous PR. If you find a way around this I am happy to hear about it.

These are my concerns:

  • We want to keep on boarding to CM as smooth as possible. For that reason we don't want to ask teams to change the interfaces returned by the storage layer and have to do (maybe large) refactors in their codebase to make it all work.
  • There is not a clear why we need to do it. What are we trying to gain from it? If it is the cache then it would be nice to have a benchmark of a current request (e.g. 50ms) that we want to lower to X ms.
  • If the reason is the cache we need to do more to be sure that we don't cache a wrong object back from an update just because TS said it was the same as the object returned by get (e.g. runtime validation)
  • The consumers need to use the rxjs or hook client to be able to benefit from the cache. Which is not the case currently. So optimising for it too early will not bring any value.
  • The logic to synchronising the cache can get pretty complex on the client side (e.g. after a search result loop over 100 of items and update the correct contentTypeId cache for each of them).
  • It can hut performance as maybe today an update to the DB returns the partial object (e.g. { foo: '123' }) but to accommodate for the full object now the DB has to do an extra fetch to get the full object ({ foo: '123', bar: true})

Let's keep this one in the backlog until all content is migrated and Serverless/BWC ready and we have a clear reason why to enforce a common DB schema on the content CRUD + search methods.

@Dosant
Copy link
Contributor

Dosant commented Mar 23, 2023

@sebelga, I don't think it is only the cache. For example, it seems that with the current approach we also won't be able to build a extractCommonFields like functionality which unlocks shared CM components and CM search layer ?

  • We discussed that extractCommonFields like functionality would be needed to map a user's object to a known to us common schema that we can use inside our code, like in the eventBus, logs, shared client components, etc.
    If we don't have the restrictions, I think we'd have to add extractCommonFields/extractContentItems for each crud method or ask to provide a single function which knows how to handle all outputs from each method. I don't think this is an acceptable API for us or consumers.

  • Without extractCommonFields like functionality we won't be able to smoothly support shared content management components with this API so things like <ContentPreview id="dashboard/123"/> won't be possible. Separate CM search layer is also not possible because we won't be able to extract common schema, correct?

  • Also I think that having this restrictions in place would really help with multi type search implementation . Currently we have to introduce custom toMSearchResult mapper (which is basically a custom extractCommonFields just for mSearch), but with the restrictions in place we could re-use a single common extractCommonFields mapper without introducing custom things just for msearch (that would work if we combine searchIds[] and bulkGet, so this is a larger discussion, but having these restrictions in place is a prereq)

Regarding the cache, I agree it wouldn't be easy on the client side, but I think it also doesn't have to be ideal either:

There is not a clear why we need to do it. What are we trying to gain from it? If it is the cache then it would be nice to have a benchmark of a current request (e.g. 50ms) that we want to lower to X ms.

  • Yes, more frequent cache hits because we can share cache between different crud operations, helping with optimistic updates or automatic updates after mutation would become possible. With the current approach we won't even be able to get a cache hit after navigating from a list of results to a detailed view.

The consumers need to use the rxjs or hook client to be able to benefit from the cache. Which is not the case currently. So optimising for it too early will not bring any value.

Promise would work if we increase staleTime. Currently it is 0. It won't automatically revalidate and update with promise, but we can get a cache hit.

Aside, TS is just TS. Anyone can trick it and return other object from the DB. If we really want to guarantee that get and create return the same storage item we should have a runtime validation in place and schemas.

I am struggling to see the problem here. If we have schemas - they will fail in runtime on schema validation. If we don't have schemas they will fail in runtime inside the business logic. It is a bug anyways that needs to be fixed by a team. If they tricked typescript, it is their fault.
It is a plan that they have schema as part of versioning helper, current? So I guess this would cover an edge-case with a corrupted database?

We want to keep on boarding to CM as smooth as possible. For that reason we don't want to ask teams to change the interfaces returned by the storage layer and have to do (maybe large) refactors in their codebase to make it all work.

I don't know how much of the overhead from onboarding perspective these restrictions create :(
But It also seems to be that having these restrictions will make it easier for backward compatibility? Instead of versioning each API output separately they can version ContentStorageItem once?

@sebelga
Copy link
Contributor

sebelga commented Mar 23, 2023

with the current approach we also won't be able to build a extractCommonFields like functionality which unlocks shared CM components and CM search layer

Why not? There is nothing that stops us to require it in the server (or browser) side registry.

// Teams decide to align all their interface
register({
  id: 'foo',
  msearch: {
    extractCommonFields: [{
      // teams have decided to unify **on their own decision** all their CRUD + search api
      // so they can declare a single handler here. We didn't enforce anything
      handler: mySuperHandlerThatWorksWithAllCrud,
      operations: ['*'], // optional. If not provided the default is ['*']
    }]
  }
});

// Teams prefer to keep their implementation as is
register({
  id: 'foo',
  msearch: {
    // teams prefer to keep their responses as is as it is simpler to expose 3 handler for the common fields.
    // We didn't enforce anything
    extractCommonFields: [{
      handler: getHandlerToCommonField,
      operations: ['get', 'bulkGet'],
    }, {
      handler: createUpdateHandlerToCommonField,
      operations: ['create', 'update'],
    }, {
      handler: searchHandlerToCommonField,
      operations: ['search'],
    }]
  }
});

Also I think that having this restrictions in place would really help with multi type search implementation. Currently we have to introduce custom toMSearchResult mapper

I always said that we need to have the extractCommonField in place. Nothing stops you require it as part of the registry. I think it should be called extractCommonFields and not toMSearchResult to align with our future vision (and maybe declared as above or other similar API that allows granularity). With the "operations" set to "search".

Yes, more frequent cache hits because we can share cache between different crud operations, helping with optimistic updates or automatic updates after mutation would become possible. With the current approach we won't even be able to get a cache hit after navigating from a list of results to a detailed view.

I reiterate my initial comment: we need to have an actual problem to solve before optimising pre-maturely. It is not worth risking the onboarding to CM (and Serverless effort) for this optimisation IMO.

I am struggling to see the problem here. If we have schemas - they will fail in runtime on schema validation.

The important part of your sentence is the "if". We don't enforce schemas to be provided anymore since we added support for BWC. It is the responsibility of each team to provide their schemas and run the validation inside each of their storage instance methods.

Fixing this issue implies that we would actually enforce them to provide a result schema for all their CRUD. That schema must be the same for get, bulkGet, create and update. So if any of their DB access does not comply with this schema we will throw. So to avoid that they will have to do a second call to their DB (get) and make sure the object returned satisfies the schema.
Oh.. and maybe some old SO that haven't been accessed in a while are missing one field. It didn't use to be a problem but now, if we do enforce and do runtime validation on anything that is returned by the DB, we will throw an error. A scenario that used to be OK (maybe the UI does not even need that left over field anymore) is not anymore OK.
All this risk for what benefit? The browser cache? Seems risky to me.

But It also seems to be that having these restrictions will make it easier for backward compatibility?

My point is: each team should be able to decide what makes their life easier. If they indeed return the same object in all their CRUD + search, they will have an easier life for BWC. Only 1 object to up/down transform. If they prefer to do it differently and version multiple objects, why do we want to enforce anything?

We can have recommendations, but I don't see (at this stage) the need to enforce anything.

@vadimkibana
Copy link
Contributor Author

vadimkibana commented Mar 24, 2023

[..] why we would put in place those restriction for the CRUD + search [..]

I don't think it is some heavy unjustified restriction. On contrary, I think it is expected that a CRUD API operates on top of some object type. For example:

  • create and read should return an object. It cannot be that they: potentially don't return an object.
  • And that object should be of the same type O, across the methods:
interface CRUD<O> {
  create: () => { obj: O }
  read: (id: string) => { obj: O }
}

// test:
obj = create()
obj == read(obj.id)
  • Similarly, read and readBulk should semantically work exactly the same, with the difference that the readBulk is an optimized version of read.
interface CRUD<O> {
  read: (id: string): { obj: O }
  readBulk: (ids: string[]): { objs: O[] }
}

// test:
[read(id)] == readBulk([id])

Most of the HTTP APIs you will see (Stripe, Shopify, GitHub, all GraphQL APIs) do exactly that: there is one schema of a content type and it is reused everywhere.

The reason why we need this is to be able to rely on this CRUD API to build any of the content management features. For example, we should be able to execute read method to fetch some content item to be able to render it, regardless of its type. Like in this <ContentIcon> React example:

image

@sebelga
Copy link
Contributor

sebelga commented Mar 27, 2023

we need this is to be able to rely on this CRUD API to build any of the content management features

Ok. I will not try to convince you more. But really, whatever the outcome of this issue is I would not do it before full BWC and Serverless preparation for all content is finished and fully tested.

We don't need it for our Component library as:

  • We will have the search layer
  • This component you mentioned are read only (no create)
  • We can ask then (when we actually need it) that content provider give us a handler to convert (if we don't have the search layer as the source of truth for common schema. Must simpler IMO, no need for all CRUDs of all content to change any of their logic. Also probably more performant (no double requests to the DB (1. post 2. fetch) to satisfy this hard requirement you want to put in place)

@vadimkibana
Copy link
Contributor Author

vadimkibana commented Mar 27, 2023

I don't see how search layer will help here, nor we will have it any time soon, if ever.

In my opinion a "CRUD" API which does not behave as a CRUD API (where all inputs and all outputs of every method can be anything) is not useful. It is tech debt, and onboarding any content on top of it is will just add more tech debt on top of that tech debt.

Whereas it should be trivial to simply return the same object shape per type in all CRUD methods that should return an object.

interface CRUD<Object> {
  get: () => Object;
  bulkGet: () => Object[];
  create: () => Object;
  // etc..
}

instead of:

interface NotReallyCRUD {
  get: () => any;
  bulkGet: () => any;
  create: () => any;
  // etc..
}

The BWC proposal started way before the Serverless project started. As part of it, the removal of the Saved Object client is not strictly necessary to support Zero Downtime Upgrades in Serverless. Instead, the safest and fastest way to support ZDU would be to build into the Saved Object client support of ZDU, for example, by using the object versioning library you have created.

Now, the main benefit we get from removing the Saved Object client is that the teams, who used it on the client side, will now have the chance to decouple their SO object schemas from the HTTP schema. It is exactly the opportunity for the teams to figure out what the object schema over the HTTP of their object should be. Create that one schema and use it in all HTTP responses.

It should not be that the CRUD methods do not provide any internal guarantees, there should be at least some consistency between the methods:

  • create and read should return an object. It cannot be that they: potentially don't return an object.
  • And that object should be of the same type O, across the methods:
interface CRUD<O> {
  create: () => { obj: O }
  read: (id: string) => { obj: O }
}

// test:
obj = create()
obj == read(obj.id)
  • Similarly, read and readBulk should semantically work exactly the same, with the difference that the readBulk is an optimized version of read.
interface CRUD<O> {
  read: (id: string): { obj: O }
  readBulk: (ids: string[]): { objs: O[] }
}

// test:
[read(id)] == readBulk([id])

@sebelga
Copy link
Contributor

sebelga commented Mar 27, 2023

In my opinion a "CRUD" API which does not behave as a CRUD API (where all inputs and all outputs of every method can be anything) is not useful.

It is not "anything". It is very well typed and I think that is what you are missing

Note how MapsUpdateOut returns SavedObjectsUpdateResponse. I don't see anything wrong with that. The only thing that their UI require to function is an id to be returned (https://github.com/elastic/kibana/pull/153304/files#diff-82f86b1752c3df9df3b133330ace1c140ba3e4d6268495a8990348c0be48e797R66).

We, as library author, provide a Generic that can return anything (the teams are then responsible to declare their schemas).

As far as I know we don't have any mandate to tell teams what they should be returning and how consistent the response should be across their CRUD. Specially when we are so close to Serverless and all the unknown that will come with it. We should not come at this stage with new hard requirements on top of the CM migration (and make onboarding more difficult).

It is exactly the opportunity for the teams to figure out what the object schema over the HTTP of their object should be.

Indeed. They have the opportunity to do it and we can give them guidance. But should not be forced to do it just because we think it would be a more (useful, semantically correct, nice...) API.

@vadimkibana
Copy link
Contributor Author

vadimkibana commented Mar 27, 2023

It is very well typed and I think that is what you are missing

I am not missing that. I am not talking about how it looks like from the Maps team perspective. They can cast any to the correct type and it is fine. I am talking about how it looks from our, Content Management, perspective:

  • What CRUD methods return is inconsistent. We have no idea what is in the responses.
  • Even if we knew what is in the responses, the shape of the returned objects should be consistent between CRUD methods.

As far as I know we don't have any mandate to tell teams what they should be returning and how consistent the response should be across their CRUD.

It is not "their CRUD", it is CRUD for the Content Management. It should have some basic consistency guarantees, so we can build Content Management features on top of it.

(Also, if they do create their own CRUD, I would argue they should indeed return objects of the same shape from create, read, update methods. Like we already do in Kibana, for example, the Data Views API.)

@sebelga
Copy link
Contributor

sebelga commented Mar 27, 2023

I am talking about how it looks from our, Content Management, perspective

From CM perspective....

// map
{
  name: string;
  description: string;
}

// dashboard
{
  title: string;
  text: string;
}

Two content type. Two completely different objects. Both are any (unknown) to us. But both have fields that mean the same thing. What we need from a CM perspective is a handler to convert to our CommonFields interface. It look like map in this example already adhere to it. Great for them. Dashboard will have to provide us with a handler to convert their interface.

It should have some basic consistency guarantees, so we can build Content Management features on top of it.

Let's then define the exact requirements to build the CM features on top of it.

  • Do we need a common schema to create, update items ? Do we have common/generic UI components that are going to create any type of content? Seems hard to imagine all SO to be the same shape. so NO
  • Do we need a common schema to delete an item? Probably not. A 200 response should be enough
  • Do we need a common schema for get, bulkGet, search ? YES. How can we solve that? Do we need server side refactors? UI code to be updated? additional DB access? NO. We simply need a handler to convert their "unknown" object (from CM point of view) to a "commonField" object we can rely on. And the best part is: no one needs to do anything today for that.

I will reiterate one more time: even if teams would try their best to have a common TS object returned, we would still need runtime validation to guarantee that the objects are indeed the same (with the risk to break Kibana if old SO don't adhere to the validation schema).
But maybe we don't actually need the full object to be the same, our requirements are only to have "id", "description", "title" ... Do we really need to make everyone's life harder for 3-5 fields we need for our reusable UI component?

@vadimkibana
Copy link
Contributor Author

vadimkibana commented Mar 27, 2023

We still need the handler to covert to CommonFields, the ask is not to return CommonFields. The ask is to return a consistent object shape across the methods within a type. (Obviously, across the content types the schema is different.)

I don't think it makes anyones life harder to return objects of the same shape from create, read, and update methods. Nor we should ever encourage anyone, or even give the possibility to return different shapes between those methods.

Having one shape of a content type is done exactly to simplify everything.

APIs we have in Kibana return the same shape of object from their respective create, read, and update calls. Be it Spaces, Data Views, Short URLs, Rules, Cases, etc..

Public APIs you will see on the net—Stripe, GitHub, Shopify, AWS, GCP, and more—all define a content type's shape once, and return it in the same shape from different API calls.

@sebelga
Copy link
Contributor

sebelga commented Mar 27, 2023

Agree with all that you say. It is a great recommendation for teams as it will make their life easier. We can show them those public APIs that adhere to this recommended way of doing a CRUD api.

For us, it doesn't really matter as we won't use it.

// The map team has decided to return `any` in Get, BulkGet, Search. What is CM going to do about this?
export class MapStorage extends ContentStorage<any> {
}

APIs we have in Kibana return the same shape of object from their respective create, read, and update calls.

I've shown you above that is not the case. update uses the SavedObjectsUpdateResponse . Which, as its name implies, is specific to Update op.

Public APIs you will see on the net—Stripe, GitHub, Shopify, AWS, GCP, and more—all define a content type's shape once, and return it in the same shape from different API calls.

Yes. 100% agree. I am not trying to say that is not a best practice to do so. I am simply saying that we don't have a mandate to enforce it right now without a compelling reason (e.g. a feature we have to build blocked by it).

Anyway, let's leave it here. Let's onboard all the content to CM and we'll retake the conversation then.

@vadimkibana
Copy link
Contributor Author

vadimkibana commented Mar 27, 2023

Agree with all that you say. It is a great recommendation for teams as it will make their life easier. We can show them those public APIs that adhere to this recommended way of doing a CRUD api.

So, if you agree and it will make their life easier, then there is no reason to not do it.

For us, it doesn't really matter as we won't use it.

For us it matter a lot. We don't care what is inside their content item object, for us it is unknown. But there should be some object, and that object should have the same type across CRUD methods.

(And, no, we don't need to runtime validate it. At least I don't see why we should.)

// The map team has decided to return `any` in Get, BulkGet, Search. What is CM going to do about this?
export class MapStorage extends ContentStorage<any> {
}
  1. Map should not put any there, but the type they will actually use.
  2. For CM it does not matter what the actual type is, for us any type is fine, as long as it is the same across CRUD methods.

(pseudocode:)

interface ContentStorage<T> {
  get: () => T,
  getBulk: () => T[],
  create: () => T,
  update: () => T,
  delete: () => { success: boolean },
  search: () => T[],
}

APIs we have in Kibana return the same shape of object from their respective create, read, and update calls.

I've shown you above that is not the case. update uses the SavedObjectsUpdateResponse . Which, as its name implies, is specific to Update op.

You literally showed your "update" method which needs to be fixed. And the reason we are removing the Saved Object client is to exactly to fix it - to decouple database schema from network schema. SavedObjectsUpdateResponse is the database response type, it should not be returned over the HTTP to browser.

Anyway, let's leave it here. Let's onboard all the content to CM and we'll retake the conversation then.

I don't think we can onboard anyone until it is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Content Management User generated content (saved objects) management Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants