Custom Field, Custom Object & Remote Object Data Architecture #1142

charlesBochet · 2023-08-09T15:58:03Z

charlesBochet
Aug 9, 2023
Maintainer

/!\ THIS IS NOT UP TO DATE - WE WENT WITH 1SCHEMA / TENANT ARCHITECTURE IN THE END /!\

Custom Field, Custom Object & Remote Object Data Architecture

Owners: @Weiko @charlesBochet

Requirements

(1) Multi-tenant
(2) Allow a workspace to have its own GqlSchema
(3) Allow customFields and customObjects by workspace
(4) Allow remote datasource by workspace
(5) Unified permission system
(6) Performant queries
(7) Enforce data integrity
(8) Scalable

Definitions

Standard object: Object that is available for all Twenty users. Ex: Company / Person
Standard field: Field of a standard object that is available for all Twenty Users: Ex: Company.displayName
Custom Object: Object that is available only in a specific workspace. Available for a subset of workspaceMembers. To simplify here, we will consider that it’s available for all workspaceMembers of workspace. We can add a permission layer on top of that later.
- Custom Object have fields ; it does not matter to call them customFields as they don’t have standardFields
Remote Object: Object that is available only in a specific workspace. These fields are stored in remote data-source that have been added to Twenty.
- Remote object have fields ; idem
- Note: we can imagine adding customFields on remote object and store them in Twenty, this would be a mix of Custom and Remote objects

Scenario

Our goal is to provide a unified GraphQL schema where users can query standards object, custom fields, custom objects and remote objects.

Let’s imagine the following scenario:

person standard object
- id + firstname + lastname standard fields
- friendliness custom field
- Note: plural of person is people
client custom object
- id + name fields
- organization field that is a relation to organization (see below)
organization remote object
- id + displayName field

This corresponds to the following gql unified model:

{
	// standardObject with standardField and customField stored in Twenty DB
	person: { id: string, firstname: string, lastname: string, friendliness: number, __type: 'Person' }
	// customObject with its fields stored in Twenty DB
        client: { id: string, name: string, organization: { displayName }, __type: 'Client' }
	// remoteObject with its fields stored in a remote Postgres DB
        organization: { id: string, displayName, __type: 'Organization' }
}

High Level schema

We have a gateway that’s exposing a /graphql endpoint and a /meta endpoint.

The /graphql endpoint is used to query Twenty schema and remote schema through a unified /graphql schema.
The /meta endpoint is used to manage Twenty schema which will be served by a cache (ElastiCach, Redis, …)

Multitenancy

All databases except Remote datasources (obviously) are multi-tenant. The Gateway is also multi-tenant. (1) is met.

Note: If we have a big customer who wants to have complete isolation, we can still host a multi tenant system as a single tenant for this customer. This type of customer can also use self-hosting.

Note: ALL data models have a workspaceId. This enables data scoping, and later sharding.

Custom Field, Custom Object & Remote Object

Data structure is stored in Metadata tables (data_sources, object_metas, field_metas).

Values are stored in either standard objects, custom objects or remote datasource.

Custom Field on a standard object

This will be stored in the JSON customFields column.

Data records sample:

person
	id: 'person-1'
	firstname: 'Jean'
	lastname: 'Dujardin'
	customFields: { friendliness: 100}
	workspaceId='workspace-1'

person
	id: 'person-2'
	firstname: 'John'
	lastname: 'FromTheGarden'
	customFields: { friendliness: 90}
	workspaceId='workspace-1'

Metadata

field_meta
	id='field-1'
	objectId='standard-person'
	type='number'
	name='friendliness'
	workspaceId='workspace-1'

Custom Object

This will be stored in a custom_objects table.

Data records sample:

custom_object
	id: 'custom-object-1'
	objectId: 'object-1'
	fields: { name: 'Michel', organization: 'organization-1'}
	workspaceId='workspace-1'

custom_object
	id: 'custom-object-2'
	objectId: 'object-1'
	fields: { name: 'Michel Michel', organization: 'organization-1'}
	workspaceId='workspace-1'

custom_object
	id: 'custom-object-3'
	objectId: 'object-1'
	fields: { name: 'Jacques', organization: null}
	workspaceId='workspace-1'

Metadata:

object_meta
	id: 'object-1'
	dataSource: 'twenty'
	name: 'client'
	workspaceId='workspace-1'

field_meta
	id: 'field-2'
	objectId: 'object-1'
	type: 'text'
	name: 'firstname'
	workspaceId='workspace-1'

field_meta
	id: 'field-3'
	objectId: 'object-1'
	type: 'relation'
	name: 'organization'
	metadata: { 
		target: { 
			sourceId: 'source-1'
			objectId: 'object-2' # defined in remote object section below
			foreignKey: 'id'
	}
	workspaceId='workspace-1'

Remote Object

Remote objects are read from remote sources

Data records sample:

organization # this is a row in a remote postgres table
	id: 'organization-1'
	displayName: 'Qonto'
	otherField: 'bla'

organization # this is a row in a remote postgres table
	id: 'organization-2'
	displayName: 'Airbus'
	otherField: 'bla'

Metadata:

object_meta
	id: 'object-2'
	dataSource: 'data-source-1'
	name: 'organization'
	workspaceId='workspace-1'

field_meta
	id: 'field-4'
	objectId: 'object-2'
	type: 'text'
	name: 'displayName'
	workspaceId='workspace-1'

data_source
	id: 'data-source-1'
	location: 'postgres://xxx@xxx:xxx:5432/xxx' # don't store credentials in plain obviously
	type: 'postgres'
	name: 'Prod DB'
	workspaceId='workspace-1'

Metadata update and Unified GraphQL schema

To add a remote datasource, a customField or a customObject, the workspaceMember can query the /meta API. No migration is performed on any database.

(2) metadata are updated accordingly
(3) then, a GraphQL schema is computed based on metadata and stored in a GQL cache for later use
(4) As customFields are stored as JSON, we will updates all objects accordingly. For example, if we delete a customField from a customObject, we will go through all instances of customObjects and update the customFields JSON accordingly

GQL cache would look like:

{
	person: { id: string, firstname: string, lastname: string, friendliness: number, __type: 'Person' }
  client: { id: string, name: string, organization: { displayName }, __type: 'Client' }
  organization: { id: string, displayName, __type: 'Organization' }
}

Querying the data

Let’s directly take a complex example of query:

{
	people(where: { friendliness: { gte: 50 }}) { # people is person plural
		firstname
		lastname
		friendliness
		company {
			id
		}
  }
  clients(where: { organization: { displayName: { equal: 'Qonto' }}}) {
		id
		name
		organization { displayName }
  }
}

This query goes through the QueryResolver:
Naive approach:

Check Schema (2)
1. Get workspaceId from schema
2. Load GQL schema from cache
3. Make sure the query match the schema
Fetch Metdata and keep it in memory (3)
Resolve first field (people)
1. Person is a standard object, no need to query metadata ⇒ SELECT FROM people
2. where condition check:
  1. Friendliness is not a standard field ⇒ query metadata ⇒ number field ⇒ where condition on customFields JSON
3. select
  1. firstname + lastname fields are standard field ⇒ select on columns
  2. friendliness is not a standard field ⇒ metadata number field ⇒ select on customFields JSON
  3. company is a standard object ⇒ call companyResolver
4. Perform query (5)
Resolve second field (clients)
1. Client is not a standard object, need to query metadata ⇒ SELECT FROM custom_objects where object_id = ‘object-1’
2. where condition check:
  1. organization is a remote object ⇒ need to do a subquery with SELECT id FROM organization where displayName = ‘Qonto’
  2. perform subquery (4)
  3. rewrite where condition to be organization: {IN: [subquery.ids]}
3. SELECT
  1. id, name from fields (json)
    1. perform query (5)
  2. organization (remote object)
    1. perform subquery (4)
  3. merge results
Merge results and return

This nested logic should be handled in a nice way with graphql resolvers.

(This solves (2), (3), (4) requirements)

Data integrity

Access control
1. Twenty will be served as multi tenant for most of their users and all tenants will use the same DB. We need to make sure the access of their data is secure and cannot be altered between different tenants (workspaces).
  1. We are storing a workspaceId on all of our objects/tables which will be used as a safeguard for all requests making sure the tenant (workspace) members doing the request have access to the objects they are querying
2. We are also allowing tenants to connect their remote DB, as a READ access.
  1. We will need to make sure those credentials are not leaked and are properly encrypted
3. To prevent unauthorized access to the database, it is important to add permissions at the Postgres level in addition to the application level. This will ensure that even if someone gains access to the database, they will not be able to bypass application-level security measures.
Validation
1. We can leverage standardFields typing and customFields metadata type. We will have a validator for each type (number, date, etc) contained in the JSON.
Inconsistency detection and correction
1. We will treat all custom objects and custom fields as soft dependencies and modifications of those entities should have a minimum impact.
  [To detail later]
2. As for the schema modification implied by custom entities modifications, we can version the schema so the client can still query the old API then client will have to do a force refetch when comparing versions.
  [To detail later, seems overcomplicated and might not work]
Backup and recovery
1. AWS backups
2. Versioning?

Scalable

We can use horizontal sharding on workspace_id. Nothing hard here!

Performances discussion

We can get optimal performance on Standard objects and standard fields.

Standard fields vs custom fields

Custom fields are stored as JSON. As we don’t know the shape of this JSON we cannot index this JSON. This means that filtering or sorting by a custom field is slower. If the number of entity is reasonable (<100k), this should still be doable with acceptable response time for the user. Filtering is ~O(n)

Custom Objects

Custom object query performance should be similar as custom fields query performance. Join can be costly and should be avoided at read time

Remote Objects

As long as remote object foreignKey is indexed, we should have an acceptable performance for the user except for join.

Complex example: filter PipelineProgresses on Clients > Organization > DisplayName

Let’s imagine that our standard object PipelineProgress has a customObjectId targeting a client (custom object). We want to display the PP whose client has an organization with a name containing ‘Qo’

Query o=Organization by displayName: ~O(1). [ex: Org=1000 orgs match ‘Qo’]
Fetch c=Clients that have the corresponding organization ids ~O(n * Org) [ex: Cli=10000 clients match]
Fetch pp=PipelineProgresses on customObjectId target ~O(Cli)

This should done in a ~O(c * Org)

This should cover all usecases for standard pricing (<10.000 entities) without issue. For workspaces with more than that, we might want to create dedicated and optimzed index for these query (let the user manually create an index, or detect that we need to create it). We might also want to move these heavy customers to dedicated plans maybe on single tenant infra or with ES clusters.

Alternatives considered

Full metadata approach: All objects are stored in a flexible data storage (like custom_objects) with metadata describing it.
- As standard entities + custom fields will cover a vast majority of our usecases, it would be damaging to not benefit from SQL optimization on those.
One schema by tenant
- This would remove the need from custom objects.
  - However, we still need to have a gateway to perform queries on remote schemas.
  - We will need schema migrations which is possible for non breaking change, and very delicate for breaking changes. We would need to copy the data from table to table to make sure we are not losing data
  - Also, the existing technologies do not support multi tenant approach and we would need to re-code a lot of things and we don’t see a progressive path to get there
Use Postgres extension to manage postgres query and mapping with custom objects / JSON:
- This doesn’t solve the issue with remote schema and we would need to code the Query resolver twice
Use NoSQL

Short term plan

This is a long term plan. To get there:

Introduce customFields on standard entities, add them to graphQL schema through a customFields resolver
Introduce customObjects, add them to graphQL schema through a customResolver
Start building the metadata manager and query resolver to aggregate data from twenty and remote

Later

Integrate customFields and customObjects in the same way as standard fields in the graphql schema

magrinj · 2023-08-11T08:56:56Z

magrinj
Aug 11, 2023
Collaborator

The described proposal sounds great and very innovative to me !
I just want to add a proposal for the GraphQL schema part by implementing schema composition.

Proposal for Apollo Schema Composition based on Object Types

Given the complexity and multifaceted nature of our GraphQL architecture, I propose that we implement Apollo schema composition to improve our overall system's efficiency, maintainability, and scalability. Here are the primary reasons for this recommendation:

Modular Schemas: In our monolithic application, we will segment our GraphQL schema into distinct modules based on object types:

standardObject Schema: Handles the core entities and the default behavior.
customObject Schema: Manages customized objects tailored for specific business needs.
remoteObject Schema: Addresses entities that fetch or interact with remote data sources.

Unified Access Point: Apollo Gateway can combine these smaller schema segments into a single unified schema. This provides clients with a single endpoint, streamlining client development and reducing confusion. Each of them can also be versioned.

Consolidated Business Logic: Business logic remains inside our single application, ensuring tight integration, especially when there's cross-talk between standard, custom, and remote objects.

Schema Evolution: This approach provides us with the flexibility to modify and expand parts of our schema independently, making future changes less disruptive.

Benefits: Clear separation of concerns, ease of deployment, and a singular endpoint for GraphQL requests.

Considerations: Potential for resource bottlenecks and reduced scalability as demand grows ?

Other thinks to take in consideration

One schema for all tenant: In the proposed solution, it fell like all the tenants are sharing the same GraphQL schema, I think we shouldn't as customObject and remoteObject can actually overlap between tenant. One schema by tenant will make it work.
Performance: Merging all the queries into one result can really slow down the response time, I think some queries need to be spit down into multiple queries instead of just one with n+1. Or we can try to use the new @defer directive: https://www.apollographql.com/docs/react/data/defer/

1 reply

AndryHTC Aug 16, 2023

@magrinj Fantastic contribution! As we prioritize making these features available, we'll concurrently strive to elevate them to the state of the art. Continued efforts in this direction are definitely appreciated.

skamensky · 2023-09-11T12:35:21Z

skamensky
Sep 11, 2023

Instead of cross-posting, see my comment here:

#501 (comment)

The essence is: pass current user data to the data source when performing remote operations sources

1 reply

charlesBochet Sep 12, 2023
Maintainer Author

Thanks for the comment, using postgres permission layer is something we are considering for standard and custom objects. We have a "driver" approach in the codebase to allow different systems to be in place based on user preferences. We could also apply it to permissions and enable using postgres permissioning or app level permissioning depending on the usecase.
We haven't investigated permissions deeply enough yet to know what would be our main direction

charlesBochet · 2023-10-17T19:23:41Z

charlesBochet Oct 17, 2023
Maintainer Author

Thanks for the suggestino @lazaridiscom, we are looking into ways to better organize our roadmap and projects, and splitting into multiple project might be interesting as the project gets more mature. We will discuss this with the core team and come back with a feedback.
Thanks for the comment, much appreciated

--Edit: outdated as the author has removed the comment

cocobeach · 2024-01-01T12:15:23Z

cocobeach
Jan 1, 2024

Hi guys, great presentation of fields in general.
But I am having a small problem.

I created a custom Object: Services, it comes with a a Default field: Name, however, if I want to bind 1 Opportunity to Many Services, I need to do so by their names, it then creates a conflict, I believe with the "Name" which is not unique?

Here is an example
I create an opportunity for gardening services:
that opportunity yields:

Service, Name : Lawn mowing
Service, Name : Hedge Trimming
Service, Name: Leaves Sweeping

Any given opportunity can have any of those services structured as follows:

Custom Field Service/Services

Activities
Custom
Activity Targets

Account Status
Custom
Select

Name ==> duplicate key value violates unique constraint
"IndexOnNameObjectMetadataIdAndWorkspaceIdUnique"
(When targeted from a custom field "Service" in my Opp, so that it reflects the
MANY services that came with that closed Op)
(PS Ideally I would like to be able to open services only when invoice was paid,
how can I do that?)
Standard
Text

Quantity
Custom
Number

Price
Custom
Number

Creation date
Standard
Date & Time

So the only filed I need as identifier in my Opp is the Name of the service, but when I try to bind it from my Opportunity (that has no name apparently) it creates a conflict with Name which is a concatenation of Firstname and Lastname I understand for people?

But won't that be a systematic problem for every relationship if we have the name field "name" created by default with any custom object.

Shouldn't we call it "CustomeObjectName" rather than "name" (but I suspected this was a label until I saw the exception)?

I am having nevertheless the error below thrown and I can't bind them, why?

I my case ServiceName?

This is what happens if I try to target my Service object field called "name" from my custom field in Opportunity "Service".

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Field, Custom Object & Remote Object Data Architecture #1142

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

This comment has been hidden.

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Custom Field, Custom Object & Remote Object Data Architecture #1142

charlesBochet Aug 9, 2023 Maintainer

/!\ THIS IS NOT UP TO DATE - WE WENT WITH 1SCHEMA / TENANT ARCHITECTURE IN THE END /!\

Custom Field, Custom Object & Remote Object Data Architecture

Requirements

Definitions

Scenario

High Level schema

Multitenancy

Custom Field, Custom Object & Remote Object

Custom Field on a standard object

Custom Object

Remote Object

Metadata update and Unified GraphQL schema

Querying the data

Data integrity

Scalable

Performances discussion

Standard fields vs custom fields

Custom Objects

Remote Objects

Complex example: filter PipelineProgresses on Clients > Organization > DisplayName

Alternatives considered

Short term plan

Replies: 4 comments · 3 replies

magrinj Aug 11, 2023 Collaborator

Proposal for Apollo Schema Composition based on Object Types

Other thinks to take in consideration

AndryHTC Aug 16, 2023

skamensky Sep 11, 2023

charlesBochet Sep 12, 2023 Maintainer Author

This comment has been hidden.

charlesBochet Oct 17, 2023 Maintainer Author

cocobeach Jan 1, 2024

charlesBochet
Aug 9, 2023
Maintainer

Replies: 4 comments 3 replies

magrinj
Aug 11, 2023
Collaborator

skamensky
Sep 11, 2023

charlesBochet Sep 12, 2023
Maintainer Author

charlesBochet Oct 17, 2023
Maintainer Author

cocobeach
Jan 1, 2024