Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing a complex update/insert/delete CTE #62

Open
lithdew opened this issue Dec 6, 2022 · 6 comments
Open

Representing a complex update/insert/delete CTE #62

lithdew opened this issue Dec 6, 2022 · 6 comments
Labels
question Further information is requested

Comments

@lithdew
Copy link

lithdew commented Dec 6, 2022

Hi,

I wanted to ask if the following complex CTE query which comprises of inserts, deletions, and updates could be modeled with this library.

Suppose we have a post system with attachments. Post attachments are identified by ETag's. ETag's can be associated with many posts, and posts can be associated with several ETag's at once. A many-to-many relations table is used to define this relationship.

CREATE TABLE public.posts (
    id text NOT NULL,
    author_id text NOT NULL,
    content text NOT NULL,
    created_at timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
    updated_at timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
);

CREATE TABLE public.etags_on_posts (
    post_id text NOT NULL,
    storage_id public."Storage" NOT NULL,
    etag_id text NOT NULL
);

CREATE TABLE public.etags (
    storage_id public."Storage" NOT NULL,
    id text NOT NULL,
    type text NOT NULL,
    num_posts_attached integer DEFAULT 0 NOT NULL,
    num_avatars_attached integer DEFAULT 0 NOT NULL
);

CREATE TYPE public."Storage" AS ENUM (
    'r2'
);

Suppose we want to update a post's content and attachment given {postId: string, content: string, etags: { storageId: "r2", id: string, type: string }}[].

  1. Old entries to "etags_on_posts" should be removed, and their corresponding entries in "etags" should have "num_posts_attached" decremented by 1.
  2. New entries to "etags_on_posts" should be added, and their corresponding entries in "etags" should have "num_posts_attached" incremented by 1. If no corresponding entries in "etags" exist, they are created with the etag's type inserted as well.
  3. The entry of the post in "posts" should have its content updated.

The following raw porsager/postgres code represents the query:

async updatePostOrComment(sql: Sql, id: string, content: string, etags: { type: string, etag: string }[]) {
    const input = etags.map(e => (["r2", e.etag, e.type]));

    return sql`
        with updated_post as (update posts set ${sql({ content, updatedAt: new Date() })} where id = ${id} returning *),
        new_posts as (select * from posts union select * from updated_post)
        ${etags.length > 0 ?
            sql`
            , input as (select storage_id::"Storage", id::text, type::text from (values ${sql(input)}) as input(storage_id, id, type))
            , removed_etags_on_post as (delete from etags_on_posts where post_id = (select id from updated_post) and (storage_id, etag_id) not in (select storage_id, id as etag_id from input) returning *)
            , etags_on_post_to_add as (select (select id from updated_post) as post_id, storage_id, id as etag_id from input where (storage_id, id) not in (select storage_id, etag_id from removed_etags_on_post))
            , added_etags_on_post as (insert into etags_on_posts select * from etags_on_post_to_add returning *)
            , update_added_etags as (
                insert into etags (storage_id, id, type) select * from input where (storage_id, id) in (select storage_id, etag_id as id from added_etags_on_post)
                on conflict (storage_id, id) do update set num_posts_attached = excluded.num_posts_attached + 1 returning *
            )
            , update_removed_etags as (
                update etags set num_posts_attached = num_posts_attached - 1
                where (storage_id, id) in (select storage_id, etag_id as id from removed_etags_on_post) returning *
            )
            , new_etags_on_posts as (select * from etags_on_posts union select * from added_etags_on_post except select * from removed_etags_on_post)
            , new_etags as (select * from etags union select * from update_added_etags union select * from update_removed_etags)`
            :
            sql``}
        
        ${this.submissions(sql, sql`updated_post`, { posts: sql`new_posts`, etagsOnPosts: etags.length > 0 ? sql`new_etags_on_posts` : sql`etags_on_posts`, etags: etags.length > 0 ? sql`new_etags` : sql`etags` })}`.then(([post]) => post);
},

... where this.submissions joins some additional fields and performs some extra transformations to updated_post.

If perhaps the following query might be too complicated, could it be significantly simplified with this library then?

@juanluispaz
Copy link
Owner

juanluispaz commented Dec 6, 2022

OMG, you put all sql features in a single query. Let me decompose the query a little bit:

with					(#1)

	updated_post as (
		update posts		(#2)
		set SOME_COL = SOME_VAL 
		where id = ${id} 
		returning *
	),
        new_posts as (
		select * 
		from posts 
		union 			(#3)
		select * 
		from updated_post
	),
	input as (
		select storage_id::"Storage", id::text, type::text 
		from (
			values ((VALUE_1, VALUE_2, VALUE_3))) 
			as input(storage_id, id, type)			(#4)
	), 
	removed_etags_on_post as (
		delete from etags_on_posts 
		where post_id = (
			select id from updated_post			(#5)
		) and (storage_id, etag_id) not in (			(#6)
			select storage_id, id as etag_id from input	(#7)
		) 
		returning *						(#8)
	), 
	etags_on_post_to_add as (
		select 
			(select id from updated_post) as post_id, 	(#4)
			storage_id, id as etag_id 
		from input 
		where (storage_id, id) not in (				(#6)
				select storage_id, etag_id 		(#7)
				from removed_etags_on_post
			)
	), 
	added_etags_on_post as (
		insert into etags_on_posts 				(#2)
		select * from etags_on_post_to_add 			(#9)
		returning *						(#10)
	),
	update_added_etags as (
		insert into etags (storage_id, id, type) 
		select * 						(#9)
		from input 
		where (storage_id, id) in (				(#6)
			select storage_id, etag_id as id 		(#7)
			from added_etags_on_post
		)
		on conflict (storage_id, id) 				(#11)
		do update 
		set num_posts_attached = excluded.num_posts_attached + 1 
		returning *						(#10)
	),
	update_removed_etags as (
		update etags 
		set num_posts_attached = num_posts_attached - 1
		where (storage_id, id) in (				(#6)
			select storage_id, etag_id as id		(#7)
			from removed_etags_on_post
		)
		returning *						(#12)
	),
	new_etags_on_posts as (
		select * 
		from etags_on_posts
		union							(#2)
		select * 
		from added_etags_on_post
		except							(#2)
		select * from removed_etags_on_post
	),
	new_etags as (
		select *
		from etags
		union 							(#2)
		select * 
		from update_added_etags
		union							(#2)
		select *
		from update_removed_etags
	)
select SOMETHING
from SOME_TABLE
where SOME_CONDITION

My notes:

  • (#1) With clauses are supported. Documentation
  • (#2) Use insert, update and delete queries in a with clause is a very PostgreSQL specific not supported yet (there are other specific postgres already supported)
  • (#3) Compound queries (union, intersect, except) are supported. Documentation
  • (#4) Inline values as table/view are supported. Documentation
  • (#5) Inline select values are supported. Documentation
  • (#6) Comparison of multiple values (aka (V1, V2) operator SOMETHING) are not supported yet, but you can use a custom sql fragment to do it (with regular values, not selects). Documentation
  • (#7) You can transform this subquery in another with and use it as a regular table in your front, then you don't need point (#6)
  • (#8) Delete returning are supported. Documentation
  • (#9) Insert from a select are supported. Documentation
  • (#10) Insert returning are supported. Documentation
  • (#11) Insert on conflicto to update supported, Included specify the conflict columns. Documentation. List of all methods available on insert
  • (#12) Update returning are supported. Documentation
  • Usually you don't want to use * in your queries in your backend, because it make fragile your queries, specially when the database schema changes and generate a mismatch between code and database; ts-sql-query doesn't support it, but instead you can specify all the columns expected by your code, there is some utilities function to do it. Documentation

Summary: The only missing feature to be able to perform this query in ts-sql-query is the one mention in (#1)

Regarding the readability of the code that generate the query, ts-sql-query can help you a lot on this. In our systems we have very complex queries, but we don't assemble it in a single function, instead of that, we decompose it in several functions that returns the object built by ts-sql-query (just omit the return type of the function declaration). Each query in the with can be created in a function.

Additionally you can take advantage of the select clause order to create a function that create the base select, and then add or complement the specific where clauses. Additionally, you can use optional joins when the join is not required all the time (there is a current limitation, if it refer a query in the with, the with will be included, but not used).

I recommend you to have a look on all convenient features to dial with dynamic queries and to the sql fragments that will allow you to use sql in parts not covered yet by the library (most of the case that is enough, but it doesn't cover point (#1)).

I'm going to see how can I add support to the point one, but it will take me a little bit.

@juanluispaz
Copy link
Owner

I'm going to point in my list to improve ts-sql-query:

  • Insert as with (only for postgres)
  • Update as with (only for postgres)
  • Delete as with (only for postgres)
  • (value1, value2, ...) in select query (all supported databases)
  • (value1, value2, ...) in insert query (only for postgres)
  • (value1, value2, ...) in update query (only for postgres)
  • (value1, value2, ...) in delete query (only for postgres)
  • Avoid add unused withs in dynamic queries

Be aware, ts-sql-query generate a compilation error when you try to use a feature not available in your database.

Good, now I have requirements, my list was empty 😃

@lithdew
Copy link
Author

lithdew commented Dec 6, 2022

Really appreciate the swift response and detailed writeup. For the time being, I'm happy to isolate separate queries and have them independently executed one by one as a transaction rather than as a large CTE :).

@juanluispaz juanluispaz added the question Further information is requested label Dec 6, 2022
@juanluispaz
Copy link
Owner

juanluispaz commented Dec 6, 2022

Oh, you can do it in several queries if you want, but that is not what I tried to tell you.

In our case, we have very complex queries in a special part of the system that manage the accountability of a project, In that case what we did is split the query in several functions that build that part of the system, and the put together in another one. It had a very interesting effect, some subqueries where reused in several places, that made even easer to understand and think in the system.

Let me put an example with the case you put in #63:

function buildFollowersCountSubquery(connection: DBConnection) { // I intentionally omit the return type, allowing TS infer it
    return connection
        .subSelectUsing(accounts)
        .from(follows)
        .where(follows.followingId.equals(a.id)
        .selectOneColumn(connection.countAll())
        .forUseAsInlineQueryValue()  // At this point is a value that you can use in other query
        .valueWhenNull(0);
}

function buildFollowingCountSubquery(connection: DBConnection) { // I intentionally omit the return type, allowing TS infer it
    return connection
        .subSelectUsing(accounts)
        .from(follows)
        .where(follows.followerId.equals(a.id)
        .selectOneColumn(connection.countAll())
        .forUseAsInlineQueryValue()  // At this point is a value that you can use in other query
        .valueWhenNull(0);
}

async function getAccountsInformation(connection: DBConnection) {
    const result = await connection
        .from(accounts)
        .select({
            id: a.id,
            numFollowers: buildFollowersCountSubquery(connection),
            numFollowing: buildFollowingCountSubquery(connection),
            // the others counts you need
        })
        .orderBy('id', 'asc')
        .executeSelectMany();

    return result;
}

I can go eve further:

function buildAccountsInformationQuery(connection: DBConnection) { // I intentionally omit the return type, allowing TS infer it
    return connection
        .from(accounts)
        .select({
            id: a.id,
            numFollowers: buildFollowersCountSubquery(connection),
            numFollowing: buildFollowingCountSubquery(connection),
            // the others counts you need
        })
        .orderBy('id', 'asc')
}

and the use it as:

async function getAccountsInformationForVerifiedUsers(connection: DBConnection) {
    const result = await  buildAccountsInformationQuery()
        .where(accounts.isVerified)
        .executeSelectMany();

    return result;
}

This strategy allowed us to deal with very complex queries in an easy and understandable way without scarify execute everything in a single call to the database.

@lithdew
Copy link
Author

lithdew commented Dec 7, 2022

For points (#2), (#8), and (#9), would you then suggest splitting the query into independent statements and having them execute separately in a transaction rather than as a single CTE statement?

@juanluispaz
Copy link
Owner

For now yes, due I don't support that construction; or, if it is important for you to be in a single query, just keep as is, in raw sql if for you it is important to be in a single query due a very important performance reason (you must be able to measure and confirm you need it).

If you want to access to the sql object used by PostgresQueryRunner you can do this (I didn't include in PostgresQueryRunner documentation, but I mention in PrismaQueryRunner documentation):

Accessing to the porsager/postgres sql object from the connection

If you want to access the underlying porsager/postgres sql object (to the transaction one when it is in one) from your connection object you can define an accesor method in your connection class like:

import type { Sql } from 'postgres'

class DBConnection extends PostgreSqlConnection<'DBConnection'> {
    getSqlClient(): Sql {
        const client = this.queryRunner.getCurrentNativeTransaction() || this.queryRunner.getNativeRunner();
        // ideally I will do something like if (client instanceof Sql) { return client; } else { throw new Error('...'); }
        // but I don't know how to perform this validation in porsager/postgres, then I just cast to any without control allowing the function return the proper type.
        return client as any;
    }
}

In this way you will be able to keep raw sql in some places.

Note: TransactionSql extends Sql interface in porsager/postgres

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants