Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize connection pool and/or queries for cluster instances #2253

Open
justinsoliz opened this issue Oct 2, 2017 · 58 comments
Open

Customize connection pool and/or queries for cluster instances #2253

justinsoliz opened this issue Oct 2, 2017 · 58 comments

Comments

@justinsoliz
Copy link

I'm looking to configure the mysql pool to handle reader and writer endpoints available for aws aurora clusters.

Basically read operations would point at one endpoint and write operations would point at another. Is there currently any configuration mechanism available for this use case?

I've looked for previously opened issues that seem related:

Its not clear to me if or how they've been resolved and if there's now a mechanism for handling that now. Any help is appreciated.

@richraid21
Copy link
Contributor

richraid21 commented Oct 3, 2017

Currently not supported. I wrote a quick (and dirty) work around that just juggles Knex instances until Knex can be updated with specific functionality.

@podtrackers
Copy link

I'm working on this right now too...
I've set up a Connection Pool and added the WRITER and READER endpoints from the Aurora cluster. Then each time I execute a query I regex it to see if it only contains selects.

@Eli-Goldberg
Copy link

+10

@rkaw92
Copy link

rkaw92 commented Nov 20, 2017

@richraid21 You stated in #2339 that having read/write routing inside the pool would be the most robust solution. I disagree:

  • A pool is, by name, a collection of similar resources. Making them dissimilar splits the pool into separate pools conceptually. Thus, it seems more natural to keep this resulting division in code and preserve the guarantee that each pool stores objects that are functionally equivalent.
  • Additional complexity in the pool code may bring about bugs similar to those found in generic-pool.
  • The pool may be used in cases where there are no concepts of "read" and "write", and it should be kept generic, not knex-specific.

I would like to propose having this above the pool, as a router/layer that possibly encapsulates multiple tagged pools and serves acquire() requests using tags:

  • acquire(tag:string) - returns a resource (in our case, connection) from any pool that has the given tag; tries to round-robin before appropriate pools if acquisition from one fails due to resource starvation (requires short timeouts), or uses resource availability reporting APIs and immediate acquiring in a critical section (without yielding to the main loop)
  • acquire(tags:string[]) - returns a resource from any pool which has all of the given tags, for example "read" and "write" simultaneously; falls back as above
  • acquire() - compatibility function which returns a resource using pre-configured tag or tags, according to settings stored in this router instance; might use a "default" tag and knex.js in its default (non-split) config might register 1 pool with the same tag so that by default Knex behaves as it does today
  • release(resource) - same as currently; does not require user to specify the original tag, as the router should remember the allocations and route the releases back to wherever the resource came from.

@ewdave
Copy link

ewdave commented Nov 21, 2017

I also recently had a need for knex to be able to direct some table queries to a replica DB. It's d be great to see this feature added in the nearest future.

+20

@richraid21
Copy link
Contributor

@rkaw92 I think we have the same idea, I just didn't express my thoughts adequately. I agree that this should not be done inside the pool.

To clarify, if I'm understanding correctly, you are imagining a system in which:

  1. A pool is created for each endpoint and is assigned 0..N tags
  2. A pool contains between min and max resources (connections)

When knex requires a connection, it will call acquire(tag:string) which will then locate all pools assigned tag and retrieve a connection via round robin across those pools?

Thanks

@rkaw92
Copy link

rkaw92 commented Dec 7, 2017

@richraid21 That's conceptually it, yeah. Additionally, this layer should live between knex and the pools themselves so as to be re-usable outside of knex and not add any bloat.

@podtrackers
Copy link

Hey all. Not sure this is needed anymore for Aurora... The now offer a master-master serverless DB https://aws.amazon.com/rds/aurora/serverless/

@elhigu elhigu changed the title Customize connection pool and/or queries for AWS Aurora cluster instances Customize connection pool and/or queries for cluster instances Dec 7, 2017
@elhigu
Copy link
Member

elhigu commented Dec 7, 2017

@podtrackers Updated issue header to be more general, some need for this still remains .

@justinsoliz
Copy link
Author

@podtrackers I think that's a good point. As far as I can tell, Aurora serverless looks to be more useful in terms of decreasing costs for low-load, non-prod environments. We'll utilize it where we can but will probably continue to run dedicated clusters in our prod environments where the traffic is more consistent.

@juliusza
Copy link

juliusza commented Aug 29, 2018

Having load balancing in db driver has many advantages by removing all the intermediate services (pgpool/virtual interfaces). As it was already mentioned there should be a separate pool per each DB node. But we'd also need health checks so that requests are not sent to servers that are offline (High Availability).

This will also have to be implemented per db driver, because of a need to check replication lag and not send queries to servers that are too far behind master.

Also we'd need to give API user the power to decide which queries go there, with an option to send all selects to slave servers. This could also be configured per query, e.g. query.fetch({slave: true})

I'm building a similar solution for PHP, where my code will juggle PDO connections for postgres DB. If PHP code works as intended, I'm willing to do some sort of knex implementation and share it. It would be only for postgres DB, however.

And while writing this down, I realized knex is probably not the place for this kind of code :)

+10

@kraftman
Copy link

kraftman commented Oct 9, 2018

I took a stab at this here: #2847
looking for feedback on the approach taken

@ozziest
Copy link

ozziest commented Apr 24, 2019

I wonder is there anything new about this? We are using AWS RDS and we have read replicas. We use AdonisJs as backend framework and it uses knex.js as database provider. Without read replica support, I guess we have to change our stack.

@avimar
Copy link

avimar commented Apr 24, 2019

@ozziest I don't know another stack will automagically do this for you.

The "simple" way is to have a knex handle for writes and a knex handle for reads.
Alternatively, check earlier comment: #2253 (comment)

@thetutlage
Copy link
Contributor

I have approached it by separating knex instance of read and write. However, within read and write, one may want to use multiple connection hosts vs relying on a single server and creating a new knex instance for each host is lot of manual work + waste of resources.

To counter the situation, I have created a module, that will let you use a callback function to compute the connection config at runtime and opens up the possibility to round robin between multiple connection configs https://github.com/thetutlage/knex-dynamic-connection

Lemme know if anyone has any thoughts :)

@jpike88
Copy link
Contributor

jpike88 commented Nov 5, 2019

Am now using aurora global database, where reads must be done in one region and writes made to another. I want to obviously distinguish these, even if it's as simple as having all select statements be automatically routed to the read endpoint. That's good enough for me.

@Frolanta
Copy link

Frolanta commented Nov 8, 2019

This is my two cents on the issue.
Note that I'm using Aurora PostgreSQL and maybe this is a very bad suggestion, I don't know ;)

Since aurora provide two endpoints: reader and writer (master) where the reader one load balance automatically between all read replicas, we can use only those two.

I show you how I split my queries between read and write and use the correct endpoint.

I also added a way (knex.select('*').queryContext({ useMaster: true })) to force a query to execute on the write or read endpoint.
This is very useful because sometime you can perform a write operation and a read one few ms after.
But since the replication process take time (under 100ms) your data is not guaranteed to be up to date on the replicates databases.
Let's say I "insert" a new product on a shopping cart table, and right after that use some select queries to get the new shopping cart price, I'm not guaranteed the new inserted data is present on the replicates databases yet.
So we should have a way to force select queries to be executed on the write (master) endpoint.

// First you must init two knex object for each endpoint

const knexMaster = require('knex')({
  client: 'pg',
  connection: { ...masterConnectionInfos },
  pool: { ...pollOpts },
});

const knexRead = require('knex')({
  client: 'pg',
  connection: { ...readConnectionInfos },
  pool: { ...pollOpts },
});

// and then we will use another knex object as a wrapper
// this one doesn't need a connection or pool settings
const knexWrapper = require('knex')({
  client: 'pg',
});

// We override the runner method of our wrapper client
knexWrapper.client.runner = function (builder) {

  /** here we will redirect the query on the correct knex object,
  * We use this method since this is one of the first executed
  * after your query has been built and it's still before the aquireConnection process
  **/

  /** bypass with knex.select('*').queryContext({ useMaster: true })....
  * this is to force your query to be executed on read or write endpoint
  * not sure about using queryContext for this but it seems ok to me
  **/
  if (builder._queryContext && builder._queryContext.useMaster === true) {
    return knexMaster.client.runner(builder);
  } else if (builder._queryContext && builder._queryContext.useMaster === false) {
    return knexRead.client.runner(builder);
  }
 
  // .toSQL() return an object or an array of object (not sure about that but this is what I found in /lib/runner.js)
  const sql = builder.toSQL();
  const useMaster = Array.isArray(sql) ? checkUseMasterMultiple(sql) : checkUseMasterSingle(sql);
  return useMaster ? knexMaster.client.runner(builder) : knexRead.client.runner(builder);
};


const checkUseMasterSingle = (object) => {
  // if object.method equals "insert", "del" or "update" then use master endpoint
  return ['insert', 'del', 'update'].includes(object.method);

  /** As a bonus (I will not provide code since it's still very poorly done on my side and kind of weird...)
  * But here you can also parse the sql query in object.sql
  * Find a way to retrieve all table names usage in the query
  * and if you find some "critical" one, let's say ("shopping_cart") then force to use master endpoint
  * This can be done if you don't want developpers to know when to force select queries on master endpoint using .queryContext({ useMaster: true })
  * Note that parsing every select queries to check for table names usage will take some times on every requests
  * So I suggest you to save a hash of the current query:  const hash = crypto.createHash('md5').update(object.sql).digest('hex');
  * and save the result (if it should use master endpoint or not) in an object, hash map or any other way
  * I think this is fine to do like this since object.sql do not contains variables, ex: select * from "table_name" where "column" in (?, ?, ?) (Note: this is not true If you use knex.raw)
  * It depends on your application and how many different queries your application have
  * Maybe you can still process / parse / transform the sql query to be more abstract and reduce the number (remove all after where condition, care with subqueries)
  **/
};

// this is just a simple loop
const checkUseMasterMultiple = (array) => {
  for (let i = 0 ; i < array.length ; i++) {
    if (checkUseMasterSingle(array[i])) {
      return true;
    }
  }
  return false;
};


// Don't forget to export the wrapper not the others two ;)
export default knexWapper;

@Ali-Dalal
Copy link

@Frolanta Thanks for the idea. Are you sure this gonna work with Aurora pgsql with read replica? I have exactly the same stack (nodejs, aurora postgres) but I also included bookshelf.js above knex.

but overall, it worth to implement this as a feature inside Knex API

@jpike88
Copy link
Contributor

jpike88 commented Mar 31, 2020

This is a real pain point for me as well.

@jpike88
Copy link
Contributor

jpike88 commented Mar 31, 2020

Just confirming that @Frolanta's design seems to work. It should be baked into knex.

@briandamaged
Copy link
Collaborator

@jpike88 : Out of curiosity, what are the main pain points you encounter when using the "2 separate Knex pools" approach? Ex: I'm guessing that this approach becomes problematic when you try to use bookshelf or any other ORM on top of knex.

@jpike88
Copy link
Contributor

jpike88 commented Mar 31, 2020

It becomes an issue when the codebase gets so large that it's painstaking to manually split all queries between the two pools.

There's one issue with the above alternative though, it doesn't seem to play nice when I try to use the .transaction method, just says can't find includes of undefined. I'll keep messing with it

@kibertoad
Copy link
Collaborator

I'll finally create a separate repo for knex-related utilities today, so if anyone would volunteer to build solution similar to the one proposed above, it could be included there. It sounds a little bit too high level to be included in Knex itself.

@Frolanta
Copy link

Just to inform you that there is a small problem with the solution I provided few months ago.

This is more a problem with AWS Aurora than knex but maybe it can be useful to you.

You can't use the AWS read endpoint which load balance between your replicas.
This will not work if you have a high traffic or if your knex connection timeout is too high.

Let's say you have multiple read replicas.
Then when you create a pool using and AWS Aurora read endpoint host, you will probably only use one of your replicates.

The read host only route to one of your replica "ip" and it change every second.

So if like me your pool if almost full in less than 1sec then all your pool connections will be on the same replica.

When one client connection timeout you will have a chance to be on an other replica, but again if your client connections almost never timeout like me you will only use one replica, or maybe 80% one and 20% another.

The only solution that I found is to have an array of read pools.
Then use the AWS SDK to find every minutes if a new replica is available or have been deleted.
And create (or delete) a new pool then add it to you array.

Since you have multiple read pool, you can select one randomly or use knex.client.pool.used.length and use the pool with the minimum connections.

@elhigu
Copy link
Member

elhigu commented Mar 31, 2020

One can easily also use validate callback to implement counter after how many times connection can be fetched from pool before it will fail the validation as @devinivy suggested.

Validation failure automatically evicts the connection from pool and creates a new one if necessary (or just fetches any other connection from the pool if there are connections available already).

Both should be able to be implemented already by passing custom validate function to knex pool configuration. This is pretty specific usecase so adding copy-paste configuration recipe for aurora to knex cookbook could be enough.

@Frolanta
Copy link

@briandamaged

  1. Provide a fancy way to automatically choose the correct pool depending upon which operations are going to be performed. (ex: use knexMaster whenever a "write" operation will be performed)

This could be nice, but we will still need a way to choose manually.

  1. Provide some way to alter to connection acquisition logic so that it accommodates upstream behaviors. (Ex: acquire only 1 connection per second to accommodate AWS's IP rotation)

I'm not sure this is good idea, 1sec is really long.
Let's say a user perform an action to my API and my pool is not full, it will take maybe 1 more sec to answer :/
Also @devinivy has a point.

  1. Provide a way to "lease" a connection for use across multiple distinct operations. (Ex: write to the database. Afterwards, use the same connection to read from the database to avoid timing issues caused by replication.)

The thing is that 100ms (replication time) is kind of long.
Just a very small example of the problem.
Let's say you have an api to create book.

perform /book POST (create a new book and return id)
At 0ms: the new book is in your master database
at 40ms user (browser) received the new id
at 45ms a new request to your api is made /book/:id GET (with the new id)
at 75ms your request is received on your api.
at 85ms your try a SELECT * FROM book .... on a replica but the data doesn't exist yet because the replication has not been perfomed yet.

I don't know how you can imagine a way to lease a connection for this kind of scenario.

For the replica connection I think the solution should be something a bit like sequelize do https://sequelize.org/v5/manual/read-replication.html (Didn't really dig into it so I don't know much about it)

You should be able to provide an array of replica (read) connection params.
Then don't use AWS loadbalanced read enpoint (because it suck) but all your replicas endpoints (this is really not difficult to get with the AWS SDK)

  • You should have a way to add and remove replica "on the fly" (because auto-scaling is a thing).

  • The number of connection to a pool should be per replica or it doesn't make sens.

  • When a new connection is required knex should choose the least used replica in your array for the connection (the one with the less active connections).

Also, read queries should also be performed sometime and your master (so master connection should also be included during the selection of the least used connection): Your application can be 99% read queries, so if your have one master and only one replica you will not use your master properly if we don't include it.

What do you think ?

@kibertoad
Copy link
Collaborator

@elhigu @lorefnon @briandamaged @maximelkin I've created https://github.com/knex/knex-utils (with placeholder content for now) for all kind of useful things that are outside of knex per se but can be useful for a wider audience. Ideas for possible features and submissions are more than welcome!

@briandamaged
Copy link
Collaborator

@Frolanta

This (automatic pool selection) could be nice, but we will still need a way to choose manually.

Agreed that end-users should still be able to directly access the underlying pools. (Personally, I'm not sure how much I'd trust logic that tries to choose the appropriate pool automatically. It seems like there would be a lot of corner-cases that could cause it to make the wrong choice)

I'm not sure this is good idea, 1sec is really long.
Let's say a user perform an action to my API and my pool is not full, it will take maybe 1 more sec to answer :/

Agreed. I'm mostly trying to figure out if there are any oddball scenarios that we might need to be able to accommodate in the future.

Like you said: it would be much more efficient just to obtain/update the complete set of IPs periodically. This would eliminate the need for any type of "warm-up" period.

perform /book POST (create a new book and return id) ...

Yeah -- I don't think connection leasing would be applicable in this example since it spans multiple HTTP transactions.

Connection leasing is more applicable when you have multi-step logic on the server side. In that case, it provides a way to guarantee that the same connection will be used throughout the entire multi-step process. (Ex: this is important when you are performing operations inside of a Transaction)

@ghost
Copy link

ghost commented Apr 30, 2020

Would be useful if connection could be a function, eg:

const knex = Knex({
  client: 'pg',
  async connection() {
    /** select and return the appropriate server */
  },
})

but ended up with this not so ugly hack:

const knex = Knex({
  client: 'pg',
  connection: {},
})

const servers = [ /** list of servers */ ]
const { acquireRawConnection } = knex.client

knex.client.acquireRawConnection = async function _acquireRawConnection() {

  knex.client.connectionSettings = null

  for (let server of servers) {

    /** logic to select the server */
    if (server.hasSuperPowers) {
      knex.client.connectionSettings = server
      break
    }

  }

  if (!knex.client.connectionSettings) {
    return Promise.reject(new Error(`No read-write server detected`))
  }

  return acquireRawConnection.call(knex.client)
}

using async cause i'm selecting the server based on show transaction_read_only query result

@llamadeus
Copy link

Okay, so I've given it some thought and think we can realize read write clustering without too much effort.

  1. We obviously need some changes to Knex.Config: I would propose a new replication key with which we can specify the cluster configuration:
interface ReplicationConfig {
  connection: string | StaticConnectionConfig | ConnectionConfigProvider,
  pool?: PoolConfig;
}

interface Config<SV extends {} = any> {
  // ... all the other options currently available
  replication: {
    write: ReplicationConfig,
    read: ReplicationConfig[],
  },
}
  1. When the client is being initialized and a replication configuration is present it should not create a single connection pool but instead create one connection pool for the write connection and one for each read connection. This way we can provide different scheduling strategies like round robin or even a least load strategy by selecting the server that has the least current connections. In the case that the write connection should also be used for reading, the connection should be listed in both write and read, but we may consider that they use the same connection pool to lower the actual load of the write server. Also the pool configuration for the read connection would not make any sense and will be ignored.

  2. It is necessary that the Runner passes a parameter to the client.acquireConnection() function indicating whether it needs a write or a read connection (depending on builder._method (can we rely on this?)). This parameter can be optional and by default the write connection will be used. The Client instance can then acquire a connection from its connection pools. This would also be compatible with transactions for which we will default to the write connection (we can assume that transactions include at least one write query).

  3. It is always possible to explicitly specify that the write connection should be used for a given query using @Frolanta's approach of using the query context (which definitely is a good idea IMO).

Yet, the problem of so called sticky connections (using the same connection for one request) remains, but still I think that the developer has the option to chose the connection based on the code.

@llamadeus
Copy link

With just a few changes to the original code I was able to extend the default Client class to support read-replication which I would like to PR into the code base.

There are just a couple of questions which I would like to discuss first before creating a PR.

  1. Is one write connection enough?
  2. Which replication methods should be supported? This should also answer question 1. Is read-replication enough (one connection for writing, multiple connections for reading) or do we want something like Galera clustering as well? Then, does it even make sense to support Galera clustering when it is recommended to use it with an actual load balancer like HAProxy?
  3. What to do in the case of a connection problem? Should we temporarily ban/ignore a node when acquiring a new connection failed? And should we automatically try a different node on error?

@jpike88
Copy link
Contributor

jpike88 commented Jul 25, 2020

Is one write connection enough?

I think that should be configurable, and considering that there are typically more than a few connections at a minimum, it shouldn't default to 1... maybe 3?

Which replication methods should be supported? This should also answer question 1. Is read-replication enough (one connection for writing, multiple connections for reading) or do we want something like Galera clustering as well? Then, does it even make sense to support Galera clustering when it is recommended to use it with an actual load balancer like HAProxy?

Never heard of Galera Cluster, likely the market share of ppl using it versus standard configurations is so small it's not worth consideration.

What to do in the case of a connection problem? Should we temporarily ban/ignore a node when acquiring a new connection failed? And should we automatically try a different node on error?

Good question... my main concern with your PR is keeping scope constrained so whatever it does, it does it well and with room for configuration

I'm not a maintainer, just weighing in

@elhigu
Copy link
Member

elhigu commented Jul 27, 2020

@llamadeus please before trying to push that solution to knex, first implement it as an external package, which uses knex internally. I would say that you are better off with having separate getters for read / write connections already becuase of transactions.

Transactions are always needed when ever you are doing anything non-trivial. So you actually have to know what kind of connection you need already before making the queries.

I haven't seen any reason why you couldn't implement that clustering functionality outside of the knex and I wouldn't like to see this prototyped in knex core code, since when its done, then maintenance will be in shoulders of knex core maintainers and handling bug reports etc.

With external package you can:

  • setup as many read / write servers as you want.
  • create a strategy which selects for example the one with least used connection for the next transaction (knex.client.pool provides this information)
  • create knex instance for query that actually doesn't have connection set yet and then write heuristics, that analyzes created query and decides if read or write connection is needed and request one for the query from you wrapper package for execution (might require that you write some knex plugin for that, if knex extension API is not powerful enough then it can be made better to support also needs of that package)

In any case the most important feature you will need is to manually request read or write transaction.

@jacob-israel-turner
Copy link

@llamadeus Let me know if there's anything I can do to help with your work.

I'm not a maintainer, just a project consumer - we implemented Knex about nine months ago and have slowly been replacing Sequelize. However we're now to the scale where we need to implement read replicas in our Aurora cluster. I had assumed going in that Knex supported this out of the box - I'm surprised to find that it does not. We'll either need to help out with this issue, and get read/write replica support into Knex, or we'll have to bake our own internal solution.

@Ali-Dalal
Copy link

@llamadeus same here, I am not a maintainer, but I've been using Knex a lot recently.
I will be happy If I can provide any help.👍

@jpike88
Copy link
Contributor

jpike88 commented Jul 13, 2021

There's one shortcoming with @Frolanta's code, which has been useful to me anyway:

If you're using asyncStackTraces, and there are undefined bindings in a query, an error gets thrown at toSQL() in the wrapper. This causes an useless error as the asyncStackTraces code never had time to execute. At first I lodged #4531 because I thought it was a Knex internal issue but have just realised the real issue.

Here @Frolanta's code with my changes near the bottom:

// We override the runner method of our wrapper client
knexWrapper.client.runner = (builder) => {
	// here we will redirect the query on the correct knex object,
	// We use this method since this is one of the first executed
	// after your query has been built and it's still before the aquireConnection process

	// bypass with knex.select('*').queryContext({ useMaster: true })....
	// this is to force your query to be executed on read or write endpoint
	// not sure about using queryContext for this but it seems ok to me

	if (builder._queryContext && builder._queryContext.useMaster === true) {
		return knexMaster.client.runner(builder);
	} else if (
		builder._queryContext &&
		builder._queryContext.useMaster === false
	) {
		return knexReader.client.runner(builder);
	}

	// .toSQL() return an object or an array of object (not sure about that but this is what I found in /lib/runner.js)
	// if toSQL fails, just point at master. this is because the query won't run anyway, but stack trace will be preserved.
	let sql;
	let useMaster = true;
	try {
		sql = builder.toSQL();
		useMaster = Array.isArray(sql)
			? checkUseMasterMultiple(sql)
			: checkUseMasterSingle(sql);
	} catch (error) {
		// swallow this, it will be thrown properly in a second when Knex internally runs it
	}

	return useMaster
		? knexMaster.client.runner(builder)
		: knexReader.client.runner(builder);
};

@AlbertoMontalesi fffuuuu

@thiagoarioli

This comment has been minimized.

@nicomabs
Copy link

Anything new to handling clustered db with knex ? For switching db automatically ?

Thanks

@mbkkong
Copy link

mbkkong commented Jan 22, 2023

+10

@enchorb
Copy link

enchorb commented Jan 25, 2023

+1

@jpike88
Copy link
Contributor

jpike88 commented Feb 26, 2023

I should add more context around this workaround I did:

#2253 (comment)

In that implementation it still blows away all slack outside the knex code, even with asyncstacktrace enabled... so that sucks.

@jpike88
Copy link
Contributor

jpike88 commented Feb 26, 2023

Actually, found the problem. knexWrapper itself needs to have asyncStackTrace enabled for the stack to be preserved. thank god

@mbkkong
Copy link

mbkkong commented May 15, 2023

@kibertoad @OlivierCavadenti any updates about this issue?

@Nedudi
Copy link

Nedudi commented Jul 26, 2023

+100

@aoscodes
Copy link

aoscodes commented Nov 6, 2023

For anyone implementing any of the workarounds suggested here:

We put together an implementation based on @jpike88 's code in this thread. That code will break transactions if you route them through the knexWrapper as is. We got around this through a somewhat brute-force method of passing all transactions to our read/write instance.

ala:

  knexWrapper.context.transaction = function(...props) {
    return knexMaster.context.transaction(...props);
  }

Seems to work just fine.

@siakc
Copy link

siakc commented Feb 23, 2024

I was thinking instead of changing knex we can manage this at a higher level. Suppose we have servers R1, R2 and W.
We create a knex connection to each one of them. Give each connection a flag so to know it is only readable, only writable or read/write. We put all this connection objects into an array.
When we create a query we call a helper function to give us the desired connection. It would become something like:

 connectionHelper('r')('TABLEX').select(...)  // R1 gets selected
 connectionHelper('w')('TABLEX').insert(...)  // W gets selected

This helper function can implement round robin or random scheme to select a node (in the selected class) in each call.

@seanmangar
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests