Cache strategy for channels in ChannelService #988

karel-advantitge · 2021-07-14T09:57:09Z

Is your feature request related to a problem? Please describe.
The ChannelService as-is holds an array of all channels in memory, which is updated whenever a channel is created/updated. This makes the vendure instance stateful and can lead to invalid/out-of-sync data if a project uses multiple vendure instances.

Describe the solution you'd like
Querying from the database on every call would solve this, but since the data is used on every call (specifically getChannelFromToken) this would become a bottleneck.
Querying the session data presented a similar bottleneck, so I think we need a similar solution.
Similar to the SessionCacheStrategy config, I'd propose a ChannelCacheStrategy

The text was updated successfully, but these errors were encountered:

michaelbromley · 2021-07-14T15:18:38Z

Yes this needs to be solved. Let's consider the approaches:

Cache nothing, fetch from DB when needed
✔ Data always correct
✔ Simple, no extra config.
❌ Performance impact. It would not be as bad as the session request, since there are only 2 top-level joins. The session query joins 3 levels deep over a much larger data set.
Use a cache with TTL
Use the TtlCache class to cache the channels for a limited period of time, e.g. 30 seconds.
✔ Performance would be similar to current solution
✔ No extra config to add to the API
❌ Still possibility of getting stale data up to seconds after a change.
Configurable cache strategy
✔ Can externalize the cache store and share between instances
✔ Full control over caching
❌ Increases the config API surface

I personally like the simplicity of option 2, but in some circumstances I can imagine even a short amount of inconsistency might not be acceptable.

Investigating further, it can be seen that a similar in-memory caching technique is used in PromotionService.activePromotions, and ZoneService.updateZonesCache() which has the exact same issues as with the ChannelService.

Therefore it might make more sense to define a generalized caching strategy, which would encompass Sessions, Channels, Promotions, Zones and anything else needed in the future. This could be a single config field, which would deprecate and succeed the SessionCacheStrategy.

I'm not exactly sure of what the API of a general cache strategy would look like, but to me that's quite an elegant solution.

Thoughts?

hendrik-advantitge · 2021-08-02T06:37:46Z

Hi Michael, sorry to get back to you on this so late.
Our preference is certainly your last proposal, a single CacheStrategy that works for all current and future entities.
The amount of config is not increased, and it seems unlikely that one would prefer different caching strategies for these entities.

hendrik-advantitge · 2021-09-22T06:38:41Z

Hi Michael. If at all possible I would like to bump the priority on this issue as it is really causing us trouble in our serverless setup. Multiple instances of Vendure exist in parallel, but they need to be restarted every time a Channel is added to or removed from the system.

michaelbromley · 2021-09-22T07:08:57Z

Yep, let's get it in the next minor 👍

michaelbromley · 2021-09-24T18:33:48Z

Also handle this line as part of the solution:

vendure/packages/core/src/service/services/product-variant.service.ts

Lines 273 to 274 in 6f7cc34

    
           // TODO: Use caching (RequestContextCacheService) to reduce DB calls 
        
           const { outOfStockThreshold, trackInventory } = await this.globalSettingsService.getSettings(ctx);

This is making a db query for every product variant where the stock level is selected.

michaelbromley · 2021-09-27T08:33:44Z

I've been doing some exploratory work on this issue and here's a problem with the configurable cache strategy option:

Several of the internal caches are storing deeply-nested class instances. For example, ChannelService.allChannels stores reference to an array of Channel instances, each of which has nested Zone instances.

A generic caching mechanism would need to be able to serialize this in order to store it in eg a database or Redis. The problem comes when we deserialize these data structures, the resulting objects will be POJOs and no longer class instances. Thus any downstream code which either relies on an instanceof check or attempts to access instance methods or getters will fail.

One solution to this would be to build-in some logic which allows us to reconstruct object instances based on the deserialized data. This rehydration process will however add time to the get() process, and may even nullify the gain of caching. It also adds considerable complexity to the solution.

I'm going to do a bit more exploration here and it might even be that with more use of the RequestContextCacheService (which is currently under-utilized), plus perhaps a short TTL cache, we might be able to solve this in a simpler way.

hendrik-advantitge · 2021-09-27T08:47:56Z

Hi Michael, this is a very valid point you are making here. We have been using class-transformer https://github.com/typestack/class-transformer in the past to solve this exact problem, but the delays introduced by doing this have to be considered. Maybe some simple tests might be in order to determine the introduced delay?

A cache with a (configurable) TTL could make sense as well, although this might give very unexpected results in multi-server environment. Multiple parallel requests by a client are not necessarily processed by the same server instance. As a consequence, multiple requests might give different results.

michaelbromley · 2021-09-27T09:08:53Z

Yes I looked at class-transformer, which would solve the serialization issue but the computation time is the unknown, plus I'm not keen on adding another layer of annotations onto the TypeORM entities.

I'm doing some benchmarking as I explore this, and will posts update this thread with results. To summarize, here is a list of all the points where data is currently cached in-memory:

Property	What is cached?	Access pattern
ChannelService.allChannels	All Channels + related tax & shipping Zone	`channelService.getChannelFromToken()` called on every request when building the RequestContext object.
ZoneService.zones	All Zones + nested members (Countries)	ProductVariantService.applyChannelPriceAndTax() - every ProductVariant view. OrderService.applyPriceAdjustments - every modification to Order.
PromotionService.activePromotions	All Promotions where `enabled: true`	Not actually used anywhere! Can be removed

hendrik-advantitge · 2021-09-27T14:42:37Z

You might have overlooked ShippingMethodService.activeShippingMethods

michaelbromley · 2021-09-27T15:24:58Z

@hendrik-advantitge thanks, that's right I missed that one.

So here is the direction I am going in so far:

I have decided to go with the second option mentioned in the above comment, namely keep caching in memory, but add a TTL to expire the cache, mainly due to the issues mentioned above with deserializing class instances.

I have created a function which allows the affected services to still work in the same way as presently, but adds a TTL to the caching. I found that it was important to keep the cache.get operation synchronous. I first of all attempted to make them async, but the infectious nature of async functions quickly started requiring a lot more changes than I wanted in the call stacks depending on these cached values.

So I am working on a design which will allow the cache.get operation to always be sync, while still triggering refreshes of the value async in the background:

/**
 * @description
 * A cache which always returns a value, and automatically refreshes itself if the value is found to be stale.
 */
export interface SelfRefreshingCache<V> {
    /**
     * @description
     * The current value of the cache. If the value is stale, the stale value is still returned,
     * but then will be asynchronously refreshed in the background.
     */
    value: V;

    /**
     * @description
     * Force a refresh of the value, e.g. when it is known that the value has changed such as after
     * an update operation to the source data in the database.
     */
    refresh(): Promise<void>;
}

export interface SelfRefreshingCacheConfig<V> {
    name: string;
    ttl: number;
    refreshFn: () => Promise<V>;
}

/**
 * @description
 * Creates a {@link SelfRefreshingCache} object, which is used to cache a single frequently-accessed value. In this type
 * of cache, the function used to populate the value (`refreshFn`) is defined during the creation of the cache, and
 * it is immediately used to populate the initial value.
 *
 * From there, when the `.value` property is accessed, it will _always_ return a value synchronously. If the
 * value has expired, it will still be returned and the `refreshFn` will be triggered to update the value in the
 * background.
 */
export async function createSelfRefreshingCache<V>(
    config: SelfRefreshingCacheConfig<V>,
): Promise<SelfRefreshingCache<V>> {
    const { ttl, name, refreshFn } = config;
    const initialValue = await refreshFn();
    let value = initialValue;
    let expires = new Date().getTime() + ttl;
    let refreshing = false;
    const refreshValue = () => {
        if (refreshing) {
            return Promise.resolve();
        }
        refreshing = true;
        Logger.debug(`Refreshing the SelfRefreshingCache "${name}"`);
        return refreshFn()
            .then(newValue => {
                value = newValue;
                expires = new Date().getTime() + ttl;
            })
            .catch(err => {
                Logger.error(
                    `Failed to update SelfRefreshingCache "${name}": ${err.message}`,
                    undefined,
                    err.stack,
                );
            })
            .finally(() => (refreshing = false));
    };
    return {
        get value() {
            const now = new Date().getTime();
            if (expires < now) {
                refreshValue();
            }
            return value;
        },
        refresh: refreshValue,
    };
}

In the example of the ChannelService, it would be used like this:

@Injectable()
export class ChannelService {

    private allChannels: SelfRefreshingCache<Channel[]>;

    /**
     * When the app is bootstrapped, ensure a default Channel exists and populate the
     * channel lookup array.
     */
    async initChannels() {
        await this.ensureDefaultChannelExists();
        this.allChannels = await createSelfRefreshingCache({
            name: 'ChannelService.allChannels',
            ttl: 10000,
            refreshFn: () => this.findAll(RequestContext.empty()),
        });
    }

   /**
     * Returns the default Channel.
     */
    getDefaultChannel(): Channel {
        const defaultChannel = this.allChannels.value.find(channel => channel.code === DEFAULT_CHANNEL_CODE);

        if (!defaultChannel) {
            throw new InternalServerError(`error.default-channel-not-found`);
        }
        return defaultChannel;
    }

   async update(
        ctx: RequestContext,
        input: UpdateChannelInput,
    ): Promise<ErrorResultUnion<UpdateChannelResult, Channel>> {
        // omitted the usual update logic

        // here we refresh the cached Channels array
        await this.allChannels.refresh();

        return assertFound(this.findOne(ctx, channel.id));
    }

In that example, the TTL is set to 10 seconds. This kind of timeframe seems reasonable, but we could also add some new config options to VendureConfig to make this configurable. I'd rather not have to do that, and if possible just find a value for each use-case which strikes a practical balance between avoiding staleness and avoiding hundreds of unneccesarry DB calls.

hendrik-advantitge · 2021-09-27T15:59:15Z

Hi Michael
Using an in memory cache with TTL seems indeed to be the better option when we want to avoid creating classes constantly. I reread the code dealing with caching Sessions, and it seems the trick to avoid this problem there is the creation of a new entity CachedSession. This is something we could apply here as well, but resulting in many breaking changes throughout the code (especially for Channel, as this object is literally passed around to every function).

TTL avoids all this mess and keeps things much simpler. The idea of returning a value even when it is stale seems unintuitive at first sight. I totally understand the reasoning though and doubt it will give any issues in practice. But in theory, with a TTL of 10 seconds, you could still retrieve a value that is hours old if the value has not been queried in that time.

It not being an issue only holds if we stick to the entities that are currently cached. If we decide to keep Promotions cached for some reason, it might turn out to be an issue. Depending on the deployment setup, an administrator could disable a promotion while a customer still gets the promotion applied (only once though).

What are your thoughts?

michaelbromley · 2021-09-27T18:14:26Z

But in theory, with a TTL of 10 seconds, you could still retrieve a value that is hours old if the value has not been queried in that time.

Does it happen that with a multi-instance setup you could have once instance that does not get traffic for such a long time? I would have thought that e.g. with serverless the instance might be cleaned up, and with a long-running instance with load balancer in front, each instance would get a similar frequency of requests. But I have not practical experience here so I'm interested to know if you ran into something like this so far.

hendrik-advantitge · 2021-09-27T18:30:33Z

In a serverless environment, unused instances indeed do get cleaned up after some time under normal circumstances. However Google, and possibly other providers as well, have introduced features that can keep a number instances 'warm'. In this scenario, the old cache is used when spinning up the instance. Another scenario, common for us, is that the API instance is kept running by an importer service that is updating products. This is a long running process, that will not trigger updates of the shippingMethod cache for example.

michaelbromley · 2021-09-27T18:39:14Z

OK, that makes sense. I took another go at making it async and I think it is doable. Rather get it correct than hope that edge cases won't appear - they always do!

hendrik-advantitge · 2021-09-27T18:54:33Z

100% agree on this!
Any reason why you would prefer not making the TTL configurable?

michaelbromley · 2021-09-27T19:04:17Z

Any reason why you would prefer not making the TTL configurable?

Yes, I have a bias against adding config if at all possible. Don't want to turn VendureConfig into webpack! I'm not dogmatic about it, it is just my default stance until convinced otherwise.

hendrik-advantitge · 2021-09-27T19:11:04Z

I think there is nothing wrong with a lot of config options, as long as there are good default values. It gives options to the advanced users without getting in the way of new users.

Relates to #988

Relates to #988. This change resolves stale data issues when running multiple Vendure instances in parallel, e.g. as serverless functions.

Relates to #988. Zone members (countries) were being newly translated for _every_ call to ZoneService.findAll(). In practice this was resulting in thousands of calls to `translateDeep`. This optimization cuts the calls down by a factor of 5-10x in various scenarios.

Relates to #988

ashitikov · 2022-01-23T21:00:12Z

Hey!
Looking at current implementation of ChannelService I found SelfRefreshingCache

vendure/packages/core/src/service/services/channel.service.ts

Line 42 in 6bc01aa

private allChannels: SelfRefreshingCache<Channel[], [RequestContext]>;

it not suitable for my setup.
I built a multi-tenant solution for vendure using row level security (RLS) in postgres, so the design is to have on each vendure entity 'tenantId' property, which identifies a tenant. So in my environment ChannelService caches wrong, because on refresh, it executes findAll() which will find all channels from current request's tenant. So if ChannelService will receive multiple requests from a few tenants, cache can return channels from the last request instead on your own.

I understand that vendure recommends to use channels as tenants, but here my thoughts:

If you have a 1000 tenants, you have 1000 channels. each time SelfRefreshingCache refreshes, it will hold-on in memory 1000 channels, and to getChannelFromToken you have complexity O(1000). Thats not so bad, but using this cache on a big environments become a bottleneck.
Configurable TTL gives you a little control of caching behavior, but introduces a consistency problem for multi-instance deployment, which you point on.
I'd like to implement my own cache to have full control on channel caching behavior. So in my setup, I'd like to have SelfRefreshingCache for each tenant, with flexible TTL setting:
tenant1 -> cache1 -> [channel1, channel2, channel3] (TTL = 10 sec)
tenant2 -> cache2 -> [channel4, channel5] (TTL = 5 sec)
So TTL can be controlled by configurable cache strategy.

Currently there is no way to control caching behavior. I think 3rd option "Configurable cache strategy" would be better for custom deployments.

I think I can contribute this feature.

@michaelbromley What do you think?

michaelbromley · 2022-01-25T09:23:22Z

Hi @ashitikov, thank you for your input on this.

The O(n) lookup problem could be solved by using a Map rather than a Array.find() lookup. That would give O(1) regardless of number of Channels. As for memory usage, I'm not sure whether the impact is significant, it would need to be measured.
Yes, that's the trade off with the current design. The reasoning being that Channels are one thing that very rarely changes under normal usage.
I would be happy to explore designs that would allow this in a non-breaking way. Open up a new issue & PR with your ideas!

michaelbromley added @vendure/core design 📐 This issue deals with high-level design of a feature labels Aug 2, 2021

michaelbromley added a commit that referenced this issue Sep 28, 2021

refactor(core): Remove unused promotions cache property

c5ba735

Relates to #988

michaelbromley added a commit that referenced this issue Sep 28, 2021

perf(core): Use per-request caching for hot ProductVariant paths

214b86b

Relates to #988

michaelbromley added a commit that referenced this issue Sep 28, 2021

feat(core): Remove all long-lived in-memory state, use short-TTL caching

d428ffc

Relates to #988. This change resolves stale data issues when running multiple Vendure instances in parallel, e.g. as serverless functions.

michaelbromley added a commit that referenced this issue Sep 28, 2021

feat(core): Make entity cache ttl values configurable

a05e7ab

Relates to #988

michaelbromley closed this as completed Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache strategy for channels in ChannelService #988

Cache strategy for channels in ChannelService #988

karel-advantitge commented Jul 14, 2021

michaelbromley commented Jul 14, 2021 •

edited

Loading

hendrik-advantitge commented Aug 2, 2021

hendrik-advantitge commented Sep 22, 2021

michaelbromley commented Sep 22, 2021 •

edited

Loading

michaelbromley commented Sep 24, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

ashitikov commented Jan 23, 2022 •

edited

Loading

michaelbromley commented Jan 25, 2022

Cache strategy for channels in ChannelService #988

Cache strategy for channels in ChannelService #988

Comments

karel-advantitge commented Jul 14, 2021

michaelbromley commented Jul 14, 2021 • edited Loading

hendrik-advantitge commented Aug 2, 2021

hendrik-advantitge commented Sep 22, 2021

michaelbromley commented Sep 22, 2021 • edited Loading

michaelbromley commented Sep 24, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

michaelbromley commented Sep 27, 2021

hendrik-advantitge commented Sep 27, 2021

ashitikov commented Jan 23, 2022 • edited Loading

michaelbromley commented Jan 25, 2022

michaelbromley commented Jul 14, 2021 •

edited

Loading

michaelbromley commented Sep 22, 2021 •

edited

Loading

ashitikov commented Jan 23, 2022 •

edited

Loading