Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session is prone to race conditions #1372

Closed
wojpawlik opened this issue Feb 19, 2021 · 12 comments · Fixed by #1373 or #1713
Closed

session is prone to race conditions #1372

wojpawlik opened this issue Feb 19, 2021 · 12 comments · Fixed by #1373 or #1713
Milestone

Comments

@wojpawlik
Copy link
Member

  • Telegraf.js Version: 4.0.3

Minimal Example Code Reproducing the Issue

'use strict'

const { Telegraf, session } = require('telegraf')
const createDebug = require('debug')

const bot = new Telegraf(process.env.BOT_TOKEN)
const debug = createDebug('test')

bot.use(session())
bot.on('message', async ctx => {
  if (ctx.session === undefined) {
    const { update_id } = ctx.update
    ctx.session = { update_id }
    debug('Defaulting session to', ctx.session)
  } else {
    debug('Session already set to', ctx.session)
  }
})

createDebug.enable('telegraf:client test')
bot.launch()

Send a few messages, then start the bot. After it processes the messages, send one more.

Expected Behavior

Each session change is atomic:

  telegraf:client HTTP call getMe {} +0ms
  telegraf:client HTTP call deleteWebhook { drop_pending_updates: undefined } +220ms
  telegraf:client HTTP call getUpdates { timeout: 50, offset: 0, allowed_updates: [] } +164ms
  test Defaulting session to { update_id: 224422703 } +0ms
  test Session already set to { update_id: 224422703 } +1ms
  test Session already set to { update_id: 224422703 } +1ms
  telegraf:client HTTP call getUpdates { timeout: 50, offset: 224422706, allowed_updates: [] } +368ms
  test Session already set to { update_id: 224422703 } +3s
  telegraf:client HTTP call getUpdates { timeout: 50, offset: 224422707, allowed_updates: [] } +3s

Current Behavior

ctx.session is read before processing each update, and written only after processing has finished. Concurrent tasks might not notice each-others' modifications, last task to finish overwrites data:

  telegraf:client HTTP call getMe {} +0ms
  telegraf:client HTTP call deleteWebhook { drop_pending_updates: undefined } +220ms
  telegraf:client HTTP call getUpdates { timeout: 50, offset: 0, allowed_updates: [] } +164ms
  test Defaulting session to { update_id: 224422703 } +0ms
  test Defaulting session to { update_id: 224422704 } +1ms
  test Defaulting session to { update_id: 224422705 } +1ms
  telegraf:client HTTP call getUpdates { timeout: 50, offset: 224422706, allowed_updates: [] } +368ms
  test Session already set to { update_id: 224422705 } +3s
  telegraf:client HTTP call getUpdates { timeout: 50, offset: 224422707, allowed_updates: [] } +3s

I believe every https://www.npmjs.com/search?q=telegraf-session- and https://github.com/nitreojs/puregram/tree/master/packages/session are affected as well.
cc @EdJoPaTo @KnightNiwrem @KnorpelSenf @mdsina @nitreojs @TemaSM

Solution?

Drop async SessionStore support (related: #1342), use Object.defineProperty so that each ctx.session access performs .get, .set, or .delete.

In the meantime, I'll advise session users to always use https://github.com/KnightNiwrem/telegraf-throttler, with default settings.

@nitreojs

This comment has been minimized.

@EdJoPaTo

This comment has been minimized.

@EdJoPaTo
Copy link
Contributor

as I understood (and I might have understood wrong), the problem is in parallel handling messages that were sent to bot while bot was OFF, right?

This should affect all quickly incoming messages handled in an async way. Sending a bunch of messages while the bot is not running is just a simple way to simulate that. At least that is what I currently understand about it.

@EdJoPaTo
Copy link
Contributor

Can this ever be a problem for different users?

If one user / chat is spamming like hell and change the session all the time I personally wouldn't care if the bot isnt doing exactly everything they spammed in the correct order without any strange effects. If this does result in other users having strange results while one chat is spamming like hell, this is a problem but I currently think that isnt the case here?

@wojpawlik
Copy link
Member Author

wojpawlik commented Feb 20, 2021

With default getSessionKey, users cannot affect other users' session. Still, users can unknowingly "spam" the bot while it's offline, or while it's taking a while to process some updates and not fetching more, possibly leading to data corruption.

I tested some other libraries:

telegraf-session-localAFFECTED! cc @EdJoPaTo @TemaSM
const { Telegraf } = require('telegraf')
const LocalSession = require('telegraf-session-local')
const createDebug = require('debug')

const bot = new Telegraf(process.env.BOT_TOKEN)
const debug = createDebug('test')

bot.use((new LocalSession({})).middleware())
bot.on('message', async ctx => {
  if (ctx.session.update_id === undefined) {
    ctx.session.update_id = ctx.update.update_id
    debug('Defaulting session to', ctx.session)
  } else {
    debug('Session already set to', ctx.session)
  }
})

createDebug.enable('telegraf:client test')
bot.launch()
telegraf-session-redisAFFECTED! cc @dotcypress
const { Telegraf } = require('telegraf')
const RedisSession = require('telegraf-session-redis')
const createDebug = require('debug')

const bot = new Telegraf(process.env.BOT_TOKEN)
const debug = createDebug('test')

bot.use((new RedisSession({})).middleware())
bot.on('message', async ctx => {
  if (ctx.session.update_id === undefined) {
    ctx.session.update_id = ctx.update.update_id
    debug('Defaulting session to', ctx.session)
  } else {
    debug('Session already set to', ctx.session)
  }
})

createDebug.enable('telegraf:client test')
bot.launch()
@telegraf/sessionAFFECTED! cc @ejnshtein
import { Telegraf } from 'telegraf'
import session from '@telegraf/session'
import createDebug from 'debug'

const bot = new Telegraf(process.env.BOT_TOKEN)
const debug = createDebug('test')

bot.use(session())
bot.on('message', async ctx => {
  if (ctx.session.update_id === undefined) {
    ctx.session.update_id = ctx.update.update_id
    debug('Defaulting session to', ctx.session)
  } else {
    debug('Session already set to', ctx.session)
  }
})

createDebug.enable('telegraf:client test')
bot.launch()

Perhaps @telegraf/core should be deprecated, now that telegraf@4 natively supports ESM?

@ivaniuk/telegraf-session-redisAFFECTED! cc @igorivaniuk
const { Telegraf } = require('telegraf')
const { TelegrafSessionRedis } = require('@ivaniuk/telegraf-session-redis')
const Redis = require("ioredis")
const createDebug = require('debug')

const bot = new Telegraf(process.env.BOT_TOKEN)
const debug = createDebug('test')
const client = new Redis()

bot.use((new TelegrafSessionRedis({ client })).middleware())
bot.on('message', async ctx => {
  if (ctx.session.update_id === undefined) {
    ctx.session.update_id = ctx.update.update_id
    debug('Defaulting session to', ctx.session)
  } else {
    debug('Session already set to', ctx.session)
  }
})

createDebug.enable('telegraf:client test')
bot.launch()
telegraf@3.38.0AFFECTED! cc @Loskir
const { Telegraf, session } = require('telegraf')
const createDebug = require('debug')

const bot = new Telegraf(process.env.BOT_TOKEN)
const debug = createDebug('test')

bot.use(session())
bot.on('message', async (ctx) => {
  if (ctx.session.update_id === undefined) {
    ctx.session.update_id = ctx.update.update_id
    debug('Defaulting session to', ctx.session)
  } else {
    debug('Session already set to', ctx.session)
  }
})

createDebug.enable('telegraf:client test')
bot.launch()
@puregram/sessionAFFECTED! cc @nitreojs

https://github.com/nitreojs/puregram/tree/99f7e273b87fc4e53d77488cc815392667df87ea/packages/session#example-usage

Send /counter 3 times, start the bot, send /counter once more.

Expected result:

You called the bot 1 times!
You called the bot 2 times!
You called the bot 3 times!
You called the bot 4 times!

Actual result:
obraz

@wojpawlik wojpawlik pinned this issue Feb 20, 2021
@wojpawlik
Copy link
Member Author

telegraf session's main issue are stale reads. This is fixable, but only in a major release. I'd rather deprecate session to discourage users from relying on shared state. Users can switch to either (from best to worst):

  1. https://npm.im/telegraf-stateless-question,
  2. https://npm.im/ioredis,
  3. Fixed synchronous session,
  4. Broken session, with https://npm.im/telegraf-throttler configured to 1 concurrent update per user (this is the default).

Synchronous session middleware (telegraf-session-local, @telegraf/session, @puregram/session) should be fixable.

I believe asynchronous session middleware (telegraf-session-redis, @ivaniuk/telegraf-session-redis) would need to perform asynchronous call synchronously in order to avoid stale reads.

@KnightNiwrem
Copy link
Contributor

Is it only fixable in a major release because of public API changes to session? Would the use of mutex be overkill here?

@wojpawlik
Copy link
Member Author

One worker already cannot run JS in paralell. Thus, mutex is probably overkill, and would probably require changing the interface.

In synchronous sessions, stale reads are the problem: one task overwrites ctx.session, concurrent task sees old value. This can be fixed either by

  1. disallowing ctx.session reassignment anyhow (breaking change),
  2. storing the data on each overwrite (rather than after the task), and reading on each access (rather than once, before the task). Because reading ctx.session is synchronous, this would require restricting Store to synchronous reads and writes (breaking change). Obviously this cannot be applied to asynchronous session middleware.

Note, none of this prevents a task from overwriting a property within ctx.session while another task is referencing it.

@EdJoPaTo
Copy link
Contributor

Personally I tend to move away from session for persistent data. If I have a bunch of lets say movies they are managed in their own way and ctx.session is not involved there while doing something with them. Session is too general for me most of the time anyway.

I tend to use the session for short term "volatile" stuff like "current page in pagination" or object creation. This is either stuff with default values / optional checks anyway (page ?? 1) or stuff which is created in multiple steps and then checked on "finish". Is it correct and is everything needed there? If yes, it gets added to something else than the session to maintain its data.

As the session is empty for new users or with telegraf/session after restarts the bot has to deal with empty sessions anyway. If the user manages to kill (some part of) the session content its not much different than a restart to me. Everything in my sessions is optional (or readonly) so existence has to be checked anyway. One example of one of my sessions in a not that complex bot.

One of the main reasons why I like to use my inline menu and stateless question is their lack of stateful stuff required. Most of the state is embedded in the telegram updates (message text and callback data). They dont require much additional state / much in the session in the first place.
With the hide functions the inline menu can work with stuff like empty sessions after a restart without problems as it only allows a user to be in a certain place if a given condition is met (like something needed in the session is there).

Personally I think enforcing session?: Optional<TSession> somehow might be helpful but other than that… I don't really care / dont see much impact here. At least for my short term usage of session.

micalevisk added a commit to micalevisk/simple-tg-bot that referenced this issue Feb 28, 2021
note that `telegraf` was not upgrated to v4 (major update) for now due to few
breaking changes that this version introduces. More one this here:
telegraf/telegraf#1372 (comment)
@wojpawlik wojpawlik reopened this Apr 16, 2021
@wojpawlik wojpawlik mentioned this issue Apr 17, 2021
4 tasks
@wojpawlik wojpawlik added this to the v5.0 milestone Apr 22, 2021
@wojpawlik
Copy link
Member Author

Either reads must be synchronous (#1436), or updates from one user / chat must be processed sequentially (#1377).

In any case, MemorySessionStore is useless and needs to go. Map will be default storage.

@wojpawlik wojpawlik reopened this Apr 24, 2021
@AlexRMU

This comment has been minimized.

@felinto-dev
Copy link

I'm not sure I understand this issue.

ctx.session is read before processing each update, and written only after processing has finished. Concurrent tasks might not notice each-others' modifications, last task to finish overwrites data:

In the meantime, I'll advise session users to always use https://github.com/KnightNiwrem/telegraf-throttler, with default settings.

I could use this telegrafThrottler configuration to solve this problem ?

      telegrafThrottler({
        in: {
          maxConcurrent: 1,
        },
      }),

MKRhere added a commit that referenced this issue Oct 7, 2022
MKRhere added a commit that referenced this issue Nov 18, 2022
MKRhere added a commit that referenced this issue Nov 22, 2022
MKRhere added a commit that referenced this issue Feb 12, 2023
MKRhere added a commit that referenced this issue Mar 5, 2023
@MKRhere MKRhere unpinned this issue Mar 5, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants