-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk helper semaphore handling results in hanging await for long-running requests (exceeding flushInterval) #1562
Comments
Yes, I can confirm this. It my case this tends to happen only when I push for larger concurrency and/or flushBytes (process' performance should allow for me to). I'm streaming data from large MySQL tables concurrently and bulk helper client hangs seemingly randomly during the 40+ minute indexing process. node version: v16.13.1 |
A workaround that's working for me is to set the |
We seem to be experiencing the same issue |
We understand that this might be important for you, but this issue has been automatically marked as stale because it has not had recent activity either from our end or yours. Note: in the past months we have built a new client, that has just landed in master. If you want to open an issue or a pr for the legacy client, you should do that in https://github.com/elastic/elasticsearch-js-legacy |
(comment to avoid stale) |
Heya, I wrote this to reproduce the issue: const dataset = [
{ user: 'jon', age: 23 },
{ user: 'arya', age: 18 },
{ user: 'tyrion', age: 39 }
]
test('issue #1562', async t => {
async function handler (req, res) {
console.log(req.url)
setTimeout(() => {
res.writeHead(200, { 'content-type': 'application/json' })
res.end(JSON.stringify({ errors: false, items: [{}] }))
}, 1000)
}
const [{ port }, server] = await buildServer(handler)
const client = new Client({ node: `http://localhost:${port}` })
async function * generator () {
const data = dataset.slice()
for (const doc of data) {
await sleep(1000)
yield doc
}
}
const result = await client.helpers.bulk({
datasource: Readable.from(generator()),
flushBytes: 1,
flushInterval: 1000,
concurrency: 1,
onDocument (doc) {
return {
index: { _index: 'test' }
}
},
onDrop (doc) {
t.fail('This should never be called')
}
})
t.type(result.time, 'number')
t.type(result.bytes, 'number')
t.match(result, {
total: 3,
successful: 3,
retry: 0,
failed: 0,
aborted: false
})
server.stop()
}) My observations:
It happens both in v7 and v8. |
@brentmjohnson I have a potential fix for this in #2027. Details in the comments there for how to test it out, if this is still affecting you. Barring any surprises during testing, the fix should be ready to go out in the next patch or minor release. |
Writing a note for future-me, in case I need to test this again: the code below was useful for testing bulk flushInterval issues like the one described here. Not quite clean enough to add to any test suite, but still nice to hold on to. 😄 import * as http from 'http'
import { Readable } from 'stream'
import { Client } from '../'
import { buildServer } from './utils'
import { sleep } from './integration/helper'
const flushInterval = 1000
const dataset = [
{ user: 'jon', age: 23 },
{ user: 'arya', age: 18 },
{ user: 'tyrion', age: 39 }
]
async function handler (req: http.IncomingMessage, res: http.ServerResponse) {
setTimeout(() => {
res.writeHead(200, { 'content-type': 'application/json' })
res.end(JSON.stringify({ errors: false, items: [{}] }))
}, 1400)
}
async function main() {
const [{ port }, server] = await buildServer(handler)
const client = new Client({ node: `http://localhost:${port}` })
console.log('one')
await sleep(10000)
console.log('two')
await sleep(10000)
console.log('three')
await sleep(10000)
const result = await client.helpers.bulk({
datasource: Readable.from(generator()),
flushBytes: 1,
flushInterval: flushInterval,
concurrency: 1,
onDocument (_) {
return {
index: { _index: 'test' }
}
},
onDrop (_) {
throw new Error('onDrop')
}
})
console.log(result)
server.stop()
}
let generated = 0
async function * generator () {
const data = dataset.slice()
for (const doc of data) {
await sleep(flushInterval)
generated++
console.log(`generated ${generated}`)
yield doc
}
}
main()
.then(() => console.log('then'))
.catch((err) => console.error('catch', err))
.finally(() => console.log('finally')) |
Merged and released in 8.12.1. Thanks to @pquentin for handling the last steps while I was out on leave. 🙏 |
🐛 Bug Report
Bulk helper hangs forever when flushInterval is exceeded while iterator is already awaiting semaphore.
To Reproduce
Steps to reproduce the behavior:
Paste your code here:
Expected behavior
Bulk helper awaits gracefully for queued requests to complete, error, or timeout.
Paste the results here:
Your Environment
@elastic/elasticsearch
version: >=7.15.0The text was updated successfully, but these errors were encountered: