-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: improve duckdbDataSource performance #278
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
827e771
to
c52cec8
Compare
…queries, and use connection.all to speed up
f3f1048
to
bd598ab
Compare
2590a27
to
0936c5c
Compare
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## develop #278 +/- ##
===========================================
- Coverage 90.21% 90.08% -0.13%
===========================================
Files 350 351 +1
Lines 5967 6044 +77
Branches 803 816 +13
===========================================
+ Hits 5383 5445 +62
- Misses 423 437 +14
- Partials 161 162 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
659e429
to
fdcfc27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside some suggestion, others LGTM
// write to txt file | ||
public static writePerformanceReport() { | ||
const filePath = path.join('./performanceRecord.txt'); | ||
// print current date, time as humun readable format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: human
@@ -99,49 +99,74 @@ export class DuckDBDataSource extends DataSource<any, DuckDBOptions> { | |||
} | |||
const { db, configurationParameters, ...options } = | |||
this.dbMapping.get(profileName)!; | |||
const builtSQL = buildSQL(sql, operations); | |||
const [builtSQL, streamSQL] = buildSQL(sql, operations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest commenting on the purpose of why do you separate two SQL
|
||
const result = await statement.stream(...parameters); | ||
const firstChunk = await result.nextChunk(); | ||
const [result, asyncIterable] = await Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest you comment on why you separate two SQL and call one by .all
and call another by stream
, do let members know the reason why not call .all
and convert to stream only or use .stream
only
|
||
const result = await statement.stream(...parameters); | ||
const firstChunk = await result.nextChunk(); | ||
const [result, asyncIterable] = await Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use an other name to replace asyncIterable
? not straightforward actually.
} | ||
}), | ||
]); | ||
const asyncIterableStream = new Readable({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a method to refactor the execution to prevent too long to read, e.g: convertToStream(result, iterable)
// set duckdb thread to number | ||
private async setThread(db: duckdb.Database) { | ||
const thread = process.env['THREADS']; | ||
|
||
if (!thread) return; | ||
await new Promise((resolve, reject) => { | ||
db.run(`SET threads=${Number(thread)}`, (err: any) => { | ||
if (err) reject(err); | ||
this.logger.debug(`Set thread to ${thread}`); | ||
resolve(true); | ||
}); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, if you still would like to keep the method, please comment the method's purpose and add the usage way in the README
@@ -26,15 +26,23 @@ export const isNoOP = ( | |||
return true; | |||
}; | |||
|
|||
const duckdbExecuteChunkSize = | |||
process.env['DUCKDB_EXECUTE_CHUNK_SIZE'] || '2000'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to update to mentioned the env in README, and what influence will be if set the env
to large value or small value.
b4980f9
to
75ab334
Compare
…t variable description in READ.me
75ab334
to
58db508
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thanks for fixing.
Description
Note:
With the same data, the performance of stream.nextChunk is 1.5~2 times of the performance of connection.all