Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cli): import data errors with strings, nulls #290

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dtbuchholz
Copy link
Contributor

@dtbuchholz dtbuchholz commented Jun 18, 2024

Summary

Fixes import-data command wrt strings and empty CSV values.

Edit: maybe the original setup and "requiring" the user make the CSV SQL-compatible is the right approach? I initially made these changes to try and debug the issue, which turned out to be due to an outdated Studio CLI release. This PR is basically a more flexible way to send CSV data vs. the current setup.

Details

A user reported that importing data into a table was not working when they had an empty value in the CSV file. This led to further digging, uncovering that strings weren't being inserted correctly (they were not wrapped in single quotes) in addition to NULL values not being inserted in case a row had an empty value for a column.

How it was tested

Added a new test file for import-data flows.

Copy link

vercel bot commented Jun 18, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 19, 2024 4:30pm

@@ -61,6 +64,13 @@ export const handler = async (
signer,
aliases,
});
const baseUrl = SdkHelpers.getBaseUrl(31337);
const val = new Validator({ baseUrl });
// get the table schema to help map values to their type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we can't assume the type (e.g., a JS number could be a TEXT column), we have to map the schema col types to the CSV headers (see the update in utils).

rowCount === 1 ? "" : "s"
} into ${definition}
const rec = await result.meta.txn?.wait();
if (rec?.errorEventIdx !== undefined) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before, we were logging that the tx was successful. in reality, data was being inserted like INSERT INTO my_table(id,val) VALUES(1,my_value)—i.e., the my_value was erroring out since it wasn't wrapped in '

table: string,
) {
let rowCount = 0;
const batches = [];
while (rows.length) {
let statement = `INSERT INTO ${table}(${headers.join(",")})VALUES`;
// map headers to schema columns to add in the header index
const schemaWithIdx = headers.map((header, idx) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, take the schema columns and add the header/col index wrt the CSV file to help with logic below

schemaWithIdx: Array<Record<string, any>>,
) {
// wrap values in single quotes if the `type` is not `int` nor `integer`
const rowFormatted = row.map((val, idx) => {
Copy link
Contributor Author

@dtbuchholz dtbuchholz Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are probs more performant ways to achieve this setup, but this will get the type for the row value based on the header index.

if the CSV row value gets parsed as an empty string, assume a NULL value. else, use the SQL type to dictate single quote wrapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this a bit more...the downside here is that it doesn't sanitize the data. so, if you have a CSV file with single quotes in the actual value, that could pose a syntax issue.

i did consider refactoring this to use the bind() method when initially creating the statements, which would handle this issue. instead, i ended up with this approach to keep things more aligned with the setup as-is.


before(async function () {
const batch = await db.batch([
// use no backticks vs. including them to emulate a non-Studio vs. Studio
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i noticed a comment about the Studio table columns having backticks around them, and the user might not know this: here. e.g., a col name is `id`, but in the UI, a user defines it as id.

i can't remember—is this still the approach? it didn't look to have an impact here, though, because i used a table that was created outside of the Studio and without the backticked col names.

@joewagner
Copy link
Contributor

Edit: maybe the original setup and "requiring" the user make the CSV SQL-compatible is the right approach?

@dtbuchholz I investigated this idea a little when I first built this command. At the time I decided that requiring the user to handle formatting was the most straight forward, but if people want the command to handle formatting it seems reasonable.
I'll give this a proper review this week if I have some free time.

@asutula
Copy link
Contributor

asutula commented Jul 18, 2024

What's the plan with this PR @dtbuchholz and @joewagner?

@joewagner
Copy link
Contributor

What's the plan with this PR @dtbuchholz and @joewagner?

I was looking at this today. If we do want to go this route, we should make sure quotes are being escaped correctly and row name escape characters work as expected

test.only("can import with quotes, escape chars, and multi line", async function () {
/* eslint-disable no-useless-escape */
const csvStr = `id,val
1,"I'll work"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joewagner up to you! i've trying to mess with the csv parser to get things to work with these different test cases. lmk if this is what you were thinking.

(I haven't been able to get the csv-parse to work properly tbh, and I'll probably need to add a SQLite-specific parser on top of it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants