Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk table import #272

Merged
merged 8 commits into from
May 29, 2024
Merged

Bulk table import #272

merged 8 commits into from
May 29, 2024

Conversation

dtbuchholz
Copy link
Contributor

@dtbuchholz dtbuchholz commented May 15, 2024

Summary

  • Adds new command for import table and import bulk
  • Exports variable from @tableland/validators to help with checking if table definition names in CSV are valid.
  • Adjust GH workflows to only occur on main and no longer use staging.

Details

  • You can now run studio import table <args> for a single table—which is the same as the old import-table command.
  • A new subcommand for import bulk <file> [--sanitize] lets you pass a CSV file table name, description, and new def name. The sanitize command will clean the new def names up for you, which is useful in case you have a lot of tables and are unsure why your def name isn't letting you import tables.

Copy link

vercel bot commented May 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 29, 2024 6:31pm

@dtbuchholz
Copy link
Contributor Author

@joewagner @asutula i had to import 143 DIMO tables into the Studio, so i ended up creating a command to help me do it. you can find all of the tables here: https://studio.tableland.xyz/partners/dimo

idk if this command something we actually want to add, so I've just kept it in draft as a reference. for example, the tRPC / API could probably handle this more efficiently and batch calls.

a few notes:

  • here's original CSV file and the data mapped to the column ordering that the import-tables command expects:
    dimo-tables.csv
    data-formatted.csv
  • since the table definition names had spaces in them, i created a --sanitize flag to help fix those.
    • idk if this is an actual feature that we want, though. the one argument for keeping it is that it's unclear what i can / can't use as a table def name—i kept hitting tRPC issues that provided no context as to why the table def was invalid until i added this feature. (in my scenario, it was just spaces, but it took me a minute to figure that out.)

and a couple of issues with the API:

  • the total time spent to import 143 tables: ~36 mins (~15s per table)
    • this seems high, doesn't it? i.e., an arbitrum nova tx + validator materialization should only take a few seconds total. but as noted, I'm sure the API itself could altered to handle the bulk scenario vs. single calls per table import and significantly help with this
  • i hit a 500 twice while running this for no apparent reason. it happened at row 15 (“Bristol”) and 70 ("Liger").
  • also note that "Liger" has duplicative entries in the UI: one is deployed, but the one that hit a 500 is still in a non-deployed status: https://studio.tableland.xyz/partners/dimo/ligier (but you'll have to scroll the UI to find the failed one)

here's the 500 error:

TRPCClientError: Error saving defintion and deployment records.
    at TRPCClientError.from (file:///Users/dtb/tbl/studio/node_modules/@trpc/client/dist/TRPCClientError.mjs:26:20)
    at file:///Users/dtb/tbl/studio/node_modules/@trpc/client/dist/links/internals/createHTTPBatchLink.mjs:59:60
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  meta: {
    response: Response {
      [Symbol(realm)]: null,
      [Symbol(state)]: [Object],
      [Symbol(headers)]: [HeadersList]
    },
    responseJSON: [ [Object] ]
  },
  shape: {
    message: 'Error saving defintion and deployment records.',
    code: -32603,
    data: {
      code: 'INTERNAL_SERVER_ERROR',
      httpStatus: 500,
      path: 'tables.importTable',
      zodError: null
    }
  },
  data: {
    code: 'INTERNAL_SERVER_ERROR',
    httpStatus: 500,
    path: 'tables.importTable',
    zodError: null
  },
  [cause]: undefined
}

package-lock.json Outdated Show resolved Hide resolved
}

export const command = "import-tables <file> [sanitize]";
export const desc = wrapText(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: i created this helper bc i couldnt get the yargs help command's formatting to print properly

definitionName || helpers.getPrefixFromTableName(tableName);
// Ensure custom definition name is valid
try {
await defNameSchema.parseAsync(defName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i needed this method from the studio-validators package to check if the user-defined table def name was usable.

and below this is where i do the "sanitize" step that may or may not be useful in the generalized use case...but it was for me w/ DIMO!

export * from "./validators/projects";
export * from "./validators/auth";
export * from "./validators/tables";
export { defNameSchema } from "./common.js";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastly, i had to add the .js extension to make this work with the CLI

@joewagner
Copy link
Contributor

We already have a command for importing tables. I've been trying to avoid having two commands that do the same thing and only differ by being plural or not.
The orginal naming with singular vs. plural wasn't meant to signify anything. For example if you create a single project the command is project, and if you list many projects the command is still project.

Curios if this makes sense to others, but maybe we refactor so that the command can take a file as input or use command parameters? Alternatively a more different name might make sense?

@joewagner joewagner marked this pull request as ready for review May 15, 2024 15:27
@joewagner joewagner marked this pull request as draft May 15, 2024 15:28
argv.description,
"definition description is required",
);
const projectId = helpers.getStringValue(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: i dropped this from being a positional since it's already a global flag. thus, this command can use the context, if set, vs explicitly being passed

@dtbuchholz
Copy link
Contributor Author

@joewagner so i removed the duplicate-ish commands and consolidated them into a single import command that replaces the current import-table command. this has table and bulk subcommands. it felt a bit more natural to me vs. trying to figure out flow for import-table --bulk or having separate commands that do that ~same thing:

> studio import table -h
cli.js import table <tableName> <description> [definitionName]

Import an existing tableland table into a project with description and,
optionally, with a new definition name

Positionals:
  tableName       The full Tableland table name to import    [string] [required]
  description     A description of the table being imported  [string] [required]
  definitionName  Optional new definition name for the table            [string]

and

> studio import bulk -h                           
cli.js import bulk <file> [sanitize]

Import existing Tableland tables into a project with CSV file that includes:
- Full Tableland table name, a description, & (optional) new definition name
- CSV header order: 'tableName', 'description', 'definitionName'

Optionally, sanitize the custom definition names to ensure they are valid by
replacing any invalid characters with underscores.

Positionals:
  file  The CSV file to import                               [string] [required]

Options:
      --sanitize          Sanitize the custom definition names for validity
                                                      [boolean] [default: false]

fwiw, i thought about also including the import-data command, but it's less about "importing" and more so just writing data. but, it could easily be another subcommand if we wanted it to...

@dtbuchholz
Copy link
Contributor Author

dtbuchholz commented May 15, 2024

btw, getting an error for the web build, but everything else works:

TypeError: Cannot read properties of null (reading 'useContext')
    at exports.useContext (/Users/dtb/tbl/studio/node_modules/react/cjs/react.production.min.js:24:118)
    at StyleRegistry (/Users/dtb/tbl/studio/node_modules/styled-jsx/dist/index/index.js:450:30)
    at Wc (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:68:44)
    at Zc (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:70:253)
    at Z (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:76:89)
    at Zc (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:74:209)
    at Z (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:76:89)
    at Zc (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:74:209)
    at Z (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:76:89)
    at Zc (/Users/dtb/tbl/studio/packages/web/node_modules/react-dom/cjs/react-dom-server.browser.production.min.js:74:209)

someone noted issues with the NODE_ENV var, so if i put NEXT_PUBLIC_NODE_ENV="" in the .env file, the error is at least somewhat fixed:

> nrb

> @tableland/studio@0.0.0 build
> next build

 ✓ Creating an optimized production build    
 ✓ Compiled successfully
   Skipping linting
 ✓ Checking validity of types    
   Collecting page data  ...PageNotFoundError: Cannot find module for page: /_not-found
    at getPagePath (/Users/dtb/tbl/studio/packages/web/node_modules/next/dist/server/require.js:94:15)
    at requirePage (/Users/dtb/tbl/studio/packages/web/node_modules/next/dist/server/require.js:99:22)
    at /Users/dtb/tbl/studio/packages/web/node_modules/next/dist/server/load-components.js:59:84
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async loadComponentsImpl (/Users/dtb/tbl/studio/packages/web/node_modules/next/dist/server/load-components.js:59:26)
    at async /Users/dtb/tbl/studio/packages/web/node_modules/next/dist/build/utils.js:1045:32
    at async Span.traceAsyncFn (/Users/dtb/tbl/studio/packages/web/node_modules/next/dist/trace/trace.js:105:20) {
  code: 'ENOENT'
}

> Build error occurred
Error: Failed to collect page data for /_not-found
    at /Users/dtb/tbl/studio/packages/web/node_modules/next/dist/build/utils.js:1171:15
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  type: 'Error'
}

@joewagner
Copy link
Contributor

btw, getting an error for the web build, but everything else works:

@dtbuchholz I am getting the same TypeError: Cannot read properties of null (reading 'useContext') error, I'll try modifying NODE_ENV

fwiw, the original issue with node modules seems to be originating with the siwe package's peer dependencies have an entry of "ethers": "^5.6.8 || ^6.0.8". The || (i.e. an or clause) results in ethers 5 being set as a peer dep in package-lock.
However when I manually update the lock file, I ran into the useContext error.

@joewagner
Copy link
Contributor

Next's prerender error explanation page barely unhelpful.
https://nextjs.org/docs/messages/prerender-error

Basically they suggest that you review the entire code base for one of 8 errors that might be the problem.

@dtbuchholz
Copy link
Contributor Author

dtbuchholz commented May 15, 2024

I remember running into this exact issue when I started the ethers refactor. I was trying out different next and react and react-dom versions...but I can't quite remember what I ended up doing to fix it.

It might've been a .next caching issue, and iirc, Vercel has a way to reset that (maybe the lockfile nuke caused issues there, idk). But that could be way off base.

@joewagner
Copy link
Contributor

@dtbuchholz I got the package-lock issues in #266 worked out, but I think you'll want to undo the changes you made to package-lock then rebase this on top of my most recent commit.

pull_request:
branches: ["main", "staging"]
Copy link
Contributor Author

@dtbuchholz dtbuchholz May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joewagner wait...was i supposed to remove this line? i can't remember / find the original comment about this GH workflow change.

should it be: branches: ["main"]

packages/cli/package.json Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added new testfile for table imports

Copy link
Contributor

@joewagner joewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, and is a really nice feature. The only issue I was having is with the definitionName not being optional.
I'm ok if we want to require the header, but if that's the case we should update the help text

const rows = dataObject.slice(1);
// Must be CSV file and have headers in the order: `tableName`,
// `description`, and `definitionName` (optional)
if (headers.length < 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried modifying an import file to do the import without definitionName, but I can't get it to work.
If I remove the header, it doesn't pass this check.
If I remove the values, I get a zod error that says "String must contain at least 1 character(s)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joewagner good catch, fixed and added a couple more tests for this.

a random callout—i'm not enforcing the actual csv headers to be of a specific naming convention but only to have correct ordering. i think that's fine, right? i.e., your headers could be blah,blah,blah, but we just care that those are tableName,description,definitionName in terms of the actual values (where definitionName is actually optional, now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joewagner good catch, fixed and added a couple more tests for this.

Excellent!

a random callout—i'm not enforcing the actual csv headers to be of a specific naming convention but only to have correct ordering. i think that's fine, right? i.e., your headers could be blah,blah,blah, but we just care that those are tableName,description,definitionName in terms of the actual values (where definitionName is actually optional, now).

That seems fine, if anyone has an issue we can re-evaluate.

This isnt necessarily the best design but a starting point. For example, it might make sense to combine this with the singular `import-table` and have a `--bulk` flag or something. Or, maybe keeping them separate is ideal.
Rather than having multiple table import commands, this consolidates them into one. You can either `import table` or `import bulk` to import a single table or a series of them under the same parent.
@dtbuchholz
Copy link
Contributor Author

(note: tests pass, aside from the known nonce tests issue)

@joewagner
Copy link
Contributor

(note: tests pass, aside from the known nonce tests issue)

Perfect, I'm re-running now.
I've got a potential fix for the nonce tests in the works.

@joewagner joewagner self-requested a review May 29, 2024 21:07
@dtbuchholz dtbuchholz merged commit 766fde5 into main May 29, 2024
4 checks passed
@dtbuchholz dtbuchholz deleted the dtb/bulk-import branch May 29, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants