-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DATA-490] Add cli command for exporting data from viam cloud. #1466
Conversation
…r with how timestamp flags are handled.
cli/data.go
Outdated
return nil | ||
} | ||
|
||
func mimeTypeToFileExt(mime string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move this into utils/mime just to keep everything in the same place
cli/data.go
Outdated
// Write everything as json for now. | ||
d := datum.GetData() | ||
if d == nil { | ||
// TODO: This should never happen. Should this notify user? Error? Or just skip? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this should never happen, then it would seem like this should error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was/am a bit conflicted here, because this should never happen, but it would actually indicate an error with the backend (that it stored invalid/empty data). Since it's not really an issue with the CLI, and the user has no ability to fix/avoid it (other than filing a bug report with us), I thought halting all the valid/working downloads could be more of an issue than the actual issue. I added the print mostly to help us debug - if we see that, we're doing something wrong on the backend.
cli/data.go
Outdated
|
||
// TODO: We need to store file extension too. In sync we map from ext -> mime type, so this is already available. | ||
// Or maybe we can just get this from file name. | ||
ext, err := mimeTypeToFileExt(md.GetMimeType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like with the new proto changes, you should be able to just grab the fileExtension from the metadata. Was there a reason we avoided having this to begin with?
) | ||
|
||
// BinaryData writes the requested data to the passed directory. | ||
func (c *AppClient) BinaryData(dst string, filter *datapb.Filter) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[super nit] I think this is short for destination so I would just call it dest. Otherwise, disregard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dst
is aI think the most common abbreviation for destination in Go (and I think other C-based languages though less sure). One example would be io.Copy
Flags: []cli.Flag{ | ||
&cli.StringFlag{ | ||
Name: dataFlagDestination, | ||
Required: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a convention to denote which one are required versus option? So when you type viam data
optionals are in brackets or required are in angled brackets or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run viam data <args>
without any of the require params, it'll yell at you saying "Required flags X, Y, Z not set".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I appreciate that! Suuuuper small, but I meant in the help command, denoting somehow which params are required but nbd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! Updated to follow standard described here: <> for required, and [] for optional
cli/cmd/main.go
Outdated
// Flags | ||
dataFlagDestination = "destination" | ||
dataFlagType = "type" | ||
dataFlagOrgs = "orgs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would specify here org id versus name. Relatedly, is there an easy way for users to get access to the org IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! And I'm not aware of one. Maybe @sbal13 knows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things, but overall looks really good and is easy to use!
I did have one concern about when I try to download all binary data, I receive an error:
could not determine file extension for mime type application/octet-stream
.
I am not sure if this is a DB thing or a bug here, but it also leads me to think that maybe instead of returning on certain errors, we should just log + continue and try to download the files we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall really incredible job, so excited to see this coming together! Mostly minor comments, and I know there's still filename work that's blocking functionality here, but super clean, thorough, and well organized.
cli/cmd/main.go
Outdated
&cli.StringFlag{ | ||
Name: dataFlagComponentModel, | ||
Required: false, | ||
Usage: "component model filter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supernit: for consistency, component_model
and component_name
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cli/cmd/main.go
Outdated
&cli.StringFlag{ | ||
Name: dataFlagStart, | ||
Required: false, | ||
// TODO: Do we store it as UTC? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming that we do on the BE, and we also do the local->UTC conversion in app when sending the request to the BE - and will do the same when copying a CLI command with this --start --end
into the clipboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be a better UX if this also took local time and converted to UTC on its own. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually jk with Steven's suggested change the user can just specify what time zone they want
cli/cmd/main.go
Outdated
if c.String(dataFlagEnd) != "" { | ||
t, err := time.Parse(timeLayout, c.String(dataFlagEnd)) | ||
if err != nil { | ||
return errors.Wrap(err, "error parsing start flag") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "error parsing end flag"
cli/cmd/main.go
Outdated
|
||
dataType := c.String(dataFlagType) | ||
switch dataType { | ||
case "binary": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extract "binary" and "tabular" into vars since reused
cli/cmd/main.go
Outdated
return err | ||
} | ||
case "tabular": | ||
if err := client.TabularData(c.String("destination"), filter); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can reuse dataFlagDestination
cli/data.go
Outdated
return nil | ||
} | ||
|
||
func mimeTypeToFileExt(mime string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to your current work to include this in the upload/storage/response. Good example of hackier fixes getting exponentially more untenable, as we've repeated this pattern too frequently throughout the codebase.
cli/data.go
Outdated
} | ||
|
||
data := resp.GetData() | ||
// TODO: Use textpb insted of ndjson, and save multiple files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a CLI flag of output type (json, protobuf, csv, etc) that is consistent between how we're writing out the metadata and data files. If we start with only JSON, we can add that flag with a follow-up PR that potentially supports another type (e.g. .textpb files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Want me to make a ticket for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be great, thanks! The ticket would include making the decision of which formats to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this, will update with a link to the CLI when this is merged
cli/data.go
Outdated
IncludeBinary: true, | ||
CountOnly: false, | ||
}) | ||
// TODO: Make sure EOF is properly interpreted. Iirc rpc errors aren't properly parsed by errors.Is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good todo - checking on testing this before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot this was a unary rpc, so EOF isn't relevant. Right now we just return an empty response when skip > the amount of data (so when we're done iterating through all the data with skip/limit). Posted in the team channel to see if there's an appropriate status code for this case, but handling it just by checking if the response is empty for now
cli/data.go
Outdated
if err != nil { | ||
return err | ||
} | ||
jsonFile, err := os.Create(filepath.Join(dst, "metadata", datum.GetId()+".json")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initally had a comment for this vs md.String() for the human-readable version of the protobuf, but now editing to say that this makes sense if we keep the format consistent between metadata and data and have a flag for the output type (see other comment)
cli/data.go
Outdated
if err != nil { | ||
return err | ||
} | ||
jsonFile, err := os.Create(filepath.Join(dst, "metadata", datum.GetId()+".json")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would ideally keep this consistent between tabular and binary. In binary data this is metadata/{datum_id}.json, whereas in tabular data this is metadata/{metadata_index}.json
This current loop will create a lot of repeated metadata messages, so I wonder if it's worth storing in a set and adding a metadata index to make consistent with tabular data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I actually realized I wasn't storing the datum level metadata for binary data, and unlike for tabular data where I can just include it in the /data
file (by just adding it to the json), that can't be done for binary data. I could have separate shared metadata and datum level metadata files, but that seems like a weird/awkward UX to me. So I'm going to just add the datum level metadata to the overall metadata, and have a metadata file per data file. I figure for images the metadata is a couple order of magnitudes or more smaller than the actual data, so the overhead is worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I work on this, I'm actually starting to think maybe we shouldn't be doing the same metadata_index stuff we do with tabular. For tabular data, metadata is similar in size or can even exceed the size of the actual data, so by repeating it we'd be being super inefficient. For binary data, that's not the case, and I'm finding it fairly awkward to merge the shared and datum level metadata, and again think it would be weird to have two separate metadata concepts per image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would want the user to easily be able to decode what's in /data
and /metadata
, so I assume you mean merging the two metadata messages within the actual API.
Thinking through what you said, I'm leaning towards your suggestion. Tabular is 1 metadata -> 1 group of sensor messages which can be combined into a CSV, whereas binary data is 1 metadata -> 1 group of images which each has additional datum-level information that can't be combined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree to merge the two in JSON format here, but then would think about changing the API to merge within the proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I meant in the actual API.
Ok great - I'll just manually merge them here and make a ticket for the API change. I figure it will probably involve changes ranging enough places (api, app both BE and FE) that it's worth a ticket
Required: false, | ||
Usage: "method filter", | ||
}, | ||
&cli.StringSliceFlag{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: how are StringSlices parsed? just wondering how i should format values that are intended to be arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You pass them as comma separated values
e.g.
viam data --orgs 1c614556-2ff9-4234-9a94-d59b0a6d3378 --destination /tmp/cli_debug_test/2 --mime_types image/jpeg,image/png --type binary
cli/cmd/main.go
Outdated
|
||
var start *timestamppb.Timestamp | ||
var end *timestamppb.Timestamp | ||
timeLayout := "2000-01-01T00:00:00.000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to specify the timezone here? I think it will assume that the provided time is UTC if unspecified in the format but not as familiar with the quirks of the time library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also use time.RFC3339
in place of the timeLayout as specified in here - should allow for timezones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's much cleaner, changed. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, really awesome work! 🚀
cli/cmd/main.go
Outdated
|
||
var usageText = fmt.Sprintf("viam data <%s> <%s> [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s]", | ||
dataFlagDestination, dataFlagDataType, dataFlagOrgIDs, dataFlagLocation, dataFlagRobotID, dataFlagRobotName, | ||
dataFlagPartID, dataFlagPartName, dataFlagComponentType, dataFlagComponentModel, dataFlagComponentName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing dataFlagMethod and dataFlagMimeTypes
} | ||
|
||
// TabularData downloads binary data matching filter to dst. | ||
func (c *AppClient) TabularData(dst string, filter *datapb.Filter) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This might need to be tweaked in a follow-up PR when https://github.com/viamrobotics/app/pull/793 is merged, depending on results of actually using it
|
Add cli command for exporting data from Viam cloud. There are still some small TODOs, and tabular export will be added in a future PR, but wanted to open this to start getting feedback. The PR with the accompanying api changes included with the local replacement are here.
The cli can be built with
go build -o ~/go/bin/viam cli/cmd/main.go
. The command for data export isdata
. You must runviam auth
to login before exporting data. I considered putting usage examples here, but I think it should be obvious/easy to use to a new user, and this should be a good test of that. Runningviam data
should display a help command describing the arguments. Please let me know if you run into any issues/confusion while trying this out!