Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DATA-490] Add cli command for exporting data from viam cloud. #1466

Merged
merged 21 commits into from
Oct 25, 2022

Conversation

AaronCasas
Copy link
Contributor

@AaronCasas AaronCasas commented Oct 10, 2022

Add cli command for exporting data from Viam cloud. There are still some small TODOs, and tabular export will be added in a future PR, but wanted to open this to start getting feedback. The PR with the accompanying api changes included with the local replacement are here.

The cli can be built with go build -o ~/go/bin/viam cli/cmd/main.go. The command for data export is data. You must run viam auth to login before exporting data. I considered putting usage examples here, but I think it should be obvious/easy to use to a new user, and this should be a good test of that. Running viam data should display a help command describing the arguments. Please let me know if you run into any issues/confusion while trying this out!

@AaronCasas AaronCasas changed the title [DATA-490] Data 490 [DATA-490] Add cli command for exporting data from viam cloud. Oct 14, 2022
@AaronCasas AaronCasas marked this pull request as ready for review October 14, 2022 20:59
cli/data.go Outdated
return nil
}

func mimeTypeToFileExt(mime string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this into utils/mime just to keep everything in the same place

cli/data.go Outdated
// Write everything as json for now.
d := datum.GetData()
if d == nil {
// TODO: This should never happen. Should this notify user? Error? Or just skip?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this should never happen, then it would seem like this should error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was/am a bit conflicted here, because this should never happen, but it would actually indicate an error with the backend (that it stored invalid/empty data). Since it's not really an issue with the CLI, and the user has no ability to fix/avoid it (other than filing a bug report with us), I thought halting all the valid/working downloads could be more of an issue than the actual issue. I added the print mostly to help us debug - if we see that, we're doing something wrong on the backend.

cli/data.go Outdated

// TODO: We need to store file extension too. In sync we map from ext -> mime type, so this is already available.
// Or maybe we can just get this from file name.
ext, err := mimeTypeToFileExt(md.GetMimeType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like with the new proto changes, you should be able to just grab the fileExtension from the metadata. Was there a reason we avoided having this to begin with?

)

// BinaryData writes the requested data to the passed directory.
func (c *AppClient) BinaryData(dst string, filter *datapb.Filter) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[super nit] I think this is short for destination so I would just call it dest. Otherwise, disregard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dst is aI think the most common abbreviation for destination in Go (and I think other C-based languages though less sure). One example would be io.Copy

Flags: []cli.Flag{
&cli.StringFlag{
Name: dataFlagDestination,
Required: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a convention to denote which one are required versus option? So when you type viam data optionals are in brackets or required are in angled brackets or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run viam data <args> without any of the require params, it'll yell at you saying "Required flags X, Y, Z not set".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I appreciate that! Suuuuper small, but I meant in the help command, denoting somehow which params are required but nbd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Updated to follow standard described here: <> for required, and [] for optional

cli/cmd/main.go Outdated
// Flags
dataFlagDestination = "destination"
dataFlagType = "type"
dataFlagOrgs = "orgs"
Copy link
Member

@tahiyasalam tahiyasalam Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would specify here org id versus name. Relatedly, is there an easy way for users to get access to the org IDs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! And I'm not aware of one. Maybe @sbal13 knows?

Copy link
Member

@tahiyasalam tahiyasalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things, but overall looks really good and is easy to use!

I did have one concern about when I try to download all binary data, I receive an error:
could not determine file extension for mime type application/octet-stream.
I am not sure if this is a DB thing or a bug here, but it also leads me to think that maybe instead of returning on certain errors, we should just log + continue and try to download the files we can.

Copy link
Contributor

@agreenb agreenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really incredible job, so excited to see this coming together! Mostly minor comments, and I know there's still filename work that's blocking functionality here, but super clean, thorough, and well organized.

cli/cmd/main.go Outdated
&cli.StringFlag{
Name: dataFlagComponentModel,
Required: false,
Usage: "component model filter",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: for consistency, component_model and component_name below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

cli/cmd/main.go Outdated
&cli.StringFlag{
Name: dataFlagStart,
Required: false,
// TODO: Do we store it as UTC?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming that we do on the BE, and we also do the local->UTC conversion in app when sending the request to the BE - and will do the same when copying a CLI command with this --start --end into the clipboard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be a better UX if this also took local time and converted to UTC on its own. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually jk with Steven's suggested change the user can just specify what time zone they want

cli/cmd/main.go Outdated
if c.String(dataFlagEnd) != "" {
t, err := time.Parse(timeLayout, c.String(dataFlagEnd))
if err != nil {
return errors.Wrap(err, "error parsing start flag")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "error parsing end flag"

cli/cmd/main.go Outdated

dataType := c.String(dataFlagType)
switch dataType {
case "binary":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extract "binary" and "tabular" into vars since reused

cli/cmd/main.go Outdated
return err
}
case "tabular":
if err := client.TabularData(c.String("destination"), filter); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can reuse dataFlagDestination

cli/data.go Outdated
return nil
}

func mimeTypeToFileExt(mime string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to your current work to include this in the upload/storage/response. Good example of hackier fixes getting exponentially more untenable, as we've repeated this pattern too frequently throughout the codebase.

cli/data.go Outdated
}

data := resp.GetData()
// TODO: Use textpb insted of ndjson, and save multiple files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a CLI flag of output type (json, protobuf, csv, etc) that is consistent between how we're writing out the metadata and data files. If we start with only JSON, we can add that flag with a follow-up PR that potentially supports another type (e.g. .textpb files).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Want me to make a ticket for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be great, thanks! The ticket would include making the decision of which formats to support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this, will update with a link to the CLI when this is merged

cli/data.go Outdated
IncludeBinary: true,
CountOnly: false,
})
// TODO: Make sure EOF is properly interpreted. Iirc rpc errors aren't properly parsed by errors.Is.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good todo - checking on testing this before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot this was a unary rpc, so EOF isn't relevant. Right now we just return an empty response when skip > the amount of data (so when we're done iterating through all the data with skip/limit). Posted in the team channel to see if there's an appropriate status code for this case, but handling it just by checking if the response is empty for now

cli/data.go Outdated
if err != nil {
return err
}
jsonFile, err := os.Create(filepath.Join(dst, "metadata", datum.GetId()+".json"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initally had a comment for this vs md.String() for the human-readable version of the protobuf, but now editing to say that this makes sense if we keep the format consistent between metadata and data and have a flag for the output type (see other comment)

cli/data.go Outdated
if err != nil {
return err
}
jsonFile, err := os.Create(filepath.Join(dst, "metadata", datum.GetId()+".json"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ideally keep this consistent between tabular and binary. In binary data this is metadata/{datum_id}.json, whereas in tabular data this is metadata/{metadata_index}.json

This current loop will create a lot of repeated metadata messages, so I wonder if it's worth storing in a set and adding a metadata index to make consistent with tabular data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I actually realized I wasn't storing the datum level metadata for binary data, and unlike for tabular data where I can just include it in the /data file (by just adding it to the json), that can't be done for binary data. I could have separate shared metadata and datum level metadata files, but that seems like a weird/awkward UX to me. So I'm going to just add the datum level metadata to the overall metadata, and have a metadata file per data file. I figure for images the metadata is a couple order of magnitudes or more smaller than the actual data, so the overhead is worth it

Copy link
Contributor Author

@AaronCasas AaronCasas Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I work on this, I'm actually starting to think maybe we shouldn't be doing the same metadata_index stuff we do with tabular. For tabular data, metadata is similar in size or can even exceed the size of the actual data, so by repeating it we'd be being super inefficient. For binary data, that's not the case, and I'm finding it fairly awkward to merge the shared and datum level metadata, and again think it would be weird to have two separate metadata concepts per image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would want the user to easily be able to decode what's in /data and /metadata, so I assume you mean merging the two metadata messages within the actual API.

Thinking through what you said, I'm leaning towards your suggestion. Tabular is 1 metadata -> 1 group of sensor messages which can be combined into a CSV, whereas binary data is 1 metadata -> 1 group of images which each has additional datum-level information that can't be combined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to merge the two in JSON format here, but then would think about changing the API to merge within the proto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I meant in the actual API.

Ok great - I'll just manually merge them here and make a ticket for the API change. I figure it will probably involve changes ranging enough places (api, app both BE and FE) that it's worth a ticket

Required: false,
Usage: "method filter",
},
&cli.StringSliceFlag{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: how are StringSlices parsed? just wondering how i should format values that are intended to be arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You pass them as comma separated values
e.g.
viam data --orgs 1c614556-2ff9-4234-9a94-d59b0a6d3378 --destination /tmp/cli_debug_test/2 --mime_types image/jpeg,image/png --type binary

cli/cmd/main.go Outdated

var start *timestamppb.Timestamp
var end *timestamppb.Timestamp
timeLayout := "2000-01-01T00:00:00.000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to specify the timezone here? I think it will assume that the provided time is UTC if unspecified in the format but not as familiar with the quirks of the time library

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also use time.RFC3339 in place of the timeLayout as specified in here - should allow for timezones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much cleaner, changed. thanks!

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Oct 25, 2022
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 25, 2022
@AaronCasas AaronCasas requested a review from agreenb October 25, 2022 15:37
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 25, 2022
Copy link
Contributor

@agreenb agreenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, really awesome work! 🚀

cli/cmd/main.go Outdated

var usageText = fmt.Sprintf("viam data <%s> <%s> [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s]",
dataFlagDestination, dataFlagDataType, dataFlagOrgIDs, dataFlagLocation, dataFlagRobotID, dataFlagRobotName,
dataFlagPartID, dataFlagPartName, dataFlagComponentType, dataFlagComponentModel, dataFlagComponentName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing dataFlagMethod and dataFlagMimeTypes

}

// TabularData downloads binary data matching filter to dst.
func (c *AppClient) TabularData(dst string, filter *datapb.Filter) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This might need to be tweaked in a follow-up PR when https://github.com/viamrobotics/app/pull/793 is merged, depending on results of actually using it

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 25, 2022
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Oct 25, 2022
@github-actions
Copy link
Contributor

Code Coverage

Package Line Rate Health
go.viam.com/rdk/components/arm 59%
go.viam.com/rdk/components/arm/universalrobots 12%
go.viam.com/rdk/components/arm/xarm 2%
go.viam.com/rdk/components/arm/yahboom 7%
go.viam.com/rdk/components/audioinput 55%
go.viam.com/rdk/components/base 68%
go.viam.com/rdk/components/base/agilex 62%
go.viam.com/rdk/components/base/boat 41%
go.viam.com/rdk/components/base/wheeled 76%
go.viam.com/rdk/components/board 69%
go.viam.com/rdk/components/board/arduino 10%
go.viam.com/rdk/components/board/commonsysfs 47%
go.viam.com/rdk/components/board/fake 39%
go.viam.com/rdk/components/board/numato 19%
go.viam.com/rdk/components/board/pi 50%
go.viam.com/rdk/components/camera 66%
go.viam.com/rdk/components/camera/fake 67%
go.viam.com/rdk/components/camera/ffmpeg 72%
go.viam.com/rdk/components/camera/transformpipeline 80%
go.viam.com/rdk/components/camera/videosource 56%
go.viam.com/rdk/components/encoder/fake 77%
go.viam.com/rdk/components/gantry 68%
go.viam.com/rdk/components/gantry/multiaxis 84%
go.viam.com/rdk/components/gantry/oneaxis 86%
go.viam.com/rdk/components/generic 84%
go.viam.com/rdk/components/gripper 82%
go.viam.com/rdk/components/input 86%
go.viam.com/rdk/components/input/gpio 87%
go.viam.com/rdk/components/motor 82%
go.viam.com/rdk/components/motor/dmc4000 69%
go.viam.com/rdk/components/motor/fake 60%
go.viam.com/rdk/components/motor/gpio 65%
go.viam.com/rdk/components/motor/gpiostepper 59%
go.viam.com/rdk/components/motor/tmcstepper 66%
go.viam.com/rdk/components/movementsensor 67%
go.viam.com/rdk/components/movementsensor/cameramono 39%
go.viam.com/rdk/components/movementsensor/gpsnmea 37%
go.viam.com/rdk/components/movementsensor/gpsrtk 28%
go.viam.com/rdk/components/posetracker 88%
go.viam.com/rdk/components/sensor 88%
go.viam.com/rdk/components/sensor/ultrasonic 31%
go.viam.com/rdk/components/servo 77%
go.viam.com/rdk/config 77%
go.viam.com/rdk/control 57%
go.viam.com/rdk/data 78%
go.viam.com/rdk/grpc 25%
go.viam.com/rdk/ml 67%
go.viam.com/rdk/ml/inference 70%
go.viam.com/rdk/motionplan 69%
go.viam.com/rdk/operation 84%
go.viam.com/rdk/pointcloud 71%
go.viam.com/rdk/protoutils 62%
go.viam.com/rdk/referenceframe 78%
go.viam.com/rdk/registry 88%
go.viam.com/rdk/resource 85%
go.viam.com/rdk/rimage 78%
go.viam.com/rdk/rimage/depthadapter 94%
go.viam.com/rdk/rimage/transform 73%
go.viam.com/rdk/rimage/transform/cmd/extrinsic_calibration 67%
go.viam.com/rdk/robot 93%
go.viam.com/rdk/robot/client 79%
go.viam.com/rdk/robot/framesystem 68%
go.viam.com/rdk/robot/impl 80%
go.viam.com/rdk/robot/server 58%
go.viam.com/rdk/robot/web 60%
go.viam.com/rdk/robot/web/stream 87%
go.viam.com/rdk/services/armremotecontrol 75%
go.viam.com/rdk/services/armremotecontrol/builtin 25%
go.viam.com/rdk/services/baseremotecontrol 75%
go.viam.com/rdk/services/baseremotecontrol/builtin 71%
go.viam.com/rdk/services/datamanager 62%
go.viam.com/rdk/services/datamanager/builtin 78%
go.viam.com/rdk/services/datamanager/datacapture 34%
go.viam.com/rdk/services/datamanager/datasync 70%
go.viam.com/rdk/services/motion 68%
go.viam.com/rdk/services/motion/builtin 89%
go.viam.com/rdk/services/navigation 54%
go.viam.com/rdk/services/sensors 78%
go.viam.com/rdk/services/sensors/builtin 97%
go.viam.com/rdk/services/shell 15%
go.viam.com/rdk/services/slam 86%
go.viam.com/rdk/services/slam/builtin 73%
go.viam.com/rdk/services/vision 82%
go.viam.com/rdk/services/vision/builtin 74%
go.viam.com/rdk/spatialmath 85%
go.viam.com/rdk/subtype 96%
go.viam.com/rdk/utils 71%
go.viam.com/rdk/vision 26%
go.viam.com/rdk/vision/chess 80%
go.viam.com/rdk/vision/delaunay 87%
go.viam.com/rdk/vision/keypoints 92%
go.viam.com/rdk/vision/objectdetection 82%
go.viam.com/rdk/vision/odometry 60%
go.viam.com/rdk/vision/odometry/cmd 0%
go.viam.com/rdk/vision/segmentation 49%
go.viam.com/rdk/web/server 26%
Summary 66% (19039 / 28773)

@AaronCasas AaronCasas merged commit 94ec593 into viamrobotics:main Oct 25, 2022
@AaronCasas AaronCasas deleted the data-490 branch October 25, 2022 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test This pull request is marked safe to test from a trusted zone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants