Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DATA-611] Stop data capturing when turned off #1510

Merged

Conversation

alexis-wei
Copy link
Contributor

Part 1 of the fix, dealing with the data capturing part. Making sure that when component capturing is disabled, the collector gets closed. As well, only initialize the collector if capturing isn't disabled to begin with.

@CLAassistant
Copy link

CLAassistant commented Oct 19, 2022

CLA assistant check
All committers have signed the CLA.

@@ -502,6 +526,7 @@ func (svc *builtIn) Update(ctx context.Context, cfg *config.Config) error {
if err := svc.initOrUpdateSyncer(ctx, 0, cfg); err != nil {
return err
}
// so if toggled sync off, then I want to STOP existing collectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily - these should be independent. You might still want to capture data locally, just stop syncing it. Collectors are only in charge of collecting the data

// if disabled, make sure that it is closed, so it doesn't keep collecting data.
collector, md, err := svc.getCollectorFromConfig(attributes)
if err != nil {
svc.logger.Errorw("collector ", attributes.Name, " was not found", "error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessarily an error? Or would it occur with any component that hasn't and doesn't have data capture turned on? If the latter (or some other valid case), it should probably be an INFO level log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense! changing to info

componentMetadata, err := svc.initializeOrUpdateCollector(
attributes, updateCaptureDir)
if err != nil {
svc.logger.Errorw("failed to initialize or update collector", "error", err)
} else {
newCollectorMetadata[*componentMetadata] = true
}
} else if attributes.Disabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this else if, or could it just be else? Would we not want to do this if say !attributes.Disabled && svc.captureDisabled, for example?

Copy link
Contributor Author

@alexis-wei alexis-wei Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need it to be else if, because if svc.captureDisabled, either no collectors existed in the past, or toggledCaptureOff would be true and all collectors would be closed on line 500. From there, no other collectors should have been initialized so as long as the first part of the if is true, then there wouldn't be more collectors to worry about

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that makes sense, this looks good then. Definitely a bit fragile that the logic here depends on these implicit connections over a pretty wide portion of code, which I think contributed to the bug happening. Just making a note that this is probably something we should try to simplify in the near future

@@ -134,6 +134,8 @@ func newTestDataManager(t *testing.T, localArmKey, remoteArmKey string) internal
}
}

// this arm should also have a collectors since it's using the data manager service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this comment referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops deleted, just self referencing

func TestDataCapture(t *testing.T) {
tests := []struct {
name string
initialDisableStatus bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: disabled x2

@@ -59,6 +59,7 @@ type collector struct {
cancelCtx context.Context
cancel context.CancelFunc
capturer Capturer
closed bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this bool needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being used in the Closed() to prevent it from panicking if already closed. Originally was thinking that a collector might close, but not be deleted. But then I added the delete because a new collector should be generated and a new file path should, so maybe no longer needed. will review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line in close was panicking before? Adding this doesn't seem unreasonable to me and it's probably better to be defensive in this case re panics, moreso asking if the var is necessary to prevent that

@@ -469,6 +492,7 @@ func (svc *builtIn) Update(ctx context.Context, cfg *config.Config) error {
}

toggledCaptureOff := (svc.captureDisabled != svcConfig.CaptureDisabled) && svcConfig.CaptureDisabled
// toggledCaptureOn := (svc.captureDisabled != svcConfig.CaptureDisabled) && !svcConfig.CaptureDisabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this can be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted


// let run for a second
time.Sleep(captureWaitTime)
midCaptureFiles := getAllFiles(tmpDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's this mid needed? I think only updated should be necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added mid so it provides results after the disabled variable has been changed and that the collector gets closed and new data gets pushed out. It's a slightly diff value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just want to test that the it capture is now (disabled | enabled) after Update is called, what does testing the file size twice get us vs just once? I think comparing updatedCaptureFiles and initialFilzeSize should accomplish the same thing

Copy link
Contributor

@AaronCasas AaronCasas Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually wait I see you have

initialCaptureFiles := getAllFiles(tmpDir)
initialCaptureFilesSize := getTotalFileSize(initialCaptureFiles)

before you call Update with the new status. I think you should take those measurements after the Update.

This would allow you to get rid of mid while testing the same thing - testing that data was or was not initially being captures, then testing that data was or was not captured after the update

time.Sleep(captureWaitTime)
updatedCaptureFiles := getAllFiles(tmpDir)
updatedCaptureFilesSize := getTotalFileSize(updatedCaptureFiles)
if !tc.newDisableStatus && tc.initialDisableStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments to each of these cases saying what they are? e.g. If disabled then enabled

The whole disabled instead of enabled thing made it hard for me to figure it out

return collector, &componentMetadata, nil
}

return nil, nil, errors.New("no collector was found with this config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: with this config -> with config <config string representation>

// capture always enabled
test.That(t, len(updatedCaptureFiles), test.ShouldEqual, len(initialCaptureFiles))
} else {
// capture ends disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: enabled then disabled to match first comment

@@ -957,3 +957,121 @@ func getTestModelManagerConstructor(t *testing.T, server rpc.Server, zipFileName
return model.NewManager(logger, cfg.Cloud.ID, client, conn, &mockClient{zipFileName: zipFileName})
}
}

func TestDataCapture(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the other capture disabled test?

Copy link
Contributor

@AaronCasas AaronCasas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for explaining all the test logic to me 🚀

err = dmsvc.Update(context.Background(), testCfg)
test.That(t, err, test.ShouldBeNil)

// let run for a second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: technically 25 milliseconds, so something like "Run capture for a moment"

err = dmsvc.Update(context.Background(), testCfg)
test.That(t, err, test.ShouldBeNil)

// let run for a second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this comment

Copy link
Contributor

@agreenb agreenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the table-driven tests, much cleaner than a lot of the existing tests.

Supernit comments, but thanks for diving into this! The (initially simple) config-based capture and sync logic has grown exponentially in Update, so definitely worth refactoring in the next few weeks to simplify and make as readable & bug-free as possible.

}
}

func getAllFiles(dir string) []os.FileInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice helper functions

@github-actions
Copy link
Contributor

Code Coverage

Package Line Rate Health
go.viam.com/rdk/components/arm 59%
go.viam.com/rdk/components/arm/universalrobots 12%
go.viam.com/rdk/components/arm/xarm 2%
go.viam.com/rdk/components/arm/yahboom 7%
go.viam.com/rdk/components/audioinput 55%
go.viam.com/rdk/components/base 68%
go.viam.com/rdk/components/base/agilex 62%
go.viam.com/rdk/components/base/boat 41%
go.viam.com/rdk/components/base/wheeled 76%
go.viam.com/rdk/components/board 69%
go.viam.com/rdk/components/board/arduino 10%
go.viam.com/rdk/components/board/commonsysfs 47%
go.viam.com/rdk/components/board/fake 39%
go.viam.com/rdk/components/board/numato 19%
go.viam.com/rdk/components/board/pi 50%
go.viam.com/rdk/components/camera 66%
go.viam.com/rdk/components/camera/fake 67%
go.viam.com/rdk/components/camera/ffmpeg 72%
go.viam.com/rdk/components/camera/transformpipeline 80%
go.viam.com/rdk/components/camera/videosource 56%
go.viam.com/rdk/components/encoder/fake 77%
go.viam.com/rdk/components/gantry 68%
go.viam.com/rdk/components/gantry/multiaxis 84%
go.viam.com/rdk/components/gantry/oneaxis 86%
go.viam.com/rdk/components/generic 85%
go.viam.com/rdk/components/gripper 82%
go.viam.com/rdk/components/input 86%
go.viam.com/rdk/components/input/gpio 87%
go.viam.com/rdk/components/motor 82%
go.viam.com/rdk/components/motor/dmc4000 69%
go.viam.com/rdk/components/motor/fake 60%
go.viam.com/rdk/components/motor/gpio 65%
go.viam.com/rdk/components/motor/gpiostepper 58%
go.viam.com/rdk/components/motor/tmcstepper 66%
go.viam.com/rdk/components/movementsensor 67%
go.viam.com/rdk/components/movementsensor/cameramono 39%
go.viam.com/rdk/components/movementsensor/gpsnmea 37%
go.viam.com/rdk/components/movementsensor/gpsrtk 28%
go.viam.com/rdk/components/posetracker 88%
go.viam.com/rdk/components/sensor 88%
go.viam.com/rdk/components/sensor/ultrasonic 31%
go.viam.com/rdk/components/servo 77%
go.viam.com/rdk/config 77%
go.viam.com/rdk/control 57%
go.viam.com/rdk/data 78%
go.viam.com/rdk/grpc 25%
go.viam.com/rdk/ml 67%
go.viam.com/rdk/ml/inference 70%
go.viam.com/rdk/motionplan 71%
go.viam.com/rdk/operation 84%
go.viam.com/rdk/pointcloud 71%
go.viam.com/rdk/protoutils 62%
go.viam.com/rdk/referenceframe 78%
go.viam.com/rdk/registry 88%
go.viam.com/rdk/resource 85%
go.viam.com/rdk/rimage 78%
go.viam.com/rdk/rimage/depthadapter 94%
go.viam.com/rdk/rimage/transform 73%
go.viam.com/rdk/rimage/transform/cmd/extrinsic_calibration 67%
go.viam.com/rdk/robot 93%
go.viam.com/rdk/robot/client 79%
go.viam.com/rdk/robot/framesystem 68%
go.viam.com/rdk/robot/impl 79%
go.viam.com/rdk/robot/server 58%
go.viam.com/rdk/robot/web 60%
go.viam.com/rdk/robot/web/stream 87%
go.viam.com/rdk/services/armremotecontrol 75%
go.viam.com/rdk/services/armremotecontrol/builtin 25%
go.viam.com/rdk/services/baseremotecontrol 75%
go.viam.com/rdk/services/baseremotecontrol/builtin 71%
go.viam.com/rdk/services/datamanager 62%
go.viam.com/rdk/services/datamanager/builtin 78%
go.viam.com/rdk/services/datamanager/datacapture 34%
go.viam.com/rdk/services/datamanager/datasync 70%
go.viam.com/rdk/services/motion 68%
go.viam.com/rdk/services/motion/builtin 89%
go.viam.com/rdk/services/navigation 54%
go.viam.com/rdk/services/sensors 78%
go.viam.com/rdk/services/sensors/builtin 97%
go.viam.com/rdk/services/shell 15%
go.viam.com/rdk/services/slam 86%
go.viam.com/rdk/services/slam/builtin 73%
go.viam.com/rdk/services/vision 82%
go.viam.com/rdk/services/vision/builtin 74%
go.viam.com/rdk/spatialmath 85%
go.viam.com/rdk/subtype 96%
go.viam.com/rdk/utils 71%
go.viam.com/rdk/vision 26%
go.viam.com/rdk/vision/chess 80%
go.viam.com/rdk/vision/delaunay 87%
go.viam.com/rdk/vision/keypoints 92%
go.viam.com/rdk/vision/objectdetection 83%
go.viam.com/rdk/vision/odometry 60%
go.viam.com/rdk/vision/odometry/cmd 0%
go.viam.com/rdk/vision/segmentation 49%
go.viam.com/rdk/web/server 26%
Summary 66% (19070 / 28763)

@alexis-wei alexis-wei merged commit 13490fe into viamrobotics:main Oct 20, 2022
@alexis-wei alexis-wei deleted the stop-data-syncing-when-turned-off branch October 20, 2022 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants