Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of only-keypoints augmentations #635

Open
AmitMY opened this issue Mar 9, 2020 · 3 comments
Open

Performance of only-keypoints augmentations #635

AmitMY opened this issue Mar 9, 2020 · 3 comments

Comments

@AmitMY
Copy link

AmitMY commented Mar 9, 2020

Following up: #621

I want to run data augmentation on poses alone (I have an interesting scenario imo) and I want to do it fast, as right now to augment a batch it takes me upwards of 30 seconds, while the training loop entirely takes 1 second.

I believe that augmenting keypoints is an easier task than augmenting images, and it might just be a matter of bad data representation.

I'm loading an image (500x500 = 250,000 pixels)
I'm also loading a list of keypoints (10,564 points = 4.2% of the pixels).

Now, for example, if we run different augmentations on these 2 data sources, we get a huge time disparity - images are much faster, although they contain more data:
(Time in seconds)

Horizontal Flip
images 0.0010446
keypoints 0.028449

Affine Transform
images 0.0035259
keypoints 0.0351857

Perspective Transform
images 0.0047839
keypoints 0.1963689

However, if instead, we decide to represent key points as a NumPy array of dimensions [N, 2], any operation on it is much faster!

points_np = np.random.rand(10564, 2)

# Flip
timeit("keypoints", lambda: np.array([1, 1]) - points_np) # 0.00012 seconds

# Transformation
transofrmation_matrix = np.array([[1, 0], [0, 1]])
timeit("keypoints", lambda: np.dot(points_np, transofrmation_matrix)) # 0.00011 seconds

A change of representation here will be huge for the time, we are talking at least 2 orders of magnitude in flip, and matrix transformation.

To reproduce everything, here is a collab!
https://colab.research.google.com/drive/1R5Kh3ryfHcurLnKVGMcNSP3BNGOgZSVv

keypoints.txt file is here if you want to run it yourself!
keypoints.txt

@aleju
Copy link
Owner

aleju commented Mar 16, 2020

Hm, are you sure you have 10k points in your input? How do you call your augmentation routines? According to the computed performance numbers (see https://imgaug.readthedocs.io/en/latest/source/performance.html#keypoints-and-bounding-boxes ), the library is able to process around 700k keypoints per second with Fliplr(p=1.0) on my machine -- and that hardware is by now quite outdated. Your numbers seem to be at around 4.6k/sec. Now the keypoint augmentation is quite a lot slower than it could be as each keypoint is currently represented as an object instead of using a single numpy array for all of them, but 4.6k/sec still seems quite slow. Unless there is a major error with the way the performance values are computed, I guess there is something wrong in your call or system configuration.

@AmitMY
Copy link
Author

AmitMY commented Mar 16, 2020

Thanks for looking into this!

This sequence of keypoints (10k) is having roughly 100 keypoints per frame, for 100 frames. I want to augment an entire video at once, as it is probably the most correct way to do so. (augmentation is done the same way for every frame)

You can see from my code (in the google collab) that augmenting keypoints is way too slow, and that actually doesn't also account for the creation of the "Keypoint" object. (added speed test in collab).

Creating the Keypoint object (for 10,000 items) takes 0.02498 seconds on average, which means even without any augmentation this is limited to 40 times a second.

I highly recommend to go to the collab, upload the "keypoints.txt" file, and "Runtime -> Run All" to really see how slow it is.

Unless I have something wrong, as you said, it is what it is... Colab replecates all of the arguments here.

@AmitMY
Copy link
Author

AmitMY commented Mar 22, 2020

Real-life use case:
Here are 2 methods, augment2d which performs a rotation, scaling, and shearing, and augment2d_imgaug which can do anything imgaug supports.
https://github.com/AmitMY/pose-utils/blob/779064c7b9275ee1bd044b8618fc9af97be1d9ef/lib/python/pose_format/pose.py#L112

The main difference is that the augment2d_imguag deconstructs the keypoints to an array of Keypoint, and then reassembled it, and augment2d works on the original array.

Here is a performance test:
https://github.com/AmitMY/pose-utils/blob/779064c7b9275ee1bd044b8618fc9af97be1d9ef/sample-data/performance.py#L61

  • Augment Affine - 815.46it/s
  • Augment imgaug empty - 15.91it/s
  • Augment imgaug affine - 11.51it/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants