Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Importing objects from Netapp storage creates extra objects with 0 byte size #6972

Closed
kesarwam opened this issue Nov 9, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@kesarwam
Copy link
Contributor

kesarwam commented Nov 9, 2023

What happened?

When importing objects from the Netapp StorageGrid in the cloud, process creates extra objects with zero byte size (see screenshots below).

I can provide credentials for the Netapp StorageGrid for testing.

image

image

Expected behavior

No response

lakeFS version

1.1.0

How lakeFS is installed

local Docker

Affected clients

Python lakefs_client v1.1.0

Relevant log output

No response

Contact details

amit.kesarwani@treeverse.io

@kesarwam kesarwam added bug Something isn't working contributor labels Nov 9, 2023
@itaiad200 itaiad200 self-assigned this Nov 12, 2023
@itaiad200
Copy link
Contributor

Thanks for reporting @kesarwam ! I've tried experimenting with NetApp storage using the s3 cli to test ListObjectsV2, which is what we use for imports.

 aws s3 ls --endpoint=https://webscalenext.netapp.com:443 s3://sample-data/  --recursive
2023-11-03 18:23:08          0 stanfordogsdataset/
2023-11-03 18:23:23          0 stanfordogsdataset/Annotation/
2023-11-03 18:23:54          0 stanfordogsdataset/Annotation/n02085620-Chihuahua/
2023-11-03 18:24:16        481 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_199
2023-11-03 18:24:17        482 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_242
2023-11-03 18:24:17        482 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_275
2023-11-03 18:24:17        480 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_326
2023-11-03 18:24:16        482 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_368
2023-11-03 18:24:17        482 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_382
2023-11-03 18:24:17        483 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_431
2023-11-03 18:24:17        710 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_473
2023-11-03 18:24:17        481 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_477
2023-11-03 18:24:16        478 stanfordogsdataset/Annotation/n02085620-Chihuahua/n02085620_7
2023-11-03 18:23:30          0 stanfordogsdataset/Images/
2023-11-03 18:24:35          0 stanfordogsdataset/Images/n02085620-Chihuahua/
2023-11-03 18:24:51      20679 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_199.jpg
2023-11-03 18:24:52      51221 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_242.jpg
2023-11-03 18:24:52      58704 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_275.jpg
2023-11-03 18:24:52      25872 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_326.jpg
2023-11-03 18:24:52      19467 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_368.jpg
2023-11-03 18:24:52      27276 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_382.jpg
2023-11-03 18:24:52      39319 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_431.jpg
2023-11-03 18:24:52      24034 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_473.jpg
2023-11-03 18:24:53      30339 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_477.jpg
2023-11-03 18:24:51       8497 stanfordogsdataset/Images/n02085620-Chihuahua/n02085620_7.jpg

As you can see, the NetApp S3 endpoint returns zero objects, hence lakeFS will import them. The S3 API doesn't return zero objects for prefixes, if there isn't an actual object with the exact key. Example:

aws s3 ls s3://axolotl-company/production/latest/ --recursive
2022-12-20 19:26:56    1354770 production/latest/Image1.png
2022-12-20 19:26:54    1409302 production/latest/Image2.png
2022-12-20 19:26:54    1378280 production/latest/Image3.png
2022-12-20 19:26:55     741990 production/latest/product-reviews/part-00000-tid-3933465910688419691-f19b4e22-6174-4044-ad89-1ab50f79ada7-790-1-c000.snappy.parquet
2022-12-20 19:26:55     675835 production/latest/product-reviews/part-00001-tid-3933465910688419691-f19b4e22-6174-4044-ad89-1ab50f79ada7-791-1-c000.snappy.parquet

So this seems like a bug with NetApp S3 compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants