Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

557 import tool usability #800

Merged
merged 11 commits into from
Oct 12, 2020
Merged

557 import tool usability #800

merged 11 commits into from
Oct 12, 2020

Conversation

johnnyaug
Copy link
Contributor

@johnnyaug johnnyaug commented Oct 11, 2020

Implements #557

  1. Align inventory column requirements between Parquet in ORC: make all columns except bucket and key optional in Parquet.
  2. Add tests to parquet reader

@codecov-io
Copy link

codecov-io commented Oct 11, 2020

Codecov Report

Merging #800 into master will increase coverage by 0.17%.
The diff coverage is 80.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #800      +/-   ##
==========================================
+ Coverage   41.78%   41.95%   +0.17%     
==========================================
  Files         133      133              
  Lines       10332    10442     +110     
==========================================
+ Hits         4317     4381      +64     
- Misses       5433     5480      +47     
+ Partials      582      581       -1     
Impacted Files Coverage Δ
cloud/aws/s3inventory/parquet_reader.go 76.19% <75.40%> (+76.19%) ⬆️
cloud/aws/s3inventory/orc_reader.go 83.33% <93.33%> (+5.91%) ⬆️
block/s3/inventory_iterator.go 79.71% <100.00%> (-0.29%) ⬇️
cloud/aws/s3inventory/reader.go 57.89% <100.00%> (+19.00%) ⬆️
block/local/adapter.go 5.71% <0.00%> (-0.74%) ⬇️
block/adapter.go 0.00% <0.00%> (ø)
block/gs/adapter.go 0.00% <0.00%> (ø)
block/s3/adapter.go 0.00% <0.00%> (ø)
api/api_controller.go 36.13% <0.00%> (+0.07%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c70ae53...a03a4fd. Read the comment docs.

@johnnyaug johnnyaug requested a review from arielshaqed October 12, 2020 08:27
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Very nice, much cleaner. Please go over comments but this is good and approved (almost) any way you handle comments

@@ -94,8 +94,7 @@ func (it *InventoryIterator) fillBuffer() bool {
it.logger.Errorf("failed to close manifest file reader. file=%s, err=%w", it.Manifest.Files[it.inventoryFileIndex].Key, err)
}
}()
it.buffer = make([]s3inventory.InventoryObject, rdr.GetNumRows())
err = rdr.Read(&it.buffer)
it.buffer, err = rdr.Read(int(rdr.GetNumRows()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Read should take an int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will cause worse casting mess

var checksum string
checksum, err = cast.ToStringE(v)
o.Checksum = swag.String(checksum)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a default. This is Go, there are no enums and no compile-time switch cover test for fake enums.

if err != nil {
errChan <- fmt.Errorf("failed to read parquet column %s: %w", fieldName, err)
return
}
for i, v := range columnRes {
if dls[i] == 0 && fieldName != "key" && fieldName != "bucket" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. What is dls?

case lastModifiedDateFieldName:
var lastModifiedMillis int64
lastModifiedMillis, err = cast.ToInt64E(v)
o.LastModifiedMillis = swag.Int64(lastModifiedMillis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, would still be happier to have a proper time.Time here.

ObjectNum: 12500,
ExpectedReadObjects: 12500,
ExpectedMinValue: "f00000",
ExpectedMaxValue: "f12499",
Format: "ORC",
},
"orc with 100 objects": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just run each of the test cases twice, once with each Format?

@johnnyaug johnnyaug requested a review from arielshaqed October 12, 2020 14:09
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool; thanks! StilLGTM...

var checksum string
checksum, err = cast.ToStringE(v)
o.Checksum = swag.String(checksum)
o.LastModified = swag.Time(time.Unix(lastModifiedMillis/int64(time.Second/time.Millisecond), 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about losing the subsecond resolution?

Suggested change
o.LastModified = swag.Time(time.Unix(lastModifiedMillis/int64(time.Second/time.Millisecond), 0))
o.LastModified = swag.Time(time.Unix(lastModifiedMillis/int64(1000), (lastModifiedMillis % 1000) * int64(1_000_000))

(also you can see that I'm not a fan of pretending that the number of milliseconds per second may change, and I'm trying to sneak that change by you...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our linter would never allow it !

@johnnyaug johnnyaug merged commit b78e5c6 into master Oct 12, 2020
@johnnyaug johnnyaug deleted the 557_import_tool_usability branch October 12, 2020 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants