Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unsafe usage in Decode #1097

Merged
merged 12 commits into from
Nov 11, 2019
Merged

Fix unsafe usage in Decode #1097

merged 12 commits into from
Nov 11, 2019

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Oct 23, 2019

We use unsafe pointers to convert slices to structs. It might be possible that the struct and byte slice are not aligned in the same way. This PR fixes the alignment issues

The command

gotip test -gcflags=all=-d=checkptr

no longer complains about unsafe pointer conversion.

The changed implementation is as fast as the existing one.

Fixes #1096

// Implicit copy
func (h *header) Decode(buf []byte) int {
	*h = *(*header)(unsafe.Pointer(&buf[0]))
	return h.Size()
}

// Explicit copy
func (h *header) DecodeWithCopy(buf []byte) int {
   copy((*[headerSize]byte)(unsafe.Pointer(h))[:], buf[:headerSize])
   return int(headerSize)    
}    
      
var x header
 
func BenchmarkHeader(b *testing.B) {
 h := &header{
     diff:    100,
     overlap: 023,
 }
 
 buf := h.Encode()
 b.ResetTimer()
 b.Run("implicit copy", func(b *testing.B) {
     for i := 0; i < b.N; i++ {
         x.Decode(buf)
     }
 })
 b.Run("explicit copy", func(b *testing.B) {
     for i := 0; i < b.N; i++ {
         x.DecodeWithCopy(buf)
     }
 })
}
gtf -run ^$ -bench=Header -benchmem
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger/v2/table
BenchmarkHeader/implicit_copy-16         	1000000000	         0.602 ns/op
BenchmarkHeader/explicit_copy-16         	1000000000	         0.712 ns/op
PASS
ok  	github.com/dgraph-io/badger/v2/table	1.467s

This change is Reviewable

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@jarifibrahim you can click here to see the review status or cancel the code review job.

@coveralls
Copy link

coveralls commented Oct 23, 2019

Coverage Status

Coverage increased (+0.1%) to 77.476% when pulling 674c31f on ibrahim/header-ptr into d57605e on master.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't use arrays in go often, so it's interesting to see how you deal with them here.


Reviewed with ❤️ by PullRequest

table/iterator.go Outdated Show resolved Hide resolved
table/builder.go Outdated Show resolved Hide resolved
@campoy
Copy link
Contributor

campoy commented Oct 24, 2019

I can confirm this fixes the issue for this test, unfortunately now it fails somewhere else.

--- FAIL: TestTruncateVlogWithClose (0.06s)
panic: runtime error: unsafe pointer conversion [recovered]
	panic: runtime error: unsafe pointer conversion

goroutine 53572 [running]:
testing.tRunner.func1(0xc0003a3e00)
	/home/francesc/src/github.com/golang/go/src/testing/testing.go:881 +0x3a3
panic(0xa43860, 0xc088c4b760)
	/home/francesc/src/github.com/golang/go/src/runtime/panic.go:679 +0x1b2
github.com/dgraph-io/badger.(*valuePointer).Decode(...)
	/home/francesc/src/github.com/dgraph-io/badger/structs.go:42
github.com/dgraph-io/badger.Open(0xc087f24f80, 0x19, 0xc087f24f80, 0x19, 0x1, 0x2, 0x2, 0x1, 0x100, 0xb5e4c0, ...)
	/home/francesc/src/github.com/dgraph-io/badger/db.go:350 +0x16c5
github.com/dgraph-io/badger.TestTruncateVlogWithClose(0xc0003a3e00)
	/home/francesc/src/github.com/dgraph-io/badger/db2_test.go:72 +0x416
testing.tRunner(0xc0003a3e00, 0xab3c18)
	/home/francesc/src/github.com/golang/go/src/testing/testing.go:916 +0xc9
created by testing.(*T).Run

These two errors might be unrelated. If that's the case I'd say you should merge the PR and create a new issue for the new failure.

@jarifibrahim
Copy link
Contributor Author

@campoy the errors are related. I've fixed them both in this PR.

Do you think we should add the checkPtr flag to all our builds?

@jarifibrahim jarifibrahim changed the title Fix unsafe usage in header.Decode Fix unsafe usage in Decode Oct 25, 2019
@jarifibrahim jarifibrahim requested a review from campoy October 25, 2019 11:07
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to check if the benchmarks are still in favor of this unsafe casting, because of the additional copy.

If benchmarks are good, then merge. Otherwise, keep looking for a solution. This is NOT a blocker for v2.0 release.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @campoy, @jarifibrahim, and @manishrjain)


structs.go, line 44 at r2 (raw file):

	// tempBuf will be allocated on the stack and so it shouldn't
	// cause significant performance degradation.
	tempBuf := make([]byte, vptrSize)

var tmpBuf [vptrSize]byte

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a better solution that avoids the additional copy.

As far as I understand, the problem currently is that header contains two uint16 both of which needs to 2 byte aligned (i.e. the start address of both uint16 should be such that address mod 2 = 0). When we typecast the existing buffer into *header, that may not be always true. And, hence the test fails. Our current solution does the following:

  1. It copies the data into a temporary buffer which will be 8 byte aligned on 64 bit (and hence, on typecast, both uint16 will be 2 byte aligned)
  2. Then it copies the temporary buffer at the header pointer

The direct copy won't work because are typecasting misaligned buffer into header. To avoid the additional copy, we need to just avoid the typecast into *header and goal is simply to copy the memory in buffer at the header pointer which is already aligned.

The solution here is to typecast the header pointer into an [4]byte and then perform a direct copy from buffer to typecast header pointer, something like this:

func (h *header) Decode(buf []byte) {
	copy((*(*[headerSize]byte)(unsafe.Pointer(h)))[:], buf)
}

I have tested this solution locally and the compiler doesn't complain any more.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @campoy, and @jarifibrahim)

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch that works for me:

diff --git structs.go structs.go
index 2d396b5..773d452 100644
--- structs.go
+++ structs.go
@@ -13,6 +13,8 @@ type valuePointer struct {
 	Offset uint32
 }
 
+const vptrSize = unsafe.Sizeof(valuePointer{})
+
 func (p valuePointer) Less(o valuePointer) bool {
 	if p.Fid != o.Fid {
 		return p.Fid < o.Fid
@@ -27,8 +29,6 @@ func (p valuePointer) IsZero() bool {
 	return p.Fid == 0 && p.Offset == 0 && p.Len == 0
 }
 
-const vptrSize = 12
-
 // Encode encodes Pointer into byte buffer.
 func (p valuePointer) Encode() []byte {
 	b := make([]byte, vptrSize)
@@ -39,7 +39,7 @@ func (p valuePointer) Encode() []byte {
 
 // Decode decodes the value pointer into the provided byte buffer.
 func (p *valuePointer) Decode(b []byte) {
-	*p = *(*valuePointer)(unsafe.Pointer(&b[0]))
+	copy((*(*[vptrSize]byte)(unsafe.Pointer(p)))[:], b)
 }
 
 // header is used in value log as a header before Entry.
diff --git table/builder.go table/builder.go
index 134d58f..9b58a6f 100644
--- table/builder.go
+++ table/builder.go
@@ -44,6 +44,8 @@ type header struct {
 	diff    uint16 // Length of the diff.
 }
 
+const headerSize = uint16(unsafe.Sizeof(header{}))
+
 // Encode encodes the header.
 func (h header) Encode() []byte {
 	var b [4]byte
@@ -52,16 +54,10 @@ func (h header) Encode() []byte {
 }
 
 // Decode decodes the header.
-func (h *header) Decode(buf []byte) int {
-	*h = *(*header)(unsafe.Pointer(&buf[0]))
-	return h.Size()
+func (h *header) Decode(buf []byte) {
+	copy((*(*[headerSize]byte)(unsafe.Pointer(h)))[:], buf)
 }
 
-const headerSize = 4
-
-// Size returns size of the header. Currently it's just a constant.
-func (h header) Size() int { return headerSize }
-
 // Builder is used in building a table.
 type Builder struct {
 	// Typically tens or hundreds of meg. This is for one single file.
diff --git table/iterator.go table/iterator.go
index 4bbbc89..118bd66 100644
--- table/iterator.go
+++ table/iterator.go
@@ -87,7 +87,7 @@ func (itr *blockIterator) setIdx(i int) {
 		itr.key = append(itr.key[:itr.prevOverlap], itr.baseKey[itr.prevOverlap:h.overlap]...)
 	}
 	itr.prevOverlap = h.overlap
-	valueOff := headerSize + int(h.diff)
+	valueOff := headerSize + h.diff
 	diffKey := entryData[headerSize:valueOff]
 	itr.key = append(itr.key[:h.overlap], diffKey...)
 	itr.val = entryData[valueOff:]

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @campoy, and @jarifibrahim)

@campoy
Copy link
Contributor

campoy commented Oct 26, 2019

I agree that @mangalaman93's solution seems more efficient (although quite ugly?)

If that one fixes the warnings let's use it and avoid the performance hit.

@mangalaman93
Copy link
Contributor

@campoy Such conversions are pretty common in badger as far as I know. Let me know which part of the solution you thought could be improved/avoided.

@jarifibrahim
Copy link
Contributor Author

@mangalaman93 Thank you so much. The patch seems to work 👍

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @ashish-goswami, @campoy, and @jarifibrahim)


structs.go, line 42 at r3 (raw file):

// Decode decodes the value pointer into the provided byte buffer.
func (p *valuePointer) Decode(b []byte) {
	copy((*[vptrSize]byte)(unsafe.Pointer(p))[:], b)

I see one dereference missing (* missing). I guess compiler will take care of it. Also, see if it makes sense to add a test where length of b is less than 4 and everything acceptable follows afterwards.

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ashish-goswami, @campoy, and @jarifibrahim)


structs.go, line 42 at r3 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

I see one dereference missing (* missing). I guess compiler will take care of it. Also, see if it makes sense to add a test where length of b is less than 4 and everything acceptable follows afterwards.

Also, add a comment on why we are doing it this way now.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✖️ This code review was cancelled. See Details

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @manishrjain and @pullrequest[bot] from 3 discussions.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @campoy, and @mangalaman93)


structs.go, line 44 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

var tmpBuf [vptrSize]byte

Done.


structs.go, line 42 at r3 (raw file):

I see one dereference missing (* missing). I guess compiler will take care of it.

@mangalaman93 We've already deferenced the pointer with the first *, do we still need to second one?

Also, add a comment on why we are doing it this way now.

Done.

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @campoy, @jarifibrahim, and @mangalaman93)


structs.go, line 42 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

I see one dereference missing (* missing). I guess compiler will take care of it.

@mangalaman93 We've already deferenced the pointer with the first *, do we still need to second one?

Also, add a comment on why we are doing it this way now.

Done.

You are typecasting the pointer into the type *[4]byte, and then converting the array pointer to slice using [:]. There is no pointer dereference in the code. I'm not sure but because tests pass, that's probably because you could index an array using a pointer to array as well as using the array directly. Still, check that once. And add a bracket around all the typecast code before [:] to make it clear what's going on.

@mangalaman93 mangalaman93 self-requested a review October 30, 2019 08:56
Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @campoy, and @mangalaman93)


structs.go, line 42 at r3 (raw file):
Yeah. Got it. I thought we've deferenced it.

I'm not sure but because tests pass, that's probably because you could index an array using a pointer to array as well as using the array directly. Still, check that once.

Yes, that's correct. I checked it using the following code.

func main() {
	var b, c *[4]byte
	a := []byte("1111")
	b = new([4]byte)
	copy(b[:], a)
	c = new([4]byte)
	copy((*c)[:], a)
	fmt.Println(a, b, c)
}

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @campoy, and @mangalaman93)

@mangalaman93 mangalaman93 self-requested a review October 31, 2019 13:52
Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ashish-goswami and @campoy)

Copy link
Contributor

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @campoy and @jarifibrahim)


structs.go, line 32 at r4 (raw file):

}

const vptrSize = unsafe.Sizeof(valuePointer{})

Adding a comment explaining why we are calculating size this way (and not by simple addition) might make sense.


table/builder.go, line 47 at r4 (raw file):

}

const headerSize = uint16(unsafe.Sizeof(header{}))

same here.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Would be good to do some benchmarks just to prove to ourselves that the performance is the same.

Reviewed 1 of 3 files at r3, 2 of 2 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @campoy and @jarifibrahim)


structs.go, line 59 at r4 (raw file):

func (p *valuePointer) Decode(b []byte) {
	// Copy over data from b into p. Using *p=unsafe.pointer(...) leads to
	// pointer alignment issues. See https://github.com/dgraph-io/badger/issues/1096

Ideally, another comment saying the solution is based on some link "...".

@jarifibrahim jarifibrahim merged commit 8a93a41 into master Nov 11, 2019
@jarifibrahim jarifibrahim deleted the ibrahim/header-ptr branch November 11, 2019 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

checkptr issue found
6 participants