Skip to content

Commit

Permalink
Handle mmaping files on Windows
Browse files Browse the repository at this point in the history
    Handle truncation while mmap-ing (esp. on Windows)

    mmap on Windows works differntly from mmap on Linux.

    * We cannot mmap to a size beyond the size of the file. So we need to
    truncate to that size and then map.

    * We cannot truncate a file while a section of it has been mapped.

    This has implications for how we manage value logs. This change fixes
    mmap-ing on Windows with the following changes:

    * When badger.NewKV() function is called, which in turn calls
    valueLog.Open() method, we don’t immediately mmap the writable log
    files (read-only log files are not a problem). We mmap the writable log
    after replay has happened.

    * Whenever the valueLog.iterate() method is called, and it determines
    that a file needs to be truncated - we need to either make sure that
    there are no open mmaps, or we need to munmap and then re-map the file
    after truncation.

    * Whenever a writable log file is mmap-ed, its size increases to
    ValueLogFileSize * 2 (because of truncation in y.Mmap() function). We
    need to ensure that once it has filled up, it is truncated back to the
    actual size before it is closed and opened as a read-only file. We also
    need to truncate if back if the value log is closed before it fills up.

    * Fix a few tests on AppVeyor that were running out of memory by
    reducing ValueLogFileSize option value.

    Fixes #212.

    * Only truncate if the file isn't mmapped.

    Fix mmap-ing on Windows.

    The previous code we checked in for mmap on Windows was broken. Cleaned
    up the code after looking at the way BoltDB does mmap-ing on Windows.

    The most important thing to note is that in Windows, unlike Linux we
    cannot map beyond the size of the file. So the workaround is to truncate
    the file to a bigger size before mmap-ing it. Special care needs to be
    taken when calling this function, because read-only file descriptors
    cannot be truncated.
  • Loading branch information
manishrjain committed Sep 21, 2017
1 parent 8de624f commit 273f40c
Show file tree
Hide file tree
Showing 4 changed files with 75 additions and 29 deletions.
7 changes: 7 additions & 0 deletions kv.go
Original file line number Diff line number Diff line change
Expand Up @@ -261,8 +261,15 @@ func NewKV(optParam *Options) (out *KV, err error) {
if err = out.vlog.Replay(vptr, fn); err != nil {
return out, err
}

replayCloser.SignalAndWait() // Wait for replay to be applied first.

// Mmap writable log
lf := out.vlog.filesMap[out.vlog.maxFid]
if err = lf.mmap(2 * out.vlog.opt.ValueLogFileSize); err != nil {
return out, errors.Wrapf(err, "Unable to mmap RDWR log file")
}

out.writeCh = make(chan *request, kvWriteChCapacity)
out.closers.writes = y.NewCloser(1)
go out.doWrites(out.closers.writes)
Expand Down
44 changes: 27 additions & 17 deletions value.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ import (
"hash/crc32"
"io"
"io/ioutil"
"math"
"math/rand"
"os"
"sort"
Expand Down Expand Up @@ -145,7 +144,7 @@ func (lf *logFile) read(p valuePointer) (buf []byte, err error) {
return buf, err
}

func (lf *logFile) doneWriting() error {
func (lf *logFile) doneWriting(offset uint32) error {
// Sync before acquiring lock. (We call this from write() and thus know we have shared access
// to the fd.)
if err := lf.fd.Sync(); err != nil {
Expand All @@ -163,6 +162,11 @@ func (lf *logFile) doneWriting() error {
if err := y.Munmap(lf.fmap); err != nil {
return errors.Wrapf(err, "Unable to munmap value log: %q", lf.path)
}
// TODO: Confirm if we need to run a file sync after truncation.
// Truncation must run after unmapping, otherwise Windows would crap itself.
if err := lf.fd.Truncate(int64(offset)); err != nil {
return errors.Wrapf(err, "Unable to truncate file: %q", lf.path)
}
if err := lf.fd.Close(); err != nil {
return errors.Wrapf(err, "Unable to close value log: %q", lf.path)
}
Expand All @@ -181,7 +185,7 @@ type logEntry func(e Entry, vp valuePointer) error

// iterate iterates over log file. It doesn't not allocate new memory for every kv pair.
// Therefore, the kv pair is only valid for the duration of fn call.
func (lf *logFile) iterate(offset uint32, fn logEntry) error {
func (vlog *valueLog) iterate(lf *logFile, offset uint32, fn logEntry) error {
_, err := lf.fd.Seek(int64(offset), io.SeekStart)
if err != nil {
return y.Wrap(err)
Expand Down Expand Up @@ -277,7 +281,8 @@ func (lf *logFile) iterate(offset uint32, fn logEntry) error {
}
}

if truncate {
if truncate && len(lf.fmap) == 0 {
// Only truncate if the file isn't mmaped. Otherwise, Windows would puke.
if err := lf.fd.Truncate(int64(recordOffset)); err != nil {
return err
}
Expand Down Expand Up @@ -365,7 +370,7 @@ func (vlog *valueLog) rewrite(f *logFile) error {
return nil
}

err := f.iterate(0, func(e Entry, vp valuePointer) error {
err := vlog.iterate(f, 0, func(e Entry, vp valuePointer) error {
return fe(e)
})
if err != nil {
Expand Down Expand Up @@ -635,10 +640,6 @@ func (vlog *valueLog) openOrCreateFiles() error {
vlog.opt.SyncWrites); err != nil {
return errors.Wrapf(err, "Unable to open value log file as RDWR")
}

if err := lf.mmap(math.MaxUint32); err != nil {
return errors.Wrapf(err, "Unable to mmap RDWR log file")
}
} else {
if err := lf.openReadOnly(); err != nil {
return err
Expand Down Expand Up @@ -671,10 +672,6 @@ func (vlog *valueLog) createVlogFile(fid uint32) (*logFile, error) {
return nil, errors.Wrapf(err, "Unable to sync value log file dir")
}

if err = lf.mmap(math.MaxUint32); err != nil {
return nil, errors.Wrapf(err, "Unable to mmap value log file")
}

vlog.filesLock.Lock()
vlog.filesMap[fid] = lf
vlog.filesLock.Unlock()
Expand Down Expand Up @@ -702,12 +699,21 @@ func (vlog *valueLog) Close() error {
defer vlog.elog.Finish()

var err error
for _, f := range vlog.filesMap {
for id, f := range vlog.filesMap {

f.lock.Lock() // We won’t release the lock.
if munmapErr := y.Munmap(f.fmap); munmapErr != nil && err == nil {
err = munmapErr
}

if id == vlog.maxFid {
// truncate writable log file to correct offset.
if truncErr := f.fd.Truncate(
int64(vlog.writableLogOffset)); truncErr != nil && err == nil {
err = truncErr
}
}

if closeErr := f.fd.Close(); closeErr != nil && err == nil {
err = closeErr
}
Expand Down Expand Up @@ -752,7 +758,7 @@ func (vlog *valueLog) Replay(ptr valuePointer, fn logEntry) error {
of = 0
}
f := vlog.filesMap[id]
err := f.iterate(of, fn)
err := vlog.iterate(f, of, fn)
if err != nil {
return errors.Wrapf(err, "Unable to replay value log: %q", f.path)
}
Expand Down Expand Up @@ -824,7 +830,7 @@ func (vlog *valueLog) write(reqs []*request) error {

if vlog.writableLogOffset > uint32(vlog.opt.ValueLogFileSize) {
var err error
if err = curlf.doneWriting(); err != nil {
if err = curlf.doneWriting(vlog.writableLogOffset); err != nil {
return err
}

Expand All @@ -835,6 +841,10 @@ func (vlog *valueLog) write(reqs []*request) error {
return err
}

if err = newlf.mmap(2 * vlog.opt.ValueLogFileSize); err != nil {
return err
}

curlf = newlf
}
return nil
Expand Down Expand Up @@ -980,7 +990,7 @@ func (vlog *valueLog) doRunGC(gcThreshold float64) error {

start := time.Now()
y.AssertTrue(vlog.kv != nil)
err := lf.iterate(0, func(e Entry, vp valuePointer) error {
err := vlog.iterate(lf, 0, func(e Entry, vp valuePointer) error {
esz := float64(vp.Len) / (1 << 20) // in MBs. +4 for the CAS stuff.
skipped += esz
if skipped < skipFirstM {
Expand Down
18 changes: 12 additions & 6 deletions value_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,9 @@ func TestChecksums(t *testing.T) {
defer os.RemoveAll(dir)

// Set up SST with K1=V1
kv, err := NewKV(getTestOptions(dir))
opts := getTestOptions(dir)
opts.ValueLogFileSize = 100 * 1024 * 1024 // 100Mb
kv, err := NewKV(opts)
require.NoError(t, err)

var (
Expand All @@ -344,7 +346,7 @@ func TestChecksums(t *testing.T) {
require.NoError(t, ioutil.WriteFile(vlogFilePath(dir, 0), buf, 0777))

// K1 should exist, but K2 shouldn't.
kv, err = NewKV(getTestOptions(dir))
kv, err = NewKV(opts)
require.NoError(t, err)
var item KVItem
require.NoError(t, kv.Get(k1, &item))
Expand All @@ -358,7 +360,7 @@ func TestChecksums(t *testing.T) {

// The vlog should contain K1 and K3 (K2 was lost when Badger started up
// last due to checksum failure).
kv, err = NewKV(getTestOptions(dir))
kv, err = NewKV(opts)
require.NoError(t, err)
iter := kv.NewIterator(DefaultIteratorOptions)
iter.Seek(k1)
Expand All @@ -381,7 +383,9 @@ func TestPartialAppendToValueLog(t *testing.T) {
defer os.RemoveAll(dir)

// Create skeleton files.
kv, err := NewKV(getTestOptions(dir))
opts := getTestOptions(dir)
opts.ValueLogFileSize = 100 * 1024 * 1024 // 100Mb
kv, err := NewKV(opts)
require.NoError(t, err)
require.NoError(t, kv.Close())

Expand All @@ -405,7 +409,7 @@ func TestPartialAppendToValueLog(t *testing.T) {
require.NoError(t, ioutil.WriteFile(vlogFilePath(dir, 0), buf, 0777))

// Badger should now start up, but with only K1.
kv, err = NewKV(getTestOptions(dir))
kv, err = NewKV(opts)
require.NoError(t, err)
var item KVItem
require.NoError(t, kv.Get(k1, &item))
Expand Down Expand Up @@ -478,7 +482,9 @@ func createVlog(t *testing.T, entries []*Entry) []byte {
require.NoError(t, err)
defer os.RemoveAll(dir)

kv, err := NewKV(getTestOptions(dir))
opts := getTestOptions(dir)
opts.ValueLogFileSize = 100 * 1024 * 1024 // 100Mb
kv, err := NewKV(opts)
require.NoError(t, err)
require.NoError(t, kv.BatchSet(entries))
require.NoError(t, kv.Close())
Expand Down
35 changes: 29 additions & 6 deletions y/mmap_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
package y

import (
"fmt"
"math"
"os"
"syscall"
"unsafe"
Expand All @@ -32,19 +34,40 @@ func Mmap(fd *os.File, write bool, size int64) ([]byte, error) {
protect = syscall.PAGE_READWRITE
access = syscall.FILE_MAP_WRITE
}
handler, err := syscall.CreateFileMapping(syscall.Handle(fd.Fd()), nil,
uint32(protect), uint32(size>>32), uint32(size), nil)
fi, err := fd.Stat()
if err != nil {
return nil, err
}
defer syscall.CloseHandle(handler)

mapData, err := syscall.MapViewOfFile(handler, uint32(access), 0, 0, 0)
// Truncate the database to the size of the mmap.
if fi.Size() < size {
if err := fd.Truncate(size); err != nil {
return nil, fmt.Errorf("truncate: %s", err)
}
}

// Open a file mapping handle.
sizelo := uint32(size >> 32)
sizehi := uint32(size) & 0xffffffff

handler, err := syscall.CreateFileMapping(syscall.Handle(fd.Fd()), nil,
uint32(protect), sizelo, sizehi, nil)
if err != nil {
return nil, err
return nil, os.NewSyscallError("CreateFileMapping", err)
}

// Create the memory map.
addr, err := syscall.MapViewOfFile(handler, uint32(access), 0, 0, uintptr(size))
if addr == 0 {
return nil, os.NewSyscallError("MapViewOfFile", err)
}

// Close mapping handle.
if err := syscall.CloseHandle(syscall.Handle(handler)); err != nil {
return nil, os.NewSyscallError("CloseHandle", err)
}

data := (*[1 << 30]byte)(unsafe.Pointer(mapData))[:size]
data := (*[math.MaxUint32]byte)(unsafe.Pointer(addr))[:size]
return data, nil
}

Expand Down

3 comments on commit 273f40c

@GwynethLlewelyn
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question regarding the changes made in kv.go. I'm not familiar enough with the logic behind the code, but I'm now getting the Unable to mmap RDWR log file error even when I'm using the option Opt.TableLoadingMode = options.FileIO. I have to use that option, since in one scenario where I'm using Badger, I have a very small memory footprint, and the Go application is being called via FastCGI with some restrictions — one of them being 'no memory mapping allowed'. Well, sort of: it can be used, so long as the actual memory being allocated is infinitesimally small (but the smallest possible size is 1 MByte). This does not seem to be the case, since, as far as I can understand the code, this is some memory allocated for the log and not the database.

I wonder if there could be an extra check to see what options have been set and avoid mmap if FileIO has been selected?

@manishrjain
Copy link
Contributor Author

@manishrjain manishrjain commented on 273f40c Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... Our testing showed that this fixes the issue.

So, the issue you're seeing is related to value log memory map. The option that we have is for LSM tree. So, changing that option wouldn't change this behavior.

Can you please create a Github issue adding the panic / error that you're getting, the Badger commit you're at, which OS etc.? The engineer working on this is currently on vacation, but we'll try and get to this as soon as we can.

Update: On re-reading your description, seems like you're in an environment where memory mapping isn't allowed? So, this is not a windows issue, but a specific issue with the env you're running in?

@GwynethLlewelyn
Copy link

@GwynethLlewelyn GwynethLlewelyn commented on 273f40c Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're quite right. I'm sorry, I'm not using Windows at all (but a customised Debian Linux setup over which I have no control). However, before this patch was committed, I had no issues with memory allocation; I wrongly interpreted it to be related to something here. It must be a different issue, and so I've filed it separately as per your suggestion: #246

Please sign in to comment.