lmdbscan: better ability to handle duplicates in a special way #17

bmatsuo · 2015-11-01T03:15:32Z

Scanning over databases and treating values for duplicate keys specially seems to be fairly cumbersome. It is always possible to iterate strictly using Cursor.Next. But a simple action like collecting all values for duplicate keys and printing them, is not as easy is it potentially could be. This is an example of scanning over a database using the NextNoDup and NextDup flags.

    err := env.View(func(txn *lmdb.Txn) (err error) {
        scanner := lmdbscan.New(txn, dbi)
        defer scanner.Close()

        for {
            scanner.Set(nil, nil, lmdb.NextNoDup)
            if !scanner.Scan() {
                return scanner.Err()
            }  
            k := scanner.Key()
            vals := [][]byte{scanner.Val()}
            scanner.SetNext(nil, nil, lmdb.NextDup, lmdb.NextDup)
            for scanner.Scan() {
                vals = append(vals, scanner.Val())
            }  
            if scanner.Err() != nil {
                return scanner.Err()
            }
            log.Printf("k=%q vals=%q", k, vals)
        }  
    }) 
    if err != nil {
        panic(err)
    }

It should not be so clumsy. In particular, scanner.SetNext(nil, nil, lmdb.NextDup, lmdb.NextDup) is pretty lame.

The text was updated successfully, but these errors were encountered:

bmatsuo · 2015-11-01T03:20:39Z

I think its possible that a straightforward solution to this problem is to have the Scanner.Set and Scanner.SetNext functions actually call Cursor.Get, setting Scanner.Key, Scanner.Val, and Scanner.Err and buffering the result for the next call to Scanner.Scan.

    err := env.View(func(txn *lmdb.Txn) (err error) {
        scanner := lmdbscan.New(txn, dbi)
        defer scanner.Close()

        for scanner.SetNext(nil, nil, lmdb.NextNoDup, lmdb.NextDup)  {
            var vals [][]byte
            k := scanner.Key()
            for scanner.Scan() {
                vals = append(vals, scanner.Val())
            }
            log.Printf("k=%q vals=%q", k, vals)

            // this may even be optional
            if scanner.Err() != nil {
                return scanner.Err()
            }
        }
        return scanner.Err()
    })
    if err != nil {
        panic(err)
    }

I believe that this can also work for scanning DupFixed databases.

    err := env.View(func(txn *lmdb.Txn) (err error) {
        scanner := lmdbscan.New(txn, dbi)
        defer scanner.Close()

        for scanner.Set(nil, nil, lmdb.NextNoDup) {
            var vals [][]byte
            k := scanner.Key()
            dup0 := scanner.Val()
            if scanner.SetNext(nil, nil, lmdb.GetMultiple, lmdb.NextMultiple) {
                vals = [][]byte{dup0}
            }
            for scanner.Scan() {
                multi := lmdb.WrapMulti(scanner.Val(), len(dup0))
                vals = append(vals, multi.Vals()...)
            }
            log.Printf("k=%q vals=%q", k, vals)

            // this may even be optional
            if scanner.Err() != nil {
                return scanner.Err()
            }
        }
        return scanner.Err()
    })
    if err != nil {
        panic(err)
    }

Described in #17. The Scanner.Set and Scanner.SetNext methods actually call Cursor.Get to prepare for the next call to Scanner.Scan, which becomes a noop. The return value added to the methods should allow more concise representation of complex scanning behaviors. Most existing code should behave the same. Code will not behave correctly if Scanner.Key, Scanner.Val, or Scanner.Err are accessed following a call to Scanner.Set, expecting to retrieve values from a preceding call to Scanner.Scan.

bmatsuo · 2015-11-02T04:33:02Z

I think the proposed solution seems pretty good. The lmdb.GetMultiple example is a little clumsy. But afaict at the moment that is because the operation itself is a little clumsy, forcing a distinction between duplicates and a single value.

bmatsuo · 2015-11-02T04:34:37Z

I think this is orthogonal to a separate method to set opnext, proposed in #18. If such a method exists it should probably advance the cursor when it is called and return a bool too.

bmatsuo changed the title ~~lmdbscan: better ability handle duplicates in a special way~~ lmdbscan: better ability to handle duplicates in a special way Nov 1, 2015

bmatsuo mentioned this issue Nov 1, 2015

lmdbscan: allow opnext to be set without opset? #18

Closed

bmatsuo mentioned this issue Nov 2, 2015

Bmatsuo/lmdbscan set calls get #23

Merged

bmatsuo closed this as completed in #23 Nov 2, 2015

bmatsuo added this to the v1.3.0 milestone Nov 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lmdbscan: better ability to handle duplicates in a special way #17

lmdbscan: better ability to handle duplicates in a special way #17

bmatsuo commented Nov 1, 2015

bmatsuo commented Nov 1, 2015

bmatsuo commented Nov 2, 2015

bmatsuo commented Nov 2, 2015

lmdbscan: better ability to handle duplicates in a special way #17

lmdbscan: better ability to handle duplicates in a special way #17

Comments

bmatsuo commented Nov 1, 2015

bmatsuo commented Nov 1, 2015

bmatsuo commented Nov 2, 2015

bmatsuo commented Nov 2, 2015