Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lmdbscan: better ability to handle duplicates in a special way #17

Closed
bmatsuo opened this issue Nov 1, 2015 · 3 comments
Closed

lmdbscan: better ability to handle duplicates in a special way #17

bmatsuo opened this issue Nov 1, 2015 · 3 comments
Milestone

Comments

@bmatsuo
Copy link
Owner

bmatsuo commented Nov 1, 2015

Scanning over databases and treating values for duplicate keys specially seems to be fairly cumbersome. It is always possible to iterate strictly using Cursor.Next. But a simple action like collecting all values for duplicate keys and printing them, is not as easy is it potentially could be. This is an example of scanning over a database using the NextNoDup and NextDup flags.

    err := env.View(func(txn *lmdb.Txn) (err error) {
        scanner := lmdbscan.New(txn, dbi)
        defer scanner.Close()

        for {
            scanner.Set(nil, nil, lmdb.NextNoDup)
            if !scanner.Scan() {
                return scanner.Err()
            }  
            k := scanner.Key()
            vals := [][]byte{scanner.Val()}
            scanner.SetNext(nil, nil, lmdb.NextDup, lmdb.NextDup)
            for scanner.Scan() {
                vals = append(vals, scanner.Val())
            }  
            if scanner.Err() != nil {
                return scanner.Err()
            }
            log.Printf("k=%q vals=%q", k, vals)
        }  
    }) 
    if err != nil {
        panic(err)
    }  

It should not be so clumsy. In particular, scanner.SetNext(nil, nil, lmdb.NextDup, lmdb.NextDup) is pretty lame.

@bmatsuo
Copy link
Owner Author

bmatsuo commented Nov 1, 2015

I think its possible that a straightforward solution to this problem is to have the Scanner.Set and Scanner.SetNext functions actually call Cursor.Get, setting Scanner.Key, Scanner.Val, and Scanner.Err and buffering the result for the next call to Scanner.Scan.

    err := env.View(func(txn *lmdb.Txn) (err error) {
        scanner := lmdbscan.New(txn, dbi)
        defer scanner.Close()

        for scanner.SetNext(nil, nil, lmdb.NextNoDup, lmdb.NextDup)  {
            var vals [][]byte
            k := scanner.Key()
            for scanner.Scan() {
                vals = append(vals, scanner.Val())
            }
            log.Printf("k=%q vals=%q", k, vals)

            // this may even be optional
            if scanner.Err() != nil {
                return scanner.Err()
            }
        }
        return scanner.Err()
    })
    if err != nil {
        panic(err)
    }

I believe that this can also work for scanning DupFixed databases.

    err := env.View(func(txn *lmdb.Txn) (err error) {
        scanner := lmdbscan.New(txn, dbi)
        defer scanner.Close()

        for scanner.Set(nil, nil, lmdb.NextNoDup) {
            var vals [][]byte
            k := scanner.Key()
            dup0 := scanner.Val()
            if scanner.SetNext(nil, nil, lmdb.GetMultiple, lmdb.NextMultiple) {
                vals = [][]byte{dup0}
            }
            for scanner.Scan() {
                multi := lmdb.WrapMulti(scanner.Val(), len(dup0))
                vals = append(vals, multi.Vals()...)
            }
            log.Printf("k=%q vals=%q", k, vals)

            // this may even be optional
            if scanner.Err() != nil {
                return scanner.Err()
            }
        }
        return scanner.Err()
    })
    if err != nil {
        panic(err)
    }

@bmatsuo bmatsuo changed the title lmdbscan: better ability handle duplicates in a special way lmdbscan: better ability to handle duplicates in a special way Nov 1, 2015
bmatsuo added a commit that referenced this issue Nov 2, 2015
Described in #17.

The Scanner.Set and Scanner.SetNext methods actually call Cursor.Get to
prepare for the next call to Scanner.Scan, which becomes a noop.  The
return value added to the methods should allow more concise
representation of complex scanning behaviors.

Most existing code should behave the same.  Code will not behave
correctly if Scanner.Key, Scanner.Val, or Scanner.Err are accessed
following a call to Scanner.Set, expecting to retrieve values from a
preceding call to Scanner.Scan.
@bmatsuo
Copy link
Owner Author

bmatsuo commented Nov 2, 2015

I think the proposed solution seems pretty good. The lmdb.GetMultiple example is a little clumsy. But afaict at the moment that is because the operation itself is a little clumsy, forcing a distinction between duplicates and a single value.

@bmatsuo
Copy link
Owner Author

bmatsuo commented Nov 2, 2015

I think this is orthogonal to a separate method to set opnext, proposed in #18. If such a method exists it should probably advance the cursor when it is called and return a bool too.

@bmatsuo bmatsuo added this to the v1.3.0 milestone Nov 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant