Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array valuer with delimiter interface #364

Closed
wants to merge 14 commits into from
Closed

Conversation

cbandy
Copy link
Contributor

@cbandy cbandy commented Apr 22, 2015

Another solution for #327. Similar to #353, but includes an interface for the delimiter character.

@johto
Copy link
Contributor

johto commented Apr 24, 2015

Pardon me if this is a stupid question, but why do we need to support arbitrary delimiters?

@cbandy
Copy link
Contributor Author

cbandy commented Apr 24, 2015

why do we need to support arbitrary delimiters?

The array delimiter is an attribute of the type. While most types have a comma delimiter, some do not (e.g. box has a semicolon delimiter.)

I thought this would be a convenient way for a driver.Valuer to also leverage pq.Array. For example,

type Box struct { ... }
func (b Box) Value() (driver.Value, error) { ... }
func (b Box) ArrayDelimiter() string { return ";" }
func (b Box) ArrayValue() (driver.Value, error) { return b.Value() }

db.Query(`SELECT $1::box[]`, pq.Array{[]Box{ ... }})

(Looking at it now, ArrayValue() seems redundant. I'll find out if that is true on the backend.)

@cbandy
Copy link
Contributor Author

cbandy commented Apr 25, 2015

ArrayValue was redundant.

cbandy added 6 commits April 28, 2015 03:17
The process for interpreting text-format array input explicitly
delegates to the text-format input routines of the element type.

driver.Valuer is the appropriate interface for a type's text-format.
These are faster and less memory-intensive than the reflection-based
Valuer.
@cbandy
Copy link
Contributor Author

cbandy commented Apr 28, 2015

As @erykwalder demonstrated, the cost of reflection is significant:

$ go test -bench 'ArrayValue' -benchmem
PASS
BenchmarkBoolArrayValue          5000000               278 ns/op              64 B/op          2 allocs/op
BenchmarkFloat64ArrayValue        300000              4906 ns/op             784 B/op          6 allocs/op
BenchmarkInt64ArrayValue          500000              2446 ns/op             784 B/op          6 allocs/op
BenchmarkStringArrayValue         100000             16939 ns/op            2048 B/op          7 allocs/op
BenchmarkGenericArrayValueBools           500000              3429 ns/op             368 B/op         24 allocs/op
BenchmarkGenericArrayValueFloat64s        200000             10172 ns/op            1392 B/op         45 allocs/op
BenchmarkGenericArrayValueInt64s          300000              5894 ns/op            1184 B/op         26 allocs/op
BenchmarkGenericArrayValueByteSlices      100000             15652 ns/op            1872 B/op         17 allocs/op
BenchmarkGenericArrayValueStrings         100000             18311 ns/op            1712 B/op         17 allocs/op

Also, like @erykwalder, I have come up against choosing escape or hex format to (or to not) support arrays of bytea. My current inclination is to encode to the hex format only, ignoring support for 8.4 or earlier.

@johto
Copy link
Contributor

johto commented Apr 28, 2015

As @erykwalder demonstrated, the cost of reflection is significant:

That was quite expected.

Also, like @erykwalder, I have come up against choosing escape or hex format to (or to not) support arrays of bytea. My current inclination is to encode to the hex format only, ignoring support for 8.4 or earlier.

That sounds reasonable to me. 8.4 has been unsupported for a while, anyway.

@johto
Copy link
Contributor

johto commented Apr 30, 2015

The array delimiter is an attribute of the type. While most types have a comma delimiter, some do not (e.g. box has a semicolon delimiter.)

Oh, right. That exception.

OK. I'll give this a review over the weekend.

@cbandy
Copy link
Contributor Author

cbandy commented Apr 30, 2015

I'll try to add ByteaArray tonight.

@pkieltyka
Copy link

hey guys, thanks for the work on this - what about the array scanners tho?

@cbandy
Copy link
Contributor Author

cbandy commented May 29, 2015

@pkieltyka, it's probably not too difficult to do this for the []bool, [][]byte, []float64, []int64, and []string types here. I may explore that in another PR.

However, I'm not sure what it would take to support multidimensional arrays or arrays/slices of Scanners. I'll think about it.

@johto johto mentioned this pull request May 29, 2015
@johto
Copy link
Contributor

johto commented May 29, 2015

@cbandy: Do you think this is in a committable state right now? It's missing documentation, but perhaps we should commit this interface as an "experimental" feature to try and get some more feedback on it.

@cbandy
Copy link
Contributor Author

cbandy commented May 29, 2015

It needs an integration test--something that asserts the backend agrees. I'd also like the documentation to be adequate before merge. Can you please suggest some ways to improve it?

I can address both in the coming days.

@johto
Copy link
Contributor

johto commented May 29, 2015

It needs an integration test--something that asserts the backend agrees.

All right.

I'd also like the documentation to be adequate before merge. Can you please suggest some ways to improve it?

Uh. This is going to sound stupid, but I completely forgot that Go projects don't put documentation in a dedicated set of files (which was what I was looking for). That said, I'll be happy to do a pass over them once I have an environment where I can see what the godoc looks like. (Is there a way to make godoc do that for github PRs? That could be awesome.)

@piotrkowalczuk
Copy link

Since main discusion is right here i decided to move my comment in here.
In my opinion providing types like sql.NullInt64 is most idiomatic and cover 90% of cases. For example:

type.go

package postgres

const (
    // ArraySeparator ...
    ArraySeparator = ", "
    // ArrayBeginningChar ...
    ArrayBeginningChar = "{"
    // ArrayEndChar ...
    ArrayEndChar = "}"
)

array_int64.go

package postgres

import (
    "bytes"
    "database/sql/driver"
    "errors"
    "strconv"
    "strings"
)

var (
    //ErrArrayInt64UnsupportedSourceType  ...
    ErrArrayInt64UnsupportedSourceType = errors.New("pq: array int64 unsupported source type")
    // ErrArrayInt64UnsupportedSourceFormat ...
    ErrArrayInt64UnsupportedSourceFormat = errors.New("pq: array int64 unsupported source format")
    // ErrArrayInt64UnsupportedValueType ...
    ErrArrayInt64UnsupportedValueType = errors.New("pq: array int64 unsupported value type")
)

// ArrayInt64 is a slice of int64s that implements necessary interfaces.
type ArrayInt64 []int64

// Scan satisfy sql.Scanner interface.
func (a *ArrayInt64) Scan(src interface{}) error {
var tmp []string
    var srcs string

    switch t := src.(type) {
    case []byte:
        srcs = string(t)
    case string:
        srcs = t
    default:
        return ErrArrayInt64UnsupportedSourceType
    }

    l := len(srcs)

    if l < 2 {
        return ErrArrayInt64UnsupportedSourceFormat
    }

    if l == 2 {
        return nil
    }

    if string(srcs[0]) != ArrayBeginningChar || string(srcs[l-1]) != ArrayEndChar {
        return ErrArrayInt64UnsupportedSourceFormat
    }

    tmp = strings.Split(string(srcs[1:l-1]), ArraySeparator)
    for _, v := range tmp {
        i, err := strconv.ParseInt(v, 10, 64)
        if err != nil {
            return ErrArrayInt64UnsupportedValueType
        }

        *a = append(*a, i)
    }

    return nil
}

// Value satisfy driver.Valuer interface.
func (a ArrayInt64) Value() (driver.Value, error) {
    var buffer bytes.Buffer

    buffer.WriteString(ArrayBeginningChar)

    for i, v := range a {
        if i > 0 {
            _, err := buffer.WriteString(ArraySeparator)
            if err != nil {
                return nil, err
            }
        }
        _, err := buffer.WriteString(strconv.FormatInt(v, 10))
        if err != nil {
            return nil, err
        }
    }

    buffer.WriteString(ArrayEndChar)

    return buffer.Bytes(), nil
}

array_int64_test.go

package postgres

import (
    "testing"

    "github.com/stretchr/testify/assert"
)

func TestArrayInt64Value(t *testing.T) {
    success := map[string]ArrayInt64{
        "{1,2,3,4}": ArrayInt64{1, 2, 3, 4},
        "{}":           ArrayInt64{},
    }

    for expected, array := range success {
        got, err := array.Value()
        if assert.NoError(t, err) {
            assert.Equal(t, []byte(expected), got)
        }
    }
}

func TestArrayInt64Scan(t *testing.T) {
    success := map[string]ArrayInt64{
        "{1,2,3,4}": ArrayInt64{1, 2, 3, 4},
    }

    for src, expected := range success {
        var got ArrayInt64
        err := got.Scan(src)
        if assert.NoError(t, err) {
            assert.Equal(t, expected, got)
        }
    }

    fail := map[string]error{
        "{string1,string2}": ErrArrayInt64UnsupportedValueType,
        "}":                  ErrArrayInt64UnsupportedSourceFormat,
        "{":                  ErrArrayInt64UnsupportedSourceFormat,
        "{12412s}":           ErrArrayInt64UnsupportedValueType,
    }

    for src, expected := range fail {
        var got ArrayInt64
        err := got.Scan(src)
        assert.EqualError(t, expected, err.Error())
    }
}

Ofc i dont see anything wrong to provide proxy like functions. Overall API in my opinion could looks like:

type ArrayInt64 []int64
type ArrayFloat64 []float64
type ArrayString []string
type ArrayBool []bool
type ArrayBytes [][]byte

func FromArray(interface{}) driver.Valuer {}
func ToArray(interface{}) io.Scanner {}

Idea of GenericArray is a little bit odd. I cant imagin situation where it can be usefull. If I understand correctly only reason why its there is because default case is not an option.
Instead of that i can suggest smth like that:

type errorValuer error

func (ev errorValuer) Value() (driver.Value, error) {
        return nil, ev
}

func Array(a interface{}) driver.Valuer {
    switch a := a.(type) {
    case []bool:
        return BoolArray(a)
    case []float64:
        return Float64Array(a)
    case []int64:
        return Int64Array(a)
    case []string:
        return StringArray(a)
        default:
                return errorValuer(errors.New("pq: unsupported data type")) // message could be better
    }
}

Example usage for more complex situation could be:

type Custom pq.ArrayInt64

// or 

type MoreCustom []int64

func (mc MoreCustom) Value() (driver.Value, error) {
    return pq.ArrayInt64(mc).Value()
}

@johto
Copy link
Contributor

johto commented May 29, 2015

@piotrkowalczuk wrote:

Ofc i dont see anything wrong to provide proxy like functions. Overall API in my opinion could looks like:

type ArrayInt64 []int64
[ .. etc .. ]

func FromArray(interface{}) driver.Valuer {}

I don't have any strong objections to providing the FromArray "auto-choose" wrapper here. Not sure which version I'd use myself.

Idea of GenericArray is a little bit odd. I cant imagin situation where it can be usefull. If I understand correctly only reason why its there is because default case is not an option.
Instead of that i can suggest smth like that:
return errorValuer(errors.New("pq: unsupported data type")) // message could be better

It's useful in the case where you have an array where the element is any type the driver does not know about. For example, assume you have an array of UUIDs and you want to fetch those from the database. You'd do something like:

// pseudo-code, since I don't have a compiler anywhere near me
uuidArray :=// whatever
rows, err := db.Query(`SELECT * FROM foo WHERE uuid = ANY($1)`, pq.Array{uuidArray})
// ...

I don't see any case here where limiting what the user is allowed to do would be beneficial for us, or the user. Just sending over an array of the string representation of the elements would Just Work, without the user having to mess around with custom types and what not.

@cbandy
Copy link
Contributor Author

cbandy commented May 30, 2015

(Is there a way to make godoc do that for github PRs? That could be awesome.)

Not that I can see. It seems to find the code from the import path the same way as go get.

@cbandy
Copy link
Contributor Author

cbandy commented May 30, 2015

Idea of GenericArray is a little bit odd. I cant imagin situation where it can be usefull. If I understand correctly only reason why its there is because default case is not an option.
Instead of that i can suggest smth like that:

return errorValuer(errors.New("pq: unsupported data type")) // message could be better

It's useful in the case where you have an array where the element is any type the driver does not know about.

This includes something as simple as []int. When one calls db.Query() with an int, uint, byte, or rune, the sql package converts it to int64 before passing it to the pq package. The pq package doesn't have code to handle int arguments, but the user expects them to work. I think users would expect slices of int, uint, etc to behave in a similar way.

GenericArray also supports nested or multidimensional arrays (see SQL:2003.) For example, a matrix of integers could be held in [][]int in Go, passed to sql/pq via GenericArray, and stored in int[][] in Postgres. I think our "array" feature must include this.

@johto
Copy link
Contributor

johto commented Jun 1, 2015

@cbandy wrote:

This includes something as simple as []int. When one calls db.Query() with an int, uint, byte, or rune, the sql package converts it to int64 before passing it to the pq package. The pq package doesn't have code to handle int arguments, but the user expects them to work. I think users would expect slices of int, uint, etc to behave in a similar way.

Good point.

@johto
Copy link
Contributor

johto commented Jun 1, 2015

I'm going to look into committing this as an "experimental" feature next. I think it'd be nice to have both Values and Scanners committed while still reserving the "right" to make changes to the interface, in case we find something that doesn't quite work as initially committed.

@johto
Copy link
Contributor

johto commented Jun 16, 2015

What I've done in my local branch yesterday is to change pq.Array() to return an interface{}, which currently only implements driver.Valuer, but would in the future implement sql.Scanner.

Though there is a problem with that: a Scanner will require a pointer to a slice, whereas a Valuer can work with just a slice "value". One option is to divide the structs into Valuers and Scanners; e.g. Int64Valuer, Int64Scanner. Another option would be to do something like this:

type Int64Array struct { A *[]int64 }

 // Value implements the driver.Valuer interface.
func (arr Int64Array) Value() (driver.Value, error) {
    a := *arr.A
    if a == nil {
        return nil, nil

I've already gone ahead and done the work for option two. Does anybody want to object to this way of writing these? Or anyone have any better ideas?

@johto
Copy link
Contributor

johto commented Jun 16, 2015

Also, all of these methods are currently returning byte slices rather than strings, which quite badly interferes with #357. There appears to be a small performance penalty (~1010 ns/op vs. ~1070ns/op on my computer) for adding a conversion back to string to the return statements of the Valuers, but I don't see that as an issue. Does anyone have a reason for not doing that?

@cbandy
Copy link
Contributor Author

cbandy commented Jun 17, 2015

The generic array valuer seems to be having a bit of a problem with types where the underlying type is a slice.

Addressed in 166afcf.

cbandy added 2 commits June 17, 2015 03:23
For example, UUID is often implemented as an array of bytes. Such an
array should be treated as a value rather than another dimension.
The Value() method returns the representation in text format.
@cbandy
Copy link
Contributor Author

cbandy commented Jun 17, 2015

Also, all of these methods are currently returning byte slices rather than strings, which quite badly interferes with #357. There appears to be a small performance penalty (~1010 ns/op vs. ~1070ns/op on my computer) for adding a conversion back to string to the return statements of the Valuers, but I don't see that as an issue. Does anyone have a reason for not doing that?

It incurs another alloc and copy of the contents. I'm not opposed, however. Addressed in 2810e56.

@cbandy
Copy link
Contributor Author

cbandy commented Jun 17, 2015

What I've done in my local branch yesterday is to change pq.Array() to return an interface{}, which currently only implements driver.Valuer, but would in the future implement sql.Scanner.

Though there is a problem with that: a Scanner will require a pointer to a slice, whereas a Valuer can work with just a slice "value".

Which is why I think we should wait to add any function(s) that try to lump these together. 😉

Does anybody want to object to this way of writing these? Or anyone have any better ideas?

I think one should be scanning into our types directly, e.g.

var a pq.Int64Array
err := db.QueryRow(`...`).Scan(&a)

The implementation looks normal:

func (a *Int64Array) Scan(src interface{}) error {
  for ... src ... {
    *a = append(*a, intValue)
  }
}

@erykwalder
Copy link
Contributor

I think ideally we have an interface that encompasses both Scanner/Valuer and return that from pq.Array.

As far as what's passed in, would we be able to do something like for Valuing:

rows, err := db.Query(q, pq.Array(someArr))

and something like this for scanning:

err := rows.Scan(pq.Array(&someArr))

But maybe I'm just not thinking about this right. I know it works pretty simply for the generic, not sure about the others.

@johto
Copy link
Contributor

johto commented Jun 18, 2015

As far as what's passed in, would we be able to do something like for Valuing:

rows, err := db.Query(q, pq.Array(someArr))
and something like this for scanning:

err := rows.Scan(pq.Array(&someArr))

Yeah, this is what I have in my local branch right now. In the first case we store a pointer to someArr inside the chosen array implementation and dereference it when Value()ing.

@cbandy
Copy link
Contributor Author

cbandy commented Jun 18, 2015

If you all are determined to prescribe the interface for Scanners, then let's just implement Scanners now...

Some questions I've encountered while thinking about Scanners:

  1. Should a user be able to var a pq.Int64Array; a.Scan("{1,2,3}")?
  2. What should happen when a user tries to scan dimensions that don't match, e.g. a PostgreSQL int[] into a Go [][]int?
  3. What should we do when the user explicitly sets the lower bound, e.g. '[5:6]={2,3}'::int[]?
  4. What destination types should we support? The Valuer implementation delegated this responsibility to driver.DefaultParameterConverter, but I don't see any exported access to the Scan equivalent, convertAssign.

@johto
Copy link
Contributor

johto commented Jun 18, 2015

Some questions I've encountered while thinking about Scanners:

Should a user be able to var a pq.Int64Array; a.Scan("{1,2,3}")?

I think that could work. Given:

type Int64Array struct { A *[]int64 }

Scanning into that directly would realize the A pointer is nil and would create a new slice. If you went through the pq.Array() interface and gave a pointer to your own slice, that would be first cleared and then appended to instead. AFAICT that would work nicely for all three use cases (two different types of Scanning, and one Valuer).

What should happen when a user tries to scan dimensions that don't match, e.g. a PostgreSQL int[] into a Go [][]int?

An error.

What should we do when the user explicitly sets the lower bound, e.g. '[5:6]={2,3}'::int[]?

I don't really care. I see exactly zero use for arrays like this. Perhaps just an error from the depths of hell would work.

What destination types should we support? The Valuer implementation delegated this responsibility to driver.DefaultParameterConverter, but I don't see any exported access to the Scan equivalent, convertAssign.

Hmm. Perhaps we could just support the "native" ones, and then the "generic" implementation would require Scanner to be implemented for the types of the elements?

@erykwalder
Copy link
Contributor

There's also the possibility of something like json.Unmarshal, which just tries to figure out scanning for generic types, if possible, returning an error otherwise.

@cbandy
Copy link
Contributor Author

cbandy commented Jun 19, 2015

type Int64Array struct { A *[]int64 }

Or anyone have any better ideas?

The following appears to work:

func (a *BoolArray) Scan(src interface{}) error { ... }

func TestBoolArrayScan(t *testing.T) {
        var b []bool
        var c = (*BoolArray)(&b)

        err = c.Scan([]byte(`{t,f,t}`))
        if err != nil {
                t.Fatalf("Expected no error, got %v", err)
        }
        if !reflect.DeepEqual(b, []bool{true, false, true}) {
                t.Errorf("got %q", a)
        }
}

I think this means we can implement the Scanner interface on *{Bool,Bytea,Float64,Int64,String}Array (i.e. pointers) normally and have Array() perform a conversion.

I would very much like to keep the types/definitions the same as they are in this PR currently.

@erykwalder
Copy link
Contributor

Would it be helpful for me to spike out a generic scanner?

@cbandy
Copy link
Contributor Author

cbandy commented Jun 26, 2015

Would it be helpful for me to spike out a generic scanner?

It would definitely help me see how it can work.

@cbandy
Copy link
Contributor Author

cbandy commented Jun 28, 2015

I've written some tests for the basic scanners, but I haven't come upon implementations that I like.

@johto
Copy link
Contributor

johto commented Aug 12, 2015

I've written some tests for the basic scanners, but I haven't come upon implementations that I like.

The work here seems to have been sitting idle for a while. I don't see the problem with doing this the way I suggested:

type Int64Array struct { A *[]int64 }

AFAICT this will provide a nice interface for both scanners and valuers.

@cbandy
Copy link
Contributor Author

cbandy commented Aug 12, 2015

I've written some tests for the basic scanners, but I haven't come upon implementations that I like.

The work here seems to have been sitting idle for a while.

@johto I've finished implementing Scanners on the basic/one-dimensional arrays in another branch. I was going to post here once I finished the generic implementation, but I've stalled out re-implementing convertAssign.

What destination types should we support? The Valuer implementation delegated this responsibility to driver.DefaultParameterConverter, but I don't see any exported access to the Scan equivalent, convertAssign.

Hmm. Perhaps we could just support the "native" ones, and then the "generic" implementation would require Scanner to be implemented for the types of the elements?

I will post patches for this approach ASAP. Shall I go ahead and add Scanners to this PR?

@johto
Copy link
Contributor

johto commented Aug 12, 2015

Shall I go ahead and add Scanners to this PR?

Sounds good.

I will post patches for this approach ASAP.

.. though don't hurry too much. I'm going try to and spend the next couple of weeks on Postgres stuff, which means I'll likely not have too much time and/or energy to spend on reviewing this patch.

@xentek
Copy link

xentek commented Oct 7, 2015

Thank you for all of the hard work on implementing this type in a way that has a nice interface and conforms to go idioms and interfaces. Any chance this will see the light of day soon?

@piotrkowalczuk
Copy link

@ibash
Copy link

ibash commented Mar 8, 2016

Is this still being worked on?

@cbandy
Copy link
Contributor Author

cbandy commented Mar 11, 2016

Is this still being worked on?

Things got busy before and after a new baby. As I recall, the pared-down goal of a slice of Scanners still had a problem because the method is defined on a pointer:

var x Thing
rows.Scan(&x) // success
rows.Scan(x)  // failure

I think a user would expect to do the following:

var y []Thing
rows.Scan(&y)

However, Thing doesn't implement Scanner (*Thing does.) This isn't a slice of Scanner. Creating and checking the pointer of an element at each level started to feel like magic. Maybe it wasn't so bad...

At this point, I'd like to get something published. I think the quickest way forward is to support sending and receiving only one-dimensional arrays, and in the generic case, only Valuer and Scanner types.

Are these two limitations acceptable to others?


  • The "native" types are done.
  • The code to send multidimensional arrays is done.
  • The code to parse the multidimensional array format is done.
  • ToDo: Scan the parsed elements into a slice of (almost) Scanner.

@ibash
Copy link

ibash commented Mar 11, 2016

@cbandy congrats!
I have no qualms with publishing with the two limitations mentioned.

@douglarek
Copy link

Is this still being worked on?

@cbandy
Copy link
Contributor Author

cbandy commented May 26, 2016

I rebased this work, included scanners, and opened a fresh PR. Please take a look. I think I addressed all the comments here, but if not, I'd like to continue the discussion there.

@freeformz
Copy link

I think this can be closed since #466 was merged. Please re-open if I am wrong.

@freeformz freeformz closed this Aug 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants