Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLCodec.DecodeValue always returns nil #2227

Open
lovromazgon opened this issue Jan 10, 2025 · 11 comments · May be fixed by #2228
Open

XMLCodec.DecodeValue always returns nil #2227

lovromazgon opened this issue Jan 10, 2025 · 11 comments · May be fixed by #2228
Labels

Comments

@lovromazgon
Copy link

Describe the bug

When trying to decode an XML value into a Go type, the driver always returns nil, because it tries to unmarshal the value into an any typed value in this line. According to the documentation, xml.Unmarshal will discard data that doesn't fit into the supplied value:

Unmarshal parses the XML-encoded data and stores the result in the value pointed to by v, 
which must be an arbitrary struct, slice, or string. 
Well-formed data that does not fit into v is discarded.

To Reproduce

Runnable example: https://go.dev/play/p/yLdFqZPaoUo

package main

import (
	"encoding/xml"
	"fmt"
)

func main() {
	bxml := []byte(`<root><key1>v1</key1><key2>v2</key2></root>`)

	var vxml any
	if err := xml.Unmarshal(bxml, &vxml); err != nil {
		fmt.Println(err)
	}
	
	// vxml is always nil
	fmt.Printf("xml value: %v, type: %T\n", vxml, vxml)
}

Expected behavior

Probably that raw XML bytes are returned.

Actual behavior

The returned value contains nil.

Version

  • pgx: v5.7.0+

Additional context

Discovered by @lyuboxa #2083 (comment).

@felix-roehrich
Copy link
Contributor

Please provide an example using pgx. Your example is not equivalent to the line you highlighted and the line itself is correct.

@lyuboxa
Copy link

lyuboxa commented Jan 10, 2025

Please provide an example using pgx. Your example is not equivalent to the line you highlighted and the line itself is correct.

pgx/pgtype/xml.go

Lines 195 to 196 in c2175fe

var dst any
err := c.Unmarshal(src, &dst)

@lovromazgon
Copy link
Author

There you go.

package main

import (
	"encoding/xml"
	"fmt"

	"github.com/jackc/pgx/v5/pgtype"
)

func main() {
	codec := &pgtype.XMLCodec{Marshal: xml.Marshal, Unmarshal: xml.Unmarshal}
	out, err := codec.DecodeValue(
		pgtype.NewMap(),
		uint32(pgtype.XMLOID),
		0,
		[]byte(`<root><key1>v1</key1><key2>v2</key2></root>`),
	)
	if err != nil {
		panic(err)
	}
	// out is always nil
	fmt.Println(out)
}

@felix-roehrich
Copy link
Contributor

Sorry, but this is not a proper example. At no point, it is guaranteed the pgtype.XMLCodec.Unmarshal is xml.Unmrashal, rather this should not be the case. That is why a full example with Conn.Query (or similar) is necessary, i.e. it should connect to a database and query a column. Then, I can help find a bug in pgx or in your example.

@felix-roehrich
Copy link
Contributor

For example, if you look at the pgtype.JSONCodec, you will see that the code there is exactly the same. If there is a bug, then the wrong Unmarshal function is set somewhere, but without a concrete example, I don't believe I can find the problem.

Note that &dst is a pointer to an interface, thus a correct Unmarshal should first allocate the proper type, then xml.Unmarshal and finally set dst.

@lovromazgon
Copy link
Author

lovromazgon commented Jan 10, 2025

At no point, it is guaranteed the pgtype.XMLCodec.Unmarshal is xml.Unmrashal

It's not guaranteed, but it's the default:

defaultMap.RegisterType(&Type{Name: "xml", OID: XMLOID, Codec: &XMLCodec{Marshal: xml.Marshal, Unmarshal: xml.Unmarshal}})

Note that &dst is a pointer to an interface

Please have another look at these two lines and note that dst will always be a nil value of type any, and the implementation of xml.Unmarshal (the default for this codec) will always return nil for a valid XML without populating dst.

pgx/pgtype/xml.go

Lines 195 to 196 in c2175fe

var dst any
err := c.Unmarshal(src, &dst)

We can go back and forth on a "proper" example, but connecting to the database won't change the outcome here. More importantly - did you check the original comment that sparked this issue? @jackc confirmed that this is indeed an issue #2083 (comment).

@felix-roehrich
Copy link
Contributor

More importantly - did you check the original comment that sparked this issue? @jackc confirmed that this is indeed an issue #2083 (comment).

He said: "I think you're right.". That is no confirmation, rather just looking at these two lines of code might have confused himself.

Yes, xml.Unmarshal will always return nil. But that is only important if the planned scan falls back on this, otherwise scanning values will work, so this would be a nonissue.

Ultimately, I believe DecodeValue(m *Map, oid uint32, format int16, src []byte, dst any) error is a better signature, but this would be breaking change. With the current signature there is nothing that can be improved here.

@lyuboxa
Copy link

lyuboxa commented Jan 10, 2025

@felix-roehrich I'm at a loss trying to understand what is not clear here. The library can be used in multiple ways, data can be scanned or it can be relied on the library to pick the best type (which in this case may be bytes). Not only this, but the default serder used, explicitly suggested that using any will result in nothing, which this library is using.

See the following example. You are correct that scanning directly into a known type works, however requesting the values directly results in nil and it is unexpected. This is not a nondeterministic problem, since the type is quite evidently determined to be xml. The codec can do better knowing all this.

package main

import (
	"context"
	"os"
	"log"

	"github.com/jackc/pgx/v5"
)

func main(){
	ctx := context.Background()

	c, err := pgx.Connect(ctx, os.Getenv("DATABASE_URL"))
	defer c.Close(ctx)

	if _, err := c.Exec(ctx, `CREATE TABLE IF NOT EXISTS xmldata (data xml)`); err != nil {
		log.Fatal(err)
	}

	if _, err := c.Exec(ctx, `TRUNCATE TABLE xmldata`); err != nil {
		log.Fatal(err)
	}

	bxml := []byte(`<root><key1>v1</key1><key2>v2</key2></root>`)
	if _, err := c.Exec(ctx, `INSERT INTO xmldata (data) VALUES ($1)`, bxml); err != nil {
		log.Fatal(err)
	}

	rows, err := c.Query(ctx, `SELECT data FROM xmldata`)
	if err != nil {
		log.Fatal(err)
	}
	defer rows.Close()

	for rows.Next() {
		var s string
		if err := rows.Scan(&s); err != nil {
			log.Fatal(err)
		}

		log.Printf("svalue: %s\n", s)

		vals, err := rows.Values()
		if err != nil {
			log.Fatal(err)
		}

		for _, v := range vals {
			log.Printf("value: %v, type: %T", v, v)
		}
	}
}

Results in

go run main.go
2025/01/10 13:55:03 svalue: <root><key1>v1</key1><key2>v2</key2></root>
2025/01/10 13:55:03 value: <nil>, type: <nil>

Another usage where the library is used indirectly via something like pglogical and has to use the codecs directly (e.g.DecodeValue()) without a choice to provide the destination type, although in this specific case the DecodeSQLDatabaseValue() can be used, but this aside.

@felix-roehrich
Copy link
Contributor

Thanks, for your example. In this case the best that can be done, is returning the raw bytes or string. Do you mean that this should be the default behaviour?
But even so, the problem is not the line you mentioned but rather the default codec is poorly chosen. Since before, it was only about this line, which on its own is correct. What do you expect me to see?

@lyuboxa
Copy link

lyuboxa commented Jan 10, 2025

Thanks, for your example. In this case the best that can be done, is returning the raw bytes or string. Do you mean that this should be the default behaviour?

Yes, when the target type is unavailable I believe this is the best option, then the user can do whatever they need with the bytes, e.g. serder into map or custom type.

But even so, the problem is not the line you mentioned but rather the default codec is poorly chosen. Since before, it was only about this line, which on its own is correct. What do you expect me to see?

I disagree, the issue is represented by the assumption where the default un/marshal codec can decode in any, which is what that line did and that is not supported as per encoding/xml. Besides I have no expectations, I have addressed this is in my code, this is a courtesy call to the library.

jackc added a commit that referenced this issue Jan 11, 2025
Previously, DecodeValue would always return nil with the default
Unmarshal function.

fixes #2227
@jackc jackc linked a pull request Jan 11, 2025 that will close this issue
jackc added a commit that referenced this issue Jan 11, 2025
Previously, DecodeValue would always return nil with the default
Unmarshal function.

fixes #2227
@jackc
Copy link
Owner

jackc commented Jan 11, 2025

Thanks for the report. It does appear that DecodeValue with the default xml.Unmarshal is wrongly returning nil. I've created a PR that always returns the []byte contents.

Also, perhaps the original author of the XMLCodec may have some feedback - @nickcruess-soda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants