Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need proper handling for invalid character for CDATA field #138

Open
firodj opened this issue Jul 7, 2022 · 13 comments · Fixed by #140
Open

Need proper handling for invalid character for CDATA field #138

firodj opened this issue Jul 7, 2022 · 13 comments · Fixed by #140
Assignees
Milestone

Comments

@firodj
Copy link

firodj commented Jul 7, 2022

When output tests contains colored output log that may include escape character (0x1b) on test output report, the escape character is not treated well by encoding/xml package for CDATA section, so it will generate invalid XML document (as spec), that may result the test report being unreadable by the CI (in my case GitLab CI).

Related issue: golang/go#53728

@jstemmer
Copy link
Owner

jstemmer commented Jul 7, 2022

Thanks. It sounds like we should at least strip the invalid characters from the CDATA section. However, since ANSI escape sequences also contain regular characters we'd probably want to strip the entire ANSI sequence rather than just the (invalid) control characters.

@emtammaru
Copy link

I just ran into the same issue with Jenkins junit parser:

org.dom4j.DocumentException: Error on line 5 of document  : An invalid XML character (Unicode: 0x1b) was found in the CDATA section.

@janisz
Copy link
Contributor

janisz commented Aug 2, 2022

Same here, I've tried to update to v2 stackrox/stackrox#2586 and prow does not parse XML as it's not valid.

error on line 2291 at column 25: Input is not proper UTF-8, indicate encoding !
Bytes: 0x00 0x63 0x32 0x68

@jstemmer jstemmer self-assigned this Aug 3, 2022
@jstemmer
Copy link
Owner

jstemmer commented Aug 3, 2022

@janisz Where do these invalid bytes come from, this doesn't look like an ansi escape character. Any concerns if they are just dropped completely from the output?

@jstemmer jstemmer pinned this issue Aug 3, 2022
@janisz
Copy link
Contributor

janisz commented Aug 4, 2022

This bytes came from key that is logged. Key is generated as prefix + separator + id where separator is []byte("\x00") which is a control character. I found a workaround of this and print the key with %q instead of %s. From my perspective it's save to ignore them or escape.
https://github.com/stackrox/stackrox/blob/1c51bfebc23b2748bc103a583b5460611dfcad53/pkg/dbhelper/db.go#L10

Minimal example:

package main

import "testing"

func TestEscape(_ *testing.T) {
	println("Test '\x00'")
}
 $ go test ./... -v  2>&1 | go-junit-report -set-exit-code | file -bi -
+ Actual  : application/octet-stream; charset=binary
- Expected: text/xml; charset=us-ascii

janisz added a commit to janisz/go-junit-report that referenced this issue Aug 4, 2022
@janisz
Copy link
Contributor

janisz commented Aug 5, 2022

Created a PR to drop control characters from output #140

janisz added a commit to janisz/go-junit-report that referenced this issue Sep 15, 2022
@jstemmer jstemmer reopened this Sep 17, 2022
@jstemmer
Copy link
Owner

Leaving this issue open for now, because it would be nice if we could also detect ANSI escape sequences and remove them from the output.

@sruehl
Copy link

sruehl commented Jun 2, 2023

@jstemmer is #162 sufficient?

@TafThorne
Copy link

I have run into a similar problem when trying out the newer v2 tool with my GitLab CI pipelines. Before this update the tool was working very well, attempting to run with it today caused GitLab to show:
image
With the error text listed as:
JUnit XML parsing failed: 4106:21: FATAL: CData section not finished 2023/07/20 10:27:5
Looking in the junit-report.xml file that was generated and that is being complained about I find the following lines:

	<testsuite name="gitlab.company.com/some/paths/gomodule" tests="72" failures="0" errors="0" id="82" hostname="runner-bvfrfxhz-project-185-concurrent-0rd9jq" time="0.550" timestamp="2023-07-20T10:29:44Z">
		<testcase name="Test_Buffer_Is_Full" classname="gitlab.company.com/some/paths/gomodule" time="0.000"></testcase>
		<testcase name="Test_Buffer_Processing" classname="gitlab.company.com/some/paths/gomodule" time="0.000"></testcase>
		<testcase name="Test_VariablesCanBeRead" classname="gitlab.company.com/some/paths/gomodule" time="0.020">
			<system-out><![CDATA[Calibration [ERROR] precondition not met
Calibration [ERROR] precondition not met
Calibration [ERROR] precondition not met
Calibration [ERROR] precondition not met
properties [WARN] Warning no property file located]]></system-out>
		</testcase>
		<testcase name="Test_ChangeDbPath" classname="gitlab.company.com/some/paths/gomodule" time="0.020"></testcase>
		<testcase name="Test_InvalidPathCannotBeApplied" classname="gitlab.company.com/some/paths/gomodule" time="0.000">
			<system-out><![CDATA[
2023/07/20 10:27:56 �[35m/builds/gitlab.company.com/some/paths/gomodule/persistence.go:53
�[0m�[31m[error] �[0mfailed to initialize database, got error unable to open database file: is a directory]]></system-out>
		</testcase>

(company identifying path names altered)
It is the second set of CDATA being complained about. As others have said, likely due to the presence of an escape or UNICODE charter.

A cheap work around would be to loose the "system-out" data with a parameter or configuration setting. Stripping or replacing invalid characters would obviously be nicer if possible.

Let me know if I can be of any help in diagnosing or fixing the problem.

@janisz
Copy link
Contributor

janisz commented Jul 20, 2023

@TafThorne what version are you using? I thought special characters will be replaced with #140

@sruehl
Copy link

sruehl commented Jul 20, 2023

I would assume #140 was not sufficient. Had the same problem and needed to use the fork / the PR #162 to fix it.

@TafThorne
Copy link

@TafThorne what version are you using? I thought special characters will be replaced with #140

I ran with the latest v2 tagged code I think:

go install github.com/jstemmer/go-junit-report/v2@latest && go install github.com/t-yuki/gocover-cobertura@latest�[0;m
go: downloading github.com/jstemmer/go-junit-report/v2 v2.0.0
go: downloading github.com/jstemmer/go-junit-report v1.0.0
go: downloading github.com/t-yuki/gocover-cobertura v0.0.0-20180217150009-aaee18c8195c

So that would be v2.0.0 7fde464 which is from Jul 1, 2022. I can try to repeat with the latest code on master and see if it fixes it... one second....

@TafThorne
Copy link

TafThorne commented Jul 20, 2023

It was more than a second but I can confirm that by using the latest changes on main things worked correctly for me.

go install github.com/jstemmer/go-junit-report/v2@7933520f1e28509e5ba6e95a6c3157d53c223561 && go install github.com/t-yuki/gocover-cobertura@latest�[0;m
go: downloading github.com/jstemmer/go-junit-report/v2 v2.0.1-0.20230321231245-7933520f1e28
go: downloading github.com/t-yuki/gocover-cobertura v0.0.0-20180217150009-aaee18c8195c

The above are the details of the changeset that worked for me.

Is there a plan to release some kind of 2.0.1 set of bug fixes or similar in the near future or should I wait for the 2.1.0 release I can see pencilled in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants