Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions for Extended Diagnostic Notation (RFC 8610 Appendix G) #386

Merged
merged 14 commits into from
May 14, 2023

Conversation

zensh
Copy link
Contributor

@zensh zensh commented Jan 26, 2023

Signed-off-by: Yan Qing txr1883@gmail.com

Description

PR Was Proposed and Welcomed in Currently Open Issue

Checklist (for code PR only, ignore for docs PR)

  • Include unit tests that cover the new code
  • Pass all unit tests
  • Pass all 18 ci linters (golint, gosec, staticcheck, etc.)
  • Sign each commit with your real name and email.
    Last line of each commit message should be in this format:
    Signed-off-by: Firstname Lastname firstname.lastname@example.com
  • Certify the Developer's Certificate of Origin 1.1
    (see next section).

Certify the Developer's Certificate of Origin 1.1

  • By marking this item as completed, I certify
    the Developer Certificate of Origin 1.1.
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@zensh zensh force-pushed the feature/diagnostic-notation branch from fad762f to f8033f8 Compare January 28, 2023 13:51
Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR. Really amazing work! 👍

I really like this PR because it:

  • implements both DN and EDN
  • includes RFC 8949 Diagnostic Examples as test cases
  • decouples decoding from diagnosis and avoids any performance hit

Given the size of this PR, I need to do another review but I have some initial thoughts to share.

API

Maybe we can make diagnostic notation API be more similar to decoding API. For example, Unmarshal() uses default decoding options and DecMode.Unmarshal() uses user defined decoding options. Maybe something like this:

type DiagMode interface {
	Diagnostics([]byte) (string, error)
}

// Readonly options for concurrent use
type diagMode struct {
	...
}

// User defined options used to create diagMode
type DiagOptions struct {
	...
}

func DiagOptions.DiagMode() (DiagMode, error) {
	...
}

// Diagnostics returns extended diagnostic notation (EDN) of CBOR data items using default diagnostic mode
func Diagnostics([]byte) (string, error) {
	return defaultDiagMode.Diagnose()
}

// Diagnostic returns extended diagnostic notation (EDN) of CBOR data items using dm mode
func (dm *diagMode) Diagnostics([]byte) (string, error) 

DiagOptions

  • We can include these options MaxNestedLevels, MaxArrayElements, and MaxMapPairs
  • We can change ByteStringEncoding from string to enum to reduce user error.

CBOR Sequences (RFC 8742)

  • We should probably avoid returning partial/incomplete EDN. For example, if the third CBOR data item in sequence has error, we can return the EDN only for the first two data items.

Thanks again for opening this amazing PR!

@zensh
Copy link
Contributor Author

zensh commented Feb 6, 2023

@fxamacker Updated. Do you have any other suggestions?

@fxamacker
Copy link
Owner

Hey @zensh Nice work! I need to work this weekend but I can take a closer look hopefully next weekend.

Thanks again for adding this really great feature!

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated PR!

I only had time to review some of the PR and I left some suggestions. I can resume reviewing PR this coming weekend.

BTW, thanks again for opening this PR, I would like to include it in v2.5.0 release. 👍

diagnose.go Outdated Show resolved Hide resolved
diagnose.go Outdated Show resolved Hide resolved
diagnose.go Outdated Show resolved Hide resolved
diagnose.go Outdated Show resolved Hide resolved
diagnose.go Outdated Show resolved Hide resolved
@zensh
Copy link
Contributor Author

zensh commented Feb 21, 2023

Good suggestions. 👍

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR! 👍

This is a partial review and I need to continue review next weekend. Last few weeks have been unusually busy for me.

diagnose.go Show resolved Hide resolved
@zensh zensh force-pushed the feature/diagnostic-notation branch from eda2717 to 363f256 Compare March 20, 2023 12:27
Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this PR! 👍 Reviewing this PR led to opening issue #399 and PR #400 because the Valid() function checks for well-formedness instead of validity. It was documented correctly but it needed to be improved.

In this PR, a check for validity is needed in two places to avoid possible runtime errors.

Reproducers:

func TestDiagnoseC201(t *testing.T) {
        data := []byte{0xc2, 0x01}
        cbor.Diagnose(data)
}

func TestDiagnoseC301(t *testing.T) {
        data := []byte{0xc3, 0x01}
        cbor.Diagnose(data)
}

diagnose.go Show resolved Hide resolved
diagnose.go Show resolved Hide resolved
zensh added 7 commits May 1, 2023 21:33
Signed-off-by: Yan Qing <txr1883@gmail.com>
Signed-off-by: Yan Qing <txr1883@gmail.com>
Signed-off-by: Yan Qing <txr1883@gmail.com>
Signed-off-by: Yan Qing <txr1883@gmail.com>
Signed-off-by: Yan Qing <txr1883@gmail.com>
…2. update from valid() to wellformed().

Signed-off-by: Yan Qing <txr1883@gmail.com>
@zensh zensh force-pushed the feature/diagnostic-notation branch from 363f256 to 72e9118 Compare May 1, 2023 14:19
@fxamacker fxamacker added the enhancement New feature or request label May 6, 2023
@fxamacker fxamacker added this to the v2.5.0 milestone May 6, 2023
Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience and updates! 🙏

I finished reviewing the non-test code. 😅 I had to reread sections of RFCs due to the nature of the PR and edge cases.

Some thoughts:

  • Special handling is needed for indef-length byte string & text string without chunks.
  • For now, we can just return error on invalid UTF-8 text strings instead of encoding to diagnostic notation. If needed, we can add option after v2.5.0 to print invalid UTF-8.
  • We can add DiagnoseFirst() to match recently added UnmarshalFirst().
  • We can remove CBORSequence option, because DiagnoseFirst() can be used instead.

Adding DiagnoseFirst() will make API symmetrical with newly added UnmarshalFirst(). And we can support diagnostic notation for CBOR data even if it has remaining bytes that is extraneous non-CBOR data.

diagnose.go Show resolved Hide resolved
}

if di.dm.byteStringEmbeddedCBOR {
di2 := newDiagnose(val, di.dm.decMode, di.dm)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to change wellformed() to receive a bool flag for CBOR sequence, because wellformed uses di.dm.cborSequence and we need to use di.dm.byteStringEmbeddedCBOR here for embedded CBOR sequence.

diagnose.go Outdated Show resolved Hide resolved
diagnose.go Outdated Show resolved Hide resolved
diagnose.go Outdated Show resolved Hide resolved
@zensh
Copy link
Contributor Author

zensh commented May 9, 2023

@fxamacker Thanks for your review.
All the suggestions have been processed, except CBORSequence option:

  1. Notating a CBOR sequence is simple and practical.
  2. Maybe we can remove CBORSequence option and use Diagnose(data []byte, withCBORSequence ...bool) instead.

Signed-off-by: Yan Qing <txr1883@gmail.com>
@fxamacker
Copy link
Owner

@fxamacker Thanks for your review. All the suggestions have been processed, except CBORSequence option:

Thanks so much! I'll take a closer look this weekend to see if we missed anything.

  1. Notating a CBOR sequence is simple and practical.
  2. Maybe we can remove CBORSequence option and use Diagnose(data []byte, withCBORSequence ...bool) instead.

Yeah, you are right, I agree with 1. 👍 Although users can call DiagnoseFirst() in a loop, it's more convenient to just call seqmode.Diagnose() on a CBOR sequence if it doesn't contain trailing non-CBOR data.

We can keep CBORSequence option as-is, so:

  • CBORSequence is not passed as function parameter, so API is consistent with other parts of codec.
  • CBORSequence is false by default, so Diagnose() matches Unmarshal() for ExtraneousDataError.
  • DiagnoseFirst() can be used in a loop by users who want to ignore extraneous non-CBOR data following single data item or following a sequence of data items.

If you agree, I think I just need to review tests this weekend and hopefully merge. 😄

@zensh
Copy link
Contributor Author

zensh commented May 9, 2023

Agreed

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks again for your contributions! 👍

Really appreciate your patience with multiple delays caused by special circumstances. 🙏

@fxamacker fxamacker merged commit 92d0a5a into fxamacker:master May 14, 2023
@zensh zensh deleted the feature/diagnostic-notation branch May 14, 2023 23:56
@fxamacker fxamacker changed the title Implements diagnostic notation (#384) Implement functions that return Extended Diagnostic Notation (RFC 8610 Appendix G) May 16, 2023
@fxamacker fxamacker changed the title Implement functions that return Extended Diagnostic Notation (RFC 8610 Appendix G) Add functions for Extended Diagnostic Notation (RFC 8610 Appendix G) May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants