Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding US sections #5

Merged
merged 12 commits into from
Dec 20, 2022
Merged

Adding US sections #5

merged 12 commits into from
Dec 20, 2022

Conversation

camrice
Copy link
Collaborator

@camrice camrice commented Nov 17, 2022

No description provided.

@@ -0,0 +1,62 @@
package sections
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused right now, but was an idea to try to simplify the parsing of the segments

GPCSegment USPNATGPCSegment
}

func initUSPNATCoreSegment(bs *util.BitStream) (USPNATCoreSegment, error) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kind of inits I'm looking to improve. I think they can be simplified.

@camrice
Copy link
Collaborator Author

camrice commented Nov 17, 2022

This is a work in progress right now. I'm going to be adding more of the sections and tests and improving the structure.

@@ -226,3 +256,22 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) {
}
return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil
}

func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be adding test for this function

@hhhjort
Copy link
Collaborator

hhhjort commented Nov 17, 2022

Do we want to consider some helper methods to interrogate these structures like IntRange has? Or are the structures simple enough that reading them straight from the struct is straightforward and it is not worth the effort of coding methods to pull them out? I haven't dived into the specs for these strings deep enough yet to have an answer here.

@camrice
Copy link
Collaborator Author

camrice commented Nov 17, 2022

I think some helper functions would be good. Many of the fields are just Integers (usually parsed from two bits, though something like Version is 6 bits). The Bitfields are a bit different as they are arrays of integers where each entry represents a different value.

@camrice
Copy link
Collaborator Author

camrice commented Nov 18, 2022

Added another pass at the decoding portion. Here I tried to simplify, where I'm trying to decode the bitstream in order but keeping the error as a reference so that it can stop parsing on the first error. This loses the field name in the error message however. Not sure if that is needed necessarily, but this at least makes the segments a bit cleaner. I will be adding more tests and sections next.

@@ -275,3 +278,24 @@ func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) {

return result, nil
}

func (bs *BitStream) ReadByteSize(size int, err error) (byte, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use ReadByteSize(N, err) rather than just ReadByteN()? And why pass an error in just to pass it back out without doing anything? (That might be the reason for ReadByteSize())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially used this function to try to clean up the setting of all the segments. However, I ended up removing this function now and just using the bitstream functions directly.

@hhhjort
Copy link
Collaborator

hhhjort commented Nov 23, 2022

When you encounter an error in one of the Read functions, you just pass the error back up as is, without adding any context. This will make it difficult to debug the error messages generated. See the main GPP header parser, where each level adds context to the error being returned up the chain. Consider adding context so the message better indicates where the problem lies, and what was attempted to be read.

@@ -226,3 +256,46 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) {
}
return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil
}

func (bs *BitStream) ReadTwoBitField(numFields int, err error) ([]byte, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to define the return value as its own type, so we can attach a helper function to it to decompose the two bits into the two booleans that those two bits correspond to. Might also allow us to label the indices in some way so that the user of the library doesn't have to reference the original spec too closely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still looking into changing this. The current implementations use an array of booleans (for 1 bit values). I would like a better way to reference which values are used besides the array directly.

Referring to the two booleans, is that needed for this approach? The spec for these fields is referring to the 2 bit value representing either 0, 1 or 2. Would it need to be parsed as two separate booleans? For example, SensitiveDataProcessing from here:
https://github.com/InteractiveAdvertisingBureau/Global-Privacy-Platform/blob/main/Sections/US-States/CA/GPP%20Extension:%20IAB%20Privacy%E2%80%99s%20California%20Privacy%20Technical%20Specification.md

@camrice camrice changed the title WIP Initial pass at adding US sections Adding US sections Dec 1, 2022
@camrice
Copy link
Collaborator Author

camrice commented Dec 1, 2022

Updated the error handling to return the field name that it fails on. This makes the section_util.go longer but I think it'll be needed anyways for debugging purposes.

@@ -226,3 +256,22 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) {
}
return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil
}

func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) {
result := []byte{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to make a byte array of the desired size. We know at the start how big this is going to be, and starting like this means append() will re reallocating memory as it expands the size of the array.

result := make([]byte, 0, numFields)

This will give you an empty slice with room to expand to numFields without triggering a new malloc when you append.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point! I updated this to initialize with numFields.

}

maxFields := numFields * 2
for i := 0; i < maxFields; i += 2 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maxFields was an idea you abandoned. Just use numFields and count by 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxFields usage removed and moved to using numFields and incrementing by 1.

@@ -16,6 +17,29 @@ type testDefinition struct {
value uint64 // The value we expect the function to return (64 bit to allow for future functions that extract larger ints)
}

var test2Bits = []testDefinition{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My error PR revamps how we are doing this testing. You may want to look at that and alter this to match.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased my PR to add your test changes and updated my added tests to better match the structure

return commonUSGPC, nil
}

func NewUSPCACoreSegment(bs *util.BitStream) (USPCACoreSegment, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to move the non-common initializers into their corresponding packages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these to their respective sections.

var commonUSGPC CommonUSGPCSegment
var err error

commonUSGPC.Gpc, err = bs.ReadByte1()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the first two bits are for subsection type in the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that the spec was updated. Fixed this to match the new GPC structure.

type USPCO struct {
SectionID constants.SectionID
Value string
CoreSegment sections.CommonUSCoreSegment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also the GPC subsection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added GPC section

type USPCT struct {
SectionID constants.SectionID
Value string
CoreSegment sections.CommonUSCoreSegment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the GPC subsection

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added GPC section

if err != nil {
return commonUSCore, errorHelper("CoreSegment.KnownChildSensitiveDataConsentsArr", err)
}
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this else the same as commonUSCore.KnownChildSensitiveDataConsentsArr, err = bs.ReadTwoBitField(1). It might be clearer to use 1 rather than zero so the reader knows there is one field/result in there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to simplify this by just making it an array. There are sections that use it as an array and some that use it as an int. I thought if it's kept an array, the caller can use the single entry as the int value. Let me know if this is a better approach or if I should keep the values separated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ends up being an array in either case, as the struct defines it as an array. That is what I mean by it being "the same", the exact same values are stored in the same fields, just a different code path that leads to the same result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! I think in that case, it seems more straight forward to have it just be an array. If the caller needs to grab the int value, it can just be accessed in the array. Otherwise, this keeps this block more clear I think.

MspaServiceProviderMode: 1,
},
GPCSegment: sections.CommonUSGPCSegment{
SubsectionType: 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubsectionType should be a 1

if err != nil {
return usputCore, errorHelper("CoreSegment.SaleOptOutNotice", err)
return commonUSGPC, ErrorHelper("GPCSegment.SubsectionType", err)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check that SubsectionType is 1? According to the spec it seems to be the only allowed value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error or should we just skip setting the GPC boolean (setting it to false)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to return some sort of error, so the upstream consumer can decide if they trust it or not. One weakness of this thing is that nearly any random sequence can be resolved into a structure, as long as it is long enough. We always ignore any extra bits after we have pulled all the fields. This is one place where a corrupted string could actually be detected.

Perhaps we should define a couple of error types in an error package, so we can return "Invalid GPC" and "Unused bytes" errors upstream that can be detected as instances of those error types. Upstream can then decided if it is worried about hitting those states.

0, 2, 1,
},
MspaCoveredTransaction: 2,
MspaOptOutOptionMode: 1,
MspaServiceProviderMode: 1,
},
GPCSegment: sections.CommonUSGPCSegment{
SubsectionType: 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsection should be 1

@camrice
Copy link
Collaborator Author

camrice commented Dec 12, 2022

In regards to the GPC section, should that information be coming from the user-agent headers rather than the string itself?

@hhhjort
Copy link
Collaborator

hhhjort commented Dec 13, 2022

In regards to the GPC section, should that information be coming from the user-agent headers rather than the string itself?

We cannot count on the header information existing. In server to server models in particular, headers can be lost. My read was that when talking to the CMP, the header should be there. It then gets saved as part of the GPP string in case the header information gets lost. Also, since only a CMP should create/edit a GPP string, we should not try to modify it. I think this should include adding info to the decoded copy of the string.

@camrice
Copy link
Collaborator Author

camrice commented Dec 19, 2022

Rebase this PR and fixed the GPC usage. I realized that it is also expected to be available in a sub section in the section string, separated by a . character. I based the new behavior off of the TypeScript implementation here:
https://github.com/IABTechLab/iabgpp-es/blob/master/modules/cmpapi/test/encoder/section/UspCaV1.test.ts

Essentially, GPC should have a section type of 1 and then a section.GPC value of what's in the string. A default GPC object is used if the string is not available.


// GPC segment subsection type can only be 1
// read 2 bits from the bitstream to ensure proper formatting
_, err = bs.ReadByte2()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be checking that the value is actually 1? Otherwise things are looking good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this block here to check if the subsection type is 1. If it is not 1, then we return an error.

Copy link
Collaborator

@hhhjort hhhjort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@camrice camrice merged commit f23d6d3 into prebid:main Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants