-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding US sections #5
Conversation
@@ -0,0 +1,62 @@ | |||
package sections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused right now, but was an idea to try to simplify the parsing of the segments
sections/uspnat/uspnat.go
Outdated
GPCSegment USPNATGPCSegment | ||
} | ||
|
||
func initUSPNATCoreSegment(bs *util.BitStream) (USPNATCoreSegment, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These kind of inits I'm looking to improve. I think they can be simplified.
This is a work in progress right now. I'm going to be adding more of the sections and tests and improving the structure. |
@@ -226,3 +256,22 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) { | |||
} | |||
return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil | |||
} | |||
|
|||
func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be adding test for this function
Do we want to consider some helper methods to interrogate these structures like IntRange has? Or are the structures simple enough that reading them straight from the struct is straightforward and it is not worth the effort of coding methods to pull them out? I haven't dived into the specs for these strings deep enough yet to have an answer here. |
I think some helper functions would be good. Many of the fields are just Integers (usually parsed from two bits, though something like Version is 6 bits). The Bitfields are a bit different as they are arrays of integers where each entry represents a different value. |
Added another pass at the decoding portion. Here I tried to simplify, where I'm trying to decode the bitstream in order but keeping the error as a reference so that it can stop parsing on the first error. This loses the field name in the error message however. Not sure if that is needed necessarily, but this at least makes the segments a bit cleaner. I will be adding more tests and sections next. |
util/bitstream.go
Outdated
@@ -275,3 +278,24 @@ func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) { | |||
|
|||
return result, nil | |||
} | |||
|
|||
func (bs *BitStream) ReadByteSize(size int, err error) (byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use ReadByteSize(N, err)
rather than just ReadByteN()
? And why pass an error in just to pass it back out without doing anything? (That might be the reason for ReadByteSize()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially used this function to try to clean up the setting of all the segments. However, I ended up removing this function now and just using the bitstream functions directly.
When you encounter an error in one of the Read functions, you just pass the error back up as is, without adding any context. This will make it difficult to debug the error messages generated. See the main GPP header parser, where each level adds context to the error being returned up the chain. Consider adding context so the message better indicates where the problem lies, and what was attempted to be read. |
util/bitstream.go
Outdated
@@ -226,3 +256,46 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) { | |||
} | |||
return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil | |||
} | |||
|
|||
func (bs *BitStream) ReadTwoBitField(numFields int, err error) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to define the return value as its own type, so we can attach a helper function to it to decompose the two bits into the two booleans that those two bits correspond to. Might also allow us to label the indices in some way so that the user of the library doesn't have to reference the original spec too closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still looking into changing this. The current implementations use an array of booleans (for 1 bit values). I would like a better way to reference which values are used besides the array directly.
Referring to the two booleans, is that needed for this approach? The spec for these fields is referring to the 2 bit value representing either 0, 1 or 2. Would it need to be parsed as two separate booleans? For example, SensitiveDataProcessing
from here:
https://github.com/InteractiveAdvertisingBureau/Global-Privacy-Platform/blob/main/Sections/US-States/CA/GPP%20Extension:%20IAB%20Privacy%E2%80%99s%20California%20Privacy%20Technical%20Specification.md
Updated the error handling to return the field name that it fails on. This makes the |
util/bitstream.go
Outdated
@@ -226,3 +256,22 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) { | |||
} | |||
return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil | |||
} | |||
|
|||
func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) { | |||
result := []byte{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to make a byte array of the desired size. We know at the start how big this is going to be, and starting like this means append()
will re reallocating memory as it expands the size of the array.
result := make([]byte, 0, numFields)
This will give you an empty slice with room to expand to numFields
without triggering a new malloc when you append.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point! I updated this to initialize with numFields
.
util/bitstream.go
Outdated
} | ||
|
||
maxFields := numFields * 2 | ||
for i := 0; i < maxFields; i += 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maxFields
was an idea you abandoned. Just use numFields
and count by 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxFields
usage removed and moved to using numFields
and incrementing by 1.
util/bitstream_test.go
Outdated
@@ -16,6 +17,29 @@ type testDefinition struct { | |||
value uint64 // The value we expect the function to return (64 bit to allow for future functions that extract larger ints) | |||
} | |||
|
|||
var test2Bits = []testDefinition{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My error PR revamps how we are doing this testing. You may want to look at that and alter this to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased my PR to add your test changes and updated my added tests to better match the structure
sections/section_util.go
Outdated
return commonUSGPC, nil | ||
} | ||
|
||
func NewUSPCACoreSegment(bs *util.BitStream) (USPCACoreSegment, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to move the non-common initializers into their corresponding packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these to their respective sections.
sections/section_util.go
Outdated
var commonUSGPC CommonUSGPCSegment | ||
var err error | ||
|
||
commonUSGPC.Gpc, err = bs.ReadByte1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the first two bits are for subsection type in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that the spec was updated. Fixed this to match the new GPC structure.
type USPCO struct { | ||
SectionID constants.SectionID | ||
Value string | ||
CoreSegment sections.CommonUSCoreSegment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also the GPC subsection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added GPC section
type USPCT struct { | ||
SectionID constants.SectionID | ||
Value string | ||
CoreSegment sections.CommonUSCoreSegment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the GPC subsection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added GPC section
sections/section_util.go
Outdated
if err != nil { | ||
return commonUSCore, errorHelper("CoreSegment.KnownChildSensitiveDataConsentsArr", err) | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this else
the same as commonUSCore.KnownChildSensitiveDataConsentsArr, err = bs.ReadTwoBitField(1)
. It might be clearer to use 1 rather than zero so the reader knows there is one field/result in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to simplify this by just making it an array. There are sections that use it as an array and some that use it as an int. I thought if it's kept an array, the caller can use the single entry as the int value. Let me know if this is a better approach or if I should keep the values separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ends up being an array in either case, as the struct defines it as an array. That is what I mean by it being "the same", the exact same values are stored in the same fields, just a different code path that leads to the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! I think in that case, it seems more straight forward to have it just be an array. If the caller needs to grab the int value, it can just be accessed in the array. Otherwise, this keeps this block more clear I think.
sections/uspco/uspco_test.go
Outdated
MspaServiceProviderMode: 1, | ||
}, | ||
GPCSegment: sections.CommonUSGPCSegment{ | ||
SubsectionType: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SubsectionType should be a 1
if err != nil { | ||
return usputCore, errorHelper("CoreSegment.SaleOptOutNotice", err) | ||
return commonUSGPC, ErrorHelper("GPCSegment.SubsectionType", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check that SubsectionType is 1
? According to the spec it seems to be the only allowed value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an error or should we just skip setting the GPC boolean (setting it to false)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to return some sort of error, so the upstream consumer can decide if they trust it or not. One weakness of this thing is that nearly any random sequence can be resolved into a structure, as long as it is long enough. We always ignore any extra bits after we have pulled all the fields. This is one place where a corrupted string could actually be detected.
Perhaps we should define a couple of error types in an error package, so we can return "Invalid GPC" and "Unused bytes" errors upstream that can be detected as instances of those error types. Upstream can then decided if it is worried about hitting those states.
sections/uspct/uspct_test.go
Outdated
0, 2, 1, | ||
}, | ||
MspaCoveredTransaction: 2, | ||
MspaOptOutOptionMode: 1, | ||
MspaServiceProviderMode: 1, | ||
}, | ||
GPCSegment: sections.CommonUSGPCSegment{ | ||
SubsectionType: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subsection should be 1
* Updated added bitstream tests to use new structure * Updated sections to match spec additions (new GPC form, field names) * Moved non-common US sections to their own package
In regards to the GPC section, should that information be coming from the user-agent headers rather than the string itself? |
We cannot count on the header information existing. In server to server models in particular, headers can be lost. My read was that when talking to the CMP, the header should be there. It then gets saved as part of the GPP string in case the header information gets lost. Also, since only a CMP should create/edit a GPP string, we should not try to modify it. I think this should include adding info to the decoded copy of the string. |
2261a4e
to
da5eabb
Compare
Rebase this PR and fixed the GPC usage. I realized that it is also expected to be available in a sub section in the section string, separated by a Essentially, GPC should have a section type of 1 and then a |
sections/section_util.go
Outdated
|
||
// GPC segment subsection type can only be 1 | ||
// read 2 bits from the bitstream to ensure proper formatting | ||
_, err = bs.ReadByte2() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be checking that the value is actually 1? Otherwise things are looking good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this block here to check if the subsection type is 1. If it is not 1, then we return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.