Adding US sections #5

camrice · 2022-11-17T19:43:41Z

No description provided.

camrice · 2022-11-17T19:44:30Z

sections/section_util.go

@@ -0,0 +1,62 @@
+package sections


This is unused right now, but was an idea to try to simplify the parsing of the segments

camrice · 2022-11-17T19:45:05Z

sections/uspnat/uspnat.go

+	GPCSegment  USPNATGPCSegment
+}
+
+func initUSPNATCoreSegment(bs *util.BitStream) (USPNATCoreSegment, error) {


These kind of inits I'm looking to improve. I think they can be simplified.

camrice · 2022-11-17T19:46:03Z

This is a work in progress right now. I'm going to be adding more of the sections and tests and improving the structure.

camrice · 2022-11-17T19:46:24Z

util/bitstream.go

@@ -226,3 +256,22 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) {
 	}
 	return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil
 }
+
+func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) {


Will be adding test for this function

hhhjort · 2022-11-17T20:08:08Z

Do we want to consider some helper methods to interrogate these structures like IntRange has? Or are the structures simple enough that reading them straight from the struct is straightforward and it is not worth the effort of coding methods to pull them out? I haven't dived into the specs for these strings deep enough yet to have an answer here.

camrice · 2022-11-17T21:05:09Z

I think some helper functions would be good. Many of the fields are just Integers (usually parsed from two bits, though something like Version is 6 bits). The Bitfields are a bit different as they are arrays of integers where each entry represents a different value.

camrice · 2022-11-18T22:24:39Z

Added another pass at the decoding portion. Here I tried to simplify, where I'm trying to decode the bitstream in order but keeping the error as a reference so that it can stop parsing on the first error. This loses the field name in the error message however. Not sure if that is needed necessarily, but this at least makes the segments a bit cleaner. I will be adding more tests and sections next.

hhhjort · 2022-11-21T21:06:38Z

util/bitstream.go

@@ -275,3 +278,24 @@ func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) {

 	return result, nil
 }
+
+func (bs *BitStream) ReadByteSize(size int, err error) (byte, error) {


Is there a reason to use ReadByteSize(N, err) rather than just ReadByteN()? And why pass an error in just to pass it back out without doing anything? (That might be the reason for ReadByteSize())

I initially used this function to try to clean up the setting of all the segments. However, I ended up removing this function now and just using the bitstream functions directly.

hhhjort · 2022-11-23T15:40:49Z

When you encounter an error in one of the Read functions, you just pass the error back up as is, without adding any context. This will make it difficult to debug the error messages generated. See the main GPP header parser, where each level adds context to the error being returned up the chain. Consider adding context so the message better indicates where the problem lies, and what was attempted to be read.

hhhjort · 2022-11-23T16:36:09Z

util/bitstream.go

@@ -226,3 +256,46 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) {
 	}
 	return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil
 }
+
+func (bs *BitStream) ReadTwoBitField(numFields int, err error) ([]byte, error) {


We may want to define the return value as its own type, so we can attach a helper function to it to decompose the two bits into the two booleans that those two bits correspond to. Might also allow us to label the indices in some way so that the user of the library doesn't have to reference the original spec too closely.

I'm still looking into changing this. The current implementations use an array of booleans (for 1 bit values). I would like a better way to reference which values are used besides the array directly.

Referring to the two booleans, is that needed for this approach? The spec for these fields is referring to the 2 bit value representing either 0, 1 or 2. Would it need to be parsed as two separate booleans? For example, SensitiveDataProcessing from here:
https://github.com/InteractiveAdvertisingBureau/Global-Privacy-Platform/blob/main/Sections/US-States/CA/GPP%20Extension:%20IAB%20Privacy%E2%80%99s%20California%20Privacy%20Technical%20Specification.md

camrice · 2022-12-01T01:08:52Z

Updated the error handling to return the field name that it fails on. This makes the section_util.go longer but I think it'll be needed anyways for debugging purposes.

hhhjort · 2022-12-01T19:39:08Z

util/bitstream.go

@@ -226,3 +256,22 @@ func ParseUInt16(data []byte, bitStartIndex uint16) (uint16, error) {
 	}
 	return binary.BigEndian.Uint16([]byte{leftByte, rightByte}), nil
 }
+
+func (bs *BitStream) ReadTwoBitField(numFields int) ([]byte, error) {
+	result := []byte{}


It would be better to make a byte array of the desired size. We know at the start how big this is going to be, and starting like this means append() will re reallocating memory as it expands the size of the array.

result := make([]byte, 0, numFields)

This will give you an empty slice with room to expand to numFields without triggering a new malloc when you append.

Ah good point! I updated this to initialize with numFields.

hhhjort · 2022-12-01T19:40:06Z

util/bitstream.go

+	}
+
+	maxFields := numFields * 2
+	for i := 0; i < maxFields; i += 2 {


I think maxFields was an idea you abandoned. Just use numFields and count by 1.

maxFields usage removed and moved to using numFields and incrementing by 1.

hhhjort · 2022-12-01T19:41:00Z

util/bitstream_test.go

@@ -16,6 +17,29 @@ type testDefinition struct {
 	value  uint64 // The value we expect the function to return (64 bit to allow for future functions that extract larger ints)
 }

+var test2Bits = []testDefinition{


My error PR revamps how we are doing this testing. You may want to look at that and alter this to match.

Rebased my PR to add your test changes and updated my added tests to better match the structure

hhhjort · 2022-12-06T14:47:53Z

sections/section_util.go

+	return commonUSGPC, nil
+}
+
+func NewUSPCACoreSegment(bs *util.BitStream) (USPCACoreSegment, error) {


I think we may want to move the non-common initializers into their corresponding packages.

Moved these to their respective sections.

hhhjort · 2022-12-06T14:52:25Z

sections/section_util.go

+	var commonUSGPC CommonUSGPCSegment
+	var err error
+
+	commonUSGPC.Gpc, err = bs.ReadByte1()


It looks like the first two bits are for subsection type in the spec.

I missed that the spec was updated. Fixed this to match the new GPC structure.

hhhjort · 2022-12-06T15:00:01Z

sections/uspco/uspco.go

+type USPCO struct {
+	SectionID   constants.SectionID
+	Value       string
+	CoreSegment sections.CommonUSCoreSegment


there is also the GPC subsection.

Added GPC section

hhhjort · 2022-12-06T15:01:37Z

sections/uspct/uspct.go

+type USPCT struct {
+	SectionID   constants.SectionID
+	Value       string
+	CoreSegment sections.CommonUSCoreSegment


There is also the GPC subsection

Added GPC section

hhhjort · 2022-12-06T15:12:19Z

sections/section_util.go

+		if err != nil {
+			return commonUSCore, errorHelper("CoreSegment.KnownChildSensitiveDataConsentsArr", err)
+		}
+	} else {


Isn't this else the same as commonUSCore.KnownChildSensitiveDataConsentsArr, err = bs.ReadTwoBitField(1). It might be clearer to use 1 rather than zero so the reader knows there is one field/result in there.

I tried to simplify this by just making it an array. There are sections that use it as an array and some that use it as an int. I thought if it's kept an array, the caller can use the single entry as the int value. Let me know if this is a better approach or if I should keep the values separated.

It ends up being an array in either case, as the struct defines it as an array. That is what I mean by it being "the same", the exact same values are stored in the same fields, just a different code path that leads to the same result.

Gotcha! I think in that case, it seems more straight forward to have it just be an array. If the caller needs to grab the int value, it can just be accessed in the array. Otherwise, this keeps this block more clear I think.

hhhjort · 2022-12-08T16:43:53Z

sections/uspco/uspco_test.go

+					MspaServiceProviderMode:         1,
+				},
+				GPCSegment: sections.CommonUSGPCSegment{
+					SubsectionType: 0,


SubsectionType should be a 1

hhhjort · 2022-12-08T16:45:21Z

sections/section_util.go

 	if err != nil {
-		return usputCore, errorHelper("CoreSegment.SaleOptOutNotice", err)
+		return commonUSGPC, ErrorHelper("GPCSegment.SubsectionType", err)
 	}



Should we also check that SubsectionType is 1? According to the spec it seems to be the only allowed value.

Should this be an error or should we just skip setting the GPC boolean (setting it to false)?

It would be nice to return some sort of error, so the upstream consumer can decide if they trust it or not. One weakness of this thing is that nearly any random sequence can be resolved into a structure, as long as it is long enough. We always ignore any extra bits after we have pulled all the fields. This is one place where a corrupted string could actually be detected.

Perhaps we should define a couple of error types in an error package, so we can return "Invalid GPC" and "Unused bytes" errors upstream that can be detected as instances of those error types. Upstream can then decided if it is worried about hitting those states.

hhhjort · 2022-12-08T16:45:40Z

sections/uspct/uspct_test.go

 						0, 2, 1,
 					},
 					MspaCoveredTransaction:  2,
 					MspaOptOutOptionMode:    1,
 					MspaServiceProviderMode: 1,
 				},
+				GPCSegment: sections.CommonUSGPCSegment{
+					SubsectionType: 0,


Subsection should be 1

* Updated added bitstream tests to use new structure * Updated sections to match spec additions (new GPC form, field names) * Moved non-common US sections to their own package

camrice · 2022-12-12T20:17:47Z

In regards to the GPC section, should that information be coming from the user-agent headers rather than the string itself?

hhhjort · 2022-12-13T15:19:26Z

In regards to the GPC section, should that information be coming from the user-agent headers rather than the string itself?

We cannot count on the header information existing. In server to server models in particular, headers can be lost. My read was that when talking to the CMP, the header should be there. It then gets saved as part of the GPP string in case the header information gets lost. Also, since only a CMP should create/edit a GPP string, we should not try to modify it. I think this should include adding info to the decoded copy of the string.

camrice · 2022-12-19T23:30:24Z

Rebase this PR and fixed the GPC usage. I realized that it is also expected to be available in a sub section in the section string, separated by a . character. I based the new behavior off of the TypeScript implementation here:
https://github.com/IABTechLab/iabgpp-es/blob/master/modules/cmpapi/test/encoder/section/UspCaV1.test.ts

Essentially, GPC should have a section type of 1 and then a section.GPC value of what's in the string. A default GPC object is used if the string is not available.

hhhjort · 2022-12-20T14:17:05Z

sections/section_util.go

+
+	// GPC segment subsection type can only be 1
+	// read 2 bits from the bitstream to ensure proper formatting
+	_, err = bs.ReadByte2()


Shouldn't we be checking that the value is actually 1? Otherwise things are looking good.

Updated this block here to check if the subsection type is 1. If it is not 1, then we return an error.

hhhjort

LGTM

camrice commented Nov 17, 2022

View reviewed changes

hhhjort reviewed Nov 21, 2022

View reviewed changes

hhhjort reviewed Nov 23, 2022

View reviewed changes

camrice changed the title ~~WIP Initial pass at adding US sections~~ Adding US sections Dec 1, 2022

hhhjort reviewed Dec 1, 2022

View reviewed changes

hhhjort reviewed Dec 6, 2022

View reviewed changes

hhhjort reviewed Dec 8, 2022

View reviewed changes

camrice and others added 7 commits December 12, 2022 14:52

WIP Initial pass at adding US sections

c516c2c

Adding helper functions to try to simplify the section parsing

c3a88fe

Added section_util and added more tests

8e134a0

Updates from code review

b185eab

Fixing ReadTwoBitField function

44a3a47

Updates from code review comments

98bf1ad

* Updated added bitstream tests to use new structure * Updated sections to match spec additions (new GPC form, field names) * Moved non-common US sections to their own package

Adding helper functions to try to simplify the section parsing

2818ab0

camrice added 3 commits December 12, 2022 14:56

Added section_util and added more tests

05cfc37

Updates from code review

edf2fc7

Fixing merge issues

5ec2a0a

Updating GPC segment usage

da5eabb

camrice force-pushed the add-us-sections branch from 2261a4e to da5eabb Compare December 19, 2022 23:26

hhhjort reviewed Dec 20, 2022

View reviewed changes

Adding check for GPC subsection type

759de0d

hhhjort approved these changes Dec 20, 2022

View reviewed changes

camrice merged commit f23d6d3 into prebid:main Dec 20, 2022

Adding US sections #5

Adding US sections #5

Conversation

camrice commented Nov 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camrice commented Nov 17, 2022 • edited Loading

Choose a reason for hiding this comment

hhhjort commented Nov 17, 2022

camrice commented Nov 17, 2022

camrice commented Nov 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hhhjort commented Nov 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camrice commented Dec 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camrice commented Dec 12, 2022

hhhjort commented Dec 13, 2022

camrice commented Dec 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hhhjort left a comment

Choose a reason for hiding this comment

camrice commented Nov 17, 2022 •

edited

Loading