Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Taxonomy Segments in FPD #6057

Closed
sfrancolla opened this issue Dec 1, 2020 · 51 comments
Closed

Proposal for Taxonomy Segments in FPD #6057

sfrancolla opened this issue Dec 1, 2020 · 51 comments
Labels
pinned won't be closed by stalebot taxonomy

Comments

@sfrancolla
Copy link
Contributor

sfrancolla commented Dec 1, 2020

Type of issue

Improvement

Description

This is a request for comment prior to next meeting (Dec 8).

The Taxonomy Taskforce has been presented with the below proposal from the taskforce's Prebid-IAB Tech Lab subcommittee. The subcommittee was created to zero-in on an agreeable structure for context and audience taxonomy segments and bring it back to the larger group.

Objectives were to support publisher, proprietary, and standardized segments with a simple structure that avoids significant repetitive text.

Additional Background

Proposal

With:

  • ext.segtax
    • [Optional] The ID for a taxonomy that is centrally registered. At present, it's assumed that the IAB would manage the registrations. Example taxonomy registry is hashed out underneath the json.
  • segment[].id
    • [Required] The ID of the segment that may or may not have a segtax affiliation. I.e. it’s fully proprietary when no ext.segtax is provided.

The proposed structure:

  "user": {
      "data": [
        {
          "name": "www.dataprovider1.com",
          "ext": {
            "segtax": 3
          },
          "segment": [
            { "id": "1001" }, 
            { "id": "1002" }
          ]
        },
        {
          "name": "www.dataprovider1.com",
          "ext": {
            "segtax": 501
          },
          "segment": [
            { "id": "123" }, 
            { "id": "456" }
          ]
        }
      ]
    },
"site": {
  "content": {
      "data": [
        {
          "name": "www.dataprovider1.com",
          "ext": {
            "segtax": 2
          },
          "segment": [
            { "id": "2001" }, 
            { "id": "2002" }
          ]
        },
        {
          "name": "www.dataprovider1.com",
          "ext": {
            "segtax": 502
          },
          "segment": [
            { "id": "123" }, 
            { "id": "456" }
          ]
        }
      ]
    }
}

Example Taxonomy Registry (Managed by IAB Tech Lab):

Taxonomy ID Taxonomy Type Version Description Link
1 Content 2.1 IAB - Content Taxonomy version 2.1 iab-content-2.1
2 Content 2.2 IAB - Content Taxonomy new version 2.2 iab-content-2.2
3 Audience 1.1 IAB - Audience iab-audience-1.1
501 Audience 1.0 Custom Audience custom-aud-1.0
502 Content 1.0 Custom Content custom-cont-1.0

*Note - numbers from 1-500 are reserved for standard taxonomy and 501 onwards can be used for custom / community agreed taxonomies.

@bretg
Copy link
Collaborator

bretg commented Dec 1, 2020

Thanks @sfrancolla - I fixed "context.data"" to "site.content.data"

@stale
Copy link

stale bot commented Dec 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 25, 2020
@bretg bretg added pinned won't be closed by stalebot and removed stale labels Jan 7, 2021
@jdwieland8282
Copy link
Member

I don't have a strong opinion about the data model, but I do have a strong opinion about using custom taxonomies. Magnite is probably going to pick one to support. We will map all the segments in the taxonomy we pick into our platform so that when we observe one in an ad request our exchange knows what to do with it.

@antlauzon
Copy link
Contributor

Having a default standard taxonomy (that we decide upon in the committee) while also allowing the flexibility to specify some type of taxonomy id for backends that want to support multiple types of taxonomies is something that I think would make sense. This would make segment ids look something like user ids, where we have a type/id binding in Prebid that the modules can interact with.

@patmmccann
Copy link
Collaborator

@sfrancolla can we add any doc on taxonomyname or other parts of this spec to https://docs.prebid.org/features/firstPartyData.html and close this out?

@patmmccann
Copy link
Collaborator

@amitshetty
Copy link

Question for this group in the context of the OpenRTB 2.6 discussions. If this is brand new, should this be an extension or be proposed as a change to 2.6?
If new, and if things are stable / well defined, we might want to consider an actual change to the object model rather than go down the extensions path.
Just putting that option out there.

@patmmccann
Copy link
Collaborator

patmmccann commented Apr 9, 2021

Notes from discussion for consideration:

Change taxonomyname -> segtax
Keep segtax in ext
IAB Audience Taxonomy -> iab_aud (or similar short code)
IAB Content Taxonomy -> iab_cat (or similar short code)
segtax version in the ext?
Convert segtax from string to an enum? - would require people to do a pr to add their number to the enum list

@sfrancolla
Copy link
Contributor Author

In response to @patmmccann's recommendation:

  • I have updated the ext.taxonomyname -> ext.segtax change in the issue.

Did we agree to the short codes at this time?
If yes, I will update that as well.

Conversion from string to enum to be discussed.

@abhinavsinha001
Copy link

I am envisioning this kind of structure and enumeration:

"data": [
        {
          "name": "www.dataprovider1.com",
          "ext": { "segtax": "1" },
          "segment": [
            { "id": "1001" }, 
            { "id": "1002" }
          ]
        },
        {
          "name": "www.dataprovider2.com",
          "ext": { "segtax": "501" },
          "segment": [
            { "id": "123" }, 
            { "id": "456" }
          ]
        }
      ]
    }
Taxonomy ID Taxonomy Type Version Description Link
1 Content 2.1 IAB - Content Taxonomy version 2.1 iab-content-2.1
2 Content 2.2 IAB Content Taxonomy new version 2.2 iab-content-2.2
3 Audience 1.1 IAB - Audience iab-audience-1.1
501 Audience 1.0 Custom custom-aud-1.0

*Note - numbers from 1-500 is reserved for standard taxonomy 501 onwards can be used for custom / community agreed taxonomies.

@jdwieland8282
Copy link
Member

Thanks @sfrancolla for updating the example at the top. One last question, custax is optional right, it can be omitted if a publisher decides its not relevant. In which case, the only change RTD maintainers and pubs (who have implemented this already) is to change user.data.ext.taxonomyname to user.data.ext.segtax

@abhinavsinha001
Copy link

I don't think different keys for conveying same information is a good design/standard approach. Causes parsing overhead and requires special logic.

In short we should not have segtax and custax represent taxonomy name. Also this approach of custom taxonomy can have conflicts and collision.

If I understand the intention correctly , it is to allow unrestricted way of defining and using taxonomies without the need for IAB doc update.If that is the case I would propose using Taxonomy provider (taxprovider) to be used and only requires provider ID update in IAB documentation.

If we want to do away with that overhead of updating provider ID as well - we can use domain as taxonomy provider iab.com and prebid.org as taxprovider

"data": [
        {
          "name": "www.dataprovider1.com",
          "ext": { "segtax": "1" , "taxprovider":"1" },
          "segment": [
            { "id": "1001" }, 
            { "id": "1002" }
          ]
        },
        {
          "name": "www.dataprovider2.com",
          "ext": { "segtax": "1" , "taxprovider":"2" },
          "segment": [
            { "id": "123" }, 
            { "id": "456" }
          ]
        }
      ]
    }
Taxonomy ID Taxonomy Type Version Description Link
1 Content 2.1 IAB - Content Taxonomy version 2.1 iab-content-2.1
2 Content 2.2 IAB Content Taxonomy new version 2.2 iab-content-2.2
3 Audience 1.1 IAB - Audience iab-audience-1.1
Taxonomy Provider ID Provider Name Taxonomy Location
1 IAB https://iabtechlab.com/wp-content/uploads/
2 Prebid https://prebid.org/custom-taxonomy/
3 New Provider https://newprovider.com/taxonomy/

@antlauzon
Copy link
Contributor

Agree @abhinavsinha001. There will be scenarios where non-IAB taxonomies are utilized and being able to specify the provider is important in order to avoid potential id range conflicts.

@sfrancolla
Copy link
Contributor Author

Quick note that the proposal has been updated in full.

This is open to a 2-week comment period, having started from Tues, April 13.

Comments will be reviewed as they come in with the aim of a final discussion taking place at the next Taxonomy Taskforce meeting on Tues, April 27.

Please continue the discussion here.

Thanks!

@simontrasler
Copy link

I will add to @abhinavsinha001's objection to adding custtax. The point of making the taxonomy an enumeration is to save space over the wire. Adding a string field means 1. potential for the two fields to disagree and 2. the string is the path of least resistance. It seems the string has been added back on the assumption that there will be some problem governing the enumerated list.

It would be vastly better to tackle governance. E.g., if this is a Prebid-owned extension, then Prebid can decide to govern the list -- if a simple/fast approach is deemed sufficient and satisfactory, so be it.

In addition, segtax and taxprovider should be typed as integers/enums, not strings, for the same reason as above -- allowing arbitrary strings means the enumerability can/will be ignored. (Anybody coding ahead of the spec being final does so at their own risk.)

Finally, if it's necessary to break out taxprovider from segtax, then why not go all the way and have separate fields for the provider, type, and version? (To follow OpenRTB, which I would encourage, the single enumerated field should be sufficient.)

@patmmccann
Copy link
Collaborator

patmmccann commented Apr 16, 2021

I do not follow the case for taxprovider; it appears completely redundant with other information.

Agreed on segtax being an integer.

The case for custtax is largely around concerns on the gatekeeping of the enum 5xx + list, particularly by a body that publishes one of the competing taxonomies. @simontrasler are you suggesting it is possible for Prebid to be that gatekeeper? I think that would alleviate those concerns.

@simontrasler
Copy link

are you suggesting it is possible for Prebid to be that gatekeeper?

Yes. I don't see why that would be a problem, and I'd certainly choose that over propagating strings.

@abhinavsinha001
Copy link

I do not follow the case for taxprovider; it appears completely redundant with other information.

@patmmccann Introducing taxonomy provider is a middle ground where instead of updating each taxonomy version/updates and ID in central doc(IAB governed) there can be consortium/ providers (Prebid) who can register once (providing taxonomy hosting location) and manage/create/update taxonomies & ids. This will reduce overall governance overhead.

@antlauzon
Copy link
Contributor

antlauzon commented Apr 18, 2021

@abhinavsinha001 do you foresee taxprovider always being the same as the data provider, just a numerical id? @patmmccann i can imagine that data providers may under some circumstances want to reference a taxonomy from a different provider than themselves. also, having a range of taxonomy identifiers [1..n] that is unique for each taxonomy provider makes sense to me. it minimizes potential collisions and red-tape around what segtax id ranges mean what and what taxonomy provider gets which id ranges allocated to them by who. the default segtax range in the absence of taxprovider could just be iab. if taxprovider happens to be present the overhead byte-wise is negligible.

@abhinavsinha001
Copy link

abhinavsinha001 commented Apr 18, 2021

@abhinavsinha001 do you foresee taxprovider always being the same as the data provider, just a numerical id?

@anthonylauzon No I do not think that is the case and ideally we should not have that level of fragmentation where everyone defines their own taxonomy. Defeats the purpose of "standard".

@simontrasler
Copy link

I don't understand why this "testing" scenario can't simply use a test range of IDs arranged between the parties involved in the test.

However, for the sake of moving on, can we just clarify in documentation please that taxprovider usage is intended to be for testing only? Or that it is only meaningful when the top-level flag test is set? Otherwise it'll be used all the time in normal operations, and defeat the point of the enumeration.

@antlauzon
Copy link
Contributor

antlauzon commented Apr 23, 2021

I don't understand why this "testing" scenario can't simply use a test range of IDs arranged between the parties involved in the test.

However, for the sake of moving on, can we just clarify in documentation please that taxprovider usage is intended to be for testing only? Or that it is only meaningful when the top-level flag test is set? Otherwise it'll be used all the time in normal operations, and defeat the point of the enumeration.

Testing is one of potentially many reasons why someone might want to differentiate their private taxonomies from the public, open taxonomies. The 'taxprovider' field opens up flexibility and allows for organizations to be creative around its use cases while being optional and not introducing any byte-level overhead on the wire universally. We're suggesting that 'taxprovider' simply mean 'taxonomy sourced from this provider as opposed to the central repo' and leave use case details up to those who choose to utilize it. It's a small documentation change for a field that could represent a lot of value if it is standardized upon between orgs.

A decent analogy to think about here is phone numbers. If someone dials a 7 digit phone number, it will connect with the default local area code. That capability shouldn't prevent people from potentially specifying an area code though, which is what the single optional field of 'taxprovider' allows for.

@sfrancolla
Copy link
Contributor Author

sfrancolla commented Apr 27, 2021

Today is the end of the 2 week comment period. Thank you to all that joined the Taxonomy Taskforce meeting today. We have concluded with the following way forward.

Group Decision =

  1. We commit to segtax and the enum list.
  2. The list may be managed by IAB Tech Lab as outlined by @bjd326 in his comment here.
  3. We will create a new, separate issue to allow a continued discussion about private taxonomy support with differentiation made declarable by way of, for example, a custtax or providerid approach.

This issue description/proposal has been updated.

Thank you!

@antlauzon
Copy link
Contributor

antlauzon commented Apr 27, 2021

Spoke with @bjd326 about 'custtax' and 'taxprovider' after today's taxonomy meeting. A lot was clarified, and I now fully support vanilla segtax since it seems to provide for all the possible use cases being discussed around 'custtax' and 'taxprovider' and detailed below:

  • segtax ranges will be able to be given to any party who requests them.
  • organizations can reserve specific segtax id sub-ranges from what they are allotted for private taxonomy and testing purposes.
  • segment id meanings and taxonomy definitions are handled separately to the taxonomy id range allotted, and can be pushed out publicly or passed between third parties privately.

Here's a quick example of what this might look like, assuming taxonomy ids 1893-2021 have been given to a company called 'AdBuzz', who might be a data provider, an SSP, publisher, DSP or other organization.

Taxonomy ID Taxonomy Type Version Description Link
1893 Audience 1.0 AdBuzz Public Taxonomy 1 http://adbuzz.example/pubtax1.json
1894 Audience 1.8 AdBuzz Public Taxonomy 2 http://adbuzz.example/pubtax2.json
1895 Audience 1.5 AdBuzz Private Taxonomy 1 N/A
1896 Audience 17.5 AdBuzz Private Taxonomy 2 N/A
.... ... ... ... ...
2020 Audience 2.1 AdBuzz Test Taxonomy 1 N/A
2021 Audience 3.1 AdBuzz Test Taxonomy 2 N/A

Taxonomy composition would be public for 1893 and 1894 and publicly accessible via the links provided to the enum repository.

By contrast, composition & taxonomy specifics would be privately distributed for taxonomies 1895 and 1896 even though their enum value is publicly available. In other words, 1895 and 1896 have reserved public taxonomy ids but the private definition files such as privtax1.json and privtax2.json would be provided to other platforms directly or distributed through private APIs.

Similar to 1895 and 1896, 2020 and 2021 are examples of private, testing taxonomies and effectively placeholders that AdBuzz can use for one-offs and testing different shifts in their own private taxonomies. AdBuzz would be able to keep an arbitrary range within what they have been allotted for private/testing taxonomies, alleviating the need for 'custtax' or 'taxprovider'. The only real difference between a "testing" and "private" taxonomy would be how AdBuzz chooses to distribute the underlying definitions.

@antlauzon
Copy link
Contributor

antlauzon commented May 6, 2021

I want to raise an additional potential issue with the data provider name as it is currently included. Having data provider id be a name with a url introduces a significant byte overhead, which is one of the things that 'segtax' solves for by using numeric taxonomy identifiers. I suggest we replace the "name" field with "id" and use the IAB GVLID numeric identifier to differentiate between vendors. The numeric vendor identifier list is available here: https://iabeurope.eu/vendor-list-tcf-v2-0
)
Example (before):

{
  "name": "www.dataprovider1.com",
  "ext": { "segtax": "1" },
  "segment": [
    { "id": "1001" }, 
    { "id": "1002" }
  ]
}

Example (after):

{
  "id": "1970",
  "ext": { "segtax": "1" },
  "segment": [
    { "id": "1001" }, 
    { "id": "1002" }
  ]
}

@simontrasler
Copy link

FWIW the difference is that name is the existing OpenRTB field for this purpose -- the argument is with the IAB Tech Lab, not this group.

Note the spec is still wrong, it needs to show segtax as an integer field, not integer-as-string.

@antlauzon
Copy link
Contributor

antlauzon commented May 7, 2021

@simontrasler noted, although the OpenRTB spec allows for both "id" and "name", i figure it would make more sense from a byte conservation perspective to use "id" and fill it with GVLIDs.. i will leave that argument up to the SSPs/DSPs & just make sure it is mentioned here if there are any object size concerns

@jdwieland8282
Copy link
Member

edited example to align user.data.segment.id with ADCOM which treats id as a string instead of an int.

@amitshetty
Copy link

I just noticed that the enumeration for the IABTL audience taxonomy is set to 3 in the example above - that is actually the ad product taxonomy in the AdCOM enumeration. I'd suggest using 4 for the audience taxonomy (and we also need to add the audience taxonomy to the AdCOM enumeration - looks like we missed that)
I know this is just an example, but might be worth fixing.

@bretg
Copy link
Collaborator

bretg commented Jun 23, 2021

@amitshetty - where's the definitive list of 'segtax' values? We did our best to anticipate what they would be at https://docs.prebid.org/features/firstPartyData.html#segments-and-taxonomy

Some bid adapters are already coding to these segtax values, so we need to nail them down...

@amitshetty
Copy link

This is the definitive list (with the caveat that audience taxonomy needs to be added). https://github.com/InteractiveAdvertisingBureau/AdCOM/blob/master/AdCOM%20v1.0%20FINAL.md#list--category-taxonomies-

@amitshetty
Copy link

Also, I just realized there is another discrepancy. 1 should point to content taxonomy 1.0 and 2 to content taxonomy 2.0. Which ofcourse begs the question of whether we need separate entries for 2.1 and 2.2. Personally I think that a 1.x and 2.x reference is sufficient, but will discuss in the working group.

@bretg
Copy link
Collaborator

bretg commented Jun 23, 2021

Thanks Amit - but are you sure the table in https://github.com/InteractiveAdvertisingBureau/AdCOM/blob/master/AdCOM%20v1.0%20FINAL.md#list--category-taxonomies- is the one we want to consider the master for this segtax value?

My understanding is that it will be a mix of both content and user taxonomy IDs, so is the plan to add id 4 (audience) to this table? I'd rather just point right here than duplicate the values in docs.prebid.org.

@amitshetty
Copy link

I am currently confirming with the working group that we can add audience taxonomy too to that list, but I am pretty confident that is where we will land. Stay tuned - should wrap that up quickly.

@antlauzon
Copy link
Contributor

just wanted to add a note in support of changing the prebid iab audience segtax id to 4 in order to align with the audience taxonomy added to adcom

@bretg
Copy link
Collaborator

bretg commented Jun 24, 2021

doc updated. rubicon adapter updated.

@SyntaxNode
Copy link

*Note - numbers from 1-500 are reserved for standard taxonomy and 501 onwards can be used for custom / community agreed taxonomies.

To be clear, the IAB specification reserves 1-499 for standard taxonomy and 500 onwards for custom values.

@patmmccann
Copy link
Collaborator

The standard has been agreed upon in committee and in collaboration with the IAB ( eg InteractiveAdvertisingBureau/openrtb#65 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pinned won't be closed by stalebot taxonomy
Projects
None yet
Development

No branches or pull requests