Types of user ids #5775

jdwieland8282 · 2020-09-21T19:48:53Z

Type of issue

Question

Description

Do consumers of user ids set within either a new or existing userid module need to know more about how the UUID was generated? Or is the id itself sufficient. Would more DSPs integrate against a particular user id if they knew more about how it was generated?

We should consider a new attribute called "stype" (source type). Type would be passed along side the UUID to SSPs & DSPs.

Steps to reproduce

NA

Test page

NA

Expected results

pbjs.setConfig({
    userSync: {
        userIds: [{
            name: "publisherProvided",
            params: {
                eids: [{
                    source: "example.com",
                    atype: 1,
                    uids:[{
                      id: "value read from cookie or local storage",
                      ext: {
                        stype: "sha256email"
                      }
                  }]
                },{
                    source: "id-partner.com",
                    atype: 1,
                    uids:[{
                      id: "value read from cookie or local storage",
                      ext: {
                        stype: "ppuid"
                      }
                  }]
                }]
            }
        }]
    }
});

The text was updated successfully, but these errors were encountered:

smenzer · 2020-09-24T08:17:53Z

I would suggest NOT including any form of hashed email in the options for stype. a hashed email is like a fingerprint in that you can't "reset" it - once you have it, you will always have the same link to a set of user data, even if they've asked someone upstream to reset/clear their data. since there's no efficient way (today) to tell EVERY platform in the industry to wipe data for a user, the best way today is simply to generate a new user id. this is just like apple and android allowing you to reset your MAID. identity providers that base an ID off of a hashed email are fine since they can change the ID they generate if the user has asked to reset/opt out at some point.

jdwieland8282 · 2020-09-30T18:44:03Z

So far we have:

DMP - added by a 3rd party id provider like ID5, Liveramp, Lotame, etc..
PPUID - added by the publisher, the publisher can be identified in eids.source

dmdabbs · 2020-10-01T00:54:56Z

FWIW, in late July there was a discussion on https://openrtb-iabtechlab.slack.com/archives/C3Y6GHUTH/p1595433523254200 (TechLab Programmatic, general Slack). The use case similar to here, signaling a stable, publisher-generated UID:

As a DSP, I want a site-specific/publisher-provided ID, so to enable basic per-site frequency-capping at least, in the absence of cross-site identifier, though there are probably other uses. I want this to be an ID generated by the site, common to all traffic for that site, i.e. perhaps generated by the PubCommon prebid.js module or similar, but how it it gets made is outside of OpenRTB's scope I think.
...
FWIW, we accept eids with a "source" attribute of "pubcid.org" for that scenario.
...
re: eids, I think probably this should happen..
add an "agent type" for site-specific IDs. Somehow deal with that there won't be a "source", necessarily, if they self-generate them. "source" is defined currently as "Source or technology provider responsible for the set of included IDs. Expressed as a top-level domain."

The following was sketched but I don't recall seeing this discussion thread picked up again.

// Agent Type
// 0   A stable, publisher/site-provided identifier.
// ... etc from OpenRTB spec
//
"eids":[
{
   "source": "localhost",
   "uids": [
      { "id": "c4a4c843-2368-4b5e-b3b1-6ee4702b9ad6", "atype": 0 }
   ],
},   
...

I pinged the channel to see if there was more discussion on eids enhancements.

smenzer · 2020-10-01T06:02:43Z

to me, DMP is too generic, and also identifiable simply by looking in the source field. I'm not sure exactly what all the right values are, but I think it's important to get some ideas from the consumers of the IDs (i.e. DSPs) to make sure it's useful.

On the ID5 side for example, we provide a field we call linkType that we use to signal how we linked two 1p IDs together - through no link (i.e. it's a publisher-only ID), through our probabilistic algo, or via deterministic signals. This would let consumers of the ID know the strength of the cross-domain reconciliation and allow them to make decisions on it. Perhaps standardizing something along these lines would be useful for the DSPs?

joshuakoran · 2020-10-01T18:57:42Z

I agree that when describing IDs, it would be useful to distinguish among the various "dimensions" of IDs:

Describes Person or Device/App

Directly-Identifiable (e.g., email)
Pseudonymous (e.g., alphanumeric string)

Set/Link of IDs

Device graph
First-party sets?

Source type

Publisher or Brand (who from the consumer's point of view is also a publisher)
Vendor to publisher or brand (or their agents)

Actual source

Which domain of which organization generated/controls ID

Age of ID

Creation date
Last seen date

I would keep all the "dimensions" distinct from uses of ID

Preference management (e.g., opt-in/-out of personalization)
Engagement (or restrictions like frequency cap)
Measurement (distinct counting)

jdcauley · 2020-10-02T15:25:21Z

Re @dmdabbs note, I think this is what you might be referring to?

This is the spec we're currently using with OpenRTB, https://github.com/Advertising-ID-Consortium/IdentityLink-in-RTB

jdwieland8282 · 2020-10-14T19:13:59Z

@joshuakoran Just working my way through your list. I think the adcom "atype" field handles

Directly-Identifiable (e.g., email)
Pseudonymous (e.g., alphanumeric string)

-Can you further clarify what you meant by "Set/Link of IDs"? Do you expect this to be an array of other ids?
-Same question for "source type", what values do you expect here?
-"Actual source" would be the user id module name value, or "source: if different from name.

age of ID, seems easy enough.
Re dimensions, are you advocating a second array where uses of ID would be enumerated? or just point out that we shouldn't add these at all yet?

joshuakoran · 2020-10-14T19:37:51Z

@jdwieland8282

The "set/link of IDs" concept relates to sharing a common ID that maps one ID to other IDs (e.g., x-device link, link across two 1P domains such as required by first-party sets).

The source (or perhaps better name is "controller") of ID is due to permissions/permitted uses tied to ID.

The dimensions defining the ID (e.g., source/controller, type, time) are orthogonal to information tied to the ID (audience attributes/cohorts, restrictions against use for personalization, event-aggregates such as frequency counter, etc.)

jdwieland8282 · 2020-10-16T17:14:47Z

Hey @joshuakoran , mind adding a few example values for each that you think maybe relevant? I want to be sure clear on what you are suggesting.

joshuakoran · 2020-10-28T20:11:52Z

Sorry for the delay, finally coming up for air. Agree that adcom is the right model to improve upon.

As we think about reducing discrepancies and adopting cross-publisher common ID schemes, such as being discussed here and in IAB TL, it seems we can improve how we annotate the interoperable IDs being used to improve engagement, measurement and optimization.

The original question as I understood it was to provide enhanced standard descriptors (metadata) around user IDs + information associated with them, rather than the attribute data (e.g., interest taxonomies, demographic taxonomies, geo taxonomies) or event data (activity_type, optional value of activity such as a purchase transaction).

I think the broad classification of ID metadata can be classed into two buckets of better describing the what-ness of the ID and “provenance” of the ID.

WHAT concepts
Some IDs describe people/households (such as home address), others are describe web clients (like the alphanumeric strings stored in cookies). While privacy language calls the former “directly-identifiable” (to replace the more ambiguous term “PII”), when web activity is not associated directly-identifiable IDs privacy language calls these IDs, “pseudonymous IDs.”

Thus example one might be to define whether the ID is pseudonymous or not.

The second type of ID is one that merely links other IDs, such as a “cluster ID.” This ID is generated server-side to associate various IDs together, either probabilistically OR deterministically. Marketers often use this for “x-device” or even same device “x-app” use cases. When publishers operating different domains link their IDs deterministically they may wish to create a shared ID for their use, which is analogous to the proposed “first-party” sets.

Thus example two might to define whether the ID is deterministically associated with other IDs or not.

FROM WHERE “provenance” concepts
An orthogonal dimension to the ID we are discussing is its provenance. Which organization created it? Privacy regulations tend to call this the “data controller.”

When was it created? When was the last time it was verified as still active?

Syndicating “stale” IDs to be activated in a walled garden or across the Open Web is technically feasible, but not adding value to marketers. Yet most marketers do not have visibility on the age or last seen date of the data syndicated on their behalf to improve media buying.

Ensuring we know where IDs come from likely requires ensuring compact description and perhaps even signing the data.

USE concepts

I also recommend we keep the above annotations about IDs distinct from what processing operations are associated with them:

Preference management (e.g., opt-in/-out of personalization)
Engagement (or restrictions like frequency cap)
Measurement (distinct counting)
Audit (which ID was sent from which org to which other org, when, and what use restrictions were communicated)

Examples (purely for illustration and not in formal spec format or optimized for transport efficiency):
Zeta_Pseudonymous_ID=123, pseudonymous, created 20200915, last_event=20201025
Zeta_Pseudonymous_ID=234, pseudonymous, created 20201001, last_event=20201026
Zeta_Email_ID=pomacedon@gmail.com, directly-identifiable, created 20201001, last_event=20201027
Zeta_Household_ID=abc, pseudonymous, probabilistic_set {ZPID=123, ZPID=234), created=20201027

jdwieland8282 · 2020-11-03T18:53:51Z

Thanks @joshuakoran what you're describing is going to be tough to express in JSON in a way that makes sense to everyone, let me take a first stab and we can iterate. wrt providence, I feel like the source and stype values do a good job describing that, so I'm going to leave them out for now.

jdwieland8282 · 2020-11-03T19:06:26Z

how about something like this? Anything else to add?

   "ext":{
      "eids":[
         {
            "source":"sharedid.org",
            "uids":[
               {
                  "id":"d88c96-5cb6-410d-827d-b019e476",
                  "atype":1,
                  "ext":[
                     {
                        "stype":"ppuid", //ppuid,dmp,sha256email
                        "origin":"person", //person, household, browser, device, gaming console
                        "pseudonymous":TRUE, //boolean
                        "deterministic":FALSE, //boolean
                        "created":"1604429992", //UNIX timestamp
                        "lastseen":"1604430025", //UNIX timestamp
                        "signature":[
                           {
                              "signedby":"cryptoboi",
                              "signature":"cryptostring"
                           }
                        ]
                     }
                  ]
               }
            ]
         }
      ]
   }
}

abhinavsinha001 · 2020-11-04T19:17:44Z

I am assuming all these params and values have to be well defined for any consumer to make sense out of it. Wouldn't it be better if we map combination of origin , pseudoanonymous and deterministic to custom atype values and publish it. Would reduce payload as well as easy to extend without adding extra parameters.

jdwieland8282 · 2020-11-04T22:24:53Z

Hi @abhinavsinha001, I think you've raised a very good point, to be clear, I don't have a strong opinion yet about what this should look like, I'm channeling the Identity PMC. But to your point about well defined values you are exactly right. We need a way to ensure that creators don't declare there ID deterministic when it isn't. Wrt pseudoanonymous, all ids except email address is pseudoanonymous, and even email can be pseudoanonymous. So in my mind pseudoanonymous should go entirely.

The consumer in this scenario is a DSP.

As far as mapping pseudoanonymous and deterministic to a custom atype, atype isn't well understood or used. In theory that sounds like a good idea to me but in practice I'm not sure it would work. Thanks for your comments, what would be really helpful is a modified example. I don't want to be the only one doing the data modeling.

joshuakoran · 2020-11-04T23:03:02Z

Hi Jeff -

"Wrt pseudoanonymous, all ids except email address is pseudoanonymous"

I think that while many IDs we rely on may begin as "pseudonymous," I believe the regulations require organizations to have appropriate technical and/or operational measures in place to keep people's activity distinct from their offline identity (directly-identifiable ID, fkna PII) to be classed as "pseudonymous."

jdwieland8282 · 2020-11-04T23:15:52Z

sure, no disagreement from me on that pt.

smenzer · 2020-11-05T08:41:55Z

since the primary consumer here are DSPs, can we get some of them to weigh in on what they'd want to see and whether they want the granularity of separate fields or a single field like atype?

abhinavsinha001 · 2020-11-05T10:02:28Z

I agree we should get feedback from DSPs on this. I feel most of the parameters do not have any significance individually and can be represented broadly using atype values.

Sample request leveraging atype value

{
  "eids": [
    {
      "source": "sharedid.org",
      "uids": [
        {
          "id": "d88c96-5cb6-410d-827d-b019e476",
          "atype": 501,
          "ext": [
            {
              "created": "1604429992",
              "lastseen": "1604430025",
              "signature": [
                {
                  "signedby": "cryptoboi",
                  "signature": "cryptostring"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Here is how we can maintain metadata for atype and parameters that define a particular atype value.

Atype Metadata

Adtype	Description
1	An ID which is tied to a specific web browser or device (cookie-based, probabilistic, or other).
2	In-app impressions, which will typically contain a type of device ID (or rather, the privacy-compliant versions of device IDs).
3	A person-based ID, i.e., that is the same across devices.
500x	All the IDs gnerated by publishers (`stype:ppuid`)
501	`stype:ppuid`, `origin:browser`, `deterministic:true`, `method:login` , `scope:individual`, `duration:short`
501	`stype:ppuid`, `origin:browser`, `deterministic:true`, `method:localstore`,`scope:individual` , `duration:medium`
600x	All the Ids aquired from some DMP (`stype:dmp`)
601	`stype:dmp`, `origin:browser`, `deterministic:true`, `method:transaction` ,`scope:individual` ,`duration:long`
601	`stype:dmp`, `origin:browser`, `deterministic:true`,`method:transaction` ,`scope:household`,`duration:long`
700x	All idendifiers generated using some link like IP/Device (`stype:probabilistic`)
701	`stype: probabilistic`, `origin:gaming-console`, `deterministic:false` , `method:algo` ,`scope:household` ,`duration:short`

ID metadata Params

Parameter	Description
stype	Type of source which generated this ID
origin	Where this Id was generated / stored
deterministic	If the Id can be confidently tied to a browser/person
method	How the ID was aquired , login, using some transaction like purchase, algorythm or traditional sync
scope	Does this Id represent an individual / household
duration	The time this ID can typically last : short < 7 days , medium <30 days , long >30 days

abhinavsinha001 · 2020-11-05T18:18:17Z

Update: Just realized while on IAB-TL meeting - most of the fields and data are part of Data Transparency Standard 1.0 and there is an active discussion to map these fields to oRTB User object - we can use the same standards for eids type as well.

jdwieland8282 · 2020-11-09T14:41:07Z

ok, so sounds like we have something that describes the type of user id in the atype field and it's just a matter of defining how we want to support the atype designation:

created
last seen
signed

I'd like to pause here, now that we have some firmer requirements and wait for DSPs to weigh in. Any disagreement with that approach?

jdwieland8282 · 2020-11-11T18:40:55Z

@abhinavsinha001 I like your example. For anyone who missed the 11/11 Identity PMC meeting, we agreed to move forward with this feature. The group felt we should proactively provide some real time metadata about the id to buyers in preparation for a future state with diminished 3rd party cookie availability.

Each UserId module sub adapter will need to decide to support these fields. The PMC will define the standard. Are there any objections to @abhinavsinha001 data model? I'll cross post on our slack channel as well.

  "eids": [
    {
      "source": "sharedid.org",
      "uids": [
        {
          "id": "d88c96-5cb6-410d-827d-b019e476",
          "atype": 501,
          "ext": [
            {
              "created": "1604429992",
              "lastseen": "1604430025",
              "signature": [
                {
                  "signedby": "cryptoboi",
                  "signature": "cryptostring"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

joshuakoran · 2020-11-20T13:59:41Z

Just FYI PRAM is suggesting three types of Identifiers:

system-generated pseudonymous ID (e.g., cookie or MAID),
user-provided ID (e.g., hashed email) and
directly-identifiable identity (publisher-agnostic offline identity)

We can augment this by creation/last seen as described above + source (e.g., publisher, vendor, marketer), such that vendor=apple provides IDFA, and vendor=sharedid.org provides cookie ID.

smenzer · 2020-11-20T14:13:40Z

@joshuakoran I don't really understand the difference between 1. and 2. ... could you please explain a bit?

joshuakoran · 2020-11-20T15:01:05Z

Even if the output is a pseudonymous ID, the input mechanism has different friction/control for users.

The user has binary control of generating / resetting ID in 1), but limited technical control over how the ID can be shared across domains.

The user has 100% technical control of providing (different/same) ID to be shared across domains for 2). Once the ID in 2) is generated it has the same limits as 1), but the generation using different IDs (work email, home email as one example) is different than using the same laptop with same browser cookies at home and work.

stale · 2020-12-25T13:40:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gglas · 2021-05-10T15:39:14Z

@jdwieland8282 did we land on a solution here?

jdwieland8282 · 2021-05-10T16:04:19Z

This hasn't come up lately, my recollection is that we would use the atype field and leave it at that. If anyone else has a different recollection feel free to reopen and propose a standard.

jdwieland8282 linked a pull request Sep 21, 2020 that will close this issue

New PubProvided Id UserId Submodule #5767

Merged

9 tasks

jdwieland8282 mentioned this issue Sep 23, 2020

Add pubProvided to userId.md prebid/prebid.github.io#2357

Merged

bretg closed this as completed in #5767 Sep 29, 2020

jdwieland8282 reopened this Sep 30, 2020

stale bot added the stale label Dec 25, 2020

gglas removed the stale label Jan 13, 2021

gglas added feature pinned won't be closed by stalebot labels May 10, 2021

jdwieland8282 closed this as completed May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Types of user ids #5775

Types of user ids #5775

jdwieland8282 commented Sep 21, 2020 •

edited

Loading

smenzer commented Sep 24, 2020

jdwieland8282 commented Sep 30, 2020

dmdabbs commented Oct 1, 2020 •

edited

Loading

smenzer commented Oct 1, 2020

joshuakoran commented Oct 1, 2020

jdcauley commented Oct 2, 2020

jdwieland8282 commented Oct 14, 2020

joshuakoran commented Oct 14, 2020

jdwieland8282 commented Oct 16, 2020

joshuakoran commented Oct 28, 2020

jdwieland8282 commented Nov 3, 2020

jdwieland8282 commented Nov 3, 2020 •

edited

Loading

abhinavsinha001 commented Nov 4, 2020

jdwieland8282 commented Nov 4, 2020

joshuakoran commented Nov 4, 2020

jdwieland8282 commented Nov 4, 2020

smenzer commented Nov 5, 2020

abhinavsinha001 commented Nov 5, 2020

abhinavsinha001 commented Nov 5, 2020

jdwieland8282 commented Nov 9, 2020

jdwieland8282 commented Nov 11, 2020 •

edited

Loading

joshuakoran commented Nov 20, 2020

smenzer commented Nov 20, 2020

joshuakoran commented Nov 20, 2020

stale bot commented Dec 25, 2020

gglas commented May 10, 2021

jdwieland8282 commented May 10, 2021

Types of user ids #5775

Types of user ids #5775

Comments

jdwieland8282 commented Sep 21, 2020 • edited Loading

Type of issue

Description

Steps to reproduce

Test page

Expected results

smenzer commented Sep 24, 2020

jdwieland8282 commented Sep 30, 2020

dmdabbs commented Oct 1, 2020 • edited Loading

smenzer commented Oct 1, 2020

joshuakoran commented Oct 1, 2020

jdcauley commented Oct 2, 2020

jdwieland8282 commented Oct 14, 2020

joshuakoran commented Oct 14, 2020

jdwieland8282 commented Oct 16, 2020

joshuakoran commented Oct 28, 2020

jdwieland8282 commented Nov 3, 2020

jdwieland8282 commented Nov 3, 2020 • edited Loading

abhinavsinha001 commented Nov 4, 2020

jdwieland8282 commented Nov 4, 2020

joshuakoran commented Nov 4, 2020

jdwieland8282 commented Nov 4, 2020

smenzer commented Nov 5, 2020

abhinavsinha001 commented Nov 5, 2020

abhinavsinha001 commented Nov 5, 2020

jdwieland8282 commented Nov 9, 2020

jdwieland8282 commented Nov 11, 2020 • edited Loading

joshuakoran commented Nov 20, 2020

smenzer commented Nov 20, 2020

joshuakoran commented Nov 20, 2020

stale bot commented Dec 25, 2020

gglas commented May 10, 2021

jdwieland8282 commented May 10, 2021

jdwieland8282 commented Sep 21, 2020 •

edited

Loading

dmdabbs commented Oct 1, 2020 •

edited

Loading

jdwieland8282 commented Nov 3, 2020 •

edited

Loading

jdwieland8282 commented Nov 11, 2020 •

edited

Loading