Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refining the cookie-sync process #2173

Closed
bretg opened this issue Mar 4, 2022 · 16 comments
Closed

Refining the cookie-sync process #2173

bretg opened this issue Mar 4, 2022 · 16 comments

Comments

@bretg
Copy link
Contributor

bretg commented Mar 4, 2022

The /cookie_sync and /setuid endpoints have grown organically over the years with features added like GDPR, coop-sync, and filters. In addition, with the growth of the number of server-side bid adapters, we're aware that the uids cookie is regularly filling up to the 4KB cookie limit for some host companies.

This issue aims to take a step back and establish a holistic set of requirements for how the overall cookie sync process should work. It may be that both PBS-Go and PBS-Java conform pretty closely to this already, but there are certainly a few changes, particularly in relation to how /setuid deals with the full-cookie scenario.

It might make sense to combine this effort with #1985, but that would likely be a larger effort so it may make sense to consider them separately.

Note, placeholders have been added to cover 'multisync' issue #1986

Config

Several configurations were placed at the "coop-sync" level that really belong up at the parent cookie-sync level. The proposal is move them around to align better with usage.

Host-Level Config

cookie-sync.default-limit
cookie-sync.max-limit
cookie-sync.default-coop-sync // remove: this is a duplication of cookie-sync.coop-sync.default
host-cookie.max-cookie-size-bytes
cookie-sync.default-timeout-ms
cookie-sync.coop-sync.default
cookie-sync.coop-sync.pri // move to the cookie-sync level
cookie-sync.coop-sync.default-limit // move to the cookie-sync level
cookie-sync.coop-sync.max-limit // move to the cookie-sync level

Account-Level Config

cookie-sync.default-timeout-ms // overrides host level config of same name
cookie-sync.coop-sync.default // overrides host-level cookie-sync.default-coop-sync
cookie-sync.coop-sync.pri // move out of coop-sync -- override host config cookie-sync.pri
cookie-sync.coop-sync.default-limit // move out of coop-sync -- override host config cookie-sync.default-limit
cookie-sync.coop-sync.max-limit // move out of coop-sync -- override host config cookie-sync.max-limit
cookie-sync.multisync-bidders: ["bidderA"] // new config for multisync

/cookie_sync

/cookie_sync requirements

First, it's important to note that the unit of cookie-syncing is the "cookie family", not a bidder code. If the page requests syncing for a bidder code that's a hardcoded alias, what comes back from PBS would be the 'family', not that bidder code. e.g. a request for 'trustx' would cause a sync for 'grid'.

Here are the high level requirements for the whole process

  1. if the incoming request defines parameters, those take precedence
  2. try to fill the response with as many uids as possible
  3. enforce the uid expiration dates
  4. don't sync bidders that GDPR doesn't allow
  5. choose iframe or redirect syncs as directed
  6. don't over-fill the uids cookie
  7. if the cookie is full, consider replacing an important bidder's uid with one that's less important
  8. log metrics for GDPR blocking scenarios
  9. (multisync) if a bidder has a multi-sync compatible iframe sync and the publisher allows it, then keep calling that bidders iframe sync until it reports being done. Note that multisync bidders do consume a max-limit entry.
  10. if a bidder is rejected for any reason (e.g. GDPR, unknown bidder), it should be reflected back in the response. Note that a bidder without cookie-sync config should only respond with an error if it was specified on the incoming request.
  11. The cookie_sync limit only applies to actual sync URLs returned to the client. Any number of warnings/errors can be returned.
  12. All 'host-cookie' behavior remains the same as before... its not directly related to syncing. This is an alternate path for the PBS host company to read an existing ID cookie that's not the uids cookie.
  13. biddercode aliases resolve to the "cookie family". It's really these "families" that are synced, not biddercodes. For GDPR purposes, the biddercode also maps to a GVLID which must be checked when in-scope.
  14. The /cookie_sync POST body should support a 'debug' flag. When true, the response should contain any errors encountered, e.g. 'rejected by TCF', 'no sync config', etc. Debug defaults to false.
  15. /cookie_sync should re-sync the host cookie family if the values in uids and the actual host cookie ever get out of sync.
  16. Accept user.ext.prebid.buyeruids as an alternate source of values

/cookie_sync endpoint

Algorithm for /cookie_sync:

  1. Figure out which cookie-families need to be synced
    a) The 'bidders' parameter links to a 'cookie-family'. Aliased bidders may map to the same cookie-family as the source bidder. So the /cookie-sync endpoint needs to map 'bidders' to 'cookie-families' and de-dupe.
    b) if coop-sync is off, then the list of cookie-families is (A) those present on the request that aren't already in the uids cookie or are present but expired. (B) Any unexpired cookie-families in the cookie for which multisync is active.
    c) otherwise, if coop-sync is on, then add to the list of all cookie-families that supply one or more sync URLs but remove those already in the uids cookie and done with multisync
    d) we now have the list of all cookie-families that want to drop a pixel or iframe
    e) if any bidders present on the request are unknown, add the "unknown bidder" error to the response if the debug flag is true
    f) if any bidders present on the request don't have cookie-sync config, add the "no sync config" error to the response if the debug flag is true
    g) check to see if the host-cookie needs to be resynced. This is an old use-case that we discovered sometimes the host-cookie's entry in uids can get out of sync with the host-cookie itself. It's arguable that the host-cookie doesn't need to be in the uids cookie at all, but will push that feature to another issue. Just documenting what the current endpoint does.
    i) Find the host-cookie value from the host-cookie.family and host-cookie.cookie-name configurations, and looking for such a cookie coming into the /cookie_sync endpoint.
    ii) If not found, done. Else if found, then look for the existing value of host-cookie.family in the uids cookie.
    ii) If the host-cookie-family has the same value in both uids and from the direct host-cookie, then we're done. Else, add the host-cookie-family to the list of families to (re)sync.
  2. Apply filterSettings rules to the cookie-families list (iframe or redirect) if requested
    a) If a cookie-family is removed due to filter settings, log a metric cookie-sync.FAMILY.filtered. Add this to the client response if debug is true.
  3. Apply GDPR filtering rules if:
    a) request specifies gdpr:1
    b) request doesn't specify gdpr and IP lookup shows the user in the EEA
    c) if a cookie-family's GVLID is blocked, log cookie-sync.FAMILY.tcf.blocked metric
    d) add the "Rejected by TCF" error to the response if the debug flag is true
  4. Now we have all the cookie-families that are allowed to drop a pixel/ifame
  5. Figure out how many cookie-families we're allowed to sync
    a) /cookie_sync request limit (capped by cookie-sync.max-limit)
    b) cookie-sync.default-limit
    c) otherwise, if neither config is specified, let the default cookie-sync.default-limit be 2.
  6. One at a time, add cookie-families from the list (step 4) to the response up to the limit (step 5) in this order:
    a) add a cookie-family specified on the request
    b) add any multisync cookie-families already in the uids cookie and not done with multisync
    c) if coop-sync is on
    i) add a cookie-families in cookie-sync.pri that's not already in the cookie
    ii) if no bidders remaining in 'pri', add a random cookie-family from the remainder of the list
    d) check if the cookie size limit has been reached. if no, skip the cookie-family. if yes, we may decide to emit the sync url anyhow. Resyncing multisync cookie-families already in the uids cookie don't count towards the cookie size limit since they don't go in more than once.
    i) if the current cookie-family is on the 'pri' list but not in the cookie, go ahead and add it the response anyhow. /setuid may eject a non-priority cookie-family
    e) check if the limit of new cookie-families has been reached for this /cookie_sync. If so, we're done.
  7. If any bidders were specifically requested but could not be fit due to the limit, log them with the "limit reached" warning.

Here are the messages that come out of the /cookie_sync process:

if the bidder/family was listed in the original request:

  • "Unsupported bidder" - a totally unknown bidder code doesn't map to a cookie family
  • "Bidder not enabled" - a known bidder, but not enabled on this PBS host
  • "No sync config" - an enabled bidder, but no sync config for this family
  • "Already in sync" - family already in uids cookie
  • "synced as FAMILY" - lets the user know that this bidder is synced as "FAMILY", which may have one of the other statuses in this list for both directly requested families and coop-synced families
  • "Rejected by TCF"
  • "Rejected by CCPA"
  • "Rejected by request filter"
  • "limit reached" - only seen for specifically-requested bidders once the limit is reached. e.g. if they ask for 5 bidders but set the limit to 1.

Creating the cookie-families sync URL

A single usersync type of iframe/redirect is not quite enough to correctly cover all the use cases:

  1. client creates iframe, the cookie-family's iframe drops AJAX. The usersync type is 'iframe' and /setuid default format is f=b
  2. client creates iframe, the cookie-family's iframe drops image. The usersync type is 'iframe'. The default f=b response works, but returns blank into an img, which may not be ideal in some scenarios. Use a format override to set f=i instead of the default f=b.
  3. client creates image: The usersync type is 'redirect', which should default to /setuid of f=i

Proposed algorithm:

  1. Determine whether this sync url should be iframe or redirect
    a) See Support both image and iframe usersyncs  #1554 for details
  2. Start with the appropriate usersync.url. Resolve {{}} macros
  3. Scan the configured redirect-url (the PBS /setuid)
    a) Resolve {{gdpr, etc}} macros
    b) support config for the the cookie-family's UID macro
    c) support config for overriding the 'format' parameter
    d) it should be possible for a bidder to not have a redirect URL sent. Some, like Rubicon, configure that on their side.
  4. If there is a redirect-url, append it to the end of the syncurl used in the cookie_sync response
  5. the 'bidder' in the /setuid call back to Prebid Server is the cookie-family name

The /setuid endpoint

/setuid requirements

High level requirements for setuid

  1. Enforce GDPR
  2. Support defining the response format of image or html
  3. Enforce cookie size
  4. Allow the host company to define a list of "priority" bidders that can get into full cookies.
  5. Log metrics for GDPR and full-cookie scenarios
  6. Check the multisyncdone parameter and turn off multisync as appropriate

/setuid endpoint

Algorithm for /setuid

  1. Parse the existing uids cookie
  2. Remove any expired entries
  3. If there's room left in the uids cookie for the current cookie-family
    a) check GDPR permissions for the the cookie-family if the request specifies gdpr=1 or if gdpr not specified and the IP address resolves to the EEA. Note that this assumes the cookie-family name can be looked up for GVLID.
    b) if allowed by GDPR, add the cookie-family, encode it, and recheck the size
    c) if not allowed by GDPR, log metric 'usersync.BIDDER.tcf.blocked' and respond with the cookie-family not added
  4. Check the multisyncdone parameter. If it's true and that the cookie-family's state in the uids cookie is multisync:true, change it to false.
  5. If there's no room left in the uids cookie:
    a) if there is a 'pri' list and the current cookie-family is not on it, ignore the bidder, don't add the cookie-family to the response set-cookie. Log a metric for 'usersync.FAMILY.sizeblocked'
    b) if there is a 'pri' list and the current cookie-family is on it, first check GDPR allows this cookie-family as above. If not, log metric 'usersync.FAMILY.tcf.blocked'
    i) remove the oldest cookie-family uid that's not on the pri list. Log a metric for the kicked-out bidder 'usersync.FAMILY.sizedout'

Utilizing the uids cookie and the host cookie in the /auction endpoint

There's no change in this area, but for completeness, the final utility of the uids cookie is to pass user.buyeruid to bid adapters.

When creating each bidder-specific ORTB request:

  1. assume there's a function BIDDERMAP that maps "request biddercode" (imp.ext.prebid.bidder.BIDDER) to "base biddercode". This handles case sensitivity, hardcoded (YAML) aliases, and softcoded (request) aliases.
  2. REQBIDDER=imp.ext.prebid.bidder.BIDDER
  3. BASEBIDDER=BIDDERMAP(REQBIDDER)
  4. Try to set user.buyeruid from user.ext.prebid.buyeruids.REQBIDDER (case insensitive). If it was found, we're done.
  5. Else, try to set user.buyeruid from user.ext.prebid.buyeruids.BASEBIDDER (case insensitive) If it was found, we're done.
  6. Else, find the cookie family in the YAML for REQBIDDER. If none, find cookie family in the YAML for BASEBIDDER. If none, skip the rest. It's a bidder with no buyeruid.
  7. Else, try to set user.buyeruid from the uids cookie for COOKIEFAMILY
  8. Else if the family name matches the "host-cookie.family" configuration:
    a. Look for an incoming cookie by the name defined in "host-cookie.cookie-name". e.g. for Rubicon, this is "khaos", for Appnexus, this is "uid".
    b. If the cookie defined by host-cookie.cookie-name exists, copy its value to user.buyeruid
@bretg
Copy link
Contributor Author

bretg commented Apr 4, 2022

Discussed in committee. Still looking for community feedback on the details, but folks are generally ok with the idea that refining the 'full cookie' scenario is higher priority than shrinking the cookie #1985

@bretg
Copy link
Contributor Author

bretg commented May 28, 2022

Updated to integrate support for the DSP sync feature #1986

@bretg
Copy link
Contributor Author

bretg commented Jul 21, 2022

Added the "Creating the bidder's sync URL" section

@SyntaxNode
Copy link
Contributor

SetUID Edge Case Exploration
The proposal refers to pri bidders as a singular thing, but there are different tiers. What should /setuid do if a cookie contains just pri bidders?

  • If the bidder we are trying to sync is a non-pri bidder, reject the uid and log metric 'usersync.BIDDER.sizeblocked'.
  • If the bidder we are trying to sync is a lower tier pri bidder, reject the uid and log metric 'usersync.BIDDER.sizeblocked'. ???
  • If the bidder we are trying to sync is a bucket 0 pri bidder ???

Sync URL Generation
I redesigned the PBS-Go user sync url generation last year to better match with the approach used by PBS-Java. There a few differences in the algorithm. This is what PBS-Go does today:

  1. Choose the iframe or redirect syncer settings.
  2. Choose the external url to use for server callbacks in the following order:
    a. adapter config: usersync.<iframe/redirect>.external_url
    a. adapter config: usersync.external_url
    a. config: user_sync.external_url
    a. config: external_url
  3. Compose the redirect url (if used)
    a. Resolve macros in the redirect url in the following order (not that order should be important): syncer key (formerly cookie family), sync type (iframe/redirect f parameter), user macro, external host for callback.
  4. Base64 encode (url-safe) the redirect url.
  5. Resolve redirect macro in the user sync url with the value from the previous step. The adapter can simply omit the macro if they don't have a redirect.

There is a hardcoded default redirect url which can be overwritten per adapter by the host, but I expect most to just leave it as is. This default matches the expectations of the setuid endpoint which I view as a PBS configuration and not a per-adapter setting. Instead, the adapters can configure the information which is used to compose the redirect url. The default for PBS-Go is:
{{.ExternalURL}}/setuid?bidder={{.SyncerKey}}&gdpr={{.GDPR}}&gdpr_consent={{.GDPRConsent}}&f={{.SyncType}}&uid={{.UserMacro}}

The syncer key defaults to the bidder name, but can be overwritten. Here's an example of the AppNexus adapter config which uses a custom key:

userSync:
  key: "adnxs"
  redirect:
    url: "https://ib.adnxs.com/getuid?{{.RedirectURL}}"
    userMacro: "$UID"

Adapters may share the same syncer key, which is useful for aliases. If the same syncer key is used, the settings may only be defined once or else it's treated as a fatal configuration error at startup.

PBS-Go Change Summary
Based on the proposal so far, here is a list of changes required for PBS-Go:

  • Move the priority groups from the coop sync level to the cookie sync level.
  • Use the priority groups when deciding to eject a sync from the cookie due to size constraints per the algorithm in this issue.
  • Add support for multi-sync bidders (iframes).
  • Allow a bidder to override it's setuid callback type from either iframe or image.

@bretg
Copy link
Contributor Author

bretg commented Sep 8, 2022

Thanks for the review @SyntaxNode .

First, to clarify - this proposal defines only two tiers: either a bidder is 'priority' for syncing or it's not.

// these are the VIPs. They're all equally important. Everyone else is equally somewhat less important.
cookie-sync.coop-sync.pri: bidderA, bidderB, bidderC  

SetUID Edge Case Exploration

I'm going to make two assumptions:

  1. you're talking about step 5 of the algorithm for /setuid
  2. By 'sync', I'm going to assume you mean "put in the uids cookie".

Maybe it will help to list out some times a /setuid call will be rejected:

  1. unknown bidder
  2. bidder doesn't have TCF Purpose 1
  3. uids cookie is full and bidder is not priority

PBS-Go Change Summary

Sounds right to me. Thanks.

@bretg
Copy link
Contributor Author

bretg commented Sep 8, 2022

Oh, except perhaps this -- I added cookie_sync req 10

if a bidder is rejected for any reason (e.g. GDPR, unknown bidder), it should be reflected back in the response. Note that a bidder without cookie-sync config should only respond with an error if it was specified on the incoming request.

The idea is that if someone is trying to sync bidderX but that bidder can't be synced (not enabled or didn't supply sync info) the caller needs to know that or that may open a support ticket.

@bretg
Copy link
Contributor Author

bretg commented Oct 20, 2022

Updated to add requirements 13 and 14. Renamed "bidders" in most places to "cookie-family" to cover the case of aliases.

@bretg
Copy link
Contributor Author

bretg commented Oct 25, 2022

Updated to list out the warnings/errors

@bretg
Copy link
Contributor Author

bretg commented Oct 26, 2022

We've noted that theres an important change happening with this revision, that I consider a bug fix, but the community should weigh in...

Districtm is a server-side hard-coded alias of appnexus. Right now the behavior is

If you POST to /cookie_sync

{"bidders":["districtm"],"limit":10}

The current response is:

        {
            "bidder": "districtm",
            "no_cookie": true,
            "usersync": {
                "url": "//ib.adnxs.com/getuid?https%3A%2F%2Fpg-prebid-server-qa.rubiconproject.com%2Fsetuid%3Fbidder%3Dadnxs%26gdpr%3D%26gdpr_consent%3D%26us_privacy%3D%26account%3D%26f%3Di%26uid%3D%24UID",
                "type": "redirect",
                "supportCORS": false
            }
        },

With this revision, the "bidder" field will become "adnxs" instead, because that's the 'cookie family' name.

After this revision,
If you POST to /cookie_sync with debug on

{"bidders":["districtm"],"limit":10, "debug": true}

You would see:

        {
            "bidder": "adnxs",
            "no_cookie": true,
            "usersync": {
                "url": "//ib.adnxs.com/getuid?https%3A%2F%2Fpg-prebid-server-qa.rubiconproject.com%2Fsetuid%3Fbidder%3Dadnxs%26gdpr%3D%26gdpr_consent%3D%26us_privacy%3D%26account%3D%26f%3Di%26uid%3D%24UID",
                "type": "redirect",
                "supportCORS": false
            }
        },
        {
           "bidder": "districtm",
           "error": "synced as adnxs"         // to avoid potential confusion 
         }

@bretg
Copy link
Contributor Author

bretg commented Nov 7, 2022

The team pointed out there's host-cookie functionality in the existing endpoint that wasn't represented here. Added cookie_sync required 15 and algorithm 1.g

@bretg
Copy link
Contributor Author

bretg commented Nov 23, 2022

Added cookie_sync step 7 and the "limit reached" edge case.

@bretg
Copy link
Contributor Author

bretg commented Dec 9, 2022

Added reference to user.ext.prebid.buyeruids.BIDDER behavior

@bretg
Copy link
Contributor Author

bretg commented Dec 19, 2022

released with PBS-Java 1.105

@bretg
Copy link
Contributor Author

bretg commented Oct 18, 2023

FYI - updated the user.buyeruid section to deal with aliases.

@AlexBVolcy
Copy link
Contributor

Closing this issue after necessary updates for PBS-Go related to refining the cookie_sync process have been implemented and merged.

@github-project-automation github-project-automation bot moved this from Ready for Dev to Done in Prebid Server Prioritization Mar 1, 2024
@AlexBVolcy
Copy link
Contributor

AlexBVolcy commented Mar 1, 2024

Re-Opened to show I opened a minor PR: #3558, that updates one of the debug messages to match Java.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants