Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain field, subdomains, and ads.txt #1813

Closed
gglas opened this issue Apr 15, 2021 · 9 comments
Closed

Domain field, subdomains, and ads.txt #1813

gglas opened this issue Apr 15, 2021 · 9 comments
Assignees
Labels

Comments

@gglas
Copy link

gglas commented Apr 15, 2021

We are working with a publisher who is leveraging PBS on some subdomains of their primary website.

In the oRTB requests we're sending via PBS, we're passing the value for the 'domain' field as the publisher's top-level domain with the subdomains sanitized and not the individual subdomain of the originating request.

A number of bidders have reported issues with this setup, because their ads.txt lines are on the publisher's subdomain and not on the parent domain. This means when the domain field is passed through via oRTB, unless the bidders parse through the full page path (which would require overriding the domain field in the requests to DSPs, which seems sub optimal), the traffic appears to be unauthorized as there is no ads.txt entry on the parent.

Looking at the ORTB documentation, it seems that subdomains should be passed through in the domain field.

@hhhjort hhhjort self-assigned this Apr 15, 2021
@hhhjort
Copy link
Collaborator

hhhjort commented Apr 15, 2021

Yes, the documentation certainly seems to allow this. PBS does not care about what is in the domain field of the site object, it should be pass through whatever happens to be there to the adapters. Have you seen different behavior, or what is your issue exactly?

@bretg
Copy link
Contributor

bretg commented Apr 16, 2021

@gglas - what is the desired algorithm? Just pass the raw domain name? www is the problem. It seems useless to me.

www.example.com --> www.example.com
www.subdomain.example.com --> www.subdomain.example.com
subdomain.example.com --> subdomain.example.com

Making this changes would break the Prebid-PG targeting logic we depend on -- it's useful from an ad targeting perspective to ignore www.

So instead of simply dropping the helpful domain aggregation service, I would propose moving the aggregated version.

  1. If domain isn't specified, PBS parses referer to get the host name. Raw host name is placed in site.domain, www and all.
  2. PBS processes host name and places it in site.publisher.domain
  3. We update Prebid PG targeting to consider both site.domain and site.publisher.domain

@bretg
Copy link
Contributor

bretg commented Apr 16, 2021

We discussed in the committee today. We're leaning towards going ahead and making the change to support placing the raw hostname in site.domain and the processed hostname in site.publisher.domain

We plan to reach out to folks who know more about ads.txt and seller transparency to confirm.

@bretg
Copy link
Contributor

bretg commented Jun 24, 2021

Released with PBS-Java 1.65

@bretg bretg added the PBS-Go label Jun 24, 2021
@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 8, 2022
@GLStephen
Copy link
Contributor

I can see doing this processing in PBS to clean up the "domain" entry. However, the crawler behavior indicated is not correct for the ads.txt spec. All ads.txt crawlers should do their own subdomain parsing. Make requests from the primary/root domain and only then consider the subdomain if the root domain indicates it via a special line in the ads.txt. This is regardless of where the request comes from. The default is always the root domain. The subdomain is made relevant --in-- the root domain ads.txt file.

Google docs related: https://support.google.com/adsense/thread/119715799/ads-txt-for-subdomains?hl=en

Ads.txt Spec 4.5: https://iabtechlab.com/wp-content/uploads/2022/04/Ads.txt-1.1.pdf

5.5 from the above:

5.5 SUBDOMAIN DIRECTIVES
When writing crawlers, implementers should request the /ads.txt from the root domains that are
driving significant requests for advertising. Publishers should always post the /ads.txt file on
their root domain. The crawler should strip the subdomains when creating the crawler’s URL
list. The public suffix list [12][16] should be utilized in implementing subdomain stripping.
In cases where specific subdomains have different authorized advertising systems, the
publisher should post ads.txt files only on those subdomains and declare each of those
subdomains explicitly in the ads.txt on the root domain using the "subdomain=" variable.
Crawlers should only crawl for ads.txt files on subdomains that are listed using the
"subdomain=" variable in the ads.txt on the root domain.
When the ads.txt on the root domain declares a subdomain and an ads.txt exists on that
subdomain, only advertising systems listed in the subdomain ads.txt are authorized to sell
inventory on that subdomain. When the ads.txt on the root domain doesn't declare a subdomain
or an ads.txt does not exist on the subdomain, only advertising systems listed in the root domain
ads.txt are authorized to sell inventory on that subdomain.

@bretg
Copy link
Contributor

bretg commented Sep 23, 2022

@GLStephen - can you please translate the ads.txt concern to the context of Prebid Server, which doesn't know anything about ads.txt?

My view is that Prebid Server is giving downstream entities the best of both worlds: they get the 'raw' domain in site.domain and the 'rounded' domain in site.publisher.domain. Both are useful.

@GLStephen
Copy link
Contributor

GLStephen commented Sep 23, 2022

@bretg my concern was more related to driving a feature based on the behavior of a system with a broken implementation of ads.txt crawling. I agree that providing the domain parsed in both ways is useful. Beyond that, in PBS context, I'm just reiterating what you said earlier and want to make sure that we're explicitly recommending cleaning both site.domain and site.publisher.domain down to their spec formats.

@bretg bretg moved this from Triage to Clarify Request in Prebid Server Prioritization Oct 7, 2022
@bretg bretg removed the Needs Design label Oct 7, 2022
@bretg
Copy link
Contributor

bretg commented Oct 7, 2022

Closing this issue as it will be resolved with #2300, which is under dev.

@bretg bretg closed this as completed Oct 7, 2022
Repository owner moved this from Clarify Request to Done in Prebid Server Prioritization Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

5 participants