Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intent to Implement: Analytics APIs for AMP #871

Closed
avimehta opened this issue Nov 6, 2015 · 56 comments
Closed

Intent to Implement: Analytics APIs for AMP #871

avimehta opened this issue Nov 6, 2015 · 56 comments
Assignees
Labels
INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code WG: analytics

Comments

@avimehta
Copy link
Contributor

avimehta commented Nov 6, 2015

Analytics APIs for AMP

Objective

Define a vendor-neutral method for adding analytics to an AMP document.

Overview

A new tag, <amp-analytics>, defines which events to measure and how the measured data is sent to an analytics vendor.

This tag provides built-in support for most widely used analytics services. For example, the following tag will send measurement requests to XYZ when the document is loaded, and when any element matching the .page_button selector is clicked.

<amp-analytics type="XYZ">
{
  "vars": {
    "account": "123456"
  },
  "triggers": [ {
    "on": "load",
    "request": "predefined-request-type1"
  }, {
    "on": "click",
    "selector": ".page_button",
    "request": "predefined-request-type2",
    "vars": {
      "foo": "Button Click",
      "bar": "Navigation",
    }
  }]
}
</amp-analytics>

Alternatively, this tag can be used to send the analytics data to a custom endpoint, request format and transport.

In both these tags, a simple templating mechanism is provided to replace {{identifier}} tokens with properties extracted from either (in descending precedence) the DOM event target element, the event configuration object, the amp-analytics configuration object, or platform provided built-in properties (for example canonical_url or referrer). This system allows declarative access to document properties.

Details

A publisher can configure the <amp-analytics> tag and use the JSON config to listen for various events and filter on elements using css selectors. There should ideally be only one tag per analytics vendor.

Analytics Tag

Each analytics vendor's configuration lives in an amp-analytics tag. The configuration that goes with the tag can be either inline or defined through an external JSON resource.

<!-- Predefined vendor -->
<amp-analytics type="google-analytics">
  // JSON Config
</amp-analytics>

<!-- Custom Vendor -->
<amp-analytics>
  // JSON config
</amp-analytics>

<!-- Custom Vendor with linked config -->
<amp-analytics config="https://analytics.com/config.json">
</amp-analytics>

type: An optional string that identifies certain pre-defined analytics services. If none is specified, the JSON config should contain the host and request values to send the analytics data to.

config: This is a URL to a JSON resource that defines the config to be used with this analytics tag. Details on the format of the config are below. This method allows the publishers to self-host the config or manage the configurations through third parties like Tag managers etc and makes the analytics on a page dynamically configurable. The transport for the URL specified here should always be https.

JSON Config

The details about how the hits for a particular vendor are fired can be specified in the JSON config. This config follows the same format for various types of analytics tags above.

<amp-analytics>
{
  // The domain:port to send the analytics data to.
  "host": "my-analytics.com:8080",

  // A map of templates to be used elsewhere in the config. The key is used to identify the template
  // and the value is a string that will be expanded into a full request before it is sent over
  // wire. Some variables defined in {{}} are platform provided others are user defined. The naming
  // convention for variables is /[a-z0-9_]+/.
  "requests": {
    "type_1": "log?d={{domain}}&p={{path}}&c={{client_id}}&dd={{domain}}&a={{account}}&s={{section}}&t={{title}}&x={{scroll_x}}&y={{scroll_y}}&sh={{screen_height}}&sw={{screen_width}}&r={{referrer}}",
    "type_2": "log?d={{domain}}&p={{path}}&c={{client_id}}&a={{account_id}}&r={{referrer}}&t={{title}}",

   // This request starts off with another request(type_2) and extends it to add additional parameters(cd1, cd3).
   "type_3": "{{type_2}}&cd1={{value1}}"
  }

  // These are the user defined variables that will be used while creating the request.[1]
  "vars": { "account": "UA-123456-1", "section": "foobar"}

  // An array of triggers that define various items of interest for analytics.
  "triggers": [{

      // This is triggered when document becomes visible and sends a hit of format "type_2".
      "on": "visible",
      "request": "type_2"
    }, {

      // This is triggered when the element #foo is tapped. The request type "type_1" is used to
      // generate the request. Vars specified take precedence over those built into platform or defined above[1].
      "on": "tap",
      "request": "type_1",
      "selector": "#foo"
      "vars": {
        "section": "Something"
        "title": "New Title"
      }
    }, {

      // This example shows how a request that was extended from another request can be used to send analytics hits.
      "on": "visible",
      "request": "type_3",
      "vars": { "value1": "foo" }
    }]}
</amp-analytics>

host: This field describes the host to which the analytics data is sent. The scheme is always https.

requests: This object defines templates that specify the request payload to be sent to an analytics host. Requests can use variables that are defined by the platform or by the publisher elsewhere in the config. If a variable can't be resolved, an empty string is used for the value.

vars: The JSON config can contain any number of named properties. These properties are used to fill in the request templates. Values of variables that are filled in from templates will be document encoded before insertion into the URL.

Variable values are defined in various places. In increasing order of precedence:

  • Built-in values: AMP platform provides these values. All built-in values are defined below.
  • Config level values: The JSON config provides these values. They are defined int the top level vars object and are shared between all the triggers.
  • Trigger level values: Optionally defined in the vars element of each trigger object. These only apply to the trigger under which they are defined.

triggers: This field defines an array of events that are measured by the analytics vendor. Each value in the array is an object with key-value pairs defined below.

on: This value defines the DOM event that is used as the trigger for the entry. Valid values for this field are:

  • tap: When one of the elements in the selector list is tapped
  • click: when one of the elements in selector list is clicked
  • visible: When the document is made visible
  • hidden: When the document is made invisible
  • timer: Based on a timer which is defined through a timer-spec.

request: Name of the request template to use for analytics requests.

selector: This is a comma separated list of selectors that will be listened to for the DOM event. This field accepts one or more comma separated list of CSS selectors that the the native browser will accept through document.querySelectorAll().

Built-in Variables

Page

  • domain: AMP document's domain
  • path: AMP document's path
  • canonical_url: The canonical URL for the AMP document
  • title: AMP document's title
  • page_authors: Authors if specified in the page's schema
  • page_section: Section details specified in the page's schema

User metadata

  • client_id: An id associated with the user (persistent). This id will be generated by the runtime and the spec will be published soon
  • client_timezone: Timezone read from the browser

Browsing data

  • referrer: Referrer to page, if any. The analytics vendor has to determine if the traffic is considered as organic/referrer/direct
  • timestamp: The timestamp of when the hit is generated.

Browser data

  • screen_width, screen_height: Dimensions of the screen.

Scroll depth

  • scroll_x, scroll_y: Current scroll depth as offset from top of page, pixels
  • max_scroll_x, max_scroll_y: Maximum scroll depth attained as offset from top of page, pixels
  • page_height: Current height of page, pixels

AMP data

  • is_proxied: A flag indicating that beacons are coming from an AMP page
  • runtime_id: Platform descriptor: A descriptor of which domain/app the AMP file is being viewed from

Timing

  • page_load_time: Page load time (ms)

Misc

  • random: A random string generated for each hit.
  • developer_mode: true/false indicating whether the developer mode was turned on.
  • hit_count: Total number of hits sent to the analytics vendor.

Examples

Note that the named examples below are just meant to show how things could be implemented. They are not really implemented yet.

Custom URL

<amp-analytics>
{
  "host": "my-analytics.com:8080",
  "requests": {
    "base_hit": "/collect?v=1&_v=a0&aip=true&_s={{hit_count}}&dl={{domain}}&dt={{title}}&sr={{screen_width}}x{{screen_height}}&ht={{timestamp}}&jid=&cid={{client_identifier}}&tid={{account}}"
    "pageview": "/r/{{base_hit}}&t=pageview&_r=1"
    "event": "{{base_hit}}&t=event&ec={{event_category}}&ea={{event_action}}&el={{event_label}}&ev={{event_value}}"

  "vars": {
    "account_id": "123456",
  }
  "triggers": {[{
    "selectors": ".measured",
    "on": "CLICK",
    "vars": {
      "event_category": "All",
      "event_label": "outbound links",
      "event_action": "click"
    },
    "request": "event"
  }, {
    "on": "LOAD",
    "request": "pageview"
  }]}
}
</amp-analytics>

Built-in tag for Google Analytics

<amp-analytics type="google-analytics">
{
  "vars": { "account_id": "UA-123456-7" }
  "triggers": {[{
    "selectors": "a .outbound",
    "on": "CLICK",
    "vars": {
      "event_category": "All",
      "event_label": "outbound links",
      "event_action": "click"
    },
    "request": "event"
  }, {
    "on": "LOAD",
    "request": "pageview"
  }]}
}
</amp-analytics>

Built-in tag for chartbeat

<amp-analytics type="chartbeat">
{
  "vars": {
    "account_id": "123456",
    "section": "Politics"
  }
  "triggers": {[{
    "on": "TIMER"
    "timer-spec": { "interval": 15, "max-count": 10 }
    "request": "default"
  }]}
}
</amp-analytics>

Remote config

<!-- The config url specifies everything that needs to be measured. -->
<amp-analytics
  config="my-config.com/pub=123456"></amp-analytics>

Known Issues

  • No Built in variables for
    • Performance metrics.
    • Heuristics for active engaged time are TBD. Maybe use an open-source implementation?
    • Element viewability data
      • Based on IAB viewability standard
      • Exact built in vars, how to expose it is tbd.
    • Ecommerce data
  • on semantics for first-visible, long-view and other events.
  • Element level overrides for variables

Edits: Removed external link.

@cramforce cramforce added the INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code label Nov 7, 2015
@cramforce
Copy link
Member

CC @rudygalfi

@cramforce
Copy link
Member

I'm working on a related doc to create the "client identifier". Will publish in the coming days and link here.

@rudygalfi
Copy link
Contributor

Is data like geo and device type accounted for in some way?

@kzap
Copy link

kzap commented Nov 12, 2015

Would be great to get the segment guys to use this as that would be one gateway to support many platforms without each analytics platform needing to create their own

@avimehta
Copy link
Contributor Author

@rudygalfi geo is usually calculated from IP address. Apart from that, the timezone (which is part of spec) is another hint. There is the option of asking browser for location but I am not aware of any vendor that uses that API.

I'll look into the device type a little more. So far the signals available for device type inference are the useragent and the screen resolution.

@cramforce
Copy link
Member

Just posted #961 regarding client identifiers.

@philwills
Copy link

@avimehta IP doesn't seem to be listed in the spec at the moment. Is that an oversight?

@philwills
Copy link

Is the current intent that you either have an inline config, or config over a URL, or is there an intent to support merging the two?

Keeping it to one or the other seems perfectly reasonable, but it would be good to be explicit.

@cramforce
Copy link
Member

@philwills I don't believe the client has any way to read the IP even if it wanted to. It would be available server side on the outbound beacon request.

As to the intent of multiple configs:

We plan to support multiple layers of config. Basically:

  1. vendor profile
  2. external config
  3. 1 or more (!) inline configs.

They all get mixed into the same large JS object with configs further down having the ability to override values defined further up.

@philwills
Copy link

@cramforce So is there some view identifier intended to be consistently available to the outbound beacon and the analytics API enabling us to tie the two together?

Apologies, I'm still trying to get my head round AMP at the moment and am not quite sure where some responsibilities lie.

@cramforce
Copy link
Member

@philwills I posted about client identifiers yesterday in #961. Not sure whether a view identifier (just a random value that is the same for all requests from the same page view?) is planned, but would certainly be straight forward to add.

@cramforce
Copy link
Member

Just merged the client identifier PR #963.

You can now do cidFor(window).get('my-cookie-name') to get a client identifier.

@avimehta
Copy link
Contributor Author

Mostly what @cramforce said.

re IP: @philwills IP is something that servers have to log on their side. javascript has no access to it. User-agent is another thing that is sent as part of http request so javascript won't deal with it(unless there is really some need for user-agent in javascript. If this is the case, please explain and I'll include it.)

re Merging: I think the intent is to do the merging but the details of how we end up doing the merging and the order of precedence is TBD. Would https://github.com/google/data-layer-helper#the-abstract-data-model merging logic work for you?

re View Identifier: This part is still TBD but there is no reason this can't be supported.

@ryanlombardo
Copy link

@avimehta We use user-agent for analytics and for customizing experiences in different in-app contexts and for different device types. For example, we might use a different SMS sharing icon if you're on Android, or we add a more prominent "Pin It" button if you're in the Pinterest app.

For in-app, the referrer usually shows as direct. Pinterest, Facebook, and Twitter modify the user-agent though so we detect that and add a class to html for styling. (We do the same for OS.) We also use it for analytics by overriding the referrer so that we can attribute users who would otherwise be labeled as direct to these apps.

@rudygalfi
Copy link
Contributor

@ryanlombardo please take a look at #945. I think it covers some of the examples you mentioned and I'd like to get your feedback.

@avimehta
Copy link
Contributor Author

+1 to what @rudygalfi said. @ryanlombardo I am still trying to understand the usecase so pardon my ignorance:

The examples you mention are related to changing something based on the environment (referrer, device, os). I think we can expose this information but making changes to the page based on the environment is handled by #945. None of the APIs exposed by this intent make changes to dom or css. We might have to add those APIs but I feel it is better done independently of amp-analytics. What do you think?

You also mentioned overriding the referrer to attribute users to apps. Does this happen on the client side? Do you have some kind of list for such mapping? I think we can try to add this mapping to the spec. I don't completely like this though because these mappings may change over time and thus the data will get messy. A slightly different approach might be to do this on the server where you have the ability to reprocess data if the referrer/user-agent changed. What do you think?

ps: fwiw, adding useragent to the spec is trivial one line change. If this is desired, I will add it. I just feel that this might lead to duplication of data.

@gfranczyk
Copy link

In the examples there's an on value load; however, this value is not listed in eligible values in the on section. Are there any other DOM events that can be bound to? One that would be helpful in particular is the unload event -- this would make it possible to gracefully submit metrics like max_scroll_x and allow the analytics provider to more accurately calculate time on page.

+1 re: Known Issue: Heuristics for active engaged time are TBD. Maybe use an open-source implementation -- however, this can also be approximated on the server side by comparing scroll_x|y values across timer pings, so I wouldn't consider it a blocker.

Also, +1 re: external / linked configs -- this is critical for allowing publishers to define an internal user ID in the analytics config for submission to analytics providers.

@ryanlombardo
Copy link

@rudygalfi #945 would work. It's similar to what we're doing for that type of customization.

@avimehta We do it on the client-side. Google Analytics and Adobe Analytics have methods for overriding the referrer. I agree that mappings may need to change overtime and probably isn't the best solution. How would your server-side proposal work?

@cramforce
Copy link
Member

We will try our very best not to use unload events. They are bad for
browser performance, because they force reload on back button press. They
are also meaningless in AMP because pages might be no longer visible but
not unload (e.g. after swipe). Page visibility events are more helpful and
will be exposed to analytics.
On Nov 23, 2015 8:24 AM, "Ryan Lombardo" notifications@github.com wrote:

@rudygalfi https://github.com/rudygalfi #945
#945 would work. It's
similar to what we're doing for that type of customization.

@avimehta https://github.com/avimehta We do it on the client-side.
Google Analytics and Adobe Analytics have methods for overriding the
referrer. I agree that mappings may need to change overtime and probably
isn't the best solution. How would your server-side proposal work?


Reply to this email directly or view it on GitHub
#871 (comment).

@rudygalfi
Copy link
Contributor

Different topic: Is there a way to pull parameters out of the URL?

Example: foo.com/section/page.html?a=5#b=3. We'd know there's a parameter a with a value of 5 or that the anchor portion says b=3

@avimehta
Copy link
Contributor Author

I am planning to expose the query and anchor parts in the doc. If there is
a requirement, we can consider pulling out specified variables from the URL.

GA can use this for campaign stuff. Probably a lot of other vendors might
find it useful as well.

On Mon, Nov 23, 2015 at 3:41 PM, rudygalfi notifications@github.com wrote:

Different topic: Is there a way to pull parameters out of the URL?

Example: foo.com/section/page.html?a=5#b=3. We'd know there's a parameter
a with a value of 5 or that the anchor portion says b=3

@joshschwartz
Copy link
Contributor

We would definitely like to pull out UTM parameters from the URL.

Sent from my iPhone

On Nov 23, 2015, at 6:44 PM, Avi Mehta notifications@github.com wrote:

I am planning to expose the query and anchor parts in the doc. If there is
a requirement, we can consider pulling out specified variables from the URL.

GA can use this for campaign stuff. Probably a lot of other vendors might
find it useful as well.

On Mon, Nov 23, 2015 at 3:41 PM, rudygalfi notifications@github.com wrote:

Different topic: Is there a way to pull parameters out of the URL?

Example: foo.com/section/page.html?a=5#b=3. We'd know there's a parameter
a with a value of 5 or that the anchor portion says b=3


Reply to this email directly or view it on GitHub.

@rudygalfi
Copy link
Contributor

@avimehta I think as far as the analytics framework is concerned it's not necessary to parse out the params. Should there be entries in the design proposal listing the query and anchor parts as built-in variables?

@philwills
Copy link

@avimehta I am aware that the IP isn't available in JS. I'm still getting to grips with AMP and wasn't sure whether the intent for this was to have the client talk back to the analytics system directly, or if that was to be intermediated.

avimehta added a commit to avimehta/amphtml that referenced this issue Dec 17, 2015
@adsouza
Copy link

adsouza commented Jan 2, 2016

Validation fails when analytics are added.

@rudygalfi
Copy link
Contributor

@adsouza Validation for amp-analytics is not yet implemented and being tracked with #1087.

@breauxc
Copy link
Contributor

breauxc commented Jan 5, 2016

How do I get the ball rolling on the implementation of active engaged time? In the last couple of check-ins with Josh from Chartbeat and Andrew from Parse.ly, we agreed to move forward with active engaged time once we had a universal definition; we think we are now there.

To summarize, each second (or time interval) we count someone as being engaged if the user has the window in focus and has interacted with the page during the last five seconds. Chartbeat uses [pageload, focus, mousedown, mousemove, scroll, keydown, resize] as a minimal set of event interactions. The reason we don’t include mobile-specific events (e.g., touchstart, touchenter) is that some of our client sites have had performance issues with the handling for these events. Instead, we leverage the fact that essentially all mobile browsers fire desktop mouse/scroll events at the end of a touch tap to better accommodate legacy web pages. The only other substantial difference between Parse.ly’s open source implementation (https://github.com/Parsely/time-engaged) is that we will not consider video in the AMP definition.

@philwills
Copy link

@breauxc Is it worth spawning a separate issue to track this? This one is rather sprawling already.

@rudygalfi
Copy link
Contributor

Once the bulk of the initial implementation for AMP Analytics is complete, I’d like to be able to close this issue and continue discussions for further feature requests in other issues. To that end, I have opened issues relating to several items that were called out as TODOs in the design or raised as comments on this issue.

Also, relating to the discussion about the view identifier, this should be supported via the PAGE_VIEW_ID substitution as documented here: https://github.com/ampproject/amphtml/blob/master/spec/amp-var-substitutions.md. (So you could construct it as something like ?pvid=PAGE_VIEW_ID or ?pvid=${pageViewId} since the variable pageViewId inherits the value of PAGE_VIEW_ID by default.

I believe the suggestion of adding a POST option is covered by the transport option of xhrpost.

It should be possible to specify more than one host since the request value now accommodates the entire URL.

avimehta added a commit to avimehta/amphtml that referenced this issue Jan 14, 2016
rudygalfi added a commit that referenced this issue Jan 16, 2016
Added documentation about variables supported by amp-analytics. #871
@rudygalfi
Copy link
Contributor

I've opened two additional issues to track work first mentioned in this issue:

@rudygalfi
Copy link
Contributor

amp-analytics should validate (#1087) and is on a path to be in our next release as a non-experimental feature (#1485). Closing this.

If you spot any spec gaps that haven't moved into other issues, please open a new issue to track.

msukmanowsky pushed a commit to Parsely/amphtml that referenced this issue Jan 21, 2016
@alvin-milton
Copy link

Will this support custom dimensions for Google Analytics?

@rudygalfi
Copy link
Contributor

@alvin-milton The Google Analytics support does include custom dimensions, according to https://developers.google.com/analytics/devguides/collection/amp-analytics/#extending_googleanalytics

@kzap
Copy link

kzap commented Feb 10, 2016

@alvin-milton @rudygalfi not exactly, since theres no optional parameter support, any custom dimensions you want to include have to be explicitly defined in your own request like the example given by rudy. So yes you can define whatever parameters you want for GA as long as you add the query string for it in the request object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code WG: analytics
Projects
None yet
Development

No branches or pull requests