Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Technology meta data in HTTP Archive #733

Open
tunetheweb opened this issue May 25, 2023 · 3 comments
Open

Store Technology meta data in HTTP Archive #733

tunetheweb opened this issue May 25, 2023 · 3 comments

Comments

@tunetheweb
Copy link
Member

It would be useful to pull in more of the Wappalyzer meta data into the HTTP Archive.

For example the meta->description and website could be useful to display in the CWV Tech report (and possibly the icon).

The Aurora team also mentioned the implies data could be useful to see how much technology is based on other tech.

@pmeenan
Copy link
Member

pmeenan commented May 25, 2023

WPT stores the processed as well as the raw wappalyzer output in the HAR. The _detected_technologies field is lightly processed and _detected_raw has the raw, unprocessed detection results. Both have the description and file name for the logo:

i.e. for webpagetest.org:

                "_detected": {
                    "Programming languages": "PHP",
                    "Caching": "Varnish",
                    "JavaScript libraries": "jQuery UI 1.8.17,jQuery 1.7.1",
                    "Documentation": "Zendesk",
                    "Issue trackers": "Zendesk",
                    "Live chat": "Zendesk",
                    "Advertising": "Twitter Ads",
                    "Webmail": "Microsoft 365",
                    "Email": "Microsoft 365",
                    "Security": "HSTS,Cloudflare Bot Management",
                    "Tag managers": "Google Tag Manager",
                    "Analytics": "Google Analytics",
                    "CDN": "Cloudflare",
                    "Miscellaneous": "Open Graph"
                },
                "_detected_apps": {
                    "PHP": "",
                    "Varnish": "",
                    "jQuery UI": "1.8.17",
                    "Zendesk": "",
                    "Twitter Ads": "",
                    "Microsoft 365": "",
                    "jQuery": "1.7.1",
                    "HSTS": "",
                    "Google Tag Manager": "",
                    "Google Analytics": "",
                    "Cloudflare Bot Management": "",
                    "Cloudflare": "",
                    "Open Graph": ""
                },
                "_detected_technologies": {
                    "PHP": {
                        "name": "PHP",
                        "description": "PHP is a general-purpose scripting language used for web development.",
                        "slug": "php",
                        "categories": [
                            {
                                "id": 27,
                                "slug": "programming-languages",
                                "groups": [
                                    9
                                ],
                                "name": "Programming languages",
                                "priority": 5
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "PHP.svg",
                        "website": "http://php.net",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:php:php:*:*:*:*:*:*:*:*"
                    },
                    "Varnish": {
                        "name": "Varnish",
                        "description": "Varnish is a reverse caching proxy.",
                        "slug": "varnish",
                        "categories": [
                            {
                                "id": 23,
                                "slug": "caching",
                                "groups": [
                                    7
                                ],
                                "name": "Caching",
                                "priority": 7
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Varnish.svg",
                        "website": "http://www.varnish-cache.org",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:varnish-software:varnish_cache:*:*:*:*:*:*:*:*"
                    },
                    "jQuery UI": {
                        "name": "jQuery UI",
                        "description": "jQuery UI is a collection of GUI widgets, animated visual effects, and themes implemented with jQuery, Cascading Style Sheets, and HTML.",
                        "slug": "jquery-ui",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.8.17",
                        "icon": "jQuery UI.svg",
                        "website": "http://jqueryui.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery_ui:*:*:*:*:*:*:*:*"
                    },
                    "Zendesk": {
                        "name": "Zendesk",
                        "description": "Zendesk is a cloud-based help desk management solution offering customizable tools to build customer service portal, knowledge base and online communities.",
                        "slug": "zendesk",
                        "categories": [
                            {
                                "id": 4,
                                "slug": "documentation",
                                "groups": [
                                    3
                                ],
                                "name": "Documentation",
                                "priority": 2
                            },
                            {
                                "id": 13,
                                "slug": "issue-trackers",
                                "groups": [
                                    3,
                                    18
                                ],
                                "name": "Issue trackers",
                                "priority": 2
                            },
                            {
                                "id": 52,
                                "slug": "live-chat",
                                "groups": [
                                    4,
                                    16
                                ],
                                "name": "Live chat",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Zendesk.svg",
                        "website": "https://zendesk.com",
                        "pricing": [
                            "low"
                        ],
                        "cpe": null
                    },
                    "Twitter Ads": {
                        "name": "Twitter Ads",
                        "description": "Twitter Ads is an advertising platform for Twitter 'microblogging' system.",
                        "slug": "twitter-ads",
                        "categories": [
                            {
                                "id": 36,
                                "slug": "advertising",
                                "groups": [
                                    2
                                ],
                                "name": "Advertising",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Twitter.svg",
                        "website": "https://ads.twitter.com",
                        "pricing": [
                            "payg"
                        ],
                        "cpe": null
                    },
                    "Microsoft 365": {
                        "name": "Microsoft 365",
                        "description": "Microsoft 365 is a line of subscription services offered by Microsoft as part of the Microsoft Office product line.",
                        "slug": "microsoft-365",
                        "categories": [
                            {
                                "id": 30,
                                "slug": "webmail",
                                "groups": [
                                    4
                                ],
                                "name": "Webmail",
                                "priority": 2
                            },
                            {
                                "id": 75,
                                "slug": "email",
                                "groups": [
                                    4,
                                    2
                                ],
                                "name": "Email",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Microsoft 365.svg",
                        "website": "https://www.microsoft.com/microsoft-365",
                        "pricing": [],
                        "cpe": null
                    },
                    "jQuery": {
                        "name": "jQuery",
                        "description": "jQuery is a JavaScript library which is a free, open-source software designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animation, and Ajax.",
                        "slug": "jquery",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.7.1",
                        "icon": "jQuery.svg",
                        "website": "https://jquery.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery:*:*:*:*:*:*:*:*"
                    },
                    "HSTS": {
                        "name": "HSTS",
                        "description": "HTTP Strict Transport Security (HSTS) informs browsers that the site should only be accessed using HTTPS.",
                        "slug": "hsts",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "default.svg",
                        "website": "https://www.rfc-editor.org/rfc/rfc6797#section-6.1",
                        "pricing": [],
                        "cpe": null
                    },
                    "Google Tag Manager": {
                        "name": "Google Tag Manager",
                        "description": "Google Tag Manager is a tag management system (TMS) that allows you to quickly and easily update measurement codes and related code fragments collectively known as tags on your website or mobile app.",
                        "slug": "google-tag-manager",
                        "categories": [
                            {
                                "id": 42,
                                "slug": "tag-managers",
                                "groups": [
                                    8
                                ],
                                "name": "Tag managers",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Tag Manager.svg",
                        "website": "http://www.google.com/tagmanager",
                        "pricing": [],
                        "cpe": null
                    },
                    "Google Analytics": {
                        "name": "Google Analytics",
                        "description": "Google Analytics is a free web analytics service that tracks and reports website traffic.",
                        "slug": "google-analytics",
                        "categories": [
                            {
                                "id": 10,
                                "slug": "analytics",
                                "groups": [
                                    8
                                ],
                                "name": "Analytics",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Analytics.svg",
                        "website": "http://google.com/analytics",
                        "pricing": [],
                        "cpe": null
                    },
                    "Cloudflare Bot Management": {
                        "name": "Cloudflare Bot Management",
                        "description": "Cloudflare bot management solution identifies and mitigates automated traffic to protect websites from bad bots.",
                        "slug": "cloudflare-bot-management",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "https://www.cloudflare.com/en-gb/products/bot-management/",
                        "pricing": [],
                        "cpe": null
                    },
                    "Cloudflare": {
                        "name": "Cloudflare",
                        "description": "Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.",
                        "slug": "cloudflare",
                        "categories": [
                            {
                                "id": 31,
                                "slug": "cdn",
                                "groups": [
                                    7
                                ],
                                "name": "CDN",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "http://www.cloudflare.com",
                        "pricing": [],
                        "cpe": null
                    },
                    "Open Graph": {
                        "name": "Open Graph",
                        "description": "Open Graph is a protocol that is used to integrate any web page into the social graph.",
                        "slug": "open-graph",
                        "categories": [
                            {
                                "id": 19,
                                "slug": "miscellaneous",
                                "groups": [
                                    6
                                ],
                                "name": "Miscellaneous",
                                "priority": 10
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Open Graph.png",
                        "website": "https://ogp.me",
                        "pricing": [],
                        "cpe": null
                    }
                },
                "_detected_raw": [
                    {
                        "name": "PHP",
                        "description": "PHP is a general-purpose scripting language used for web development.",
                        "slug": "php",
                        "categories": [
                            {
                                "id": 27,
                                "slug": "programming-languages",
                                "groups": [
                                    9
                                ],
                                "name": "Programming languages",
                                "priority": 5
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "PHP.svg",
                        "website": "http://php.net",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:php:php:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "Varnish",
                        "description": "Varnish is a reverse caching proxy.",
                        "slug": "varnish",
                        "categories": [
                            {
                                "id": 23,
                                "slug": "caching",
                                "groups": [
                                    7
                                ],
                                "name": "Caching",
                                "priority": 7
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Varnish.svg",
                        "website": "http://www.varnish-cache.org",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:varnish-software:varnish_cache:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "jQuery UI",
                        "description": "jQuery UI is a collection of GUI widgets, animated visual effects, and themes implemented with jQuery, Cascading Style Sheets, and HTML.",
                        "slug": "jquery-ui",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.8.17",
                        "icon": "jQuery UI.svg",
                        "website": "http://jqueryui.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery_ui:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "Zendesk",
                        "description": "Zendesk is a cloud-based help desk management solution offering customizable tools to build customer service portal, knowledge base and online communities.",
                        "slug": "zendesk",
                        "categories": [
                            {
                                "id": 4,
                                "slug": "documentation",
                                "groups": [
                                    3
                                ],
                                "name": "Documentation",
                                "priority": 2
                            },
                            {
                                "id": 13,
                                "slug": "issue-trackers",
                                "groups": [
                                    3,
                                    18
                                ],
                                "name": "Issue trackers",
                                "priority": 2
                            },
                            {
                                "id": 52,
                                "slug": "live-chat",
                                "groups": [
                                    4,
                                    16
                                ],
                                "name": "Live chat",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Zendesk.svg",
                        "website": "https://zendesk.com",
                        "pricing": [
                            "low"
                        ],
                        "cpe": null
                    },
                    {
                        "name": "Twitter Ads",
                        "description": "Twitter Ads is an advertising platform for Twitter 'microblogging' system.",
                        "slug": "twitter-ads",
                        "categories": [
                            {
                                "id": 36,
                                "slug": "advertising",
                                "groups": [
                                    2
                                ],
                                "name": "Advertising",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Twitter.svg",
                        "website": "https://ads.twitter.com",
                        "pricing": [
                            "payg"
                        ],
                        "cpe": null
                    },
                    {
                        "name": "Microsoft 365",
                        "description": "Microsoft 365 is a line of subscription services offered by Microsoft as part of the Microsoft Office product line.",
                        "slug": "microsoft-365",
                        "categories": [
                            {
                                "id": 30,
                                "slug": "webmail",
                                "groups": [
                                    4
                                ],
                                "name": "Webmail",
                                "priority": 2
                            },
                            {
                                "id": 75,
                                "slug": "email",
                                "groups": [
                                    4,
                                    2
                                ],
                                "name": "Email",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Microsoft 365.svg",
                        "website": "https://www.microsoft.com/microsoft-365",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "jQuery",
                        "description": "jQuery is a JavaScript library which is a free, open-source software designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animation, and Ajax.",
                        "slug": "jquery",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.7.1",
                        "icon": "jQuery.svg",
                        "website": "https://jquery.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "HSTS",
                        "description": "HTTP Strict Transport Security (HSTS) informs browsers that the site should only be accessed using HTTPS.",
                        "slug": "hsts",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "default.svg",
                        "website": "https://www.rfc-editor.org/rfc/rfc6797#section-6.1",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Google Tag Manager",
                        "description": "Google Tag Manager is a tag management system (TMS) that allows you to quickly and easily update measurement codes and related code fragments collectively known as tags on your website or mobile app.",
                        "slug": "google-tag-manager",
                        "categories": [
                            {
                                "id": 42,
                                "slug": "tag-managers",
                                "groups": [
                                    8
                                ],
                                "name": "Tag managers",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Tag Manager.svg",
                        "website": "http://www.google.com/tagmanager",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Google Analytics",
                        "description": "Google Analytics is a free web analytics service that tracks and reports website traffic.",
                        "slug": "google-analytics",
                        "categories": [
                            {
                                "id": 10,
                                "slug": "analytics",
                                "groups": [
                                    8
                                ],
                                "name": "Analytics",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Analytics.svg",
                        "website": "http://google.com/analytics",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Cloudflare Bot Management",
                        "description": "Cloudflare bot management solution identifies and mitigates automated traffic to protect websites from bad bots.",
                        "slug": "cloudflare-bot-management",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "https://www.cloudflare.com/en-gb/products/bot-management/",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Cloudflare",
                        "description": "Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.",
                        "slug": "cloudflare",
                        "categories": [
                            {
                                "id": 31,
                                "slug": "cdn",
                                "groups": [
                                    7
                                ],
                                "name": "CDN",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "http://www.cloudflare.com",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Open Graph",
                        "description": "Open Graph is a protocol that is used to integrate any web page into the social graph.",
                        "slug": "open-graph",
                        "categories": [
                            {
                                "id": 19,
                                "slug": "miscellaneous",
                                "groups": [
                                    6
                                ],
                                "name": "Miscellaneous",
                                "priority": 10
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Open Graph.png",
                        "website": "https://ogp.me",
                        "pricing": [],
                        "cpe": null
                    }
                ],

@pmeenan
Copy link
Member

pmeenan commented May 25, 2023

It doesn't look like the "implies" mapping and if a detection was direct or implied are available.

@max-ostapenko
Copy link
Contributor

Besides the metrics available in the test results and crawl.pages table we now also have:

  • synced full metadata snapshot in wappalyzer.apps
SELECT COUNT(1) -- 71
FROM `httparchive.wappalyzer.apps`
WHERE 'JavaScript frameworks' IN UNNEST(categories)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants