Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new api #116

Merged
merged 3 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ jobs:
- name: Install checkers
run: |
python -mpip install --upgrade pip
python -mpip install black flake8
python -mpip install black flake8 mypy types-PyYaml
- name: flake
run: flake8 .
- name: black
run: black --check --diff --color --quiet .
- name: mypy
run: mypy

# REPLACE BY: job which python -mbuild, and uploads the sdist and wheel to artifacts
# build is not binary so can just build the one using whatever python version
Expand Down
158 changes: 83 additions & 75 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,119 +1,127 @@
uap-python
==========

A python implementation of the UA Parser (https://github.com/ua-parser,
formerly https://github.com/tobie/ua-parser)
Official python implementation of the `User Agent String
Parser <https://github.com/ua-parser>`_ project.

Build Status
------------

.. image:: https://github.com/ua-parser/uap-python/actions/workflows/ci.yml/badge.svg
:alt: CI on the master branch


Installing
----------

Install via pip
~~~~~~~~~~~~~~~

Just run:
Just add ``ua-parser`` to your project's dependencies, or run

.. code-block:: sh

$ pip install ua-parser

Manual install
~~~~~~~~~~~~~~

In the top-level directory run:

.. code-block:: sh

$ python setup.py install

Change Log
---------------
Because this repo is mostly a python wrapper for the User Agent String Parser repo (https://github.com/ua-parser/uap-core), the changes made to this repo are best described by the update diffs in that project. Please see the diffs for this submodule (https://github.com/ua-parser/uap-core/releases) for a list of what has changed between versions of this package.
to install in the current environment.

Getting Started
---------------

Retrieve data on a user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Retrieve all data on a user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

>>> from ua_parser import user_agent_parser
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> from ua_parser import parse
>>> ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'
>>> parsed_string = user_agent_parser.Parse(ua_string)
>>> pp.pprint(parsed_string)
{ 'device': {'brand': 'Apple', 'family': 'Mac', 'model': 'Mac'},
'os': { 'family': 'Mac OS X',
'major': '10',
'minor': '9',
'patch': '4',
'patch_minor': None},
'string': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 '
'Safari/537.36',
'user_agent': { 'family': 'Chrome',
'major': '41',
'minor': '0',
'patch': '2272'}}

Extract browser data from user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> parse(ua_string) # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
ParseResult(user_agent=UserAgent(family='Chrome',
major='41',
minor='0',
patch='2272',
patch_minor='104'),
os=OS(family='Mac OS X',
major='10',
minor='9',
patch='4',
patch_minor=None),
device=Device(family='Mac',
brand='Apple',
model='Mac'),
string='Mozilla/5.0 (Macintosh; Intel Mac OS...

Any datum not found in the user agent string is set to ``None``::

>>> parse("")
ParseResult(user_agent=None, os=None, device=None, string='')

Extract only browser data from user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

>>> from ua_parser import user_agent_parser
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> from ua_parser import parse_user_agent
>>> ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'
>>> parsed_string = user_agent_parser.ParseUserAgent(ua_string)
>>> pp.pprint(parsed_string)
{'family': 'Chrome', 'major': '41', 'minor': '0', 'patch': '2272'}
>>> parse_user_agent(ua_string)
UserAgent(family='Chrome', major='41', minor='0', patch='2272', patch_minor='104')

..
For specific domains, a match failure just returns ``None``::

⚠️Before 0.15, the convenience parsers (``ParseUserAgent``,
``ParseOs``, and ``ParseDevice``) were not cached, which could
result in degraded performances when parsing large amounts of
identical user-agents (which might occur for real-world datasets).

For these versions (up to 0.10 included), prefer using ``Parse``
and extracting the sub-component you need from the resulting
dictionary.
>>> parse_user_agent("")

Extract OS information from user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

>>> from ua_parser import user_agent_parser
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> from ua_parser import parse_os
>>> ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'
>>> parsed_string = user_agent_parser.ParseOS(ua_string)
>>> pp.pprint(parsed_string)
{ 'family': 'Mac OS X',
'major': '10',
'minor': '9',
'patch': '4',
'patch_minor': None}

Extract Device information from user-agent string
>>> parse_os(ua_string)
OS(family='Mac OS X', major='10', minor='9', patch='4', patch_minor=None)

Extract device information from user-agent string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

>>> from ua_parser import user_agent_parser
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> from ua_parser import parse_device
>>> ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'
>>> parsed_string = user_agent_parser.ParseDevice(ua_string)
>>> pp.pprint(parsed_string)
{'brand': 'Apple', 'family': 'Mac', 'model': 'Mac'}
>>> parse_device(ua_string)
Device(family='Mac', brand='Apple', model='Mac')

Parser
~~~~~~

Parsers expose the same functions (``parse``, ``parse_user_agent``,
``parse_os``, and ``parse_device``) as the top-level of the package,
however these are all *utility* methods.

The actual protocol of parsers, and the one method which must be
implemented / overridden is::

def __call__(self, str, Components, /) -> ParseResult:

It's similar to but more flexible than ``parse``:

- The ``str`` is the user agent string.
- The ``Components`` is a hint, through which the caller requests the
domain (component) they are looking for, any combination of
``Components.USER_AGENT``, ``Components.OS``, and
``Components.DEVICE``. ``Domains.ALL`` exists as a convenience alias
for the combination of all three.

The parser *must* return at least the requested information, but if
that's more convenient or no more expensive it *can* return more.
- The ``ParseResult`` is similar to ``CompleteParseResult``, except
all the attributes are ``Optional`` and it has a ``components:
Components`` attribute which specifies whether a component was never
requested (its value for the user agent string is unknown) or it has
been requested but could not be resolved (no match was found for the
user agent).

``ParseResult.complete()`` convert to a ``CompleteParseResult`` if
all the components are set, and raise an exception otherwise. If
some of the components are set to ``None``, they'll be swapped for a
default value.

Calling the parser directly is part of the public API. One of the
advantage is that it does not return default values, as such it allows
more easily differentiating between a non-match (= ``None``) and a
default fallback (``family = "Other"``).
55 changes: 53 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ build-backend = "setuptools.build_meta"
[project]
name = "ua-parser"
description = "Python port of Browserscope's user agent parser"
version = "1.0.0a"
version = "1.0.0a1"
readme = "README.rst"
requires-python = ">=3.8"
dependencies = []
optional-dependencies = { yaml = ["PyYaml"] }
optional-dependencies = { yaml = ["PyYaml"], re2 = ["google-re2"] }

license = {text = "Apache 2.0"}
urls = {repository = "https://github.com/ua-parser/uap-python"}
Expand Down Expand Up @@ -42,3 +42,54 @@ classifiers = [
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy"
]

[tool.mypy]
python_version = "3.8"
files = "src,tests"

# can't use strict because it's only global

# these two are global
warn_unused_configs = true
warn_redundant_casts = true

# these can be overridden (maybe?)
strict_equality = true
strict_concatenate = true
check_untyped_defs = true
disallow_subclassing_any = true
disallow_untyped_decorators = true
disallow_any_generics = true
disallow_untyped_calls = true
disallow_incomplete_defs = true
disallow_untyped_defs = true
no_implicit_reexport = true
warn_return_any = true

[[tool.mypy.overrides]]
module = "ua_parser.user_agent_parser"

#check_untyped_defs = false
disallow_untyped_calls = false
#disallow_incomplete_defs = false
disallow_untyped_defs = false

[[tool.mypy.overrides]]
module = [
"test_core",
"test_caches",
"test_parsers_basics",
]

#check_untyped_defs = false
#disallow_untyped_calls = false
#disallow_incomplete_defs = false
disallow_untyped_defs = false

[[tool.mypy.overrides]]
module = "test_legacy"

#check_untyped_defs = false
disallow_untyped_calls = false
#disallow_incomplete_defs = false
disallow_untyped_defs = false
Loading
Loading