Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation #1

Merged
merged 28 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
788dbe2
Initial implementation and docs
Gallaecio Jul 1, 2024
612dd72
Remove the AI mention from the docs
Gallaecio Jul 2, 2024
83d0200
Support formaction and formmethod, and raise NotImplementedError for …
Gallaecio Jul 2, 2024
564f5fc
docs/conf.py: remove leftover
Gallaecio Jul 2, 2024
d4d0b61
Use from None to hide internal exception
Gallaecio Jul 2, 2024
c227a4d
Solve issues reported by CI
Gallaecio Jul 2, 2024
29d1f6f
Solve additional CI issues
Gallaecio Jul 2, 2024
ec0a7c9
Add doctest to GitHub Actions
Gallaecio Jul 2, 2024
3475f17
Complete test coverage
Gallaecio Jul 2, 2024
ba3eb10
Install pytest for mypy
Gallaecio Jul 2, 2024
991ab50
Run pre-commit
Gallaecio Jul 2, 2024
e62c738
Allow method override
Gallaecio Jul 2, 2024
f37e269
Update docs/usage.rst
Gallaecio Jul 3, 2024
51e733f
request_from_form → form2request
Gallaecio Jul 3, 2024
0d0ded7
Add parsel support
Gallaecio Jul 3, 2024
9830c9f
Only raise NotImplementedError for the dialog method
Gallaecio Jul 3, 2024
0337a40
Do not make form and data position-only
Gallaecio Jul 3, 2024
e098d54
Support text/plain enctype, only raise NotImplementedError for mutipa…
Gallaecio Jul 3, 2024
1907424
Remove cast usages
Gallaecio Jul 3, 2024
259ddde
Allow overriding enctype
Gallaecio Jul 3, 2024
e54ae78
Shorten attribute override docs
Gallaecio Jul 3, 2024
6974d95
Minor refactoring
Gallaecio Jul 3, 2024
ba5da92
Update exception messages and test expectations after adding multipar…
Gallaecio Jul 3, 2024
4d6e25d
Cover method and enctype in the docstring
Gallaecio Jul 3, 2024
f9ec401
Improve the docstring
Gallaecio Jul 3, 2024
864a7e3
Minor doc improvements
Gallaecio Jul 12, 2024
aa5ec68
Fix typo (of → or)
Gallaecio Jul 12, 2024
689d7ee
Clarify a test scenario
Gallaecio Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .bandit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
skips:
- B101 # assert_used, needed for mypy
exclude_dirs: ['tests']
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[report]
exclude_lines =
if TYPE_CHECKING:
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
[flake8]
extend-select = TC, TC1
ignore =
max-line-length = 88
per-file-ignores =
# F401: Imported but unused
form2request/__init__.py:F401
# D100-D104: Missing docstring
docs/conf.py:D100
tests/__init__.py:D104
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
fail-fast: false
matrix:
python-version: ['3.12']
tox-job: ["pre-commit", "mypy", "docs", "twinecheck"]
tox-job: ["pre-commit", "mypy", "docs", "doctest", "twinecheck"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
/.coverage
/coverage.xml
/dist/
/.tox/
12 changes: 12 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,20 @@ repos:
- flake8-debugger
- flake8-docstrings
- flake8-string-format
- flake8-type-checking
- repo: https://github.com/asottile/pyupgrade
rev: v3.16.0
hooks:
- id: pyupgrade
args: [--py38-plus]
- repo: https://github.com/pycqa/bandit
rev: 1.7.9
hooks:
- id: bandit
args: [-r, -c, .bandit.yml]
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.18.0
hooks:
- id: blacken-docs
additional_dependencies:
- black==24.4.2
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ form2request

.. description starts

``form2request`` is an AI-powered Python 3.8+ library to build HTTP requests
out of HTML forms.
``form2request`` is a Python 3.8+ library to build HTTP requests out of HTML
forms.

.. description ends

Expand Down
Binary file removed dist/form2request-0.0.0.tar.gz
Binary file not shown.
6 changes: 5 additions & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,8 @@
API reference
=============

.. autofunction:: form2request.form2request

.. autoclass:: form2request.Request
:members:
:undoc-members:
17 changes: 15 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,24 @@

html_theme = "sphinx_rtd_theme"

autodoc_member_order = "groupwise"

intersphinx_disabled_reftypes = []
intersphinx_mapping = {
"lxml": ("https://lxml.de/apidoc/", None),
"parsel": ("https://parsel.readthedocs.io/en/stable", None),
"python": ("https://docs.python.org/3", None),
"scrapy": ("https://docs.scrapy.org/en/latest", None),
}

nitpick_ignore = [
*(
("py:class", cls)
for cls in (
# https://github.com/sphinx-doc/sphinx/issues/11225
"FormdataType",
"FormElement",
"HtmlElement",
"Selector",
"SelectorList",
)
),
]
219 changes: 218 additions & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,221 @@
Usage
=====

:ref:`Given an HTML form <form>`:

.. _parsel-example:

>>> from parsel import Selector
>>> html = b"""<form><input type="hidden" name="foo" value="bar" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

You can use :func:`~form2request.form2request` to generate form submission
request data:

>>> from form2request import form2request
>>> req = form2request(form)
>>> req
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

:func:`~form2request.form2request` does not make requests, but you can use its
output to build requests with any HTTP client software, e.g. with the requests_
library:

.. _requests: https://requests.readthedocs.io/en/latest/

.. _requests-example:

>>> import requests
>>> requests.request(req.method, req.url, headers=req.headers, data=req.body) # doctest: +SKIP
<Response [200]>

:func:`~form2request.form2request` supports :ref:`user-defined form data
<data>`, :ref:`choosing a specific submit button (or none) <click>`, and
:ref:`overriding form attributes <override>`.


.. _form:

Getting a form
==============

:func:`~form2request.form2request` requires an HTML form object. You can get
one using :doc:`parsel <parsel:index>`, as :ref:`seen above <parsel-example>`,
or you can use :doc:`lxml <lxml:index>`:

.. _fromstring-example:

>>> from lxml.html import fromstring
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

If you use a library or framework based on :doc:`parsel <parsel:index>` or
:doc:`lxml <lxml:index>`, chances are they also let you get a form object. For
example, when using a :doc:`Scrapy <scrapy:index>` response:

>>> from scrapy.http import TextResponse
>>> response = TextResponse("https://example.com", body=html)
>>> form = response.css("form")

Here are some examples of XPath expressions that can be useful to get a form
using parsel’s :meth:`Selector.xpath <parsel.selector.Selector.xpath>` or
lxml’s :meth:`HtmlElement.xpath <lxml.html.HtmlElement.xpath>`:

- To find a form by one of its attributes, such as ``id`` or ``name``, use
``//form[@<attribute>="<value>"]``. For example, to find ``<form id="foo"
…``, use ``//form[@id="foo"]``.

When using :meth:`Selector.css <parsel.selector.Selector.css>`, ``#<id>``
(e.g. ``#foo``) finds by ``id``, and ``[<attribute>="<value>"]`` (e.g.
``[name=foo]`` or ``[name="foo bar"]``) finds by any other attribute.

- To find a form by index, by order of appearance in the HTML code, use
``(//form)[n]``, where ``n`` is a 1-based index. For example, to find the
2nd form, use ``(//form)[2]``.

If you prefer, you could use the XPath of an element inside the form, and then
visit parent elements until you reach the form element. For example:

.. code-block:: python

element = root.xpath('//input[@name="zip_code"]')[0]
while True:
if element.tag == "form":
break
element = element.getparent()
form = element


.. _data:

Setting form data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the docs and the explanations, great work! It's something which is currently missing in Scrapy's FormRequest docs.

=================

While there are forms made entirely of hidden fields, like :ref:`the one above
<fromstring-example>`, most often you will work with forms that expect
user-defined data:

>>> html = b"""<form><input type="text" name="foo" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

Use the ``data`` parameter of :func:`~form2request.form2request`, to define
the corresponding data:

>>> form2request(form, {"foo": "bar"})
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

You may sometimes find forms where more than one field has the same ``name``
attribute:

>>> html = b"""<form><input type="text" name="foo" /><input type="text" name="foo" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

To specify values for all same-name fields, instead of a dictionary, use an
iterable of key-value tuples:

>>> form2request(form, (("foo", "bar"), ("foo", "baz")))
Request(url='https://example.com?foo=bar&foo=baz', method='GET', headers=[], body=b'')

.. _remove-data:

Sometimes, you might want to prevent a value from a field from being included
in the generated request data. For example, because the field is removed or
disabled through JavaScript, or because the field or a parent element has the
``disabled`` attribute (currently not supported by form2request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL


>>> html = b"""<form><input name="foo" value="bar" disabled /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

To remove a field value, set it to ``None``:

>>> form2request(form, {"foo": None})
Request(url='https://example.com', method='GET', headers=[], body=b'')


.. _click:

Choosing a submit button
========================

When an HTML form is submitted, the way form submission is triggered has an
impact on the resulting request data.

Given a submit button with ``name`` and ``value`` attributes:

>>> html = b"""<form><input type="submit" name="foo" value="bar" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

If you submit the form by clicking that button, those attributes are included
in the request data, which is what :func:`~form2request.form2request` does
by default:

>>> form2request(form)
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

However, sometimes it is possible to submit a form without clicking a submit
button, even when there is such a button. In such cases, the button data should
not be part of the request data. For such cases, set ``click`` to ``False``:

>>> form2request(form, click=False)
Request(url='https://example.com', method='GET', headers=[], body=b'')

You may also find forms with more than one submit button:

>>> html = b"""<form><input type="submit" name="foo" value="bar" /><input type="submit" name="foo" value="baz" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

By default, :func:`~form2request.form2request` clicks the first submit button:

>>> form2request(form)
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

To change that, set ``click`` to the element that should be clicked:

>>> submit_baz = form.css("[value=baz]")
>>> form2request(form, click=submit_baz)
Request(url='https://example.com?foo=baz', method='GET', headers=[], body=b'')


.. _override:

Overriding form attributes
==========================

You can override the method_ and enctype_ attributes of a form:

.. _enctype: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/form#enctype
.. _method: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/form#method

>>> form2request(form, method="POST", enctype="text/plain")
Request(url='https://example.com', method='POST', headers=[('Content-Type', 'text/plain')], body=b'foo=bar')


.. _request:

Using request data
==================

The output of :func:`~form2request.form2request`,
:class:`~form2request.Request`, is a simple request data container:

>>> req = form2request(form)
>>> req
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

While :func:`~form2request.form2request` does not make requests, you can use
its output request data to build an actual request with any HTTP client
software, like the requests_ library (see an example :ref:`above
<requests-example>`) or the :doc:`Scrapy <scrapy:index>` web scraping
framework:

.. _Scrapy: https://docs.scrapy.org/en/latest/

>>> from scrapy import Request
>>> Request(req.url, method=req.method, headers=req.headers, body=req.body)
<GET https://example.com?foo=bar>
2 changes: 2 additions & 0 deletions form2request/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
"""Build HTTP requests out of HTML forms."""

from ._base import Request, form2request
Loading
Loading