Skip to content

Commit

Permalink
gh-125063: marshal: Add version 5, improve documentation (GH-126829)
Browse files Browse the repository at this point in the history
* Document that slices can be marshalled
* Deduplicate and organize the list of supported types
  in docs
* Organize the type code list in marshal.c, to make
  it more obvious that this is a versioned format
* Back-fill some historical info

Co-authored-by: Michael Droettboom <mdboom@gmail.com>
  • Loading branch information
encukou and mdboom authored Nov 15, 2024
1 parent e174869 commit d00f7b1
Show file tree
Hide file tree
Showing 7 changed files with 110 additions and 41 deletions.
9 changes: 5 additions & 4 deletions Doc/c-api/marshal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ binary mode.

Numeric values are stored with the least significant byte first.

The module supports two versions of the data format: version 0 is the
historical version, version 1 shares interned strings in the file, and upon
unmarshalling. Version 2 uses a binary format for floating-point numbers.
``Py_MARSHAL_VERSION`` indicates the current file format (currently 2).
The module supports several versions of the data format; see
the :py:mod:`Python module documentation <marshal>` for details.

.. c:macro:: Py_MARSHAL_VERSION
The current format version. See :py:data:`marshal.version`.

.. c:function:: void PyMarshal_WriteLongToFile(long value, FILE *file, int version)
Expand Down
64 changes: 46 additions & 18 deletions Doc/library/marshal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,23 +38,39 @@ supports a substantially wider range of objects than marshal.
maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source.

There are functions that read/write files as well as functions operating on
bytes-like objects.

.. index:: object; code, code object

Not all Python object types are supported; in general, only objects whose value
is independent from a particular invocation of Python can be written and read by
this module. The following types are supported: booleans, integers, floating-point
numbers, complex numbers, strings, bytes, bytearrays, tuples, lists, sets,
frozensets, dictionaries, and code objects (if *allow_code* is true),
where it should be understood that
tuples, lists, sets, frozensets and dictionaries are only supported as long as
the values contained therein are themselves supported. The
singletons :const:`None`, :const:`Ellipsis` and :exc:`StopIteration` can also be
marshalled and unmarshalled.
For format *version* lower than 3, recursive lists, sets and dictionaries cannot
be written (see below).
this module. The following types are supported:

* Numeric types: :class:`int`, :class:`bool`, :class:`float`, :class:`complex`.
* Strings (:class:`str`) and :class:`bytes`.
:term:`Bytes-like objects <bytes-like object>` like :class:`bytearray` are
marshalled as :class:`!bytes`.
* Containers: :class:`tuple`, :class:`list`, :class:`set`, :class:`frozenset`,
and (since :data:`version` 5), :class:`slice`.
It should be understood that these are supported only if the values contained
therein are themselves supported.
Recursive containers are supported since :data:`version` 3.
* The singletons :const:`None`, :const:`Ellipsis` and :exc:`StopIteration`.
* :class:`code` objects, if *allow_code* is true. See note above about
version dependence.

.. versionchanged:: 3.4

* Added format version 3, which supports marshalling recursive lists, sets
and dictionaries.
* Added format version 4, which supports efficient representations
of short strings.

.. versionchanged:: next

Added format version 5, which allows marshalling slices.

There are functions that read/write files as well as functions operating on
bytes-like objects.

The module defines these functions:

Expand Down Expand Up @@ -140,11 +156,24 @@ In addition, the following constants are defined:

.. data:: version

Indicates the format that the module uses. Version 0 is the historical
format, version 1 shares interned strings and version 2 uses a binary format
for floating-point numbers.
Version 3 adds support for object instancing and recursion.
The current version is 4.
Indicates the format that the module uses.
Version 0 is the historical first version; subsequent versions
add new features.
Generally, a new version becomes the default when it is introduced.

======= =============== ====================================================
Version Available since New features
======= =============== ====================================================
1 Python 2.4 Sharing interned strings
------- --------------- ----------------------------------------------------
2 Python 2.5 Binary representation of floats
------- --------------- ----------------------------------------------------
3 Python 3.4 Support for object instancing and recursion
------- --------------- ----------------------------------------------------
4 Python 3.4 Efficient representation of short strings
------- --------------- ----------------------------------------------------
5 Python 3.14 Support for :class:`slice` objects
======= =============== ====================================================


.. rubric:: Footnotes
Expand All @@ -154,4 +183,3 @@ In addition, the following constants are defined:
around in a self-contained form. Strictly speaking, "to marshal" means to
convert some data from internal to external form (in an RPC buffer for instance)
and "unmarshalling" for the reverse process.
2 changes: 1 addition & 1 deletion Include/marshal.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ PyAPI_FUNC(PyObject *) PyMarshal_ReadObjectFromString(const char *,
Py_ssize_t);
PyAPI_FUNC(PyObject *) PyMarshal_WriteObjectToString(PyObject *, int);

#define Py_MARSHAL_VERSION 4
#define Py_MARSHAL_VERSION 5

PyAPI_FUNC(long) PyMarshal_ReadLongFromFile(FILE *);
PyAPI_FUNC(int) PyMarshal_ReadShortFromFile(FILE *);
Expand Down
27 changes: 24 additions & 3 deletions Lib/test/test_marshal.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ def helper(self, sample, *extra):
finally:
os_helper.unlink(os_helper.TESTFN)

def omit_last_byte(data):
"""return data[:-1]"""
# This file's code is used in CompatibilityTestCase,
# but slices need marshal version 5.
# Avoid the slice literal.
return data[slice(0, -1)]

class IntTestCase(unittest.TestCase, HelperMixin):
def test_ints(self):
# Test a range of Python ints larger than the machine word size.
Expand Down Expand Up @@ -241,7 +248,8 @@ def test_bug_5888452(self):
def test_patch_873224(self):
self.assertRaises(Exception, marshal.loads, b'0')
self.assertRaises(Exception, marshal.loads, b'f')
self.assertRaises(Exception, marshal.loads, marshal.dumps(2**65)[:-1])
self.assertRaises(Exception, marshal.loads,
omit_last_byte(marshal.dumps(2**65)))

def test_version_argument(self):
# Python 2.4.0 crashes for any call to marshal.dumps(x, y)
Expand Down Expand Up @@ -594,6 +602,19 @@ def testNoIntern(self):
s2 = sys.intern(s)
self.assertNotEqual(id(s2), id(s))

class SliceTestCase(unittest.TestCase, HelperMixin):
def test_slice(self):
for obj in (
slice(None), slice(1), slice(1, 2), slice(1, 2, 3),
slice({'set'}, ('tuple', {'with': 'dict'}, ), self.helper.__code__)
):
with self.subTest(obj=str(obj)):
self.helper(obj)

for version in range(4):
with self.assertRaises(ValueError):
marshal.dumps(obj, version)

@support.cpython_only
@unittest.skipUnless(_testcapi, 'requires _testcapi')
class CAPI_TestCase(unittest.TestCase, HelperMixin):
Expand Down Expand Up @@ -654,7 +675,7 @@ def test_read_last_object_from_file(self):
self.assertEqual(r, obj)

with open(os_helper.TESTFN, 'wb') as f:
f.write(data[:1])
f.write(omit_last_byte(data))
with self.assertRaises(EOFError):
_testcapi.pymarshal_read_last_object_from_file(os_helper.TESTFN)
os_helper.unlink(os_helper.TESTFN)
Expand All @@ -671,7 +692,7 @@ def test_read_object_from_file(self):
self.assertEqual(p, len(data))

with open(os_helper.TESTFN, 'wb') as f:
f.write(data[:1])
f.write(omit_last_byte(data))
with self.assertRaises(EOFError):
_testcapi.pymarshal_read_object_from_file(os_helper.TESTFN)
os_helper.unlink(os_helper.TESTFN)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
:mod:`marshal` now supports :class:`slice` objects. The marshal format
version was increased to 5.
1 change: 1 addition & 0 deletions Programs/_freeze_module.c
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ compile_and_marshal(const char *name, const char *text)
return NULL;
}

assert(Py_MARSHAL_VERSION >= 5);
PyObject *marshalled = PyMarshal_WriteObjectToString(code, Py_MARSHAL_VERSION);
Py_CLEAR(code);
if (marshalled == NULL) {
Expand Down
46 changes: 31 additions & 15 deletions Python/marshal.c
Original file line number Diff line number Diff line change
Expand Up @@ -50,41 +50,52 @@ module marshal
# define MAX_MARSHAL_STACK_DEPTH 2000
#endif

/* Supported types */
#define TYPE_NULL '0'
#define TYPE_NONE 'N'
#define TYPE_FALSE 'F'
#define TYPE_TRUE 'T'
#define TYPE_STOPITER 'S'
#define TYPE_ELLIPSIS '.'
#define TYPE_INT 'i'
/* TYPE_INT64 is not generated anymore.
Supported for backward compatibility only. */
#define TYPE_INT64 'I'
#define TYPE_FLOAT 'f'
#define TYPE_BINARY_FLOAT 'g'
#define TYPE_COMPLEX 'x'
#define TYPE_BINARY_COMPLEX 'y'
#define TYPE_LONG 'l'
#define TYPE_STRING 's'
#define TYPE_INTERNED 't'
#define TYPE_REF 'r'
#define TYPE_TUPLE '('
#define TYPE_BINARY_FLOAT 'g' // Version 0 uses TYPE_FLOAT instead.
#define TYPE_BINARY_COMPLEX 'y' // Version 0 uses TYPE_COMPLEX instead.
#define TYPE_LONG 'l' // See also TYPE_INT.
#define TYPE_STRING 's' // Bytes. (Name comes from Python 2.)
#define TYPE_TUPLE '(' // See also TYPE_SMALL_TUPLE.
#define TYPE_LIST '['
#define TYPE_DICT '{'
#define TYPE_CODE 'c'
#define TYPE_UNICODE 'u'
#define TYPE_UNKNOWN '?'
// added in version 2:
#define TYPE_SET '<'
#define TYPE_FROZENSET '>'
// added in version 5:
#define TYPE_SLICE ':'
#define FLAG_REF '\x80' /* with a type, add obj to index */
// Remember to update the version and documentation when adding new types.

/* Special cases for unicode strings (added in version 4) */
#define TYPE_INTERNED 't' // Version 1+
#define TYPE_ASCII 'a'
#define TYPE_ASCII_INTERNED 'A'
#define TYPE_SMALL_TUPLE ')'
#define TYPE_SHORT_ASCII 'z'
#define TYPE_SHORT_ASCII_INTERNED 'Z'

/* Special cases for small objects */
#define TYPE_INT 'i' // All versions. 32-bit encoding.
#define TYPE_SMALL_TUPLE ')' // Version 4+

/* Supported for backwards compatibility */
#define TYPE_COMPLEX 'x' // Generated for version 0 only.
#define TYPE_FLOAT 'f' // Generated for version 0 only.
#define TYPE_INT64 'I' // Not generated any more.

/* References (added in version 3) */
#define TYPE_REF 'r'
#define FLAG_REF '\x80' /* with a type, add obj to index */


// Error codes:
#define WFERR_OK 0
#define WFERR_UNMARSHALLABLE 1
#define WFERR_NESTEDTOODEEP 2
Expand Down Expand Up @@ -615,6 +626,11 @@ w_complex_object(PyObject *v, char flag, WFILE *p)
PyBuffer_Release(&view);
}
else if (PySlice_Check(v)) {
if (p->version < 5) {
w_byte(TYPE_UNKNOWN, p);
p->error = WFERR_UNMARSHALLABLE;
return;
}
PySliceObject *slice = (PySliceObject *)v;
W_TYPE(TYPE_SLICE, p);
w_object(slice->start, p);
Expand Down

0 comments on commit d00f7b1

Please sign in to comment.