From 00667195d6a5bae8bc2622577ff335b9b5947e7f Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Thu, 14 Sep 2023 05:16:50 +0900 Subject: [PATCH 1/3] update README --- README.md | 122 +++++++++++++++++++++++++----------------------------- 1 file changed, 57 insertions(+), 65 deletions(-) diff --git a/README.md b/README.md index 61f99e1f..84e6333f 100644 --- a/README.md +++ b/README.md @@ -10,53 +10,6 @@ It lets you exchange data among multiple languages like JSON. But it's faster and smaller. This package provides CPython bindings for reading and writing MessagePack data. - -## Very important notes for existing users - -### PyPI package name - -Package name on PyPI was changed from `msgpack-python` to `msgpack` from 0.5. - -When upgrading from msgpack-0.4 or earlier, do `pip uninstall msgpack-python` before -`pip install -U msgpack`. - - -### Compatibility with the old format - -You can use `use_bin_type=False` option to pack `bytes` -object into raw type in the old msgpack spec, instead of bin type in new msgpack spec. - -You can unpack old msgpack format using `raw=True` option. -It unpacks str (raw) type in msgpack into Python bytes. - -See note below for detail. - - -### Major breaking changes in msgpack 1.0 - -* Python 2 - - * The extension module does not support Python 2 anymore. - The pure Python implementation (`msgpack.fallback`) is used for Python 2. - -* Packer - - * `use_bin_type=True` by default. bytes are encoded in bin type in msgpack. - **If you are still using Python 2, you must use unicode for all string types.** - You can use `use_bin_type=False` to encode into old msgpack format. - * `encoding` option is removed. UTF-8 is used always. - -* Unpacker - - * `raw=False` by default. It assumes str types are valid UTF-8 string - and decode them to Python str (unicode) object. - * `encoding` option is removed. You can use `raw=True` to support old format. - * Default value of `max_buffer_size` is changed from 0 to 100 MiB. - * Default value of `strict_map_key` is changed to True to avoid hashdos. - You need to pass `strict_map_key=False` if you have data which contain map keys - which type is not bytes or str. - - ## Install ``` @@ -65,12 +18,9 @@ $ pip install msgpack ### Pure Python implementation -The extension module in msgpack (`msgpack._cmsgpack`) does not support -Python 2 and PyPy. - -But msgpack provides a pure Python implementation (`msgpack.fallback`) -for PyPy and Python 2. +The extension module in msgpack (`msgpack._cmsgpack`) does not support PyPy. +But msgpack provides a pure Python implementation (`msgpack.fallback`) for PyPy. ### Windows @@ -82,10 +32,6 @@ Without extension, using pure Python implementation on CPython runs slowly. ## How to use -NOTE: In examples below, I use `raw=False` and `use_bin_type=True` for users -using msgpack < 1.0. These options are default from msgpack 1.0 so you can omit them. - - ### One-shot pack & unpack Use `packb` for packing and `unpackb` for unpacking. @@ -97,16 +43,16 @@ msgpack provides `dumps` and `loads` as an alias for compatibility with ```pycon >>> import msgpack ->>> msgpack.packb([1, 2, 3], use_bin_type=True) +>>> msgpack.packb([1, 2, 3]) '\x93\x01\x02\x03' ->>> msgpack.unpackb(_, raw=False) +>>> msgpack.unpackb(_) [1, 2, 3] ``` `unpack` unpacks msgpack's array to Python's list, but can also unpack to tuple: ```pycon ->>> msgpack.unpackb(b'\x93\x01\x02\x03', use_list=False, raw=False) +>>> msgpack.unpackb(b'\x93\x01\x02\x03', use_list=False) (1, 2, 3) ``` @@ -127,11 +73,11 @@ from io import BytesIO buf = BytesIO() for i in range(100): - buf.write(msgpack.packb(i, use_bin_type=True)) + buf.write(msgpack.packb(i)) buf.seek(0) -unpacker = msgpack.Unpacker(buf, raw=False) +unpacker = msgpack.Unpacker(buf) for unpacked in unpacker: print(unpacked) ``` @@ -162,8 +108,8 @@ def encode_datetime(obj): return obj -packed_dict = msgpack.packb(useful_dict, default=encode_datetime, use_bin_type=True) -this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime, raw=False) +packed_dict = msgpack.packb(useful_dict, default=encode_datetime) +this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime) ``` `Unpacker`'s `object_hook` callback receives a dict; the @@ -191,8 +137,8 @@ It is also possible to pack/unpack custom data types using the **ext** type. ... return ExtType(code, data) ... >>> data = array.array('d', [1.2, 3.4]) ->>> packed = msgpack.packb(data, default=default, use_bin_type=True) ->>> unpacked = msgpack.unpackb(packed, ext_hook=ext_hook, raw=False) +>>> packed = msgpack.packb(data, default=default) +>>> unpacked = msgpack.unpackb(packed, ext_hook=ext_hook) >>> data == unpacked True ``` @@ -263,3 +209,49 @@ You can use `gc.disable()` when unpacking large message. List is the default sequence type of Python. But tuple is lighter than list. You can use `use_list=False` while unpacking when performance is important. + + +## Very important notes for old version users + +### PyPI package name + +Package name on PyPI was changed from `msgpack-python` to `msgpack` from 0.5. + +When upgrading from msgpack-0.4 or earlier, do `pip uninstall msgpack-python` before +`pip install -U msgpack`. + + +### Compatibility with the old format + +You can use `use_bin_type=False` option to pack `bytes` +object into raw type in the old msgpack spec, instead of bin type in new msgpack spec. + +You can unpack old msgpack format using `raw=True` option. +It unpacks str (raw) type in msgpack into Python bytes. + +See note below for detail. + + +### Major breaking changes in msgpack 1.0 + +* Python 2 + + * The extension module does not support Python 2 anymore. + The pure Python implementation (`msgpack.fallback`) is used for Python 2. + +* Packer + + * `use_bin_type=True` by default. bytes are encoded in bin type in msgpack. + **If you are still using Python 2, you must use unicode for all string types.** + You can use `use_bin_type=False` to encode into old msgpack format. + * `encoding` option is removed. UTF-8 is used always. + +* Unpacker + + * `raw=False` by default. It assumes str types are valid UTF-8 string + and decode them to Python str (unicode) object. + * `encoding` option is removed. You can use `raw=True` to support old format. + * Default value of `max_buffer_size` is changed from 0 to 100 MiB. + * Default value of `strict_map_key` is changed to True to avoid hashdos. + You need to pass `strict_map_key=False` if you have data which contain map keys + which type is not bytes or str. From 93e2996ef403e9ae68021eb7976b9101b190411b Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Thu, 14 Sep 2023 05:33:48 +0900 Subject: [PATCH 2/3] fixup --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 84e6333f..ec51cc2e 100644 --- a/README.md +++ b/README.md @@ -156,7 +156,7 @@ in a map, can be unpacked or skipped individually. ## Notes -### string and binary type +### string and binary type in old msgpack spec Early versions of msgpack didn't distinguish string and binary types. The type for representing both string and binary types was named **raw**. @@ -229,7 +229,7 @@ object into raw type in the old msgpack spec, instead of bin type in new msgpack You can unpack old msgpack format using `raw=True` option. It unpacks str (raw) type in msgpack into Python bytes. -See note below for detail. +See [string and binary type in old msgpack spec](#string-and-binary-type-in-old-msgpack-spec) for detail. ### Major breaking changes in msgpack 1.0 From 584955930d82bd848326e0dae39cac13eadcf1bb Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Thu, 14 Sep 2023 05:42:00 +0900 Subject: [PATCH 3/3] update more --- README.md | 36 ++++++++++++++---------------------- 1 file changed, 14 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index ec51cc2e..d3247bdf 100644 --- a/README.md +++ b/README.md @@ -211,9 +211,9 @@ But tuple is lighter than list. You can use `use_list=False` while unpacking when performance is important. -## Very important notes for old version users +## Major breaking changes in the history -### PyPI package name +### msgpack 0.5 Package name on PyPI was changed from `msgpack-python` to `msgpack` from 0.5. @@ -221,37 +221,29 @@ When upgrading from msgpack-0.4 or earlier, do `pip uninstall msgpack-python` be `pip install -U msgpack`. -### Compatibility with the old format +### msgpack 1.0 -You can use `use_bin_type=False` option to pack `bytes` -object into raw type in the old msgpack spec, instead of bin type in new msgpack spec. - -You can unpack old msgpack format using `raw=True` option. -It unpacks str (raw) type in msgpack into Python bytes. - -See [string and binary type in old msgpack spec](#string-and-binary-type-in-old-msgpack-spec) for detail. - - -### Major breaking changes in msgpack 1.0 - -* Python 2 +* Python 2 support * The extension module does not support Python 2 anymore. The pure Python implementation (`msgpack.fallback`) is used for Python 2. + + * msgpack 1.0.6 drops official support of Python 2.7, as pip and + GitHub Action (setup-python) no longer support Python 2.7. * Packer - * `use_bin_type=True` by default. bytes are encoded in bin type in msgpack. - **If you are still using Python 2, you must use unicode for all string types.** - You can use `use_bin_type=False` to encode into old msgpack format. - * `encoding` option is removed. UTF-8 is used always. + * Packer uses `use_bin_type=True` by default. + Bytes are encoded in bin type in msgpack. + * The `encoding` option is removed. UTF-8 is used always. * Unpacker - * `raw=False` by default. It assumes str types are valid UTF-8 string + * Unpacker uses `raw=False` by default. It assumes str types are valid UTF-8 string and decode them to Python str (unicode) object. - * `encoding` option is removed. You can use `raw=True` to support old format. - * Default value of `max_buffer_size` is changed from 0 to 100 MiB. + * `encoding` option is removed. You can use `raw=True` to support old format (e.g. unpack into bytes, not str). + * Default value of `max_buffer_size` is changed from 0 to 100 MiB to avoid DoS attack. + You need to pass `max_buffer_size=0` if you have large but safe data. * Default value of `strict_map_key` is changed to True to avoid hashdos. You need to pass `strict_map_key=False` if you have data which contain map keys which type is not bytes or str.