BIP: 276 Layer: Applications Title: Scheme for encoding typed bitcoin data Author: Roger Taylor <roger.taylor.email@gmail.com> Status: Draft Type: Standards Track Created: 2019-10-11 License: PD
This BIP proposes a scheme for encoding typed bitcoin related data in a user friendly way.
In the past, due to the constraints placed on the Bitcoin protocol, the ways it could be used were artificially limited and in most cases it was enough to share addresses, or to copy and paste them where needed. Because addresses were relatively short, and it was possible that a user might choose to type them in manually, base58 was a useful choice.
With the restoration of the protocol, and the wider variety of possibilities like payment destinations that are scripts which lack the ability to be summarised by an address, it becomes necessary to define a way to copy and paste data of arbitrarily length. No real person will be typing addresses, or scripts by hand. We do not need to care about the shorter length, or the danger of characters being transcribed. Instead we should aim to give the user a general context of what they are pasting, by using a textual prefix. And we should aim to give developers some easy insight into the data by not attempting to pack the data, and providing a format that can to some degree be read visually, by encoding all data following the prefix in hexadecimal.
We intentionally retain what base58 offers, including the relevant network, a version for the prefixed data, and a checksum using the same algorithm. However, we aim to do so in a less opaque manner.
The scheme follows this form:
<prefix>:<version hex><network hex><data hex><checksum hex>
The version relates to this encoding structure. It has no bearing on the format of the embedded data, any type of data encoded in this structure that has it's own subformat is responsible for taking care of it's own versioning.
Where the currently supported versions are:
Version | Description |
---|---|
1 | The initial version. |
Where the currently supported prefixes are:
Prefix | Description |
---|---|
bitcoin-script | A complete script as intended to be included in a transaction output. |
bitcoin-template | ... TODO: some standardised specification for incomplete scripts |
Where the encoded segments are:
Segment | Structure | Description |
---|---|---|
version hex | unsigned byte[1] as hexadecimal | Provides the ability to update the structure of the data that follows it. |
network hex | unsigned byte[1] as hexadecimal | Provides the ability to specify that the data is only valid for use on the given network. |
data hex | unsigned byte[] as hexadecimal | The payload data. |
checksum hex | unsigned byte[4] as hexadecimal | Provides the ability to detect whether the complete encoding is fully intact. |
Where the currently supported network byte values are:
Network | Value |
---|---|
not applicable | 0 |
mainnet | 1 |
testnet | 2 |
The data bytes are of arbitrary length, and are followed by four checksum bytes. The checksum is created by taking the first four bytes of the double SHA256 hash of the text preceding it. Note that it is not the checksum of the data segment, but the checksum of the rest of the encoded string, and is intended to ensure the entire encoded text is intact.
The checksum is calculated over the prefix and all other segments:
<prefix>:<version hex><network hex><data hex>
Then the hexadecimal encoded checksum is appended to that text:
<prefix>:<version hex><network hex><data hex><checksum hex>
The data segment for version 1 encodings, is solely the bytes that make up a script.
When looking to add additional types, or to extend the data embedded in a type, it is worth considering whether it is better to wrap data you have encoded in this scheme in a higher level protocol.
One possible extension to the script type, is an expiration. If someone copies a payment destination and UI eventually receives it, it is useful to be able to warn the user that the given data is old and they might want to obtain a fresh version. However, it is more than likely a much better fit to include the encoded script with context in a higher level protocol.
Another possible extension is what the script is, and how it should be used. Is it spendable? Is it an output script? Is it an input script? These are also things that can defined based on the context the encoded script is supplied. An example of a higher level protocol might be one of the URL formats that includes an address. If an encoded script is included in lieu of an address, then it is implicitly a payment destination and given in the context of an amount and potentially other related data.
from hashlib import sha256 from typing import Tuple PREFIX_SCRIPT = "bitcoin-script" PREFIX_TEMPLATE = "bitcoin-template" CURRENT_VERSION = 1 NETWORK_MAINNET = 1 def _checksum(data: bytes) -> bytes: return sha256(sha256(data).digest()).digest()[0:4] def bip276_encode(prefix: str, data: bytes, network:int=NETWORK_MAINNET, version:int=CURRENT_VERSION) -> str: assert version == CURRENT_VERSION payload_bytes = bytearray() payload_bytes.append(version) payload_bytes.append(network) payload_bytes.extend(data) payload_hex = payload_bytes.hex() result = prefix +":"+ payload_hex return result + _checksum(result.encode()).hex() class ChecksumMismatchError(Exception): pass def bip276_decode(text: str) -> Tuple[str, int, int, bytes]: text = text.strip() prefix, payload_hex = text.split(":", 1) payload_bytes = bytes.fromhex(payload_hex) checksum = payload_bytes[-4:] version = payload_bytes[0] assert version == CURRENT_VERSION network = payload_bytes[1] data = payload_bytes[2:-4] checksummed_bytes = text[:-8].encode() local_checksum = _checksum(checksummed_bytes) if checksum != local_checksum: raise ChecksumMismatchError(f"expected {checksum.hex()}, got {local_checksum.hex()}") return prefix, version, network, data
Expected output:
>>> s = bip276_encode(PREFIX_SCRIPT, b"fake script") >>> s 'bitcoin-script:010166616b65207363726970746f0cd86a' >>> bip276_decode(s) ('bitcoin-script', 1, 1, b'fake script')
Thanks go to Curtis Ellis for his assistance in developing this encoding.
This document is placed in the public domain.