Base-UTF8 is a specially designed encoding method used to store arbitrary binary data in environments that only support UTF-8 encoding.
- Add a reserved byte at the beginning of the data to store the length of padding to be added later.
- Pad the end of the data with zeros until its length is a multiple of 7. This is done to ensure that the encoding can be divided into blocks of 7 bytes each.
- Store the length of the padding in the reserved byte added in step 1.
- Divide the padded data into blocks of 7 bytes each.
- Encode each block by storing the first bit of each byte in the original block in the last 7 bits of the first byte of the encoded block. The remaining 7 bytes of the encoded block store each byte in the original block where the first bit has been set to 0.
- Output the encoded data.