Hides/reveals secret messages in text. Optimized for instant messaging.
Messages are encoded as zero-width Unicode characters, as a casual form of steganography.
Web app: https://dblspk.io/
Chrome extension:
Tab / Shift+Tab — cycle through fields
Encoded message is automatically copied by tabbing to or clicking on the field.
Drag and drop files onto page to encode.
- File transmission
- CRC-32 error checking
- Multi-message decoding
- Linkifies URLs, emails, phone numbers, and Twitter hashtags
- Preview URLs for images, video, and audio
- Progressive Web App — can be pinned to your Android homescreen
What can be hidden:
- Text
- URLs (similar use to QR codes)
- Watermarks
- Small files
Possible places for storage:
- Chat messages
- Social media posts
- User profile information
- Forums
- HT︁︀︁ML
- Emails
- Digital documents
- File names (very short messages only)
Unicode contains some zero-width, unprintable characters. We use 16 of them to encode any data in hexadecimal, using our arbitrary encoding scheme:
Decimal | Hex | Binary | Character | Description |
---|---|---|---|---|
0 | 0 | 0000 | U+200C |
zero-width non-joiner |
1 | 1 | 0001 | U+200D |
zero-width joiner |
2 | 2 | 0010 | U+2060 |
word joiner |
3 | 3 | 0011 | U+2061 |
function application |
4 | 4 | 0100 | U+2062 |
invisible times |
5 | 5 | 0101 | U+2063 |
invisible separator |
6 | 6 | 0110 | U+2064 |
invisible plus |
7 | 7 | 0111 | U+206A |
inhibit symmetric swapping |
8 | 8 | 1000 | U+206B |
activate symmetric swapping |
9 | 9 | 1001 | U+206C |
inhibit Arabic form shaping |
10 | A | 1010 | U+206D |
activate Arabic form shaping |
11 | B | 1011 | U+206E |
national digit shapes |
12 | C | 1100 | U+206F |
nominal digit shapes |
13 | D | 1101 | U+FE00 |
variation selector-1 |
14 | E | 1110 | U+FE01 |
variation selector-2 |
15 | F | 1111 | U+FEFF |
zero-width non-breaking space |
A header, encoded in the same way, is prepended:
Size | Field | Description |
---|---|---|
1 byte | Protocol signature | ASCII letter "D", or 0x44 |
1 byte | Protocol version | 0x00 |
4 bytes | CRC-32 | Calculated on decoded data field |
1 byte | Data type | 0x00 : Encryption wrapper0x01 : UTF-8 text0x02 : File |
1+ bytes | Data length | Variable length quantity, representing length of the data field |
Varies | Data | Depends on data type |
The resulting string of invisible characters is then inserted at a random location in the cover text. More details in the protocol specification.
Each invisible character represents 4 bits, while taking 3 bytes (24 bits) to store. Thus, the hidden data consumes 6 times as much memory as the original data, not including header data and cover text.
When decoding, input is treated as a stream of an arbitrary number of messages. This allows users to paste in any text and decode all messages within at once. This also allows messages that have been split into chunks to be decoded, as long each chunk contains an even number of encoding characters, to maintain byte alignment.
Each message header stores the length of the data field, to allow decoding of multiple concatenated messages.
During parsing, the decoder keeps track of consecutive sequences of encoding characters in the cover text. If some encoding characters have been corrupted or truncated, the CRC fails and the remainder of a sequence must be discarded. However, decoding will resume from the next sequence. This prevents one corrupted message from making all following messages undecodable.
Sequences of insufficient length, such as might occur naturally when encoding characters are used for their original purpose, are discarded.
The following planned features are defined in the protocol specification:
- Automatic compression, only when size will be reduced
- Optional built-in encryption for convenience (users can still provide own encryption without this feature)
To suggest a feature, please create an issue.
- Produces no visible alteration in the text.
- Can theoretically store a near-unlimited amount of data regardless of length of the cover text.
- Can be used with applications that do not support file transfers.
- Reduces suspicion by not requiring the frequent transfer of large files during communication.
- Can be filtered or corrupted by applications that do not support Unicode, or that attempt to format user input.
- Extremely easy to detect. Any digital text can be checked for the possible presence of a message by pasting it into a decoder, or a text editor that displays non-printing characters. Large messages may create line breaks in some applications.
If you are serious about concealing your payload, you should use another form of steganography.
As with any method of communication, security is only as good as the encryption applied. This only provides a casual level of security through obscurity.
This project began at Cal Hacks 3.0 by a much less memorable name.
- Joshua Fan — web app
- Samuel Arnold — encoding algorithm and Python app
- Nitzan Orr — decoding algorithm
Cross-browser testing courtesy of BrowserStack.