Doublespeak

Hides/reveals secret messages in text. Optimized for instant messaging.

Messages are encoded as zero-width Unicode characters, as a casual form of steganography.

Web app: https://dblspk.io/

Chrome extension:

Usage

Web app

Tab / Shift+Tab — cycle through fields

Encoded message is automatically copied by tabbing to or clicking on the field.

Drag and drop files onto page to encode.

Features

File transmission
CRC-32 error checking
Multi-message decoding
Linkifies URLs, emails, phone numbers, and Twitter hashtags
Preview URLs for images, video, and audio
Progressive Web App — can be pinned to your Android homescreen

Possible uses

What can be hidden:

Text
URLs (similar use to QR codes)
Watermarks
Small files

Possible places for storage:

Chat messages
Social media posts
User profile information
Forums
HT⁢⁢‌‌⁮⁯︁⁮︀⁢⁠⁣‌‍‍⁡⁣⁬⁤⁪⁣⁠⁪⁪⁠⁤⁣⁠‌⁪‌⁪⁠⁤⁣⁪⁢⁪⁢⁪⁬⁠‌⁤⁪⁤⁤⁤⁢⁠︁ML
Emails
Digital documents
File names (very short messages only)

How it works

Unicode contains some zero-width, unprintable characters. We use 16 of them to encode any data in hexadecimal, using our arbitrary encoding scheme:

Decimal	Hex	Binary	Character	Description
0	0	0000	`U+200C`	zero-width non-joiner
1	1	0001	`U+200D`	zero-width joiner
2	2	0010	`U+2060`	word joiner
3	3	0011	`U+2061`	function application
4	4	0100	`U+2062`	invisible times
5	5	0101	`U+2063`	invisible separator
6	6	0110	`U+2064`	invisible plus
7	7	0111	`U+206A`	inhibit symmetric swapping
8	8	1000	`U+206B`	activate symmetric swapping
9	9	1001	`U+206C`	inhibit Arabic form shaping
10	A	1010	`U+206D`	activate Arabic form shaping
11	B	1011	`U+206E`	national digit shapes
12	C	1100	`U+206F`	nominal digit shapes
13	D	1101	`U+FE00`	variation selector-1
14	E	1110	`U+FE01`	variation selector-2
15	F	1111	`U+FEFF`	zero-width non-breaking space

A header, encoded in the same way, is prepended:

Size	Field	Description
1 byte	Protocol signature	ASCII letter "D", or `0x44`
1 byte	Protocol version	`0x00`
4 bytes	CRC-32	Calculated on decoded data field
1 byte	Data type	`0x00`: Encryption wrapper `0x01`: UTF-8 text `0x02`: File
1+ bytes	Data length	Variable length quantity, representing length of the data field
Varies	Data	Depends on data type

The resulting string of invisible characters is then inserted at a random location in the cover text. More details in the protocol specification.

Efficiency

Each invisible character represents 4 bits, while taking 3 bytes (24 bits) to store. Thus, the hidden data consumes 6 times as much memory as the original data, not including header data and cover text.

Robustness

Multiple/split messages

When decoding, input is treated as a stream of an arbitrary number of messages. This allows users to paste in any text and decode all messages within at once. This also allows messages that have been split into chunks to be decoded, as long each chunk contains an even number of encoding characters, to maintain byte alignment.

Concatenated messages

Each message header stores the length of the data field, to allow decoding of multiple concatenated messages.

Corrupted and uncorrupted messages mixed together

During parsing, the decoder keeps track of consecutive sequences of encoding characters in the cover text. If some encoding characters have been corrupted or truncated, the CRC fails and the remainder of a sequence must be discarded. However, decoding will resume from the next sequence. This prevents one corrupted message from making all following messages undecodable.

Sequences of insufficient length, such as might occur naturally when encoding characters are used for their original purpose, are discarded.

Roadmap

The following planned features are defined in the protocol specification:

Automatic compression, only when size will be reduced
Optional built-in encryption for convenience (users can still provide own encryption without this feature)

To-do list

To suggest a feature, please create an issue.

Comparison to other steganography techniques

Pros

Produces no visible alteration in the text.
Can theoretically store a near-unlimited amount of data regardless of length of the cover text.
Can be used with applications that do not support file transfers.
Reduces suspicion by not requiring the frequent transfer of large files during communication.

Cons

Can be filtered or corrupted by applications that do not support Unicode, or that attempt to format user input.
Extremely easy to detect. Any digital text can be checked for the possible presence of a message by pasting it into a decoder, or a text editor that displays non-printing characters. Large messages may create line breaks in some applications.

If you are serious about concealing your payload, you should use another form of steganography.

As with any method of communication, security is only as good as the encryption applied. This only provides a casual level of security through obscurity.

Credits

This project began at Cal Hacks 3.0 by a much less memorable name.

Joshua Fan — web app
Samuel Arnold — encoding algorithm and Python app
Nitzan Orr — decoding algorithm

Services used

Cross-browser testing courtesy of BrowserStack.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
images		images
lib @ 7222aaa		lib @ 7222aaa
polyfills		polyfills
.gitmodules		.gitmodules
CNAME		CNAME
README.md		README.md
favicon.ico		favicon.ico
index.css		index.css
index.html		index.html
index.js		index.js
manifest.json		manifest.json
sw.js		sw.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doublespeak

Usage

Web app

Features

Possible uses

How it works

Efficiency

Robustness

Multiple/split messages

Concatenated messages

Corrupted and uncorrupted messages mixed together

Roadmap

Comparison to other steganography techniques

Pros

Cons

Credits

Services used

License

About

Contributors 2

Languages

dblspk/web-app

Folders and files

Latest commit

History

Repository files navigation

Doublespeak

Usage

Web app

Features

Possible uses

How it works

Efficiency

Robustness

Multiple/split messages

Concatenated messages

Corrupted and uncorrupted messages mixed together

Roadmap

Comparison to other steganography techniques

Pros

Cons

Credits

Services used

License

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages