Skip to content

LLM steganography with minimum-entropy coupling - Hiding encrypted messages in natural language.

License

Notifications You must be signed in to change notification settings

user1342/Tomato

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Hide text within other natural language text 🍅

Tomato is a proof of concept steganography tool that utilises minimum-entropy coupling code provided by ssokota! ⭐

🧠 How It Works

  • LLM-Generated Cover Text: The LLM, as normal, generates coherent text based off a prompt.
  • Embedding with MEC: MEC is applied to merge the probability distribution of the hidden message (ciphertext) with the distribution of the LLM-generated covertext. This coupling minimizes the joint entropy, ensuring that the stegotext (covertext with the embedded message) retains the statistical properties of natural language, making the hidden message effectively undetectable.
  • Decoding Process: During decoding, the LLM assists by providing a context-aware interpretation of the stegotext. MEC is then used in reverse to decouple the hidden message from the covertext. The process leverages the same probability distributions used during embedding, ensuring that the message is accurately extracted without compromising the integrity of the covertext.

This method ensures that the hidden message is seamlessly integrated into the text and can be securely and precisely retrieved later, with minimal risk of detection.

📙 Example

from tomato import Encoder

encoder = Encoder()

plaintext = "hello"
formatted_stegotext, stegotext = encoder.encode(plaintext)
estimated_plaintext, estimated_bytetext = encoder.decode(stegotext)

Output:

Stegotext: After hours, I like to walk. Sometimes I will travel by train to a station I’ve never been, and walk from there in no particular direction. As I walk, I find the world reveals itself, in small, inexplicable ways. Tonight, a rabbit darted across the track ahead of the train with such urgency I thought, for a moment, it was a fox, or something more menacing. And when I pulled my phone out to
------
Decoded Plaintext:  helloAAAAAAAAAA # The A's are padding up to the encryption key length

⚙️ Setup

Tomato required Nvidia CUDA. Follow the steps below:

Install the dependencies using:

pip install git+https://github.com/user1342/mec
git clone https://github.com/user1342/Tomato.git
cd tomato
pip install -r requirements.txt
pip install -e .

🏃 Running

You can use the Tomato Encoder/Decoder Tool directly from the command line. Here are the available commands:

Encode a Message

To encode a plaintext message into stegotext:

tomato-encode.exe "Your secret message here" --cipher_len 20 --shared_private_key 123abc... --prompt "Good evening."

Output:

Stegotext: [Your encoded message here]

Decode a Message

To decode a stegotext back into its original plaintext:

tomato-decode.exe "Your stegotext here" --cipher_len 20 --shared_private_key 123abc... --prompt "Good evening."

Output:

Estimated Plaintext: [Your decoded plaintext]
Estimated Bytetext: [Your decoded bytetext]

Programatic Example

Checkout the example playbook! For a quick demonstration, you can try encoding and decoding a simple message using the following code snippet:

from tomato import Encoder

encoder = Encoder()

plaintext = "I am a hidden code"
formatted_stegotext, stegotext = encoder.encode(plaintext)
estimated_plaintext, estimated_bytetext = encoder.decode(stegotext)

print(formatted_stegotext)
print("------")
print(estimated_plaintext)

🛡️ Customization Options

The Tomato Encoder/Decoder Tool offers several customizable parameters:

  • cipher_len: Length of the cipher (default is 15).
  • shared_private_key: Shared private key in hex format. If not provided, a random key will be generated.
  • prompt: Prompt for the language model (default is "Good evening.").
  • max_len: Maximum length of the covertext (default is 100).
  • temperature: Sampling temperature for the language model (default is 1.0).
  • k: The k parameter for the language model (default is 50).
  • model_name: Name of the language model to be used (default is "unsloth/mistral-7b-instruct-v0.3-bnb-4bit").

🙏 Contributions

Tomato is an open-source project and welcomes contributions from the community. If you would like to contribute to Tomoto, please follow these guidelines:

  • Fork the repository to your own GitHub account.
  • Create a new branch with a descriptive name for your contribution.
  • Make your changes and test them thoroughly.
  • Submit a pull request to the main repository, including a detailed description of your changes and any relevant documentation.
  • Wait for feedback from the maintainers and address any comments or suggestions (if any).
  • Once your changes have been reviewed and approved, they will be merged into the main repository.

⚖️ Code of Conduct

Tomato follows the Contributor Covenant Code of Conduct. Please make sure to review and adhere to this code of conduct when contributing to Tomato.

🐛 Bug Reports and Feature Requests

If you encounter a bug or have a suggestion for a new feature, please open an issue in the GitHub repository. Please provide as much detail as possible, including steps to reproduce the issue or a clear description of the proposed feature. Your feedback is valuable and will help improve Monocle for everyone.

📜 License

MIT