-
Notifications
You must be signed in to change notification settings - Fork 526
Security, privacy, and AI safety
When you use Petals as a client or run your own server, our important priority is to make sure that you computer is safe. Here, we answer popular questions about safety, security and privacy.
Q: If I join Petals, do I allow other people to execute code on my computer?
During the training, participants only exchange tensors (embeddings, gradients) and never send code to each other. No other peer can execute arbitrary code on your computer - they can only request you to run one of pre-defined BLOOM layers. Peers serialize their tensors using safe protocols, such as protobuf. We never use pickle because it is unsafe.
Q: What should I worry about when using Petals?
Here are some common concerns:
-
Data privacy. If you work with sensitive data, do not use the public swarm. This is important because it's technically possible for peers serving model layers to recover input data and model outputs, or modify the outputs in a malicious way. Instead, you can set up a private Petals swarm hosted by people and organization you trust, who are authorized to process this data.
-
Opening ports for a Petals server. We recommend you to only open the port the server is going to use. Otherwise, there is a risk of giving access to another application running on a different port (e.g., Jupyter, misconfigured to listen to all interfaces instead of just
localhost
). -
Large amount of traffic. During peak loads, Petals server may generate a lot of Internet traffic and interfere with other apps on the same network. If this is not acceptable, we recommend setting a bandwidth limit while running the server.
If possible, we encourage you to do your own research, so you are sure your computing infrastructure is safe.
Q: Is there anything I can do to further increase security?
If you run a Petals server accepting requests from the Internet, consider isolating it from the rest of the machine by running it inside a Docker container (see instructions for doing that in our README) or a virtual machine.
If you run the server inside a corporate or university cluster, we encourage you to run it from a sub-network that does not have access to internal privacy-sensitive systems.
Q: What about my privacy? What information do I share with other peers?
When you use a Petals client, it's technically possible for peers serving model layers to recover input data and model outputs, or modify the outputs in a malicious way. Therefore, please do not use the public swarm for processing sensitive data. Instead, you can set up a private Petals swarm hosted by people and organization you trust, who are authorized to process this data. In this case, you can also anonymize your data to further increase privacy of your datasets.
When you use a Petals server, you share your compute throughput and which BLOOM layers you currently serve - this can be potentially used to identify your server hardware. Also, Petals lets other peers know your IP address. If this is a concern, we recommend using VPN or other means to hide it.
Q: The networking used in Petals is different from what I'm used to. Is it secure?
Petals uses standard TCP and QUIC communication via the libp2p framework. Libp2p has a detailed security analysis. We recommend reading it directly, but in summary, an attacker can make it difficult for you to communicate to other peers or temporarily make your server run BLOOM in vain, but they cannot make you run custom code.
Libp2p has been around since 2016 (same as PyTorch!) and used in many security-sensitive applications such as:
- IPFS - a distributed filesystem where anyone can join
- Berty - a secure peer-to-peer messaging application
- A bunch of cryptocurrencies: PolkaDot, FileCoin, and so on
Q: Does Petals guarantee that model outputs are correct?
Not by default. A faulty or malicious server could give you incorrect outputs. There are two things you can do about this:
- Verify outputs. Send some of your tensors to two or more peers and check that the answers match.
- Distill the model before using in production. If you have trained a Petals-based model for production use, you can distill it into a smaller, task-specific model that runs on your own hardware.
- Set up a private swarm. You can launch your own swarm hosted by people and organization you trust, who are authorized to process your data.
In future, we plan to implement an automatic verification and a reputation system, so that clients can select servers that they can trust.
Q: What about AI safety?
Petals runs BLOOM-176B - an open-source large language model by BigScience collaboration. This model is publicly accessible here. BigScience did a lot of work reducing potential harm from using BLOOM (see details in model card). Still, there is still a risk that the model will be misused. Weidinger et al. (2021) provide a survey of potential ethical and societal risks from large language models, and Petals shares the same risks as the original BLOOM.
However, we believe that Petals does not add to the existing security risks in any significant way. The original BLOOM weights are already public and downloaded tens of thousands of times (see the model info on the right), making it difficult to "erase the model from the Internet". Without Petals, a potential attacker could download these weights and run BLOOM locally given enough hardware. Thus, Petals could make some misuse scenarios more affordable, but it does not create new threats.
If you have any other questions, please join #discussion channel on our Discord or reach us out by any other means.