Users only connect to other users with the same dat link. Anyone with a dat link can see other users that are sharing that link and their IP addresses.
We are thinking more about how to ensure reader privacy. See this blog post for more discussion.
Yes, data shared over Dat is encrypted in transit using the public key (Dat link). When you share a Dat, you must share the public key with another user so they can download it. We use that key on both ends to encrypt the data so both users can read the data but we can ensure the data is not transferred over the internet without encryption.
One of the key elements of Dat privacy is that the public key is never used in any discovery network. The public key is hashed, creating the discovery key. Whenever peers attempt to connect to each other, they use the discovery key.
Data is encrypted using the public key, so it is important that this key stays secure.
Only someone with the key can download data for Dat. It is the responsibility of the user that the Dat link is only shared with people who should access the data. The key is never sent over the network via Dat. We do not track keys centrally. It is almost impossible for keys to overlap (and thus to guess keys).
As long as the public key isn't shared outside of your team, the content will be secure (though the IP addresses and discovery key may become known). You can take a few steps further to improve privacy (generally at the cost of ease of use):
- Disable bittorrent DHT discovery (using only DNS discovery), use
--no-dht
flag in CLI. - Whitelist IP addresses
- Run your own discovery servers
- Encrypt contents before adding to dat (content is automatically encrypted in transit but this would also require decrypting after arrival).
Only some of these options can be done in the current command line tool. Feel free to PR options to make these easier to configure!
Dat uses the concept of a Merkle tree to make sure content is not tampered with. When content is added to a Dat we cryptographically fingerprint it and add it to the tree. On download, we can use the tree to make sure the content has not changed and the parent hashes match.
Dat uses an append-only to track changes over time. An append-only log shows all of the changes for a given Dat since it was shared. We use this for version control but it can also bolster transparency for a dataset. Any changes to a dataset will be tracked and you can see what changed and when.
As a peer to peer network, Dat faces similar privacy risks as Bittorrent. When you download a dataset, your IP address is exposed to the users sharing that dataset. This may lead to honeypot servers collecting IP addresses, as we've seen in Bittorrent. However, with dataset sharing we can create a web of trust model where specific institutions are trusted as primary sources for datasets, diminishing the sharing of IP addresses. Read more about reader privacy in the p2p web.