-
Notifications
You must be signed in to change notification settings - Fork 25
Perspectives on Agent Design
It still contains a lot of good thinking, but we've seen over time that the wiki isn't the right place to discuss. We now have better information about agents in the following places:
- Indy SDK Getting Started Guide -- shows how to write agents using Indy's API
- How DIDs, Keys, Credentials, and Agents Work Together in Sovrin -- covers the basics of how agents are authorized (despite "Sovrin" in the title, this document applies equally well to Indy)
- Sovrin Agent-to-Agent Communication -- covers some theory of how communication between agents works (despite "Sovrin" in the title, this document applies equally well to Indy)
You can also chat with us over on chat.hyperledger.org, in the #indy-agent
channel.
Definition
A Sovrin agent is a piece of software that acts as the proxy or servant of an identity owner. Ordinary people can have agents; agents can also be used by owners that are institutions or things (in the IoT sense).
The Jobs Agents Do
Agents make it possible to contact another identity owner on the network, and to interact with them using a programmatic API. Agents provide automatable horsepower as well as insulation from direct interaction. This insulation improves privacy, and makes it possible to have some real-time interactions even when an owner is offline. Agents may also participate in secret-sharing schemes and mix network routing to add security and robustness features to the Sovrin ecosystem.
Living Without
It is possible to accomplish tasks on Sovrin without an agent. However, many tasks have drastically better performance, scale, and usability if agents are involved. Without an agent, a university might hire humans to manually update thousands of student transcript claims on Sovrin at the end of each semester, and might further task these humans with writing emails to notify their students about the updated claims. With an agent, a university might dump new transcripts to a folder, and let all the details of updating claims and notifying the student body unfold on autopilot, with the agent’s help.
Where Agents Run
Agents can run in many places--on a mobile device possessed by an owner, on a server in a cloud, on a desktop, maybe even in an an on-demand harness like Amazon Lambda... The use cases that they address are broader if they are always on, so we believe that the cloud home is the most natural home in the near term (Alice’s configuration in the diagram above). However, as mobile devices and the IoT advance mesh-based contexts, perhaps that will change.
About Agencies
A Sovrin agency provides SaaS-style agent services to identity owners. Some Sovrin participants may use them, and some may not. Different agencies will offer different features with their agents; they will compete on features and price. Sovrin wants to encourage innovation in agents, and does not want to constrain them unnecessarily. However, there must be a minimum spec that all agents implement (both in terms of functionality and communication); this guarantees that arbitrary agents can interact, and it makes owner data at least partly portable with respect to agents, preventing the most egregious forms of vendor lock-in. This design doc focuses on the least-common-denominator aspects of agent design (the parts that should not vary from vendor to vendor).
How Much Agents Are Trusted
Although agents are trusted to do useful work on behalf of their owners, a general design guideline is that they should not be "trusted third parties" in the cybersecurity sense. That is, their access to the secrets of their owners should be non-existent or at least severely curtailed. In this sense, they are like butlers who carry sealed envelopes on silver platters for their employers: they convey information, and do useful work--but they are kept at arms length, not knowing intimate details. A hacked agent should not be capable of impersonating its owner, absconding with data, or spying.
Inbound Communication
Agents need to be externally callable. A natural way to do that would be over a RESTful web service interface, but there are other options as well (e.g., a message queue-oriented approach, raw sockets with a custom protocol, etc). The choice of mechanism here cannot be left up to free variation, because agent interop is necessary for the ecosystem to deliver significant value.
Because agents are vital to many trusted interactions, the mechanism we choose must be secured. If we choose https, we’ll be using traditional certificate authority mechanisms to establish trust. This would be a familiar and well supported path--but it would also make the sovrin ecosystem vulnerable to many problems with certificates, such as expiration, revocation, broken chains of trust, and man-in-the-middle attacks. Perhaps more importantly, trust in the ecosystem would now be no stronger than the certificates, which would undermine much of the value that Sovrin’s all about. And this would be doubly ironic, because the whole point of certificates is to prove the identity of parties in an interaction--something that Sovrin has a much stronger story for, anyway.
We are actively investigating several solutions, and will report back to the community in short order with a proposal backed by a thoughtful analysis. In the meantime, assume something like a RESTful web service interface of https, with a bookmark in that decision. Where "http" and “https” are used in this doc, please remember the bookmark.
Outbound Push
Agents also typically provide outbound push notification features that allow them to reach their owners. Different push notification mechanisms are possible, including push that reaches mobile devices over mobile networks (common when owners are humans), push that’s web service based (common when owners are IoT devices or institutions), push that uses email, push that uses snail mail, and so forth. The general agent spec does not constrain how these mechanisms work, except to say that they need to maintain the same level of security as other parts of the system.
Other Parties
Agents will commonly communicate with one another. They may also be contacted by code that is not sovrin-centric; such code may have a variety of purposes and a variety of behavioral and design constraints. Agents should always be designed for robustness and security in these communications; they cannot assume that they will only be contacted by trustworthy parties.
Endpoints
A fundamental assumption of the Sovrin ecosystem is that identities are (at least potentially) multifaceted. Alice may interact with Faber College as a student, with Acme Corp as an employee, and with Thrift Bank as a customer. All of these facets of Alice’s identity feel like part of a larger whole to Alice, but each of them is independent and unknown to other parties, except as Alice discloses them. Each portion of Alice’s compound identity has a different identifier; Faber may know Alice as 123, Acme may know Alice as 456, and so forth. (Here, 123 and 456 are placeholders for standard DIDs or CIDs, which are much larger numbers.)
Each time Alice’s agent interacts with the outside world, it does so through one of these identifiers. This means that the agent may have different endpoints, depending on who calls it. When Faber College talks to Alice’s agent, it may address it at https://agents.myagency.com/123; when Acme does the same, it may use https://agents.myagency.com/456. Some identity owners on Sovrin may choose to expose a public and universal agent URI; this may be common for certain institutions. But individuals who do so have exposed a correlatable piece of PII; they should only do this after careful forethought, never as a default encouraged by an agency.
It is possible for an identity owner to use multiple agents. However, at most one agent (ignoring load balancing and similar schemes) should be tasked with proxying the identity owner at a given identifier. That is, it should never be the case that two agents having different state and different features might respond to a given endpoint in free variation; a given party wishing to interact with Alice should see consistent behavior when doing so.
Interfaces
All agents MUST support the following types of core interactions.
-
Establish a connection: exchange public keys with the agent or other software of another identity owner, so a new pair-wise communication channel can be created. This channel will be authenticated by using each party’s public key.
-
Respond to a ping: This is a sovrin trust ping, not just an ICMP ping. It can only happen between an agent and another party that have exchanged keys and have agreed to communicate already; it is never anonymous. The response validates not just availability of an agent at an endpoint, but the validity of the keys that maintain the communication channel between the two parties, and the status of any link contracts associated with the channel. This must be one of an agent’s supported features, but it does not need to be enabled. In other words, it is valid to not respond to a ping, but it is not valid to claim that a ping is unsupported or unrecognized.
-
Report available interfaces: List the names and versions of interfaces that the agent supports. This list is context-dependent; an agent may report different interfaces depending on who it is talking to. Interfaces are named and use semantic versioning.
-
Export data: provide a downloadable package that the identity owner can use to externalize data belonging to the identity owner, and held in trust by the agent. This might include just configuration data, secure backups of claims and link contracts, and many other things. Supporting this behavior is an important hedge against vendor lock-in, but its importance goes beyond that purpose: even proprietary agents that service a sponsoring institution need export so they can do predictable upgrades.
Agents MAY support the following types of standard interactions:
-
Transfer private data: Route a secure message between two devices that are both in the possession of the agent’s owner. This might be used to send a secure backup of a vault from one device to another, for example.
-
Send a claim
-
Receive a claim
-
Revoke (rotate) a key
-
Route a message to the owner
-
Route a message from the owner
-
Add a device: owner buys an ipad in addition to phone; both devices now need to work with agent
-
Remove a device: owner decommissions old phone
-
Respond to a Shamir secret sharing request
-
Configure Shamir secret sharing
The exact nature of each of these interactions is not specified in here, but it is fair to assume that each requires an authenticated connection followed by the exchange of one or more messages. Where messages can use text practically, JSON is preferred; binary payloads may be required in some cases. A future design doc will describe these interactions in detail. The format of the docs that is exchanged should be reflected in the semantic versioning of the interfaces.
Advanced Features
In addition to these core and standard behaviors, particular agents may support advanced features that have meaning for their particular identity owners. These behaviors can be added in any fashion that suits the purpose, but supported features SHOULD be discoverable with the standard "Report available interfaces" feature, and should use standard naming and semantic versioning.
Not all agent features are interactions. For example, agents might host an AI that acts on behalf of the owner; they might capture an audit trail of their own activities, or of login attempts by external parties; they might impute reputation to various entities; they might queue and even triage messages for their owner; they might detect patterns of fraud; they might interact with an escrow service...
Other Considerations
Agents need to be upgradeable. A purpose of semantic versioning is to give predictable behavior with respect to backward compatibility and interop.
Parties that call agents are called agent consumers. Agent consumers should not assume:
-
That agents have a specific version.
-
That endpoints for an agent are always on.
-
That communication with agents is reliable or high-performance.
-
That agents are trustworthy with respect to data belonging to the agent consumer or to the identity owner.
-
That URIs for agent endpoints are immutable.
-
That the public key for an agent is immutable. (They must look it up periodically on the ledger.)
They may assume:
- That URIs for agent endpoints are relatively stable (change infrequently enough that reusing the URI multiple times in a single session is a useful simplification).
Agents should treat all incoming data as suspect, and undertake careful data sanitization on it. This includes data purported to come from the identity owner. Agents should also have a robust error handling strategy.
Agents should report errors using reasonable http status codes, possibly augmented by a response body that contains additional detail.
Agents are strongly encouraged to deal in utf-8 text, ISO 8601 timestamps, UTC, and other standards that are broadly accepted. In particular, agents are discouraged from serializing data in a format that’s specific to a single programming language, that’s sensitive to endianness or the precision of floating point numbers, and so forth.