-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP0006 Stake pool extended metadata #15
Conversation
Happy to see our work as a baseline for this task! Well-processed structure, etc., I would just like to disagree:
Otherwise great, thanks. |
Abuse contact information is a historically grown form and method for quick urgency communication between decentralized and often before unknown entities. So it sounds also appropriate as an optional but recommended contact information, based on a very common and diffused contact medium: E-mail Examples also email abuse reports already have some standardized format |
Sure, I didn't understand the purpose of that or how it was different from the marketing accounts. makes sense to me to add that back in.
The "recommend if full" idea is a good concept, but I'm unclear how best to implement this in practice and it brings up some strategic questions. Is extended metadata meant to be different for each pool registered on chain? or is the concept to have one extended metadata file for each pool you have registered. It seems like your metadata concept was allowing multiple pools to share the same extended metadata so I headed down that path. I think we all have to agree on this fundamental question first before we sort out the rest. |
CIP6/CIP6.md
Outdated
|
||
## Motivation | ||
|
||
As the ecosystem around Cardano stake pools proliferate so will the desire to slice, organize and search pool information dynamically. Currently the metadata referenced on chain provides 512 bytes that can be allocated across the four information categories ([delegation-design-specification Section 4.2)](https://hydra.iohk.io/build/790053/download/1/delegation_design_spec.pdf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the 512
bytes limitation only exists in softwares currently dealing with metadata (i.e. cardano-wallet and SMASH). Having a max size that is rather constrained prevents some DOS attacks on softwares consuming metadata. Yet, it should be possible to increase that limit to makes it possible to include more information, while still keeping a reasonable size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the
512
bytes limitation only exists ... it should be possible to increase that limit to makes it possible to include more information, while still keeping a reasonable size.
it is an interesting option indeed, that should be considered as a possible way to go. In that case it any change requires a re-registration on chain with the new hash. Having it only linked from main to the extended json file makes it more flexible but also a little bit less trusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit for on chain data is great for those of us parsing, doubling it would be fine but I do think keeping it constrained and allowing the extended metadata to be the place for flexible and fast changes is the right tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit for on chain data is great for those of us parsing, doubling it would be fine but I do think keeping it constrained and allowing the extended metadata to be the place for flexible and fast changes is the right tradeoff.
Agreed, it's way better to have a strictly defined set of mandatory metadata and more flexible set of extended metadata, while mandatory metadata may be subject to extensive CIP review process, extended metadata provide the ability to innovate with after-the-fact formalization thru the CIP process for greater interoperability.
Thank you. Extended is ideally unique for every pool (that was plan), like basic meta.json. Is true, someone can use one extended for more than one pool, but main target is 1=1. I can imagine that in time we will come up with something new here that will already strictly require 1 = 1. @gufmar yes, abuse is great use-case for a lot of services (for. ex. when you are sending spams, when you have attacking script / backdoor on webhosting, when your client does something he doesn't, something illegal for removal...), but here in this case I just do not see a specific use. Do we have a specific case where it would really make sense more than, for example, an official iohk newsletter? |
we should define the intention and use case for abuse contacts. |
Ok, I would recommend that we keep it 1 = 1 for now then. The specific pool attributes would make more sense then inside the
I'd prefer not to do this as I think its confusing, but could also change
to
if we wanted to build in support for multiple pools from the start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me what the security model is here.
With the on-chain referenced metadata, the checksum is on chain and signed by the pool operator and owners. So we know the metadata has not been tampered with and that they endorse the metadata content.
With the current proposed extended metadata, there's an additional level of indirection and no checksum. This has the benefit of not having to re-register the pool cert to change the metadata, but it also means we loose the tamper-resistance and loose assurance that the metadata content is endorsed by the operator or owners. We know they point to a URL but there are many points of weakness (like DNS hijacking, proxy or origin server hacking) so it's hard to trust the content.
What is the intended security model? What are we supposed to be able to rely on?
Is the extra level of indirection deliberate to avoid having to re-register the pool? Or is it because you think you cannot put more data into the on-chain referenced metadata?
If we want the same security model as the on-chain referenced metadata, but for performance or other reasons want to have more data in a separate file, we can just use the same trick of a URL + content hash.
So I'm not asking for any specific design: you propose what you want to propose. I'm asking for these security questions to be addressed in the proposal.
@papacarp it is true that since PoolTool does not anymore provide a link to inicial metadata JSON file it is hard to verify root of trust chain -> metadata -> extended metadata, before when you could click on a link that pointed to the on-chain registered metadata file, you could verify e.q. that both metadata and extended metadata live on a same domain with the same TLS certificate, this is not possible on PoolTool and I am not sure it was ever available on ADA Pools. With that said, when using site such as PoolTool or ADA Pools, you are explicitly trusting a centralized counterparty to safely retrieve, process and display the embeded metadata, however there should probably be a recommendations on secure retieval and processing of extended metadata including e.q. how often are the data consumers expected to update such metadata. When on-chain registered metadata are updated, there is an on-chain event to trigger, when extended metadata are updated, there is no on-chain event to trigger on. |
Yes, that is the main purpose of the creation this - re-registering is not required. I can imagine that re-registering in addition can cause a lot of operators a lot of unexpected problems. |
to be defined (or redefined)
another variant by keeping the whole metadata secure verifiable and trustworthy is to not have any checksum on chain, but adding a signed witness to the metadata file. then the pool owner can alter the metadata file whenever he want without adding any load to the chain.
the mentioned weaknesses seem putting in question more than half of the existing internet I would say. As it is already an existing requirement to provide the metadata.json over HTTPS only, same should be obligatory for an extended file of course. Might - as an additional requirement - it even needs to be the same hostname as the main metadata file, but as the linked extended URL is signed by the pool owner it should be safe to trust. Then DNS- and Proxy-based MITM attacks are the challenge for an attacker, in order to modify the extended data like twitter handle or the owner logo (nothing fund or delegation related)
Can you explain please?
definitively as it's important. |
Not a KO, but it is susceptible to version downgrade / rollback / replay for anyone bootstraping his / her app from chain genesis and fetching metadata as pool certificates apear on the chain. The replay can be partially mitigated by using timestamping instead of serials and forced metadata update period just like we do with KES I suppose. |
In preparation for our meeting in a few hours I've updated. There are a few conversations I'm hoping we can close out during our meeting:
|
CIP6/CIP6.md
Outdated
}, | ||
"operator": { | ||
"country": "UK", | ||
"sex": "2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably this should be "gender" instead
Why is gender (sex) of the operator included at all? |
I just read this proposal in detail and have a few comments, questions and suggestions.
This turned out to be a lot but I hope it is helpful. Example
|
Wow, thank you for your thoughtful feedback. I generally have no problem with the structure you propose. As mentioned earlier, the structure I put in was evolved from the adapools implementation in an attempt to make this all as easy as possible. I'd be fine switching to your structure if we can all agree.
It does not seem irrelevant yet. We are still at 50% staked in Cardano, so that's a lot of stake still looking for homes.
I assume another CIP?
My goal was simply a region defined by an internationally recognized country code. The address details came from adapools.
adapools added that. I'm not sure where they were going with it.
I assume another CIP. I'm not sure how the process would work for lookup details like that.
Feel free to create a better list. These are the lists we used in pooltool.
Its arbitrary. If you think it makes more sense as active then go for it. I envision these will end up translated so wanted a code to abstract a bit from the word.
Seems like it. If you add a pool, then the pool ID is required, but I could envision someone wanting to "erase" all metadata and they could do so by submitting an empty extended json with a new serial.
again, adapools. I'm fine with those changes. I know telegram_admin is referring to a private, direct, contact point wheras telegram_handle is likey a public one or more specifically a channel as you point out.
Well I think its an important distinguishing strategy for pools that we should capture IF the pool wants to market themselves that way. I also know CF has placed an emphasis on promoting gender in blockchain. Ethnicity would be the same, and leaving it out was not so much an oversight as just a decision to keep this incremental. Feel free to add in ethnicity if you can find a standard designator set for them (IEC? ISO?)
we used a the ISO standard for this. Since gender identity has evolved considerably over the last 20 years I think its best to just allow the standards bodies to create the identifiers and we just follow that. Look up ISO/IEC 5218 sex
in pooltool we gave you two options. owner location and server location. Many of us have nodes spread all over the world and the public facing ones are often registered and easily traceable. Again, the goal here is to give the operators a country to affiliate with not necessarily document or capture reality. So I'm fine if you want to make the nodes an array, but it wasn't really my intention in capturing the data.
Its very helpful. I really appreciate your thoughtful ideas and practical recommendations.
That would be great! I commented on the changes I had feedback on above. If I didn't comment on it, then I'm fine with your recommendation. My goal is not so much to control this standard, but to have something in place so we can start sharing data. |
First, thank you for this feedback. I believe this also makes evident how and why CIP efforts can help to end up with better, combined results. The ITN section might seem irrelevant, but often the desired effect is invisible. In this case this ITN ticker proof has (and still does) prevent from imitating duplicates. At least I'm not aware of any duplicate tickers for all those who published their itn proof. |
Thanks for clarifying those points. Regarding ethnicity, I did not find any international standard. I think both "operator" and "owner" could have the info separated into "person" and "organization". You could have either or both. Here is an updated example:
|
A couple questions about the lists like "os" and "infrastructure".
For the "os" and "infrastructure" lists I would suggest:
|
I agree with all changes, thanks to all. From my view only:
Please extend this to 128 Characters Max, I believe we really dont want have there another redirect-mania, 128 should be okay for github raws etc. |
@papacarp can you please rename the PR to reflect what is the CIP about? |
CIP6/CIP6.md
Outdated
"server": "long description of server details", | ||
"company": "long description of company details" | ||
}, | ||
"rss": "https://mycoolpool.com/xml/poolrss.xml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RSS as well as Atom are very painful standards that are becoming obsolete, they have been discontinued by most browsers already. We might want to investigate alternatives, such as JSON Feed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting
indeed RSS seems has slowly died out in last 15 years (https://trends.google.com/trends/explore?date=all&q=rss)
I had to lookup what https://www.jsonfeed.org/ is.
Exists since 2017
has surprisingly many supported plugins and libraries https://jsonfeed.org/code/
here is the github home: https://github.com/manton/JSONFeed
https://en.wikipedia.org/wiki/JSON_Feed
https://en.wikipedia.org/wiki/Comparison_of_feed_aggregators
we might need to understand first how SPos want to use news feeds (beside the social networks and chat services)
we need two URLs (data and hash file) |
@cardanians @papacarp @SebastienGllmt @ashisherc @dmitrystas based on the CIP-Editor call conversations, I'm going to refine the proposal. One question towards known metadata consumers (portal operators) is about the absolutely required fields. The current state of the proposed JSON structure is
A second question is related to the RRS feed technology. some thoughts on required fields and potential risks Do we need a Generally I see 3 category of "claims" an SPO can make here
|
Folks, can we please concentrate on the mechanism for the extended metadata, and postpone all the bikeshedding about the content of the extended metadata for later? The mechanism is what needs to be carefully described and agreed for us to be able to implement anything. Once we have that in place we can discuss additions to the schema with relatively little technical risk. Any time we spend now on the schema delays getting the mechanism in place. So I'd again recommend that we go with a absolutely minimal metadata schema (e.g. with one single example uncontroversial entry), and firm up the description of the mechanism for the extended metadata. Once that's agreed we can open the door to bikeshedding about what new metadata we want to add, and that can be done bit by bit, one thing at a time so we don't have to have uncontroversial items block on controversial items. |
My recent commit b53da37 describes one possible mechanism. it actually seems to need a small extension of the cli tool to calculate the signature of any json schema. Or we have to build the schema in a similar way as the current signature commands already expect. There would be an alternative way for the signature by using TX metadata. The drawback is, a consumer would need a fully synched chain and probably a proper db-sync instance to have access to this validation data. On the other hand, it would be an elegant technique using the chain itself. We could even think about designing the whole thing according to the current DID design (https://www.w3.org/TR/did-core/) Based on the feedback I have received from who is already using the existing, non-standardised extended metadata, I see the need to already include the most frequently used fields (e.g. logo, contact handles) in order to encourage rapid and significant adoption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting much better.
I'd still like to see more explicit and precise details: say what exactly the new fields in the main metadata are. What is their format exactly. There's some confusion of hashes vs signatures to clear up.
You've sort-of described how an operator can create the files, but not exactly and not clearly.
What steps do tools need to do to validate the extended metadata? I.e. explain what has to be downloaded, what signatures have to be checked with what keys, don't just assume everyone understands the scheme already.
CIP-0006/CIP-0006.md
Outdated
Then a new (not available yet) `cardano-cli` command generate the signed hash (`extData.sign`) . | ||
|
||
```shell | ||
cardano-cli shelley stake-pool rawdata-hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cardano-cli shelley stake-pool rawdata-hash | |
cardano-cli shelley stake-pool rawdata-sign |
CIP-0006/CIP-0006.md
Outdated
| `homepage` | A website URL for the pool| 64 Characters Maximum, must be a valid URL | | ||
| `name` | A name for the pool | 50 Characters Maximum | | ||
| `extDataUrl` | A URL for extended metadata | optional, 128 Characters Maximum, must be a valid URL | | ||
| `extHashUrl` | A URL with the extended metadata hash | optional, 128 Characters Maximum, must be a valid URL | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't mean extHashUrl
but rather extSigUrl
. This file contains the signature of the data file, which can be verified with the extVkey
.
CIP-0006/CIP-0006.md
Outdated
The operator now: | ||
|
||
- has the `extData.json` and `extData.sign` files | ||
- will publish them at some https:// URL (probably same host as the main metadata) | ||
- use the `extData.vkey` string and the two extend file URLs to re-register the main metadata | ||
|
||
This re-registration of the main metadata file with the `extData.vkey` and the two URLs is only necessary once. Afterwards, the operator can update his extended metadata at any time, generate the new signature and put both files online. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand, but the text is not clear.
Make clear which fields in the main metadata file these things correspond to. Say that you need the URL of the extended metadata json file, and signature file, that these are to fill in the extDataUrl
and extSigUrl
fields.
And what format exactly is the extVkey
? Bech32 I presume? What prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gufmar - can we get this addressed to have it merged next meeting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about poolmd_vk
for the bech32 prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pick a title to better reflect what this CIP is about (SPOs...) - Currently too vague
Todo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good now.
Just would like a simple clear section that says explicitly what software implementing this spec needs to do to verify the extended metadata.
So it should say that the extVkey
is an ordinary 32byte ed25519 verification key. That the extSigUrl
resolves to a raw 64byte ed25519 signature. What is the signature of? Is it the raw data we find at the end of extDataUrl
or is it the hash of that data? Everywhere else in Cardano that we use signatures we sign hashes only, not variable-sized raw data. We should do the same here. So sign the Blake 2b 255bit hash of the raw data we find at extDataUrl
.
Thus the verification of the extended metadata is simply to do an ed25519 verification of the signature found at extSigUrl
using the vkey from the main metadata, over the hash of the data found at extDataUrl
.
| `description` | Pool Description. Text that describes the pool | 50 Characters Maximum | | ||
| `homepage` | A website URL for the pool | 64 Characters Maximum, must be a valid URL | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really accurate? As far as I can tell, the Design Specification (Section 4.2) doesn't impose an explicit character limit on the "description" and "homepage" metadata fields. All it requires is a total size of the on-chain metadata of 512 bytes or less.
Edit: To be clear, I am talking about the current metadata specification here. CIP0006 is proposing to shorten the "description" character limit to 50 and the "homepage" character limit to 64, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I'm not sure what is effectively true, but I'm aware of multiple different origins and specifications.
For example the incentivised testnet metadata registry
https://github.com/cardano-foundation/incentivized-testnet-stakepool-registry#submission-well-formedness-rules
afaik this was used as a template for the mainnet metadata registration file.
I also remember another proposal/definition/example I can't find now.
And there is what currently is implemented by the SMASH server as input validation
https://github.com/input-output-hk/smash/blob/479fc6d8fa62537cd6c8560176d6a8a7ffda6e9e/smash-servant-types/src/Cardano/SMASH/Types.hs#L212-L252
With the additional fields, we need to increase the current max size from 512 to (proposed) 1024 bytes
By not specifying additional max sizes for individual fields would make it very unpredictable (for example for UI design)
One could consciously use the shortest possible URLs, and fill an unlimited description or name field with >900 characters.
So I propose to align to what Smash server currently has implemented and is also using in his SQL DB schema.
In my opinion, what is missing from the proposed extended metadata specification is a way to specify the fingerprint of a public PGP key, or alternatively a download URL for the public key. Something like this:
|
Initial proposal for extended metadata that can be a basis for dialog