Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(p2p): dnsaddr recursive resolution #2204

Merged
merged 16 commits into from
Sep 17, 2024
Merged

chore(p2p): dnsaddr recursive resolution #2204

merged 16 commits into from
Sep 17, 2024

Conversation

rymnc
Copy link
Member

@rymnc rymnc commented Sep 16, 2024

Linked Issues/PRs

Description

  • creates a new module dnsaddr_resolution which handles recursive dnsaddr resolution to add to the DHT without suffix matching. This way we can connect to all peers behind a domain without specifying the exact PeerId, like /dnsaddr/bootstrap.libp2p.io.
  • asyncifies some of the methods related to mounting the p2p service
  • resultifies some of the methods related to mounting the p2p service

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

After merging, notify other teams

[Add or remove entries as needed]

@@ -50,6 +50,7 @@ thiserror = "1.0.47"
tokio = { workspace = true, features = ["sync"] }
tracing = { workspace = true }
void = "1"
hickory-resolver = "0.25.0-alpha.2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use resolver from libp2p?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean the same version as libp2p or to use their resolver?

their resolver is in libp2p-dns (https://docs.rs/libp2p-dns/latest/libp2p_dns/tokio/type.Transport.html) and doesn't export any helper functions to resolve dnsaddr's.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, they use the same library to do it. Yeah. we need to use the same version to not bloat the Cargo.lock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 4f4b662

@rymnc rymnc self-assigned this Sep 17, 2024
@rymnc rymnc marked this pull request as ready for review September 17, 2024 10:15
@rymnc rymnc requested review from xgreenx and a team September 17, 2024 10:15
let mut dnsaddr_multiaddrs = vec![];

for dnsaddr in dnsaddr_urls {
let multiaddrs = dns_resolver.lookup_dnsaddr(dnsaddr.as_ref()).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if dnsaddr will have another dnsaddr, will it work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, resolution is recursive. the test case handles that :)

const MAX_DNS_LOOKUPS: usize = 10;

#[async_trait::async_trait]
pub trait DnsLookup {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need trait?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed :)

@@ -97,7 +102,7 @@ impl Config {
self
}

pub fn finish(self) -> Behaviour {
pub async fn finish(self) -> anyhow::Result<Behaviour> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, have you considered implementation that doesn't require async during construction? Maybe it is possible to resolved addresses in the fn start?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean here -

pub async fn start(&mut self) -> anyhow::Result<()> {
?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 5b4671f

@netrome netrome self-requested a review September 17, 2024 11:53
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to block this PR, but there are two pretty substantial issues I can see so far that I would like to have addressed.

  1. I don't think we should use .now_or_never() to assert that config.finish().await is ready in build_behavior_fn. It is not clear to me that this future will always be ready when this is invoked.
  2. The DNS resolution tests should not be dependent on the host machines local DNS cache or network connection. Instead I'd suggest creating a port for this, which would allow us to create more tests for different DNS configurations.

Moreover, if a port is introduced to allow the DNS lookup logic to be tested I'd be interested in seeing a test case covering recursive lookups to make sure they work as expected.

Comment on lines 271 to 274
async move { config.finish().await }
.now_or_never()
.unwrap()
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we can guarantee this future to be immediately ready. What happens if the DnsResolver::new().await? is pending for example? As I read this, we'd panic in that scenario. That feels quite brittle to me, in which case it would feel safer to just use the synchronous resolver instead.

My preferred option though would be to allow build_behavior_fn to return a function which returns a future, so we can return this future without having to panic if it isn't ready here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecated in 5b4671f

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

Comment on lines 179 to 188
let dnsaddr_urls = multiaddrs
.iter()
.filter_map(|node| {
if let Protocol::Dnsaddr(multiaddr) = node.iter().next()? {
Some(multiaddr.clone())
} else {
None
}
})
.collect::<Vec<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This reads a bit funky to me. Wouldn't the singular of multiaddrs be multiaddr rather than node? I'd suggest this naming instead:

Suggested change
let dnsaddr_urls = multiaddrs
.iter()
.filter_map(|node| {
if let Protocol::Dnsaddr(multiaddr) = node.iter().next()? {
Some(multiaddr.clone())
} else {
None
}
})
.collect::<Vec<_>>();
let dnsaddr_urls = multiaddrs
.iter()
.filter_map(|multiaddr| {
if let Protocol::Dnsaddr(dnsaddr_url) = multiaddr.iter().next()? {
Some(dnsaddr_url.clone())
} else {
None
}
})
.collect::<Vec<_>>();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 0482f80

/// This limit is for preventing malicious or misconfigured DNS records from causing infinite recursion.
const MAX_DNS_LOOKUPS: usize = 10;

#[async_trait::async_trait]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async traits have been stabilized since rust 1.75, so I'd prefer not to add this declaration for new traits. Instead just have the functions return impl Future<Output = ...> and use async fn in the implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the trait

Comment on lines 40 to 42
) -> Pin<
Box<dyn std::future::Future<Output = anyhow::Result<Vec<Multiaddr>>> + Send + 'a>,
> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the Pin<Box<...>> here? That feels like an artifact of async_trait which should preferably be removed (as per my other suggestion), or if you want to keep the async_trait, this should happen in the trait implementation and not in this private helper method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recursion needs us to pin the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, makes sense 👍

Comment on lines 91 to 99
// given
let resolver = DnsResolver::new().await.unwrap();
// when
let multiaddrs = resolver
.lookup_dnsaddr("bootstrap.libp2p.io")
.await
.unwrap();
// then
assert!(!multiaddrs.is_empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assume here that any machine running this test will always be able to look up the "bootstrap.libp2p.io" address in their environment?

For example, what happens if I'm flushing my local DNS cache and disconnect from the internet before running this test? As I read this, the test case would fail in that scenario.

I think we need to introduce a a port to manage the DNS lookup to make this logic testable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think introduction of the port will make this tests useless, since it will just test how port itself works. Plus I'm not sure how hard is it I think the introduction of the port will make these tests useless since they will just test how the port itself works. Plus I'm not sure how hard it is to write your own DNS resolver.

I'm okay with the idea that this test fails without a connection or local DNS resolution.

Having a test with real DNS proves that it works in a real environment. The libp2p library tests it in the same way=)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I see your point, I have to respectfully disagree. I think the port will help us make the tests even more useful. Right now we can't assert much more than !multiaddrs.is_empty(), because there's no way for us to control which DNS records we get when we look up bootstrap.libp2p.io. With the port, we for example can test that our logic parses the TXT records correctly.

As for another example, there's currently no way to know from this test if this test is doing a recursive lookup or just a plain single lookup - and this can vary depending on how libp2p.io configures their records. With the trait we can write multiple test cases to test and make clear assertions about different scenarios.

I don't think it would be too hard to put a trait between our code and the TokioAsyncResolver, because as far as I can see, we're only calling TokioAsyncResolver::txt_lookup so we only need to mock one method for the tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will try to put that in another way=)

The main functionality that I would like to see tested is that we can resolve the real addresses from ipfs(or fuel testnet/mainnet). So, I still want to see the test that uses real dnsaddr and verifies that TokioAsyncResolver works as expected.

The functionality of resolve_recursive and how this function works are not so important to me.
The approach with the port for internal resolver can help to test the behaviour of the resolve_recursive function in different use cases. And it looks like the right call to do it.

It is up to @rymn. I'm okay with doing that in a separate PR just dedicated to better test coverage of the resolve_recursive since the main feature is implemented and we have an integration test that covers this feature=)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this? I have some dnsaddr's on my personal domain anyway, we can just reuse those with recursion and assert strictly about the multiaddrs this function spits out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'd prefer if we test against fuel domains but I'm okay with any domain within our control so if you use your personal domain for now and create a follow-up to set up fuel domains that would be my preferred option. Any owner of a hard-coded domain in our tests will have the power to block our CI temporarily so I'd rather have it be someone in the team.

I'd still find the port solution more readable, since this scenario is only verifiable by manually doing DNS lookups to check which records exists on the host machine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 8c77c1d

i won't touch the records, i promise 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahah pinky promise 😂 Nice comment also, super helpful thank you! 🙏

@netrome netrome self-requested a review September 17, 2024 14:20
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll re-review tonight, but my main concerns have been addressed. Thank you!

@netrome netrome dismissed their stale review September 17, 2024 14:22

Concerns has been addressed, but I need to re-review before approving.

.lookup_dnsaddr("bootstrap.libp2p.io")
.await
.unwrap();
// run a `dig +short txt rymnc.com` to get the TXT records
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, this should be _dnsaddr.rymnc.com. patching.

Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM=)

crates/services/p2p/src/p2p_service.rs Outdated Show resolved Hide resolved
// notice that it contains -
// `dnsaddr=/dnsaddr/zone-1.rymnc.com/tcp/4001/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN`
// which is a recursive call
let multiaddrs = resolver.lookup_dnsaddr("rymnc.com").await.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to use main net dnsaddr=)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 92f5f22

xgreenx
xgreenx previously approved these changes Sep 17, 2024
@xgreenx xgreenx requested a review from netrome September 17, 2024 16:29
@xgreenx xgreenx enabled auto-merge (squash) September 17, 2024 16:30
// when
// run a `dig +short txt _dnsaddr.mainnet.fuel.network` to get the TXT records
let multiaddrs = resolver
.lookup_dnsaddr("mainnet.fuel.network")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we switch this to core-test.fuellabs.net? The records on the mainnet.fuel.network record can and will change over time, but I just setup the core-test DNS record to be static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for setting this up. I created this follow-up issue to use the core-test.fuellabs.net hostname in the test, since this PR got auto-merged after my approval.

Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍 It would be nice to use the test records Mike suggested, but that can be done as a follow-up if you want to get this merged promptly for release.

@@ -5,6 +5,7 @@ pub mod behavior;
pub mod codecs;
pub mod config;
pub mod discovery;
mod dnsaddr_resolution;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd move this declaration to a separate block than the public modules.

@xgreenx xgreenx merged commit 20f3091 into master Sep 17, 2024
33 of 35 checks passed
@xgreenx xgreenx deleted the fix/p2p branch September 17, 2024 20:21
@xgreenx xgreenx mentioned this pull request Sep 17, 2024
xgreenx added a commit that referenced this pull request Sep 18, 2024
## Version v0.36.0

### Added
- [2135](#2135): Added metrics
logging for number of blocks served over the p2p req/res protocol.
- [2151](#2151): Added
limitations on gas used during dry_run in API.
- [2188](#2188): Added the new
variant `V2` for the `ConsensusParameters` which contains the new
`block_transaction_size_limit` parameter.
- [2163](#2163): Added
runnable task for fetching block committer data.
- [2204](#2204): Added
`dnsaddr` resolution for TLD without suffixes.

### Changed

#### Breaking
- [2199](#2199): Applying
several breaking changes to the WASM interface from backlog:
- Get the module to execute WASM byte code from the storage first, an
fallback to the built-in version in the case of the
`FUEL_ALWAYS_USE_WASM`.
- Added `host_v1` with a new `peek_next_txs_size` method, that accepts
`tx_number_limit` and `size_limit`.
- Added new variant of the return type to pass the validation result. It
removes block serialization and deserialization and should improve
performance.
- Added a V1 execution result type that uses `JSONError` instead of
postcard serialized error. It adds flexibility of how variants of the
error can be managed. More information about it in
FuelLabs/fuel-vm#797. The change also moves
`TooManyOutputs` error to the top. It shows that `JSONError` works as
expected.
- [2145](#2145): feat:
Introduce time port in PoA service.
- [2155](#2155): Added trait
declaration for block committer data
- [2142](#2142): Added
benchmarks for varied forms of db lookups to assist in optimizations.
- [2158](#2158): Log the
public address of the signing key, if it is specified
- [2188](#2188): Upgraded the
`fuel-vm` to `0.57.0`. More information in the
[release](https://github.com/FuelLabs/fuel-vm/releases/tag/v0.57.0).

## What's Changed
* chore(p2p_service): add metrics for number of blocks requested over
p2p req/res protocol by @rymnc in
#2135
* Weekly `cargo update` by @github-actions in
#2149
* Debug V1 algorightm and use more realistic values in gas price
analysis by @MitchTurner in
#2129
* feat(gas_price_service): include trait declaration for block committer
data by @rymnc in #2155
* Convert gas price analysis tool to CLI by @MitchTurner in
#2156
* chore: add benchmarks for varied forms of lookups by @rymnc in
#2142
* Add label nochangelog on weekly cargo update by @AurelienFT in
#2152
* Log consensus-key signer address if specified by @acerone85 in
#2158
* chore(rocks_db): move ShallowTempDir to benches crate by @rymnc in
#2168
* chore(benches): conditional dropping of databases in benchmarks by
@rymnc in #2170
* feat: Introduce time port in PoA service by @netrome in
#2145
* Get DA costs from predefined data by @MitchTurner in
#2157
* chore(shallow_temp_dir): panic if not panicking by @rymnc in
#2172
* chore: Add initial CODEOWNERS file by @netrome in
#2179
* Weekly `cargo update` by @github-actions in
#2177
* fix(db_lookup_times): rework core logic of benchmark by @rymnc in
#2159
* Add verification on transaction dry_run that they don't spend more
than block gas limit by @AurelienFT in
#2151
* bug: fix algorithm overflow issues by @MitchTurner in
#2173
* feat(gas_price_service): create runnable task for expensive background
polling for da metadata by @rymnc in
#2163
* Weekly `cargo update` by @github-actions in
#2197
* Fix bug with gas price factor in V1 algorithm by @MitchTurner in
#2201
* Applying several breaking changes to the WASM interface from backlog
by @xgreenx in #2199
* chore(p2p): dnsaddr recursive resolution by @rymnc in
#2204

## New Contributors
* @acerone85 made their first contribution in
#2158

**Full Changelog**:
v0.35.0...v0.36.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(p2p): Handle dnsaddr resolution
4 participants