Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool: fix spurious send_event timeout error #375

Merged

Conversation

DanConwayDev
Copy link
Contributor

start to listen for OK msg immediately before event is sent

this prevents the OK msg from being recieved
before we start listening for it

this bug was introduced in v0.28 and discovered when upgrading
v0.27 ~> v0.29 in ngit (https://https://gitworkshop.dev/ngit)
as it caused many intergration tests to fail

Description

Notes to the reviewers

Changelog notice

Checklists

All Submissions:

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

Comment on lines 1064 to 1067
let handle_responses = time::timeout(Some(opts.timeout), async {
let mut published: HashSet<EventId> = HashSet::new();
let mut not_published: HashMap<EventId, String> = HashMap::new();
let mut notifications = self.internal_notification_sender.subscribe();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is enough to move the let mut notifications = self.internal_notification_sender.subscribe(); outside the time::timeout, before the self.batch_msg(msgs, opts).await?;. Should be cleaner as solution.

What do you think?

start to listen for OK msg immediately before event is sent

this prevents the OK msg from being recieved
before we start listening for it

this bug was introduced in v0.28 and discovered when upgrading
v0.27 ~> v0.29 in ngit (https://https://gitworkshop.dev/ngit)
as it caused many intergration tests to fail
@DanConwayDev DanConwayDev force-pushed the fix-send-event-spurious-timeout branch from 5e4901b to 7362f7c Compare March 26, 2024 15:49
@DanConwayDev
Copy link
Contributor Author

That's much more elegant! I've force pushed your suggestion to this branch.

Copy link
Member

@yukibtc yukibtc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 7362f7c

@yukibtc yukibtc merged commit 7362f7c into rust-nostr:master Mar 26, 2024
6 checks passed
@yukibtc
Copy link
Member

yukibtc commented Mar 26, 2024

Just published nostr-relay-pool v0.29.1

Make sure to bump it in your Cargo.lock with cargo update -p nostr-relay-pool

@DanConwayDev
Copy link
Contributor Author

Thanks.
I am still getting frequent, but intermittent, errors on many integration tests after upgrading from rust-nostr v0.27 ~> v0.29.

They seem to be failing at points which use Relay::get_events_of.
Maybe there is a similar problem as the handler is created after the subscription is initiated in get_events_of_with_callback?

Also, possibly not related but clippy is warning about

large future with a size of 21016 bytes
for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#large_futures
`#[warn(clippy::large_futures)]` implied by `#[warn(clippy::pedantic)]`

for relay.connect and client.connect_relay post upgrade.

@yukibtc
Copy link
Member

yukibtc commented Mar 26, 2024

Can you point me where are the issues in the ngit code? I'll try to take a look

@yukibtc
Copy link
Member

yukibtc commented Mar 26, 2024

I found the get_events_of. Maybe the timeout is too low (500 ms) also if the relay is on the same machine. What error do you receive?

@DanConwayDev
Copy link
Contributor Author

Thanks :-)

rust-nostr v0.29 upgrade branch

running when running cargo test, a non-consistent set of integration tests will fail with the same error:

Timeout Error: Expected EOF (End of File) but got "searching for profile and relay updates...`\r``\n`" (after waiting 2000 ms)

The first thing the tool does with a fresh config (integration test conditions) is to fetch the users metadata and relays. this is where it intermittently fails to receive the events sent by the relay. The test times out before the end of the get_events_of timeout.

You highlighted the function where this occurs.
https://github.com/DanConwayDev/ngit-cli/blob/3c534c24e5ff96d8dcd88ed419ae7d7d07e96f92/src/client.rs#L243

@DanConwayDev
Copy link
Contributor Author

the clippy errors:

large future with a size of 21016 bytes
for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#large_futures
`#[warn(clippy::large_futures)]` implied by `#[warn(clippy::pedantic)]`

appear on these two lines:
https://github.com/DanConwayDev/ngit-cli/blob/3c534c24e5ff96d8dcd88ed419ae7d7d07e96f92/src/client.rs#L127
https://github.com/DanConwayDev/ngit-cli/blob/3c534c24e5ff96d8dcd88ed419ae7d7d07e96f92/src/client.rs#L249

@yukibtc
Copy link
Member

yukibtc commented Mar 27, 2024

I tried to take a look but the most of time I receive this:

X has been running for over 60 seconds

Always in the tests/list.rs


Regarding the warning: maybe is related to the size of InternalRelay::try_connect or it's a false positive. Not sure yet, I never saw this warning before. From the clippy book:

Note that clippy::pedantic contains some very aggressive lints prone to false positives.

https://doc.rust-lang.org/stable/clippy/configuration.html?highlight=clippy%3A%3Apedantic#allowingdenying-lints

I'll try to re-write the try_connect method for better readability and should solve also that warning.

@DanConwayDev
Copy link
Contributor Author

Try running cargo test --test init a few times until one of the test fail.

the list tests are probably failing for the same reason but in a 'prep' phase which means they dont fail gracefully and cause the test to never end.

The main branch passes CI:
https://github.com/DanConwayDev/ngit-cli/actions/runs/8392897271/job/22987395219

@yukibtc
Copy link
Member

yukibtc commented Mar 27, 2024

Ok, yes, I received that error.

Bumping nostr-sdk to commit d0c5f15 solves the issue. I'm going to release a new patch for nostr-relay-pool

@yukibtc
Copy link
Member

yukibtc commented Mar 27, 2024

Released nostr-relay-pool v0.29.2

DanConwayDev added a commit to DanConwayDev/ngit-cli that referenced this pull request Mar 28, 2024
rust-nostr author suggested this might be a false positive
rust-nostr/nostr#375 (comment)
DanConwayDev added a commit to DanConwayDev/ngit-cli that referenced this pull request Mar 28, 2024
rust-nostr author suggested this might be a false positive
rust-nostr/nostr#375 (comment)
@DanConwayDev
Copy link
Contributor Author

There were a few minor behavioral changes which were breaking other tests. I updated the test and they are all now passing, so I have merged the update to v0.29.

Thanks for your assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants