Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression Producer 3.5.0 vs 3.4.0 #1321

Closed
atnoya opened this issue Apr 24, 2024 · 9 comments · Fixed by #1322
Closed

Performance regression Producer 3.5.0 vs 3.4.0 #1321

atnoya opened this issue Apr 24, 2024 · 9 comments · Fixed by #1322

Comments

@atnoya
Copy link
Contributor

atnoya commented Apr 24, 2024

We have upgraded fs2-kafka to 3.5.0 from 3.4.0, and it seems the performance and CPU usage got a serious hit.

Screenshot 2024-04-24 at 14 23 04 Screenshot 2024-04-24 at 14 23 34 Screenshot 2024-04-24 at 14 47 37

In the screenshots above, you can see when we switch back to 3.4.0 at 14:18

Looking at the list of changes between 3.4.0 and 3.5.0, we tried downgrading only fs-core to 3.9.4 as we thought could be the root cause, but still got terrible performance.

It looks to me, but I don't know the internals of cats-effect or fs2 that well, that the problem might be in the change from Sync[F].blocking to Sync[F].interruptible, as from my limited experience, I don't see how any of the other changes could cause this.

I am happy to provide more information if I can. I will try to test my hypothesis above in the meantime. Will report results If I find anything.

@abestel
Copy link
Contributor

abestel commented Apr 24, 2024

I reached the exact same conclusion this afternoon, this PR #1126 seems to be the culprit (tested by reverting it and publishing a local version).

@aartigao
Copy link
Contributor

Wow... I'm astonished...

@aartigao
Copy link
Contributor

I'm also not well versed on the CE internals, and by looking at the docs for interruptible it didn't seem to hurt, that's why I merged that PR 😢

OFC, I'm going to create a fix for this now, but now I'm really curious of why such a performance drop 🤔 cc @armanbilge was that expected?

@aartigao
Copy link
Contributor

Maybe interruptible doesn't shift to the blocking pool?

@atnoya
Copy link
Contributor Author

atnoya commented Apr 25, 2024

https://github.com/typelevel/cats-effect/blob/769a89ef5d39f35d3a2cd00ffedbd22c91df48cc/core/jvm/src/main/scala/cats/effect/IOFiberPlatform.scala#L28

I can see some complex logic there, including semaphores, AtomicRefs and busy waits not running in the Blocking TP (can explain the CPU increased usage)?

https://github.com/typelevel/cats-effect/blob/769a89ef5d39f35d3a2cd00ffedbd22c91df48cc/core/jvm/src/main/scala/cats/effect/IOFiberPlatform.scala#L177

But it does seem that the action is run in the Blocking TP.

@atnoya
Copy link
Contributor Author

atnoya commented Apr 25, 2024

We can possibly open an issue for clarification in the cats-effect repo.

@atnoya
Copy link
Contributor Author

atnoya commented Apr 25, 2024

Btw thanks a million for the quick reaction and the release 🙇

@atnoya
Copy link
Contributor Author

atnoya commented Apr 25, 2024

Just dropping confirmation, the new release 3.5.1 fixes the issue:
Screenshot 2024-04-25 at 12 34 48
Screenshot 2024-04-25 at 12 34 55
Screenshot 2024-04-25 at 12 35 03

I am good to close the issue. Unless you want to keep it open for tracking the interruptible issue.

@aartigao
Copy link
Contributor

It's fine to close it. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants