Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix jar file reference release race condition #43221

Closed

Conversation

pcasaes
Copy link
Contributor

@pcasaes pcasaes commented Sep 11, 2024

Per instance, close should only be performed once.

Possible solution for #43158

Per instance, close should only be performed once.
@quarkus-bot quarkus-bot bot added the area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins label Sep 11, 2024
@geoand
Copy link
Contributor

geoand commented Sep 12, 2024

cc @mariofusco

Copy link
Contributor

@mariofusco mariofusco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We designed this class to keep the concurrent state in one single variable, the referenceCounter. In general adding a second variable makes the concurrent state no longer atomic and much harder to be coordinated and kept consistent, even though I must admit that in this specific case this additional state is only used to ensure that the close is called only once, so probably there's no need for a coordination with the other variable.

However the atomic state implemented by the referenceCounter actually works only under the strict assumption that the close method is called only once, while with this proposed change you're trying to cover the possibility of multiple close invocation, which breaks that assumption.

In essence while this pull request could be acceptable and make the whole design safer, I believe that it is fixing a symptom not the root cause. In other words, before dealing with the possibility that the close is called more than once, I would like to investigate why this is happening and possibly prevent it.

@franz1981 any comment on your side?

@franz1981
Copy link
Contributor

franz1981 commented Sep 12, 2024

I see 2 possible actions:

https://github.com/quarkusio/quarkus/blob/3.14.2/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java#L323

add a check in the synchronized here to not call the reset twice

AND

do as you've done - because we need close specifically - which is exposed to outside, let's say, to be idempotent and cannot be triggered more than once.

We can save throwing an exception (and just return) when we try to transition (again) from 1 -> 0 in the ref count, but we risk to not capture ref cnt's other issues, if we become more relaxed there.

Another possibility is to use a parity bit in the ref count (or just turning it negative?) when we request a close - ignoring any attempt to set it again

i.e.

  • ref cnt is 2
  • close is called once: ref cnt become -1
  • close called again: ref cnt is already negative, it is ignored
  • release happen: ref cnt move from -1 -> 0

Which means that acquire/release increase/decrease the absolute value of the counter but cannot change sign
while close decrease the absolute value of the counter and change its sign in negative, only if the value is positive; if is negative is ignored.

I can help to impl that one if is not clear.

@pcasaes
Copy link
Contributor Author

pcasaes commented Sep 12, 2024

In essence while this pull request could be acceptable and make the whole design safer, I believe that it is fixing a symptom not the root cause.

I agree 100%. Which is why in the issue I comment that I'm not sure this is the solution
#43158 (comment)

close can be called more than once because an instance of a reference can be added and removed from the runner's cache many times, all while a happy path acquire/release is taking place.

Maybe a proper solution is to not add references back into the cache once close has been called (called, not necessarily completed).

@gsmet
Copy link
Member

gsmet commented Sep 12, 2024

FWIW, I think we need a safe and backportable solution very soon as it seems like something we want to fix before 3.15.0 (and the last backports will be made on Tuesday).

@pcasaes
Copy link
Contributor Author

pcasaes commented Sep 12, 2024

I wonder if this would be a better change

    void close(JarResource jarResource) {
        while (true) {
            var jarFileReference = jarResource.jarFileReference.get();
            if (jarFileReference == null) {
                break;
            } else if (jarResource.jarFileReference.compareAndSet(jarFileReference, null)) {
                release(jarResource);
                break;
            }
        }
    }

This would prevent a reference that has been marked as closed from being reused even if there's still counters available.

This does prevent the issue from happening but it would allow multiple instances of the same jar file to "live" concurrently.

EDIT: didn't work. Left a reference open without ever closing.

@franz1981
Copy link
Contributor

franz1981 commented Sep 12, 2024

@pcasaes I have proposed an alternative solution using a negative value scheme which should work, do you need any help to implement it?

Consider that currently acquire cannot happen if the counter is 0, so we have here a problem of double close, and not of "resurrection" : @mariofusco the only case where something similar could happen it's if an acquire happen after a close, but a previous acquire hasn't release it yet - but I think this is fine

@pcasaes
Copy link
Contributor Author

pcasaes commented Sep 12, 2024

@franz1981 What is the solution?

@pcasaes
Copy link
Contributor Author

pcasaes commented Sep 12, 2024

Oh, here? #43221 (comment)

@pcasaes
Copy link
Contributor Author

pcasaes commented Sep 12, 2024

Probably best if yuo do the implementation. I can certainly test it.

@franz1981
Copy link
Contributor

FYI @pcasaes #43257

@pcasaes
Copy link
Contributor Author

pcasaes commented Sep 12, 2024

@pcasaes
Copy link
Contributor Author

pcasaes commented Sep 12, 2024

@franz1981 Seems to have fixed it. But two things:

I do believe this field needs to be set as volatile (or use atomic reference)
https://github.com/quarkusio/quarkus/pull/43257/files#diff-f20997e90aae1d2fe046b512defb1a3e14215b7a834177d43369542bae99a7f6R17

Can we just return here instead of checking on count again?
https://github.com/quarkusio/quarkus/pull/43257/files#diff-f20997e90aae1d2fe046b512defb1a3e14215b7a834177d43369542bae99a7f6R126-R128

@franz1981
Copy link
Contributor

franz1981 commented Sep 13, 2024

Thanks for the check @pcasaes

For the volatile is not needed thanks to the barriers surrounding it

  1. Completable future's allocation store "release" the field value
  2. Reading the field requires to call Completable future::getNow/join before, which load "acquire" it

The acquire/release semantics implied happens-before, so we should be fine with weaker architectures like ARM, here, been compliant with the Java Memory Model 9.

Re

Can we just return here instead of checking on count again?

The existing algorithm doesn't forcibly free immediately the jar file ref, but try to keep it around till none needed it anymore (an acquire can still happen after a mark close, to save creating a useless new jar file ref again); it means that both release/markForClosing are in the position to move the counter to 0, and that means that both need to check if they have won the race to free it.

For release it can happen only -1 -> 0 (in the code comments) whilst for markForClosing, with 1 -> 0, hence both need to check their own expected ones before freeing the resource and attempt to cleanup the shared atomic reference

@mariofusco
Copy link
Contributor

I believe this pull request can be closed as the problem has been fixed by #43257

@gsmet
Copy link
Member

gsmet commented Sep 13, 2024

Thanks a lot for the efforts! This is superseded by #43257 .

@gsmet gsmet closed this Sep 13, 2024
@quarkus-bot quarkus-bot bot added the triage/invalid This doesn't seem right label Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins triage/invalid This doesn't seem right
Projects
Development

Successfully merging this pull request may close these issues.

5 participants