Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix: concurrent increase by CounterWindow may cause PriorityQueue breaken #12505

Merged
merged 3 commits into from
Aug 4, 2024

Conversation

kael-aiur
Copy link
Contributor

@kael-aiur kael-aiur commented Aug 2, 2024

Fix <#12503>

  • Add a unit test to verify that the fix works.
  • Explain briefly why the bug exists and how to fix it.

as org.apache.skywalking.oap.meter.analyzer.dsl.counter.CounterWindow are single instance, the windows which type ConcurrentHashMap will return the same PriorityQueue object for the same meter, when the same meter increase with concurrent call, some unexcpected exception will occur in PriorityQueue object, source code:

public class CounterWindow {

    public static final CounterWindow INSTANCE = new CounterWindow();

    private final Map<ID, Tuple2<Long, Double>> lastElementMap = new ConcurrentHashMap<>();
    private final Map<ID, Queue<Tuple2<Long, Double>>> windows = new ConcurrentHashMap<>();

    public Tuple2<Long, Double> increase(String name, ImmutableMap<String, String> labels, Double value, long windowSize, long now) {
        ID id = new ID(name, labels);
        // there will return the same PriorityQueue object for concurrent operation
        Queue<Tuple2<Long, Double>> window = windows.computeIfAbsent(id, unused -> new PriorityQueue<>());
        window.offer(Tuple.of(now, value));  // some unexpected error will occur in this line when concurrent operation
        long waterLevel = now - windowSize;
        Tuple2<Long, Double> peek = window.peek();
        if (peek._1 > waterLevel) {
            return peek;
        }

        Tuple2<Long, Double> result = peek;
        while (peek._1 < waterLevel) {
            result = window.poll();  // some unexpected error will occur in this line when concurrent operation
            peek = window.element();  // some unexpected error will occur in this line when concurrent operation
        }

        // Choose the closed slot to the expected timestamp
        if (waterLevel - result._1 <= peek._1 - waterLevel) {
            return result;
        }

        return peek;
    }
    //....
}
  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #12503.
  • Update the CHANGES log.

@wu-sheng wu-sheng added this to the 10.1.0 milestone Aug 2, 2024
@wu-sheng wu-sheng added backend OAP backend related. enhancement Enhancement on performance or codes labels Aug 2, 2024
@wu-sheng
Copy link
Member

wu-sheng commented Aug 2, 2024

@kezhenxu94 Docker compose somehow missing. Could you check CI env as well?

@peachisai
Copy link
Contributor

@kezhenxu94 Docker compose somehow missing. Could you check CI env as well?

I had the same problem as well, I think the command docker compose should instead docker-compose
Ref actions/runner-images#9557

@wu-sheng
Copy link
Member

wu-sheng commented Aug 4, 2024

@kezhenxu94 Docker compose somehow missing. Could you check CI env as well?

I had the same problem as well, I think the command docker compose should instead docker-compose

Ref actions/runner-images#9557

We fixed it through new e2e tool version.

@kezhenxu94 kezhenxu94 merged commit 5c33ee0 into apache:master Aug 4, 2024
158 checks passed
@kael-aiur kael-aiur deleted the bugfix/counterwindow branch August 5, 2024 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. enhancement Enhancement on performance or codes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants