Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: make ringbuffer right #1558

Merged
merged 1 commit into from
Jun 22, 2018
Merged

bugfix: make ringbuffer right #1558

merged 1 commit into from
Jun 22, 2018

Conversation

fuweid
Copy link
Contributor

@fuweid fuweid commented Jun 20, 2018

Signed-off-by: Wei Fu fhfuwei@163.com

Ⅰ. Describe what this PR did

Make ringbuffer right

Ⅱ. Does this pull request fix one issue?

NONE

Ⅲ. Describe how you did it

As the original design says, the ringbuffer uses two pointer to represent
push and pop position respectively in the circle list. The ringbuffer
allows to override oldest data if the buffer is full.

However, the Close action will be hang if the push pointer doesn't equal to
pop pointer. Please check the case.

E = empty, H = has data

At this time, the circle has full buffer, which means each node contains
the data.

   0H -> 1H -> 2H -> 3H -> 4H -> 0H
               ^     ^
               |     |
               |   push pointer
              pop pointer

In the following goroutine scheduled, the pop action will consume all
the data.

   0E -> 1E -> 2E -> 3E -> 4E -> 0E
               ^     ^
               |     |
               |   push pointer
              pop pointer

However, if there are not enough data to make the push pointer equals to
the pop pointer, the Close action will be hang.

In order to make the ringbuffer right, use FIFO queue to as buffer. The
front pointer contains the oldest data. When the buffer is full and
there is new data coming, the ringbuffer can drop the front one and push
new data into the back.

However, the ringbuffer still contains the data after closed. The
containerio should drain all the data and write it into the backend.

Ⅳ. Describe how to verify it

It should be closed successfully.

➜  /tmp cat ./run.sh
#!/usr/bin/env bash

for _ in $(seq 1 1000); do
  pouch run --rm reg.docker.alibaba-inc.com/base/busybox sh -c 'for i in $(seq 1 10000); do echo hello-$i; done;' | wc -l
done

Ⅴ. Special notes for reviews

The ringbuffer means non-blocked IO, which loss the data if the consumer pops slowly.
Next action:

  • refactor containerio
  • make blocked IO if necessary

@fuweid fuweid requested review from YaoZengzeng and HusterWan June 20, 2018 13:01
@pouchrobot pouchrobot added kind/bug This is bug report for project size/XL labels Jun 20, 2018
@allencloud
Copy link
Collaborator

CRI tools validation fails:

[k8s.io] Container runtime should support basic operations on container 
  runtime should support execSync [Conformance]
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:121
[BeforeEach] [k8s.io] Container
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/framework/framework.go:50
[BeforeEach] [k8s.io] Container
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:63
[BeforeEach] runtime should support basic operations on container
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:72
[It] runtime should support execSync [Conformance]
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:121
STEP: create container
STEP: Get image status for image: busybox:1.26
STEP: Create container.
Jun 20 13:09:09.131: INFO: Created container "517bed7d36812d99d5be947992a537f12d9b595710d374c7644da4942bd15fe7"
STEP: start container
STEP: Start container for containerID: 517bed7d36812d99d5be947992a537f12d9b595710d374c7644da4942bd15fe7
Jun 20 13:09:09.249: INFO: Started container "517bed7d36812d99d5be947992a537f12d9b595710d374c7644da4942bd15fe7"
STEP: test execSync
STEP: verify execSync output
STEP: execSync for containerID: 517bed7d36812d99d5be947992a537f12d9b595710d374c7644da4942bd15fe7
Jun 20 13:09:09.312: INFO: Execsync succeed
[AfterEach] runtime should support basic operations on container
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:76
STEP: stop PodSandbox
STEP: delete PodSandbox
[AfterEach] [k8s.io] Container
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/framework/framework.go:51
• Failure [0.908 seconds]
[k8s.io] Container
/home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/framework/framework.go:72
  runtime should support basic operations on container
  /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:68
    runtime should support execSync [Conformance] [It]
    /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:121
    The stdout output of execSync should be hello
    
    Expected
        <string>: 
    to equal
        <string>: hello
        
    /home/travis/gopath/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:380

ping @YaoZengzeng @starnop
And I retrigger this to have another try.

@codecov-io
Copy link

codecov-io commented Jun 21, 2018

Codecov Report

Merging #1558 into master will decrease coverage by 0.73%.
The diff coverage is 83.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1558      +/-   ##
==========================================
- Coverage   41.59%   40.85%   -0.74%     
==========================================
  Files         267      268       +1     
  Lines       17366    17778     +412     
==========================================
+ Hits         7223     7264      +41     
- Misses       9251     9618     +367     
- Partials      892      896       +4
Impacted Files Coverage Δ
daemon/containerio/container_io.go 62.43% <57.89%> (+0.89%) ⬆️
pkg/ringbuffer/list.go 93.75% <93.75%> (ø)
pkg/ringbuffer/ringbuff.go 95.91% <95.91%> (ø)
daemon/mgr/network.go 48.86% <0%> (-19.13%) ⬇️
pkg/user/user.go 49.23% <0%> (-14.11%) ⬇️
apis/server/server.go 39.39% <0%> (-9.36%) ⬇️
daemon/mgr/container.go 45.3% <0%> (-5.36%) ⬇️
daemon/logger/jsonfile/utils.go 71.54% <0%> (-1.63%) ⬇️
... and 2 more

elem.val = val

at := q.root.prev
n := at.next
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this local var n, isn't it always q.root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it clear.

@YaoZengzeng
Copy link
Contributor

LGTM, @yyb196 @HusterWan May review this PR more in detail 😄

As the original design says, the ringbuffer uses two pointer to represent
push and pop position respectively in the circle list. The ringbuffer
allows to override oldest data if the buffer is full.

However, the Close action will be hang if the push pointer doesn't equal to
pop pointer. Please check the case.

```
E = empty, H = has data

At this time, the circle has full buffer, which means each node contains
the data.

   0H -> 1H -> 2H -> 3H -> 4H -> 0H
               ^     ^
               |     |
               |   push pointer
              pop pointer

In the following goroutine scheduled, the pop action will consume all
the data.

   0E -> 1E -> 2E -> 3E -> 4E -> 0E
               ^     ^
               |     |
               |   push pointer
              pop pointer

However, if there are not enough data to make the push pointer equals to
the pop pointer, the Close action will be hang.
```

In order to make the ringbuffer right, use FIFO queue to as buffer. The
front pointer contains the oldest data. When the buffer is full and
there is new data coming, the ringbuffer can drop the front one and push
new data into the back.

However, the ringbuffer still contains the data after closed. The
containerio should drain all the data and write it into the backend.

Signed-off-by: Wei Fu <fhfuwei@163.com>
@fuweid
Copy link
Contributor Author

fuweid commented Jun 21, 2018

@yyb196 I have updated the code. Please take a look.

cc @YaoZengzeng

elem := elemPool.Get().(*element)
elem.val = val

at := q.root.prev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename at to last or tail, it may more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shaloulcy , the q.root.prev means the tail or last. at means the position. I think we can keep it here.

return nil
}

at := q.root.next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename at to first or head, it may more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the last comment

@fuweid
Copy link
Contributor Author

fuweid commented Jun 22, 2018

@yyb196 @shaloulcy any comments? 😄

@shaloulcy
Copy link
Contributor

LGTM

@pouchrobot pouchrobot added the LGTM one maintainer or community participant agrees to merge the pull reuqest. label Jun 22, 2018
@allencloud allencloud merged commit 758acd9 into AliyunContainerService:master Jun 22, 2018
@fuweid fuweid deleted the bugfix_ringbuffer branch August 3, 2018 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is bug report for project LGTM one maintainer or community participant agrees to merge the pull reuqest. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants