Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

Workspaces take too long to stop #1410

Closed
2 of 4 tasks
ScrewTSW opened this issue May 15, 2019 · 14 comments
Closed
2 of 4 tasks

Workspaces take too long to stop #1410

ScrewTSW opened this issue May 15, 2019 · 14 comments

Comments

@ScrewTSW
Copy link
Member

ScrewTSW commented May 15, 2019

Issue problem:
Recently it happens very often that workspaces can take upwards of 200 seconds, up to 1000 to get stopped.

This issue is visible across all clusters with both ephemeral and PVC workspaces (che7)

Workspace IDs that failed to stop:

These worksapce stop fails have been collected between 10:00 pm yesterday and 10:00 am this morning

PVC workspaces

1a: osiotest-workspace-1a total: 17

  • workspacewvafn1awvb8x1oax
  • workspace0azosssqnwnnamrz
  • workspacedkjdne9qb8dmb12r
  • workspacecbkrswrlmka8b3ie
  • workspaceqe47ta7okc0k22bl
  • workspace8mkxnf9r8gw4htt6
  • workspacedkjdne9qb8dmb12r
  • workspace3ijyyynk4bwr6iaw
  • workspaceqnftnnwr4pr1u6qf
  • workspaceva62dih9lwdf7lgi
  • workspacexlzrrp441tsgn2u3
  • workspacevb9nska2aepvl0eh
  • workspacefx6y3b7bvt1g7bol
  • workspace8hydyq200ryuss3v
  • workspacexma5mpptcrndud8j
  • workspace3eg3yrinj90uue88
  • workspacel37oqbcm8o86tj6y

1b: osiotest-workspace-1b total: 14

  • workspaceogj3ha7japkfemx3
  • workspacettkovts9egov9gog
  • workspacehqths5cyy6j2bhal
  • workspacezr4oyzfxgp0bj7sj
  • workspacevgwa7bcfaijwiub8
  • workspacehqths5cyy6j2bhal
  • workspace8fao0hsnk5tj7uli
  • workspacegga733wvpsf7xz7y
  • workspaceadwg3au7qdojhuhk
  • workspacesd5cexl3pcfdjz1k
  • workspacekkwvaf12p13fddks
  • workspacepzk2gmr7wt4mdcwr
  • workspaceqc5uxcwlrj5fk7s7
  • workspacesxjjz7ds74h1v9fi

2: osiotest-workspace-2 total: 17

  • workspacei9e6bx9kx4eityiy
  • workspacevel3v8ult7hwp08h
  • workspace7p75afwbgi60jruq
  • workspace64uzf7zmnt4wb3bc
  • workspacezze7v1iq54xm64af
  • workspace83sdj3hqpkxvgmfa
  • workspaceerl2qg7eatn1r0h8
  • workspace7p75afwbgi60jruq
  • workspace310hqbfb34r8zttz
  • workspacexjxzw8ldxcnx3jju
  • workspace7y0ay94u16f0yfr1
  • workspaceaxr0pi3iv9euk1jk
  • workspacecp0g4tronp98eud3
  • workspaceq0hbv7qessfaikv8
  • workspacenbwiwoafe2hx5iji
  • workspacexd92qfc99i0icmfq
  • workspaceb22uadtawo00mdt4

2a: osiotest-workspace-2a

  • not available - account has stuck running pods, excluded from tests

Ephemeral workspaces

1a: osiotest-workspace-eph-1a total: 19

  • workspaceou8o8uvtz6bo9jkg
  • workspace2uxbkrid6zt4lpjp
  • workspaceqncuy89bfg82q6tl
  • workspaceax7sdu2h3r7ydasy
  • workspacewn5w6nq664bk6kt1
  • workspace2kxh2y40azafvjv8
  • workspace30106z5xev9m3i9g
  • workspacel4u0ravw3u4u8eun
  • workspacew6dzf3yqt8zi70jh
  • workspacevwuer5ekskobrrlg
  • workspacec7fk0z2byzfyveol
  • workspace8sipoo5i7pxs3u8w
  • workspace4zcrn6gziomkf04s
  • workspaceqoh05jhn37dbijsr
  • workspaceej6a8jgo2yhhonfg
  • workspaceukqsxl6b1shd8hal
  • workspacejbtru1kb3q7w0qja
  • workspacek5qg1s9f4o39ssur
  • workspace68e16k41nsbwab3t

1b: osiotest-workspace-eph-1b total: 14

  • workspace6ha3eip3gz57yax2
  • workspacewgptydgtgfu5ffcx
  • workspaceoj25u1kqka53xj18
  • workspacer6cfys4loueixexr
  • workspacedp0i8hlwduusofhh
  • workspacecgtr5sayu56sfqph
  • workspaceub0wq5ncvmn50k7x
  • workspaceley7yoaaxurjmek9
  • workspacecmfm0a1ci9v0ms0q
  • workspace9zjbhuk42qeqd2y0
  • workspacea5h1numb4mhu1amg
  • workspace0zihl6vdo5d71s0h
  • workspaceavolf11tilqt9u37
  • workspaceanbvsauhz463lm6c

2: osiotest-workspace-eph-2 total: 13

  • workspacem6l5clm37ux4azud
  • workspace5r1p0fqyc13wc0w0
  • workspacezzw9j451cgwqjdvq
  • workspacegugzf4tlfmbla7r6
  • workspace1bujuwsbdx4fbbof
  • workspacez9jecg5j0drb7pty
  • workspace4ezlpjlu8yq45vjo
  • workspaceamnv55ydlqa01k06
  • workspace809oiss80n4v4hwu
  • workspacep5rfxkcrhyp5jovg
  • workspacegrb5lhdpr1v2wycs
  • workspace2ymd2zf8heoj1xh3
  • workspacepaslvi2iaebb1plx

2a: osiotest-workspace-eph-2a-new total: 17

  • workspacet5kay8v6h1ibv70f
  • workspaceeaepvep5h5cbe2us
  • workspaceigvn3yboxj2vn4fw
  • workspacednvznsmjxz9x5ttt
  • workspaceh5hsgt7ofn9ltq7m
  • workspace33ctthytiglvo8v3
  • workspaceyl4lcd5r6eih3eca
  • workspace07aslyt84o94d0d5
  • workspacey2eh54yg5crmrfbd
  • workspacey8w9e41j95l6mzaz
  • workspace4aporvtc89j86kj4
  • workspacehp6jxdkj9uwp3sfv
  • workspace734ip7lefagbdti1
  • workspacei66ve9ih1ht4xmfp
  • workspace4r3srlfh3qy7o2ld
  • workspace7dac8pao5lrqozm7
  • workspace5f20vp9xtdtcaie0

Red Hat Che version:

version: @eclipse-che/theia-assembly 0.5.0

  • I can reproduce it on latest official image

Reproduction Steps:

This issue happens at random, with intervals where workspace stops instantly and other where it takes insanely long time.

  • Create workspace
  • Start workspace
  • Wait for it to be in state READY
  • Stop workspace
  • Delete workspace
  • repeat

Runtime:

runtime used:

  • minishift (include output of minishift version)
  • OpenShift.io
  • Openshift Container Platform (include output of oc version)
@Katka92
Copy link
Collaborator

Katka92 commented May 15, 2019

I think something similar is happening with few workspaces created during periodic tests.
Affected worskpaces (all on production):

workspacer9xqgwoluvvm5s1w 2a 
workspaceo7jttxuwdggqa2ei 1a
workspaceuap750q287ln1epb 1a
workspacexccsqy6sjbj7epr4 1b
workspacetni7njpgioqze2by 2

What can I see in kibana e.g. for workspaceo7jttxuwdggqa2ei:

00:10:28.594 Workspace 'osiotest1a/workspace5i4uis' with id 'workspaceo7jttxuwdggqa2ei' created by user 'osiotest1a'
00:10:29.065 Starting workspace 'osiotest1a/workspace5i4uis' with id 'workspaceo7jttxuwdggqa2ei' by user 'osiotest1a'
(workspace is not started in 10 minutes, try to stop and remove it)
00:20:34.699 Workspace 'osiotest1a/workspace5i4uis' with id 'workspaceo7jttxuwdggqa2ei' is stopping by user 'osiotest1a'
00:22:11.050 Workspace 'osiotest1a:workspace5i4uis' with id 'workspaceo7jttxuwdggqa2ei' started by user 'osiotest1a'
00:22:11.079 No workspace start time recorded for workspace workspaceo7jttxuwdggqa2ei
00:27:16.806 Workspace 'osiotest1a/workspace5i4uis' with id 'workspaceo7jttxuwdggqa2ei' is stopped by user 'osiotest1a'
(then four of this:)
00:27:17.007 Could not get runtimeIdentity for workspace 'workspaceo7jttxuwdggqa2ei', and so cannot verify if current subject '593d4c52-dfe0-47cd-9238-6a44b06e4d34' owns workspace
00:27:17.529 Workspace 'workspaceo7jttxuwdggqa2ei' removed by user 'osiotest1a'
(then two of this:)
00:27:21.565 Could not get runtimeIdentity for workspace 'workspaceo7jttxuwdggqa2ei', and so cannot verify if current subject '593d4c52-dfe0-47cd-9238-6a44b06e4d34' owns workspace

So from what I can see here, the events were create - starting - stopping - started - stopped - removed. Please note, that from starting to started it took 12 minutes and from started to stopped it took 5 mintes.

@ibuziuk
Copy link
Member

ibuziuk commented May 15, 2019

@tdancs @ScrewTSW @Katka92 could you please confirm that those are both prod and prod-preview issue? going to bring it to Sre today

@Katka92
Copy link
Collaborator

Katka92 commented May 16, 2019

Seen again on production

workspacezjweqmjgo4p19edk 2
workspaceu2geeobgdb6yrmab 1a
workspaceq4jnery4fyew6b5q 1b

@Katka92
Copy link
Collaborator

Katka92 commented May 16, 2019

@ibuziuk Just a tough - could it be cause by long time for workspace starting? I just realised the meaning from what I've posted here #1410 (comment). In Kibana logs, it is pretty nicely shown - the time from when workspace is started to workspace stopped is just 5 seconds. So the problem is not in stopping itself.
I think that tests sends request to stop the workspace and wait until it is stopped, but really doesn't care if some of the time that is measured is spent by pod to get started. The problem is that even when stop request is sent, the pod somehow ignores it and waits until is started. Then stops and deletes.
If I'm correct, this problem is only a consequence of starting issue.

@Katka92
Copy link
Collaborator

Katka92 commented May 16, 2019

If I'm correct, this problem is only a consequence of starting issue.

Ah, from what I can see e.g. for workspace workspace4zcrn6gziomkf04s it was started in ~3 minutes and stopped after ~4 minutes. So this still is different issue.

@Katka92
Copy link
Collaborator

Katka92 commented May 21, 2019

Seen again on cluster 2:

workspaceuj6b71yhiy7xdj9z   4 minutes 
workspacegvsef06s4366xu5z   7 minutes

@Katka92
Copy link
Collaborator

Katka92 commented May 22, 2019

Seen again cluster
2a workspacehe686e199dl4srol 5 minutes
1a

workspaceupqt2m647jcpubfs   10 minutes 
workspacebgk5jpg2kech6ix7   10 minutes
workspacec9qxlg9rsgc9v2ly   11 minutes

@ibuziuk ibuziuk self-assigned this May 22, 2019
@ibuziuk
Copy link
Member

ibuziuk commented May 22, 2019

prod pods have been restarted and I can not reproduce the issue anymore

@ibuziuk ibuziuk added this to the Sprint #167 (Che OSIO) milestone May 22, 2019
@ScrewTSW
Copy link
Member Author

Since yesterday, the time it takes to stop workspaces has been significantly reduced.
The thing is, why does prod preview exhibit more than half of the times for both starting and stopping compared to production?

PVC stopping times are hovering around 20-40 seconds for prod and 10 seconds for prod-preview
PVC starting times are 40-60 seconds on prod and solid 40 seconds on prod-preview
Ephemeral stopping times are 15-30 with occasional spikes and 2-10 seconds for prod-preview
Ephemeral starting times are 40-60 seconds with occasional spikes and solid 40 for prod-preview

@ibuziuk
Copy link
Member

ibuziuk commented May 23, 2019

more than 5 seconds for workspace stop is an abnormal situation. Are you sure it still persists after a series of the deprovisioning actions. For me workspace stop takes just a couple of seconds.
@tdancs also, could you please compare the startup / stop results for prod-preview only on 2a claster (which is used for both) to make sure the infrastructure is the same.

@Katka92
Copy link
Collaborator

Katka92 commented May 24, 2019

@ibuziuk I quickly scanned workspaces stop times in periodic tests. I went through just few of them, but I can see e.g. this (logs gathered from Kibana):

May 24th 2019, 05:08:57.235 Workspace 'osiotest2/workspaceu35lb6' with id 'workspacesbmbkmtpr51d4sqh' is stopping by user 'osiotest2'
May 24th 2019, 05:09:11.591 Workspace 'osiotest2/workspaceu35lb6' with id 'workspacesbmbkmtpr51d4sqh' is stopped by user 'osiotest2'

Which says that stopping of this workspace last for 14 seconds. The test was run today in the morning, so I would say that the issue is still present.

@ScrewTSW
Copy link
Member Author

Another cases of workspaces taking too long to stop (data over this weekend 24-25-26-27.5)

Ephemeral workspaces

starter-us-east-1a : osio-ci-testcreation

  • workspacevdnzc21vny4es4ar
  • workspace4aiv59mmgtayds12
  • workspacewswquj7uwtb3oh97

starter-us-east-1b : osiotest-workspace-eph-1b

  • workspaceenpoegakfzh8kup2
  • workspace261z3ahrmcsptrcu
  • workspaceyzkzrn9uxg9vnwu4

starter-us-east-2 : osiotest-workspace-eph-2

  • workspacew95id1pdk87rbjq0
  • workspacec52v7wg3yii30dke
  • workspacespbw78nwxr1smpnn
  • workspacevsp0d79n5b9jhren

starter-us-east-2a : osiotest-workspace-eph-2a-new

  • workspacer5himz8cxveyemg6
  • workspacefmq4744402fq6han
  • workspacet19cc5iydbe38ogh

starter-us-east-2a-preview : osiotest-workspace-eph-preview

  • NONE

PVC workspaces:

starter-us-east-1a : osiotest-workspace-1a

  • workspace42etpzz8anwwd0bs
  • workspace1fnywbw4b1iyzkfy

starter-us-east-1b : osiotest-workspace-1b

  • workspacezy5mf8c5y46zj4i7

starter-us-east-2 : kkanova-osiotest1

  • workspacebgblk9ybuc2djn9b
  • workspaceozqhfhszikv3q99l

starter-us-east-2a : che-perf-prod1

  • workspaceasqexxoal9a2h38o
  • workspace4burlydy1epe4ju2
  • workspacepoqvvrezgf9lxya6

starter-us-east-2a-preview : osiotest-workspace-new-preview

  • NONE

@ibuziuk
Copy link
Member

ibuziuk commented Oct 22, 2019

Fixed by updating async thread pool settings - average workspace stop for the last week is 1.6 seconds according to grafana

@ibuziuk ibuziuk closed this as completed Oct 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants