Socket overhaul #128

mzhong1 · 2020-03-05T18:21:11Z

Completely overhaul the Socket system to use STREAM sockets instead of REQ/REP sockets, adding in multithreading support for the disk-manager, and message queue handling for bynar and disk-manager.

This is in an effort to support disk-manager operations that may take more than a few minutes to execute, allowing Bynar to run operations without repeating and without hanging on a single operation for possibly hours depending on the operation.

… map

… op and return the old one

…s with threadPool for threaded worker add_disk operations

…ER sockets

…, added function to parse message properly and drain message bytes properly

…ore removals

…e partitions on the disks to be replaced to the list

…t immediately if check_all_disks failed

… already running

src/backend/ceph.rs

src/client.rs

src/disk_manager.rs

src/backend/ceph.rs

0X1A · 2020-03-26T16:26:07Z

I'll just post here what we went over on Slack as to include information for others or in case we need it

Since the ZQM polling functions can be considered non-blocking we can rule out any need for ensuring the socket polling does not spin the CPU resources at 100%
We should ideally begin to rule out allowing single character variables, as this impedes readability and doesn't give enough context for understanding the code at a glance
We understood that we currently do not have adequate tests in order to ensure proper functionality, so code reviews can only catch a certain number of issues that may arise with introducing changes.
Any variables that aren't quite needed should be replaced with _ or removed completely.

Other than my comment for the magic number and what I've outlined above, this LGTM

cholcombe973 · 2020-03-26T21:35:18Z

Well just because it's non blocking doesn't mean it won't soak up a CPU core. I've made that mistake before and it helps to insert a small sleep to let the CPU do something else

0X1A · 2020-03-31T15:32:25Z

@cholcombe973 I initially brought up the same thing, but reluctantly decided we may not need to ensure that resource gets dunked on. That being said, weird stuff happens, and that weird stuff tends to be the stuff that bites back.

@mzhong1 I think we should probably completely promise that thread won't get blocked in the loop

…rdless

…non-lvm OSDs and fix database ticket update function to use correct column name

…at operations once finished running

src/util.rs

…ent exploding log sizes

cholcombe973 · 2020-05-18T16:09:54Z

just a suggestion at this point but I think this should either be merged or split up. It's quite large

mzhong1 · 2020-05-18T17:35:47Z

just a suggestion at this point but I think this should either be merged or split up. It's quite large

Yups, I figure at this point I will probably squash merge this if everyone is alright with this PR....

cholcombe973 · 2020-05-18T17:36:50Z

Go for it

…n the disk instead of the partition only

mzhong1 added 30 commits January 31, 2020 11:10

Changed socket type to Dealer, and added outline for creating message…

213caa3

… map

implement create function for message request map

d4dfb92

Add operation to message map

56371f8

get an operation from the map

7bde826

Remove an op from the message map

c0f9dc5

Update the add_map_op function to update the map if it already has an…

9957a99

… op and return the old one

Get a specific disk hashmap

82bd8f3

Added channels and edited disk-manager's add_disk and listen function…

5336b74

…s with threadPool for threaded worker add_disk operations

Changed disk-manager handler for all functions to use threadpool/DEAL…

50a46a3

…ER sockets

Swapped DEALER with STREAM in disk-manager

72f3456

Added handling of sending messages from Bynar to disk-manager

9d4b09d

Added macros, handling add_disk

6fd3fb8

Fix a little client stuff

4767549

Some client fixing + macros

c3a59d6

Updated Zmq version

02a9ac4

fix notify?

770147f

Filter states in returned state machines and clippy fixes

bf586c5

Filter disks for WaitingForReplacement and not in-progress

f444936

Switched to STREAM/STREAM system

479ade6

Fix bynar-client to use new socket system

5e50318

parse_from_bytes grabs messages from the end of the byte vector given…

51e6041

…, added function to parse message properly and drain message bytes properly

change remove/safeto-remove to be stricter with config/non-lvm bluest…

08bce43

…ore removals

Fix macro use for parsing opresults

6e7b3a5

filter state machines for the disks that need replacement + add in th…

3528e8c

…e partitions on the disks to be replaced to the list

Push SafeToRemove + Remove to queue

91ec3b8

Error's cannot be cloned, so change check_for_failed_disk to error ou…

c9936c2

…t immediately if check_all_disks failed

handle add and remove return values

397e153

Add new message handler to main function

04f690f

Add functions to create a req_map for disk-manager + check if ops are…

85d3d61

… already running

Added request map to disk-manager, skip repeat requests

96ce4b0

cholcombe973 reviewed Mar 6, 2020

View reviewed changes

src/backend/ceph.rs Outdated Show resolved Hide resolved

cholcombe973 reviewed Mar 6, 2020

View reviewed changes

src/client.rs Show resolved Hide resolved

cholcombe973 reviewed Mar 6, 2020

View reviewed changes

src/client.rs Outdated Show resolved Hide resolved

cholcombe973 reviewed Mar 6, 2020

View reviewed changes

src/disk_manager.rs Outdated Show resolved Hide resolved

remove unused disk-manager code and update backend

4702c9a

mzhong1 requested a review from cholcombe973 March 16, 2020 13:00

0X1A reviewed Mar 26, 2020

View reviewed changes

src/backend/ceph.rs Show resolved Hide resolved

add comment, small loop sleep, and CentOS package dependencies

bbd763f

mzhong1 added 6 commits March 31, 2020 16:49

move the smartctl enable as a soft error, since smartctl -H runs rega…

d383c9b

…rdless

Fix checking if disk operation already done to account for bluestore …

103e7bf

…non-lvm OSDs and fix database ticket update function to use correct column name

Add check in osd metadata for non-lvm bluestore osds, remove SkipRepe…

50ed446

…at operations once finished running

update TOML to use correct block-utils version and vault crates

e13a72c

update goji and arrange function to prevent opening Client without use

eae5e98

Sleep the poll_events so it doesn't spin

4d8be60

0X1A requested changes Apr 22, 2020

View reviewed changes

src/util.rs Outdated Show resolved Hide resolved

mzhong1 added 7 commits April 22, 2020 16:34

fix the skip signal handling issue and add sleep to loop

9bfc05a

remove extraneous sleep

8413563

add explicit sleep to macro for polling events

0d6dc3f

Fix the fd leak

a43e444

Update libatasmart

af492ac

update ceph crates

73b2ea6

Fix soft error printing so common errors are changed to debug to prev…

fa685d2

…ent exploding log sizes

mzhong1 added 2 commits June 12, 2020 10:03

fix JIRA crate version

82e5276

fix remove operation on successful Add so it removes all operations o…

7477053

…n the disk instead of the partition only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socket overhaul #128

Socket overhaul #128

mzhong1 commented Mar 5, 2020 •

edited

Loading

0X1A commented Mar 26, 2020

cholcombe973 commented Mar 26, 2020

0X1A commented Mar 31, 2020

cholcombe973 commented May 18, 2020

mzhong1 commented May 18, 2020

cholcombe973 commented May 18, 2020

Socket overhaul #128

Are you sure you want to change the base?

Socket overhaul #128

Conversation

mzhong1 commented Mar 5, 2020 • edited Loading

0X1A commented Mar 26, 2020

cholcombe973 commented Mar 26, 2020

0X1A commented Mar 31, 2020

cholcombe973 commented May 18, 2020

mzhong1 commented May 18, 2020

cholcombe973 commented May 18, 2020

mzhong1 commented Mar 5, 2020 •

edited

Loading