My personal notes, projects and best practices.
Aditya Hajare (Linkedin).
WIP (Work In Progress)!
Open-sourced software licensed under the MIT license.
Concurrency
is about multiple things happening at the same time in random order.Concurrency
is composition of independent execution compiutations, which may or may not run in parallel.Parallelism
is ability to execute multiple computations simultaneously.Concurrency
enables theParallelism
.Concurrency
is about dealing with lots of things at once.Parallelism
is about doing lots of things at once.Concurrency
is about structure andParallelism
is about execution.Memory Access Synchronization
tools reduceParallelism
and comes with their own limitations.
- Process:
Process
is just an instance of a running program.Process
provides environment for program to execute.- When the program is executed, the Operating System creates a process and :
- Allocates memory in a virtual address space.
- The virtual address space will contain Code Segment which is a compiled machine code.
- There is a
Data Region
which contains Global Variables. Heap Segment
is used forDynamic Memory Allocation
.Stack
is used for storingLocal Variables in Function
.
- Operating System:
- The job of operating system is to give fair chance for all processes to access CPU, memory and other resources. There are times when higher priority tasks get precedence.
- Threads:
Threads
are smallest unit of execution that CPU accepts.- Each
Process
has atleast one thread. That ismain thread
. Process
can have multipleThreads
.Threads
share the same address space.- Each
Thread
has it's ownStack
. Threads
run independent of each other.- Operating System Scheduler makes scheduling decisions at thread level and not at the process level!
Threads
can run concurrently (with each thread taking turn on individual core) or in parallel (with each thread running on different cores at the same time).Threads
communicate between each other by sharing memory.- Sharing of memory between threads creates lot of complexity.
- Concurrent access to to shared memory by two or more threads can lead to Data Race and outcome can be Un-deterministic.
- The actual number of threads we can create are limited.
- We can think of Goroutines as user space threads managed by Go runtime.
- Goroutines are extremely lightweight. Goroutines starts with 2kb of stack, which grows and shrinks as required.
- Low CPU Overhead: 3 instructions per function call.
- Can create hundreds of thousands of goroutines in the same address space.
- Channels are used for communication of data between Goroutines. Sharing of memory can be avoided.
Context Switching
between Goroutines is much cheaper than threadContext Switching
.- Go runtime can be more selective in what is persisted for retrieval, how it is persisted and when the persisting needs to occur.
- Go runtime creates OS threads.
- Goroutines runs in the context of OS threads.
- Many Goroutines can execute in a context of single OS threads.
- Go follows a logical concurrency model called Fork and Join.
- Go statement
Forks
a Goroutine. When a Goroutine isdone
with it's job, itJoins
back to themain routine
. Ifmain routine
doesn't wait for the Goroutine, then it is highly likely that a program will finish before the Goroutine gets a chance to run. - To create a
Join
point, we can usesync.WaitGroup
. - Waitgroups deterministically blocks the main goroutine.
- Psudo Code:
var wg sync.WaitGroup wg.Add(1) go func() { defer wg.Done() // Do stuff.. } wg.Wait()
- Goroutines execute within the same address space they are created in.
- Goroutines can directly modify variables in the enclosing lexical block.
- Go runtime has mechanism known as MN Scheduler.
- Fo Scheduler runs in user space.
- Go Scheduler uses OS threads to schedule Goroutines for execution.
- Goroutine runs in the context of OS threads.
- Go runtime creates number of worker OS threads, equals to
GOMAXPROCS
environment variable value. GOMAXPROCS
default value is number of processors/cores on machine.- It is a responsibility of Go Scheduler to distribute runnable Goroutines on over multiple OS threads that are created.
- Channels are used to communicate data between Goroutines.
- Channels can also help synchronize Goroutines execution.
- Channels are typed. They are used to send and receive values of a particular type.
- Channels are Thread Safe! Channel variables can be used to send and receive values concurrently by multiple Goroutines.
- Default value of Channel is
nil
. - Example:
var ch chan int
ch = make(chan int) // make() will allocate memory for channel and returns referance for the allocated Memory
// OR
ch := make(chan int) // Declares and allocates memory for channel
- Pointer operator is used for sending and receiving the value from channel.
- The "arrow" indicates the direction of data flow.
- Channels are blocking! i.e. The sending Goroutine is going to block until there is a corrosponding Goroutine ready to receive the value.
- It is responsibility of channel to make the Goroutine runnable again once it has the data.
- Sender Goroutine must indicate receiving Goroutine that it has no more values to send. We use
close(ch)
to close the channel. - Receiver receives 2 values:
value, ok = <- ch
// value: Received value from the sender
// ok: is a boolean value
// True, value generated by write
// False, value generated by close
- The receiver Goroutine can receive sequence of values from Channel and then it can range over those values.
- Loop automatically breaks when Channel is closed.
- Range does not return the second boolean value.
- They are Synchronous.
- Receiving Goroutine will block until there is sender and sender will block until there is receiver.
- To create Unbuffered Channel:
ch := make(chan int)
- They are Asynchronous.
- They are in-memory FIFO queues.
- There is a buffer between sender and receiver Goroutine.
- We can specify capacity i.e. Buffer size, which indicates the number of elemenets that can be sent without the receiver being ready.
- Sender can keep sending the values until buffer gets full. When the buffer gets full, the sender will get blocked.
- Receiver will keep receiving values until buffer gets empty. When the buffer gets empty, the receiver will get blocked.
- When using channels as a function parameters, we can specify if a channel is meant to only send or receive values.
- This will increase the Type Safety of our program.
- For e.g.
func(in <-chan string, out chan<- string) {
// in: This channel can only receive values of type string
// out: This channel can only send values of type string
}
- Default value for channels is
nil
.
var ch chan interface{}
- Reading and Writing to a
nil
channel will block forever.
var ch chan interface{}
<- ch
ch <- struct{}{}
- Closing
nil
channel willpanic
.
var ch chan interface{}
close(ch)
- Ensure the channels are initialized first.
- Owner of channel is a Goroutine that instantiates, writes and closes a channel.
- Channel utilizers only have a read-only view into the channel.
- It allows us to do operations on Channel which ever is ready and don't worry about the order.
- Select is like a
switch
statement. - Each statement specifies
send
orreceive
on a specific channel and it has associated block of statements. - Each cases specifies communication.
- All channel operations are considered simultaneously.
- Select waits until some case is ready to proceed. If none of the channels are ready, then entire Select statement is going to be blocked until some case is ready for the communication.
- When one channel is ready, that operation will proceed.
- If multiple channels are ready, it will pick one of the channels randomly and proceed.
- Select is helpful for implementing:
- Timeouts
- Non-blocking communications
- Select Statement syntax:
select {
case <- ch1:
// Block of statements
case <- ch2:
// Block of statements
case ch3 <- struct{}{}:
// Block of statements
}
- We can specify timeouts on channel operations as below.
- In below example:
- Select will wait until there is event on channel ch or until timeout is reached (after 3 seconds).
- The
time.After()
function takes in atime.Duration
argument and returns a channel that will send the current time after the duration we have specified.
select {
case v := <- ch:
fmt.Println(v)
case <- time.After(3 * time.Second):
fmt.Println("Timeout!")
}
- As we know channels are blocking, we can implement Non-Blocking operation using
select
by specifying thedefault
case. - In below example:
- Send or receive on a channel, but avoid blocking if channel is not ready.
default
allows us to exit aselect
block without blocking.
select {
case m := <- ch:
fmt.Println("Received message: ", m)
default:
fmt.Println("No message received")
}
- Empty
select
statement will block forever.
select {}
select
onnil
channel will block forever.
// Will block forever
var ch chan string
select {
case v := <- ch:
case ch <- v:
}
- Mutex is used to guard access to shared resource.
- sync.Mutex provides exclusive access to shared resource.
- If the Goroutine is just reading from the memory and not writing to the memory then we can use
READ WRITE MUTEX
. sync.RWMutex
allows multiple readers. Writers get exclusive lock.
- Channels:
- They are made to implement communication between Goroutines.
- Passing copy of data.
- Distributing units of work.
- Communicating asynchronous results.
- Mutex:
- When we have data such as Caches, States, Registeries which are big to be sent over the channels and we want access to this data to be thread safe. This is where classic synchronization tool such as Mutex comes into the picture.
Automic
is used to performed low level automic operations on memory. It is used by other synchonization utilities.- It is a
lockless
operation.
- Condition variable is one of the synchronization mechanism.
- It is a
lockless
operation. - A condition variable is basically a container of Goroutines that are waiting for a certain condition.
- Condition variables are type:
var c *sync.Cond
- We use constructor method
sync.NewCond()
to create a conditional variable, it takessync.Locker
interface as input, which is usuallysync.Mutex
.
mu := sync.Mutex{}
cond := sync.NewCond(&mu)
- Wait suspends the execution of Goroutine.
- Signal wakes one Goroutine waiting on
c
. - Broadcast wakes all Goroutines waiting on
c
.
sync.Once
is used to run one time initialization functions.once.Do(funcValue)
method accepts the initialization function.sync.Once
ensures that only one call toDo
ever calls the function that is passed in - even on different Goroutines.
sync.Pool
is commonly used for creation of expensive resources. For e.g. database connections, network connections and memory.
- Go provides race detector tool for finding race conditions in Go code.
- Race detector tool is integrated with other Go tools.
- We can use race detector tool as follows:
# Test the package for race conditions
go test -race mypkg
# Compile and run the program and check for race conditions
go run -race myprog.go
# Build Go code and check for race conditions
go build -race myprog
# Install the package
go install -race mypkg
- The binary builds need to be race enabled to run race tool.
- When a racy behaviour is detected, a warning is printed.
- Race enabled binary will be 10 times slower and consumes 10 times more memory.
- Integration tests and Load tests are good candidates to test with binary with race enabled.
- Go's concurrency primitives makes it easy to construct streaming pipelines that enables us to efficiently use I/O and multiple CPU cores available on the machine.
- Pipelines are often used to process streams or batches of data.
- Pineline is a series of stages that are connected by the channels. For e.g.
ch1 ch2 ch3
Goroutine1 -----> Goroutine2 -----> Goroutine3 -----> Goroutine4
- Each stage in pipeline is represented by a Goroutine.
- Each stage takes the data in, performs the operations on it, and then passes the data out.
- By using pipelines, we can separate the concerns of each stage and process individual stages concurrently.
- Stages could consume and return the same type.
- Fan Out and Fan In helps us break computationally intesive stage (Goroutine) in our Pipeline into multiple Goroutines and run them in parallel to speed it up.
- Fan Out:
- Multiple Goroutines are started to read data from the single channel.
- Multiple Goroutines read data from the same channel.
- Fan In:
- Process of combining multiple results from different channels into 1 channel.
- Pass a read-only
done
channel to Goroutine. - Close the channel, to send broadcast signal to all Goroutines.
- On receiving the signal on
done
channel, Goroutines needs to abandon their work and terminate. - We use
select
to make send/receive operation on channelpre-emptible
. For e.g.
select {
case out <- n:
case <- done:
return
}