Marcelo's Notes

This post serves as a study guide. All the links for the information can be found at the end of each section.

I do not claim it as my own.

This page is an organization of topics.

Process Synchronization

Process

What is a process?

What are the states of a process?

What are the types of concurrent unit control?

How do concurrent processes interact with each other?

Critical Sections

What is a critical section?

Solution to the critical section problem?

Semaphores

What is a semaphore?

Does the semaphore have atomic operations?

Why use semaphores?

Difference between locks, mutex semaphore?

Producer Consumer Problem

What is this Producer Consumer Problem?

Infinite Buffer Example

Code can be found on page 57 of Litte Book of semaphores

buffer: number[] = [];
  mutex: semaphore = 1;
  items: semaphore = 0;
  
  Producer {
    mutex.P() // lock the mutex
      buffer.add(item)
      items.V() // increase number of items for consumer threads
    mutex.V() // release mutex
  }

  Consumer {
    items.P() // decrease number of items
    mutex.P() 
      buffer.pop() // remove item from buffer
    mutex.V()
  }

What is wrong with this implementation? How to fix it?

Producer

Assume $mutex = 1$ at the start of the producer.

mutex.P() inside Producer makes $mutex = 0$
items.V() inside Producer can trigger a consumer thread
items.P() inside Consumer makes $mutex = -1$
mutex.V() inside Producer makes $mutex = 0$

This means we cannot enter the critical section inside the consumer. In order to proceed the semaphore value has to be greater than 0

The fix is to signal the consumer after we are out of the critical section.

buffer: number[] = [];
  mutex: semaphore = 1;
  items: semaphore = 0;
  
  Producer {
    mutex.P() // lock the mutex
      buffer.add(item)
    mutex.V() // release mutex
    items.V() // increase number of items and signal consumer threads
  }

Consumer

Various consumers can decrement value of items before popping. This leads to incorrect count. Consider this:

Consumer {
      mutex.P() 
        items.P() // decrease number of items
        buffer.pop() // remove item from buffer
      mutex.V()
    }

Assume $items = 0$

The consumer thread grabs the mutex
items.P() in Consumer gets stuck since the value is not greater than 0
Producer thread cannot increment the value of items, since it's waiting on the consumer to release the mutex

We have reached a deadlock.

Finite Buffer Example

Readers-Writers Problem

Little Book of Semaphores 4.2

What is the readers-writers problem?

What are the constraints?

Implementation

Starvation

Writer priority

Dining Philosophers Problem

Little Book of Semaphores 4.4

What is the Dining Philosophers Problem?

A philosopher needs two forks to eat. Looking at the diagram, we can tell some of them will have to wait for an adjacent philosopher to finish eating.

Each philosopher, representing a thread, runs the following loop:

while (true) {
    think();
    getForks();
    eat();
    putForks();
  }

Each philosopher knows their number ranging from 0 to $n$ . Each fork is also numbered from 0 to $n$ . A philosopher $i$ grabs the left fork $i$ and the right fork $i+1$

The problem lies in the function $getForks()$ and $putForks()$ . We need to implement them such that:

Only one philosopher can hold a fork at a time
No deadlocks
No starvation
Multiple philosophers eating

Solutions

const getForks = (i: number) => {
    forks[getLeft(i)].P();
    forks[getRight(i)].P();
  }

  const putForks = (i: number) => {
    forks[getLeft(i)].V();
    forks[getRight(i)].V();
  }

Does this solution work?

Can it be improved?

We can ensure no deadlock as long as we leave one fork available. Say there are 5 forks, 5 philosophers, but we only allow 4 to eat. In the scenario where everyone reaches for a fork on the right, there will be one spare fork since one philosopher is not eating. We can implement this with a semaphore initialized to $n-1$ , $n$ being the number of philosophers.

maxPhilosophers: semaphore = n - 1;

  const getForks = (i: number) => {
    maxPhilosophers.P(); // only the first n - 1 philosophers can proceed

    forks[getLeft(i)].P();
    forks[getRight(i)].P();
  }

  const putForks = (i: number) => {
    forks[getLeft(i)].V();
    forks[getRight(i)].V();

    maxPhilosophers.V(); // notify the waiting philopher they can eat
  }

Monitors

What is a monitor?

Properties of monitors

Producer Consumer Problem

Implementation

Readers-Writers Problem

Readers Priority

Geeks for Geeks:

Semaphores vs Monitors

References

Sources

Process Communication

Distributed Systems 3rd edition (2017) Chapter 4

Message Passing Model

Why do Distributed Systems use message passing?

In a very high level, how does it work?

Say process $P$ wants to communicate with with process $Q$ .

$P$ builds message in local address space
Sytem call sends message to $Q$
Both $P$ and $Q$ need to agree on what was sent

OSI Model

What is the OSI Model?

Why use layers in the OSI Model? How many are there?

Common Protocols

Middleware Protocols

What is middleware

What effect did it have on the OSI model?

Berkeley Sockets

What is a socket?

What are the socket operations for TCP?

Describe the communication pattern

Server side executes the first 4 operations in order:

$socket \rightarrow bind \rightarrow listen \rightarrow accept$

When socket is called, caller creates new communication endpoint. The OS reserves resources for the specified resource.
bind associates local address with newly created socket. This tells the OS to receive messages on specified address and ports.
listen us called only in the case of connection-oriented communication. Allows OS to reserve buffers for specified amount of pending connection requests that are to be accepted.
accept blocks callers until receiving connection request. When request arrives, local OS creates copy of the socket and returns it to the caller.

Now we look at the client side.

5 . connect requires caller to specify where a connection request is to be sent. Client is blocked until connection is established.

Both client and server perform send and receive operations.

8 . close is a symmetric operation.

Remote Procedure Call

Distributed Systems 3rd edition (2017) 4.2

What is the basic idea of Remote Procedure Call?

What can cause issues?

Is RPC transparent?

Single machine procedure call

newlist = append(data, dbList);

Assumptions:

dbList is globally defined
data is the element to be appended

Observations:

dbList is a reference to a list
data is a value

$append(data, dbList)$ pushes the representations of data & dbList into the stack
Representations are now accessible by append's implementation

stack before the call to append
stack while the called procedure is active

What is a stub?

Why do stubs convert messages?

Client stub vs server stub

Steps of a RPC

Stub Generation

When to use Asynchronous RPC?

Deferrred synchronous RPC?

References

Sources

Clocks

Cristian's Clock

Distributed Systems 3rd edition (2017) 6.1

What is the idea?

Issues with this?

Example

$j$ sends a timestamped request $T_1$ to $i$
$i$ records the arrival request $T_2$ time (from local clock)
$i$ returns response $T_3$ with a timestamp, as well as the timestamp from $T_2$
$j$ records the arrival respone $T_4$ time
$j$ calculates offset: $\theta = \frac{(T_2 - T_1) - (T_4 - T_3)}{2}$

If clocks are to be modified, would it be gradually or abruptly?

The Berkeley algorithm

How does it work?

When to use this algorithm?

Logical clocks

Distributed Systems 3rd edition (2017) 6.2

Is it necessary to have an acurate account of the real time?

What two things did Lamport point out?

Lamport's Logical Clocks

Happens-before

What does a || b mean?

Also displayed as $a \rightarrow b$ It is read as event $a$ happens before $b$ .

It means all events agree that a happens firtst, then afterwards b.

If $a$ and $b$ are in the same process, then $a \rightarrow b$ is true
If $a$ is an event that sends a message,and $b$ is the event of receiving a message then $a \rightarrow b$ is true. (You can't receive a message without sending it!)

Is Happens-before a transitive property?

Yes. if $a \rightarrow b$ and $b \rightarrow c$

$\Rrightarrow a \rightarrow c$

When are events said to be concurrent?

When they happen in separate processes that do not exchange messages.

$a \rightarrow b$ is not true, nor is $b \rightarrow a$

Pretty much nothing can be said about when the events happened first.

For every event $a$ , we assign a value $C(a)$ on which all processes agree.

What property must these values have?

If $a$ and $b$ are in the same process, and $a$ occurs before $b$ then $a \rightarrow b$ then $C(a) < C(b)$
If $a$ is a sending event to another process and $b$ is the receiving event, then $C(a)$ and $C(b)$ have to be assigned such that everyone agrees that $C(a) < C(b)$

How to make corrections to the clock time?

The clock time, $C$ , must always increase. Corrections to time can be made by adding a positive value, not substracting one.

Lamport's Logical Clocks Implementation

Each process, $P_i$ , mantains a local counter, $C_i$

How are the clocks updated?

Before executing an event, $P_i$ increments $C_i$ : $C_i \leftarrow + d$
When process $P_i$ sends message $m$ to $P_j$ , it sets m's timestamp $ts(m) = C_i$
When $P_j$ receives m, it adjusts its own local counter as $C_j \leftarrow max{C_j, ts(m)}$

If a||b then C(a) < C(b)?

True. But the converse is false. $C(a) < C(b) \Rrightarrow a \rightarrow b$

Why?

Total-ordered multicasting

Consider a DB has been replicated across several sites in order to improve query performance. A query is forwarded to the nearest copy. Each update operation must be carried out at each replica in the same order.

What is total-ordered multicasting?

Vector Clocks

What can Lamport clocks say about the relationship of events a and be if C(a) < C(b)?

Nothing. It does not imply $a$ happened before $b$ .

Why? Hint: use three processes

$T_s(m_i):$ time at which $m_i$ was sent
$T_r(m_i)$ time of receipt

We know that for each $m_i$ : $T_s(m_i) < T_r(m_i)$

Consider messages $m_i$ and $m_j$ , what can we say from $T_r(m_i) < T_s(m_j)$ ?

Let $m_i = m_1$ and $m_j = m_3$

Looking at the figure below, we notice that $m_3$ was sent after receiving $m_1$ . This means that $m_3$ may have depended on both $m_1$ and $m_2$ . We cannot be sure since Lamport clocks do not capture causality.

How to capture causality?

How to track causality?

Assign each event a unique name and an incrementing counter: $p_k$ is the $k^{th}$ event that happened at process $P$ Then keep track of the causal histories

Example: Two local events happen at process P, then causal history $H(p_2) = \{ p_1, p_2 \}$

Assume process $P$ sends a message to $Q$ . At the time of arrival, the causal history of $Q$ was ${q_1}$ . To track causality, $P$ sends its own causal history ${p_1, p_2}$ . Upon arrival, $Q$ records the event as $q_2$ and updates its causal history to $\{p_1, p_2, p_3, q_1, q_2\}$

Why is the P's causality history before Q's?

To determine if an event $p$ ocurred before $q$ , we check that $p \in H(q)$ Assumming q is the last local event.

What is the problem with causal histories?

How to construct vector clocks?

Assign an index to each process
$j^{th}$ entry represents number of events that occurred before $P_j$

We now construct the vector clock by: letting each process $P_i$ mantain a vector $VC_i$ with the following properties:

$VC_i[i]$ number of events that have occurred so far at $P_i$
If $VC_i[j] = k$ then $P_i$ knows that $k$ events occured at $P_j$

Vector Clocks implementation rules

Before executing an event, $P_i$ executes: $VC_i[i] = VC_i[i] + d$ . Recording an event
When $P_i$ sends message $m$ to $P_j$ , it sets $ts(m) = V[i]$
When $P_j$ receives $m$ , it adjust's its own vector by setting $VC_j[k] = max\{VC_j[k], ts(m)[k]\}$

Example

References

Sources

Mutual Exclusion

What categories are mutual exclusion algorithms classified in?

A centralized algorithm

Straightforward way to achieve mutual exclusin in DS?

Pros and Cons?

Distributed Algorithms

Lamport's Algorithm

Geeks for Geeks

What type of algorithm is Lamport's?

How are critical sections executed?

What are the types of messages in the algorithm?

How does each site keep track of CS requests?

Each site $S_i$ has a local queue denoted by $request_queue_i$ This queue orders requests requests by their timestamps.

Summarize Lamport's Algorithm

Let $S_i$ be the site t

Critical section request:

$S_i$ sends request with timestamp to other sites while also putting it in its local queue
Site $S_j$ puts $S_i$ 's request in the local request queue
$S_j$ replies to $S_i$

Critical section access:

$S_i$ enters critical section when:
- $S_i$ is on the top of the local queue
- $S_i$ receives a reply with a larger timestamp from all sites $S_j$

Critical section release:

$S_i$ removes its request from local queue
$S_i$ sends a timestamped RELEASE message to other sites
All other sites remove $S_i$ 's request from their local queues upon recieval from previous message

Example

Proof by contradiction

Assumptions:

$S_i$ and $S_j$ are both in the CS
$S_i$ 's request timestamp is smaller than $P_j$ 's

With the previous assumptions it means $S_i$ and $S_j$ are on top of their respective queues Using FIFO, assumption 2, and the fact that $P_j$ is executing, then the request from $P_i$ is in $P_j$ 's queue

This is a contradiction since, $P_i$ should be at the top of the queue since it has a smaller timestamp

So there cannot be two processes executing at the same time.

Ricart Agrawala Algorithm

Geeks for Geeks

What type of algorithm is Ricart Agrawala's?

How are critical sections executed?

What are the types of messages in the algorithm?

How does each site keep track of CS requests?

Each site $S_i$ has a local queue denoted by $request_queue_i$ This queue orders requests requests by their timestamps.

Summarize Ricart Agrawala's Algorithm

Let $S_i$ be the site t

Critical section request:

$S_i$ sends message with timestamp to other sites
Sites $S_j$ reply immediately to $S_i$ with approval if $S_j$ :
- is not interested in the critical section
- not executing the CS
- has a request for critical section with a larger timestamp than incoming request
The site $S_j$ sends a delayed approval to $S_i$ if it had a request for the critical section with a smaller timestamp than the incoming request; in this case the response is delayed until the critical section is done

Critical section access:

$S_i$ enters critical section after receiving approval from every other site

Critical section release:

$S_i$ replies to all the defferred requests

Example

Does this algorithm require FIFO?

Maekawa's Algorithm

How does it differ from the previous algorithms?

What are the types of messages in the algorithm?

Summarize the algorithm?

Summary of critical section request:

$S_i$ sends request to subset of sites Sites $S_j$ sends a reply to $S_i$ if it has not already sent one since the last release If it has sent a reply, it puts the request inside the queue.

**Summary of critical section access: **

$S_i$ enters critical section when after receiving replies from all sites in the subset

Can deadlocks occur?

Election

Garcia-Molina

What is the bully algorithm?

Explained

Consider $N$ processes ${P_0,...,P_{N-1}}$
$id(P_k)=k$

When the coordinator stops responding, a process $P_k$ initiats election:

$P_k$ sends ELECTION messages to all higher processes ( $P_{k+1}, ..., P_{n-1}$ )
If no one responds, $P_k$ is the coordinator
If a higher up responds, they take over.
Repeat until coordinator gets elected

"Biggest guy in town wins"

Ring Algorithm

Explained

References

Sources

Databases

What is a database?

Transactions

What are transactions?

Actions

Assumptions?

ACID?

What is the transaction model?

What are nested transactions?

What are distributed transactions?

Concurrency Control

What is concurrency control?

What three modules(managers) does a DB system consists?

Can the previous models be expanded to a Distributed System?

When do transactions conflict with each other?

What is serial execution?

Transaction $b$ executes after transaction $a$ has completed

What is concurrent execution?

Transaction $b$ begins execution before $a$ has completed

Logs

What is a log?

When are logs equivalent?

Serial log vs Serializable Log?

What is a Serialization Graph?

Directed graph to test if a log is serializable.

Transactions serve as nodes
An edge ei is constructed between node $T_j$ to $T_k$ if one of the operations in $T_j$ appears in the schedule before some conflicting operation in $T_k$ .

How to draw graph?

Number of nodes = Number of transactions
Directed edge $i \rightarrow j$ for each conflicting operation $o_i \rightarrow o_j$

Concurrency Control Algorithms

What do they do?

Lock Based

How do lock-based algorithms work?

Types of Lock

What does it mean for a transaction to be well formed?

Two-phase locking

How to know if all legal logs in a set of transactions are serializable?

Issues with two-phase locking

Time Based Algorithms

What is the idea?

How to timestamp transactions?

Basic Timestamp Ordering

References

Sources

Process Synchronization.css-svjswr{opacity:0.6;border:0;border-color:inherit;border-style:solid;border-bottom-width:1px;width:100%;}

Process

Critical Sections

Semaphores

Producer Consumer Problem

Readers-Writers Problem

Dining Philosophers Problem

Monitors

Producer Consumer Problem

Readers-Writers Problem

References

Process Communication

Message Passing Model

OSI Model

Middleware Protocols

Berkeley Sockets

Remote Procedure Call

References

Clocks

Cristian's Clock

The Berkeley algorithm

Logical clocks

Lamport's Logical Clocks

Happens-before

Total-ordered multicasting

Vector Clocks

References

Mutual Exclusion

A centralized algorithm

Distributed Algorithms

Lamport's Algorithm

Ricart Agrawala Algorithm

Maekawa's Algorithm

Election

Garcia-Molina

Ring Algorithm

References

Databases

Transactions

Concurrency Control

Logs

Concurrency Control Algorithms

Lock Based

Time Based Algorithms

References

Process Synchronization