This post serves as a study guide. All the links for the information can be found at the end of each section.
I do not claim it as my own.
This page is an organization of topics.
What is a process?
We can define a process as a program that is currently being executed.
What are the states of a process?
What are the types of concurrent unit control?
when do they occurr?
How do concurrent processes interact with each other?
What is a critical section?
Region of code that needs to be executed atomically since resources are shared
Solution to the critical section problem?
Busy Waiting
Technique in which a process/task waits and constantly checks for a condition to be satisfied before proceeding with its execution
Disabling Interrupts
Test-and-Set Intruction
boolean lock = false;boolean TestAndSet (boolean &target){boolean rv = target;target = true;return rv;}while(1){while (TestAndSet(lock));critical sectionlock = false;remainder section}
What is a semaphore?
It is similar to an integer but differs in:
Does the semaphore have atomic operations?
Yes
sem = Semaphore(1)// Processsem.P()// critical sectionsem.V()
Why use semaphores?
Difference between locks, mutex semaphore?
What is this Producer Consumer Problem?
Producer threads create items and place them in a buffer, meanwhile consumer threads remove them from the buffer. Producer threads should not be able to create items if the buffer is full, and consumers should not be able to remove items if buffer is empty.
Infinite Buffer Example
Code can be found on page 57 of Litte Book of semaphores
buffer: number[] = [];mutex: semaphore = 1;items: semaphore = 0;Producer {mutex.P() // lock the mutexbuffer.add(item)items.V() // increase number of items for consumer threadsmutex.V() // release mutex}Consumer {items.P() // decrease number of itemsmutex.P()buffer.pop() // remove item from buffermutex.V()}
What is wrong with this implementation? How to fix it?
Producer
Assume mutex=1 at the start of the producer.
This means we cannot enter the critical section inside the consumer. In order to proceed the semaphore value has to be greater than 0
The fix is to signal the consumer after we are out of the critical section.
buffer: number[] = [];mutex: semaphore = 1;items: semaphore = 0;Producer {mutex.P() // lock the mutexbuffer.add(item)mutex.V() // release mutexitems.V() // increase number of items and signal consumer threads}
Consumer
Various consumers can decrement value of items before popping. This leads to incorrect count. Consider this:
Consumer {mutex.P()items.P() // decrease number of itemsbuffer.pop() // remove item from buffermutex.V()}
Assume items=0
We have reached a deadlock.
Finite Buffer Example
Recall that a producer cannot create new items if buffer is full. Also, the consumer cannot retrieve items if buffer is empty.
const bufferSize: number = N;items: semaphore = 0;...if (items >= bufferSize)block()...
Would this work?
No, we cannot read the value of a semaphore. Only increase and decrease.
Solution
We can add one more semaphore and set it equal to the size of the buffer.
buffer: number[bufferSize];mutex: semaphore = 1;items: semaphore = 0;spaces: semaphore = bufferSize;Producer {spaces.P(); // decrement number of available spacesmutex.P();buffer.add()mutex.V();items.V() // signal the consumer}Consumer {items.P(); // decrement number of items in buffermutex.P();buffer.pop();mutex.V();spaces.V(); // increment spaces}
Little Book of Semaphores 4.2
What is the readers-writers problem?
Occurs when a resource is read and written by concurrent threads. Ideally, you do not want to read the resource while it is being written.
What are the constraints?
Implementation
isRoomEmpty: semaphore = 1;mutex: semaphore = 1;readers: number = 0;Reader {mutex.P();readers++;if(readers == 1) isRoomEmpty.P(); // first reader locks the roommutex.V();<READ/>mutex.P();readers--;if(readers == 0) isRoomEmpty.V(); // last readers frees the roommutex.V();}Writer {isRoomEmpty.P(); // writer waits for room to be empty<WRITE/>isRoomEmpty.V();}
Starvation
In the previous implementation, writers can starve. This means that readers can keep coming in and writers might wait forever.
Make it so that when a writer arrives, exisiting readers finish.
isRoomEmpty: semaphore = 1;mutex: semaphore = 1;turnstile: semaphore = 1;readers: number = 0;Reader {turnstile.P(); // decrease numberturnstile.V(); // the final increase to reach 1 is given by the writermutex.P();readers++;if(readers == 1) isRoomEmpty.P(); // first reader locks the roommutex.V();<READ/>mutex.P();readers--;if(readers == 0) isRoomEmpty.V(); // last readers frees the roommutex.V();}Writer {turnstile.P(); // let readers know a writer came inisRoomEmpty.P(); // writer waits for room to be empty<WRITE/>turnstile.V(); // notify readersisRoomEmpty.V(); // room is ready}
Writer priority
Little Book of Semaphores page 79
readerMutex: semaphore = 1;readers: number = 0;writerMutex: semaphore = 1;writers: number = 0;allowReaders: semaphore = 1;allowWriters: semaphore = 1;Reader {allowReaders.P();readerMutex.P();readers++;if(readers == 1) allowWriters.P(); // notify writers that readers are therereaderMutex.V();allowReaders.V();<READ/>readerMutex.P();readers--;if(readers == 0) allowWriters.V(); //signal writers the are no more readersreaderMutex.V();}Writer {writerMutex.P();writers++;if (writers == 1) allowReaders.P(); // notify readers to waitwriterMutex.V();allowWriters.P();<WRITE/>allowWriters.V();writerMutex.P();writers--;if(writers == 0) allowReaders.V();writerMutex.V();}
Little Book of Semaphores 4.4
What is the Dining Philosophers Problem?
A philosopher needs two forks to eat. Looking at the diagram, we can tell some of them will have to wait for an adjacent philosopher to finish eating.
Each philosopher, representing a thread, runs the following loop:
while (true) {think();getForks();eat();putForks();}
Each philosopher knows their number ranging from 0 to n. Each fork is also numbered from 0 to n. A philosopher i grabs the left fork i and the right fork i+1
The problem lies in the function getForks() and putForks(). We need to implement them such that:
Solutions
const getForks = (i: number) => {forks[getLeft(i)].P();forks[getRight(i)].P();}const putForks = (i: number) => {forks[getLeft(i)].V();forks[getRight(i)].V();}
Does this solution work?
No. Imagine each philosopher picks up the fork to the right. There are no more forks available, and each philosphre only has one fork. No one can eat, thus leading to a deadlock.
Can it be improved?
We can ensure no deadlock as long as we leave one fork available. Say there are 5 forks, 5 philosophers, but we only allow 4 to eat. In the scenario where everyone reaches for a fork on the right, there will be one spare fork since one philosopher is not eating. We can implement this with a semaphore initialized to n−1, n being the number of philosophers.
maxPhilosophers: semaphore = n - 1;const getForks = (i: number) => {maxPhilosophers.P(); // only the first n - 1 philosophers can proceedforks[getLeft(i)].P();forks[getRight(i)].P();}const putForks = (i: number) => {forks[getLeft(i)].V();forks[getRight(i)].V();maxPhilosophers.V(); // notify the waiting philopher they can eat}
What is a monitor?
Collection of variables and procedures. Kinda like a class.
interface Condition {wait, // Process performing will be suspended and placed in the local queuesignal, // One of the queued processes is poppedqueue // local queue of pending processes}Monitor Test{variables: any;myConditions: Condition;procedure pn {...}}
Properties of monitors
Implementation
Monitor ProducerConsumer {items: number = 0; // # of items inside bufferfull: condition;empty: condition;Procedure Produce {if (items == N) full.wait(); // can't produce since buffer is full<PRODUCE/> // should have released the waititems++;empty.signal(); // notify buffer is NOT empty & consumer can proceed}Procedure Consume {if (items == 0) empty.wait(); // can't consume since buffer is empty<CONSUME/>items--;full.signal(); // notify producer buffer is no longer full}}
Readers Priority
Monitor ReaderWriter {readers: number = 0;busy: boolean = false;reader: condition;writer: condition;Procedure beginRead {if (busy) reader.wait(); // wait until writer is donereaders++;reader.signal(); // signal readers}Procedure endRead {reader--;if(readers == 0) writers.signal(); // notify writers there are no more readers}Procedure beginWrite {if(readers > 0 || busy) writer.wait(); // either reading or writing in processbusy = true; // preparationwriter.signal(); // notify writer it can proceed}Procedure endWrite {busy = false; // writer has finishedif(reader.queue) reader.signal(); // notify reader it can proceedelse writer.signal(); // if no readers, go to writer}}
Geeks for Geeks:
Semaphores vs Monitors
Sources
Distributed Systems 3rd edition (2017) Chapter 4
Why do Distributed Systems use message passing?
Distributed Systems send and receive low level messages due to the absence of shared memory.
In a very high level, how does it work?
Say process P wants to communicate with with process Q.
What is the OSI Model?
The Open Systems Interconnection Model identifies the various levels of communication, their purpose, as well as naming them. It is designed to allow open systems to communicate by using communication protocols.
Why use layers in the OSI Model? How many are there?
Each layer offers one or more specific communication services to the layer above it. In this way, the problem of getting a message from A to B can be divided into manageable pieces, each of which can be solved independently of the others.
7 Layers:
Common Protocols
What is middleware
Application that lives in the OSI application layer, but contains many general-purpose protocols that warrant their own layers.
What effect did it have on the OSI model?
It replaces the session and presentation layer
What is a socket?
One endpoint of a two way communication link between two programs running on the network.
What are the socket operations for TCP?
Describe the communication pattern
Server side executes the first 4 operations in order:
socket→bind→listen→accept
Now we look at the client side.
5 . connect requires caller to specify where a connection request is to be sent. Client is blocked until connection is established.
8 . close is a symmetric operation.
Distributed Systems 3rd edition (2017) 4.2
What is the basic idea of Remote Procedure Call?
Allow programs to call procedures located on other machines.
When a process on machine A calls a procedure on machine B:
Information can be transported from the caller to the callee in the parameters and can come back in the procedure result
What can cause issues?
Is RPC transparent?
Yes, the goal if that the calling procedure should not be aware that the called procedure is executing on a different machine or vice versa. In simple words, we want to make it look like local procedure.
Single machine procedure call
newlist = append(data, dbList);
Assumptions:
Observations:
stack before the call to append
stack while the called procedure is active
What is a stub?
A stub is a piece of code that translates parameters sent between the client and server during a remote procedure call.
Why do stubs convert messages?
The client and server use different address spaces, so parameters used in function calls have to be converted, otherwise the values of those parameters could not be used, because pointers to parameters in one computer's memory would point to different data on the other computer. The client and server may also use different data representations, even for simple parameters (e.g., big-endian versus little-endian for integers).
Client stub vs server stub
Steps of a RPC
Stub Generation
Conside the following function:
void foobar(char x, float y, int z[5]){...}
Assuming a word is four bytes:
Using these rules, the following message if formed:
Server stub know the format that is being used.
Is agreeing on the format enough?
No. Client and server need to agree on representation of data structures.
When to use Asynchronous RPC?
To support situations in which there no result to return to the client
How does it work?
a. shows traditional RPC (notice how the procedure finishes on the server and then sends results to client)
b. Async RPC (acknowledgment is sent, then starts working on procedure)
Deferrred synchronous RPC?
Occurs when reply will be returned but client is not prepared to wait for it and do nothing in the meantime.
Sources
Distributed Systems 3rd edition (2017) 6.1
What is the idea?
Client's contact a server to retrieve the time.
Issues with this?
Message delays will have outdated the reported time.
Example
If clocks are to be modified, would it be gradually or abruptly?
Graudally. Changing the time in one go can lead to serious problems such as: an object file compiled just after the clock change having a tim
How does it work?
Server constantly asks nodes what time it is. It then calculates the average time and notifies node whether they should speed up or slow down their local clocks.
When to use this algorithm?
Typically to be used in an internal synchronization algorithm.
Distributed Systems 3rd edition (2017) 6.2
Is it necessary to have an acurate account of the real time?
No, we just need nodes to agree on a time.
What two things did Lamport point out?
What does a || b mean?
Also displayed as a→b It is read as event a happens before b.
It means all events agree that a happens firtst, then afterwards b.
Is Happens-before a transitive property?
Yes. if a→b and b→c
⇛a→c
When are events said to be concurrent?
When they happen in separate processes that do not exchange messages.
a→b is not true, nor is b→a
Pretty much nothing can be said about when the events happened first.
For every event a, we assign a value C(a) on which all processes agree.
What property must these values have?
How to make corrections to the clock time?
The clock time, C, must always increase. Corrections to time can be made by adding a positive value, not substracting one.
Lamport's Logical Clocks Implementation
Each process, Pi, mantains a local counter, Ci
How are the clocks updated?
If a||b then C(a) < C(b)?
True. But the converse is false. C(a)<C(b)⇛a→b
Why?
Consider a DB has been replicated across several sites in order to improve query performance. A query is forwarded to the nearest copy. Each update operation must be carried out at each replica in the same order.
What is total-ordered multicasting?
Multicast operation by which all messages are delivered in the same order to each receiver
What can Lamport clocks say about the relationship of events a and be if C(a) < C(b)?
Nothing. It does not imply a happened before b.
Why? Hint: use three processes
We know that for each mi: Ts(mi)<Tr(mi)
Consider messages mi and mj, what can we say from Tr(mi)<Ts(mj)?
Let mi=m1 and mj=m3
Looking at the figure below, we notice that m3 was sent after receiving m1. This means that m3 may have depended on both m1 and m2. We cannot be sure since Lamport clocks do not capture causality.
How to capture causality?
Vector clocks
How to track causality?
Assign each event a unique name and an incrementing counter: pk is the kth event that happened at process P Then keep track of the causal histories
Example: Two local events happen at process P, then causal history H(p2)={p1,p2}
Assume process P sends a message to Q. At the time of arrival, the causal history of Q was q1. To track causality, P sends its own causal history p1,p2. Upon arrival, Q records the event as q2 and updates its causal history to {p1,p2,p3,q1,q2}
Why is the P's causality history before Q's?
To determine if an event p ocurred before q , we check that p∈H(q) Assumming q is the last local event.
What is the problem with causal histories?
How to construct vector clocks?
We now construct the vector clock by: letting each process Pi mantain a vector VCi with the following properties:
Vector Clocks implementation rules
Example
Sources
What categories are mutual exclusion algorithms classified in?
Token Based Solutions: passing a special message between processes. Whoever holds the token gets to access the shared resource
Permission Based Approach: process needs permission to access a resource
Straightforward way to achieve mutual exclusin in DS?
Pros and Cons?
Pros:
Cons:
Geeks for Geeks
What type of algorithm is Lamport's?
Permission based algorithm
How are critical sections executed?
They are executed in the increasing order of timestamps.
What are the types of messages in the algorithm?
How does each site keep track of CS requests?
Each site Si has a local queue denoted by requestqueuei This queue orders requests requests by their timestamps.
Summarize Lamport's Algorithm
Let Si be the site t
Critical section request:
Critical section access:
Critical section release:
Example
Image taken from Distributed Mutual Exclusion Using Logical Clocks
Proof by contradiction
Assumptions:
With the previous assumptions it means Si and Sj are on top of their respective queues Using FIFO, assumption 2, and the fact that Pj is executing, then the request from Pi is in Pj's queue
This is a contradiction since, Pi should be at the top of the queue since it has a smaller timestamp
So there cannot be two processes executing at the same time.
Geeks for Geeks
What type of algorithm is Ricart Agrawala's?
Permission based algorithm
How are critical sections executed?
They are executed in the increasing order of timestamps.
What are the types of messages in the algorithm?
How does each site keep track of CS requests?
Each site Si has a local queue denoted by requestqueuei This queue orders requests requests by their timestamps.
Summarize Ricart Agrawala's Algorithm
Let Si be the site t
Critical section request:
Si sends message with timestamp to other sites
Sites Sj reply immediately to Si with approval if Sj:
The site Sj sends a delayed approval to Si if it had a request for the critical section with a smaller timestamp than the incoming request; in this case the response is delayed until the critical section is done
Critical section access:
Critical section release:
Example
Image taken from Distributed Mutual Exclusion Using Logical Clocks
Does this algorithm require FIFO?
No. "That’s because each process is personally responsible for granting permission. Even with out of order messages due to transmission delays, the REPLY message will never come thanks to the logical clocks." Distributed Mutual Exclusion Using Logical Clocks
How does it differ from the previous algorithms?
Requests are not made to every other site, only a subset of them.
What are the types of messages in the algorithm?
Summarize the algorithm?
Summary of critical section request:
Si sends request to subset of sites Sites Sj sends a reply to Si if it has not already sent one since the last release If it has sent a reply, it puts the request inside the queue.
**Summary of critical section access: **
Si enters critical section when after receiving replies from all sites in the subset
Can deadlocks occur?
Yes.
What is the bully algorithm?
Solution for the electing a coordinator.
Explained
When the coordinator stops responding, a process Pk initiats election:
"Biggest guy in town wins"
Explained
Sources
What is a database?
Collection of shared data that can be accessed by users.
What are transactions?
Sequence of multiple operations performed on a database, and all served as a single logical unit of work
Actions
Assumptions?
They preserve consistency and take finite amount of time
ACID?
What is the transaction model?
What are nested transactions?
When transactions are constructed as a number of subtransactions
What are distributed transactions?
When clients wrap a number of requests, possibly for different servers. The idea is that either all the requests get executed, or none do.
What is concurrency control?
Protocols to mantain consistency
What three modules(managers) does a DB system consists?
Which one has to enforce concurrency control?
Scheduler
Can the previous models be expanded to a Distributed System?
Yes
When do transactions conflict with each other?
If they access the same resource, and at least one of them is a writing operation.
What is serial execution?
Transaction b executes after transaction a has completed
What is concurrent execution?
Transaction b begins execution before a has completed
What is a log?
Order of actions performed by transactions
When are logs equivalent?
Serial log vs Serializable Log?
What is a Serialization Graph?
Directed graph to test if a log is serializable.
How to draw graph?
What do they do?
Control the interleaving of conflicting transactions in order to mantain DB consistency
How do lock-based algorithms work?
Types of Lock
Static Lock
Dyncamic Lock
What does it mean for a transaction to be well formed?
Two-phase locking
How to know if all legal logs in a set of transactions are serializable?
Issues with two-phase locking
What is the idea?
Use a transaction's timestamp to decide the order of execution
How to timestamp transactions?
Lamport's logical clock
Basic Timestamp Ordering
Notes: W_TS(x): largest timestamp of any transaction that executed write(x) R_TS(x): largest timestamp of any transaction that executed read(x)
Sources