May 2017 Shonan Meeting

### A Promising Semantics for Relaxed-Memory Concurrency

#### Seoul National University (Korea)



Jeehoon Kang Chung-Kil Hur

Ori Lahav Viktor Vafeiadis Derek Dreyer





#### A Message Passing Example: No Data Race Initially: D = F = 0while (1) { D = 42LOCK(L) f = FLOCK(L) UNLOCK(L) F = 1UNLOCK(L) if (f) break } d = D

### Sequentially Consistent Concurrency

Initially: D = F = 0

| D = 42 | while (1) {       |
|--------|-------------------|
| F = 1  | f = F             |
|        | if (f) break<br>} |
|        | d = D             |







#### Release & Acquire with Tweak

Initially: D = F = 0[acq] f = FD = 42if (f) { d = D // d = 42? [rel] F = 1} D = 10

### **Concurrency Models**

- Semantics of multi-threaded programs?
  - Sequential consistency (SC): simple but expensive
- Relaxed memory model (C/C++, Java)
  - Many consistency modes (cost vs. consistency tradeoff)
  - Open problem: what is the "right" semantics?

**Conflicting goals** of compilers, hardware & programmers

- Compiler/hardware: validating optimizations (e.g. reordering, merging)
- Programmer: supporting reasoning principles (e.g. DRF theorem, program logic)

**Conflicting goals** of compilers, hardware & programmers

- Java memory model
  Compiler/hardware: validating optimizations (e.g. reordering, merging)
- Programmer: supporting reasoning principles (e.g. DRF theorem, program logic)

**Conflicting goals** of compilers, hardware & programmers

- Java memory model
  Compiler/hardware: validating optimizations (e.g. reordering, merging)
- Programmer: supporting reasoning principles (e.g. DRF theorem, program logic)
   C/C++ memory model

**Conflicting goals** of compilers, hardware & programmers

- Compiler/hardware: validating optimizations (e.g. reordering, merging)
- Programmer: supporting reasoning principles (e.g. DRF theorem, program logic)

C/C++ memory model

### Key problem: "out-of-thin-air"

# Thread 1Thread 2a = Xb = YY = aX = 42

(allowed: a=b=42)

#### "Out-of-thin-air" problem (1/3) Load-Buffering (LB) Registers Thead 1 Thread 2 b = Ya = XY = aX = 42(allowed: a=b=42)





# Thread 1Thread 2a = Xb = YY = aX = 42

(allowed: a=b=42)









"Out-of-thin-air" problem (2/3) Classical Out-of-thin-air (OOTA)

# Thread 1Thread 2a = Xb = YY = aX = b

(forbidden: a=b=42)

"Out-of-thin-air" problem (2/3) Classical Out-of-thin-air (OOTA)

# Thread 1Thread 2a = Xb = YY = aX = b

(forbidden: a=b=42)

42 is out-of-thin-air!

Reasoning principles (e.g. invariant a=b=X=Y=0) Classical Out-of-thin-air (OOTA) Thread 1 Thread 2 (X=Y=0) (sequence

"Out-of-thin-air" problem (2/3)

a = X b = YY = a X = b

(forbidden: a=b=42)

42 is out-of-thin-air!

Reasoning principles (e.g. invariant a=b=X=Y=0) (X=Y=0) (sequencedbefore) Read X,42(read-from)Read Y,42 Write Y,42 Write X,42 Allowed by justification (C/C++) "Out-of-thin-air" problem (2/3) Classical Out-of-thin-air (OOTA)

Thread 1 Thread 2 a = X b = Y Y = a X = b(forbidden: a=b=42)

42 is out-of-thin-air!

Reasoning principles (e.g. invariant a=b=X=Y=0)

(X=Y=0) (sequenced-before) before) Read X,42<sub>(read-from)</sub>Read Y,42 Write Y,42 Write X,42 Allowed by justification C/C++) What does hardware do?

Thread 1Thread 2a = Xb = YY = aX = 42

(a=b=42?)

Thread 1Thread 2a = Xb = YY = aX = b(a=b=42?)

Thread 1Thread 2a = Xb = YY = aX = 42

(a=b=42?)

Thread 1 Thread 2 a = X Y = a(a=b=42?)













# Promising Semantics

- Solving the out-of-thin-air problem
- Supporting optimizations & reasoning principles
- Covering most C/C++ concurrency features
- Operational semantics w/o undefined behavior

Most results are verified in Coq

http://sf.snu.ac.kr/promise-concurrency

## Key Idea 1: Messages & Views

- Memory: pool of **messages** (loc, val, timestamp)
- Per-thread **view** on the memory



## Key Idea 1: Messages & Views

- Memory: pool of messages (loc, val, timestamp)
- Per-thread view on the memory



## Key Idea 1: Messages & Views

- Memory: pool of messages (loc, val, timestamp)
- Per-thread **view** on the memory


# Key Idea 1: Messages & Views

- Memory: pool of messages (loc, val, timestamp)
- Per-thread **view** on the memory



| Threa | d   | 1    |
|-------|-----|------|
| Ý     | ′ = | = 42 |
| З     | l = | = X  |



| Thread 1 |
|----------|
| Y = 42   |
| a = X    |
| Loc. 🕇   |





Timestamp









| Thre | eac | <b>d</b> ' |   |
|------|-----|------------|---|
|      | а   | =          | Х |
|      | Y   | =          | а |

Timestamp



# Key Idea 2: Promises

 A thread can promise to write X=V in the future, after which other threads can read X=V.

 To avoid OOTA, the promising thread must certify that it can write X=V in isolation.

 Until all its promises are fulfilled, the thread can take certifiable steps only.

| Thre | a | d 1 |   |
|------|---|-----|---|
|      | а | =   | Х |
|      | Y | =   | а |



Timestamp

| Thre | eac | d ' |   |
|------|-----|-----|---|
|      | а   | =   | Х |
|      | Y   | =   | а |



Thread 2's 42 promise

| <b>Thread 1</b> | Thread 2                    |
|-----------------|-----------------------------|
| a = X           | b = Y                       |
| Y = a           | X = 42 (allowed: $a=b=42$ ) |
| Loc.            | Thread 2's                  |
| X               | 42 promise                  |
| Y               | in isolation                |
|                 | Timestamp                   |









| <b>Thread 1</b><br>a = X<br>Y = a | Thread 2<br>b = Y<br>X = 42 (allowed: $a=b=42$ ) |
|-----------------------------------|--------------------------------------------------|
| Loc.XV                            | Thread 2's<br>Promise is<br>certified!           |
|                                   | Timestamp                                        |



| <b>Thread 1</b><br>a = X<br>Y = a | Thread 2<br>b = Y<br>X = 42 (allowed: $a=b=42$ )                  |
|-----------------------------------|-------------------------------------------------------------------|
| Loc.<br>X<br>Y                    | Thread 2's<br>tage promise<br>tage promise is<br>certified!<br>42 |
|                                   | Timestamp                                                         |







**Thread 1** a = X Y = a

(forbidden: a=b=42)

Timestamp



**Thread 1** a = X Y = a

Loc.

Х

(forbidden: a=b=42)

Thread 2's 42 promise?



#### Promises: "Semantic Solution" to OOTA



21/28













#### 23/28










#### Release & Acquire with Tweak

Initially: D = F = 0[acq] f = FD = 42if (f) { d = D // d = 42? [rel] F = 1} D = 10

• A thread must **re-certify during the execution** that it can write all its promises **in isolation**.



(forbidden: X=1)

• A thread must **re-certify during the execution** that it can write all its promises **in isolation**.



(forbidden: X=1)

• A thread must **re-certify during the execution** that it can write all its promises **in isolation**.



25/28

• A thread must **re-certify during the execution** that it can write all its promises **in isolation**.



(forbidden: X=1)





















| Thread 1 | Thread 2 |
|----------|----------|
| X = 1    | X = 2    |
| a = X    | b = X    |

| Thre | ead 1 |
|------|-------|
|      | X = 1 |
|      | a = X |

| Thread 1 | Thread 2 |
|----------|----------|
| X = 1    | X = 2    |
| a = X    | b = )    |

$$X = 2$$
  
b = X

| Thread 1 |  |
|----------|--|
| X = 1    |  |
| 🛶 a = X  |  |



| Thread | d 1 |
|--------|-----|
| Х      | = 1 |
| 👝 a    | = X |



| Thre                                   | ac | 1 |   |
|----------------------------------------|----|---|---|
| ************************************** | Х  | = | 1 |
|                                        | Х  | = | 2 |

| Thre | ead 1 |
|------|-------|
|      | X = 1 |
|      | X = 2 |

| Thre | ac | 1 |   |
|------|----|---|---|
|      | Х  | = | 1 |
|      | Х  | = | 2 |

| Thre | ead 1 |
|------|-------|
|      | X = 1 |
|      | X = 2 |



| Thre | ad 1  |
|------|-------|
|      | X = 1 |
|      | X = 2 |



## Results (1/2) Compiler/HW Optimizations

 Operational semantics for C/C++ concurrency: plain/relaxed/release/acquire r/w/u/fence, SC fence

- Compiler optimizations
  (reordering, merge, dead load elim., ...)
- Compilation to x86 🦆 & Power 📝

## Results (2/2) Reasoning Principles

- **DRF-SC: D**ata **R**ace **F**reedom  $\Rightarrow$  SC
  - DRF-PromiseFree:
    DRF ⇒ semantics w/o promises

 Invariant-based logic: soundness of global invariant (e.g. a=b=X=Y=0)

http://sf.snu.ac.kr/promise-concurrency

#### More comprehensive semantics for C/C++ concurrency

- **DRF-SC: D**ata **R**ace **F**reedom  $\Rightarrow$  SC
  - DRF-PromiseFree:
    DRF ⇒ semantics w/o promises

 Invariant-based logic: soundness of global invariant (e.g. a=b=X=Y=0)

http://sf.snu.ac.kr/promise-concurrency

### Future Work

- Supporting SC reads & writes
  (We found a flaw in C/C++11 on SC)
- Supporting **consume** reads
- Developing a rich program logic & Verifying fine-grained concurrent programs