USENIX  NSDI2016
Session:  Resource  Sharing
2016-‐‑‒05-‐‑‒29  @oraccha
Co-‐‑‒located  Events
• ACM  Symposium  on  SDN  Research  2016  (SOSR),  March  13-‐‑‒17
• 2016  Open  Networking  Summit  (ONS),  March  14-‐‑‒17
• The  12th  ACM/IEEE  Symposium  on  Architectures  for  Networking  
and  Communications   Systems  (ANCSʼ’16),  March  17-‐‑‒19
• The  13th  USENIX  Symposium  on  Networked  Systems  Design  and  
Implementation  (NSDIʼ’16)  
• The  USENIX  Workshop  on  Cool  Topics  in  Sustainable  Data  
Centers  (CoolDCʼ’16),   March  19
2
Session:  Resource  Sharing
• “Ernest:  Efficient  Performance  Prediction  for  Large-‐‑‒Scale  Advanced  
Analytics,”  Shivaram Venkataraman,  Zongheng Yang,  Michael  Franklin,  
Benjamin  Recht,  and  Ion  Stoica,  University  of  California,  Berkeley
• “Cliffhanger:  Scaling  Performance  Cliffs  in  Web  Memory  Caches,”  
Asaf Cidon and  Assaf Eisenman,  Stanford  University;  Mohammad  
Alizadeh,  MIT  CSAIL;  Sachin Katti,  Stanford  University
• “FairRide:  Near-‐‑‒Optimal,  Fair  Cache  Sharing,”  Qifan Pu  and  Haoyuan
Li,  University  of  California,  Berkeley;  Matei Zaharia,  Massachusetts  
Institute  of  Technology;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,  
Berkeley
• “HUG:  Multi-‐‑‒Resource  Fairness  for  Correlated  and  Elastic  Demands,”  
Mosharaf Chowdhury,  University  of  Michigan;  Zhenhua Liu,  Stony  Brook  
University;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,  Berkeley,  
and  Databricks Inc.
3
Ernest:  Efficient  Performance  Prediction  for  
Large-‐‑‒Scale  Advanced  Analytics
• Who?:SparkやMesos等で知られるUCB  AMPLabの⼤大学院⽣生。⼤大規模
データ分析に対するシステムやアルゴリズムが専⾨門で、SoCC12、
EuroSys13、OSDI14、SIGMOD16等で発表あり。
• What?:クラウド環境における機械学習、ゲノム解析などのデータ分析
ワークロードを効率率率的に性能予測するフレームワークの提案
4
DO CHOICES MATTER ?
0
5
10
15
20
25
30
Time(s)
1 r3.8xlarge
2 r3.4xlarge
4 r3.2xlarge
8 r3.xlarge
16 r3.large
Matrix Multiply: 400K by 1K 
0
5
10
15
20
25
30
35
Time(s)
QR Factorization 1M by 1K 
Network Bound
 Mem Bandwidth Bound
DO CHOICES MATTER ? MATRIX MULTIPLY
10
15
20
25
30
Time(s)
1 r3.8xlarge
2 r3.4xlarge
4 r3.2xlarge
8 r3.xlarge
Matrix size: 400K by 1K 
Cores = 16
Memory = 244 GB
Cost = $2.66/hr
Cosine
Transform
Normalization
Linear Solver
~100 iterations
Iterative 
(each iteration many jobs)
Long Running à Expensive
Numerically Intensive
7
Keystone-ML TIMIT PIPELINE
Raw
Data
Properties
0
10
20
30
0
 100
 200
 300
 400
 500
 600
Time(s)
Cores
Actual
 Ideal
r3.4xlarge instances, QR Factorization:1M by 1K 
13
Do choices MATTER ?
Computation + Communication à Non-linear Scaling
Ernest:  Efficient  Performance  Prediction  for  
Large-‐‑‒Scale  Advanced  Analytics
5
• How?:⼩小規模なTraining  jobの実⾏行行結果から性能を予測。実験計画法
を使ってTraining  job数を削減。
OPTIMAL Design of EXPERIMENTS
1%
2%
4%
8%
1
 2
 4
 8
Input
Machines
Use off-the-shelf solver
(CVX)
USING ERNEST
Training
Jobs
Job
Binary
Machines,
Input Size 
Linear
Model
Experiment
Design
Use few iterations for
training
0
200
400
600
800
1000
1
 30
 900
Time
Machines
ERNEST
BASIC Model
time = x1 + x2 ∗
input
machines
+ x3 ∗ log(machines)+ x4 ∗ (machines)
Serial
Execution
Computation (linear)
Tree DAG
All-to-One DAG
Collect Training Data
 Fit Linear Regression
Ernest:  Efficient  Performance  Prediction  for  
Large-‐‑‒Scale  Advanced  Analytics
• Results:
6
TRAINING TIME: Keystone-ml
TIMIT Pipeline on r3.xlarge instances, 100 iterations
29
7 data points
Up to 16 machines
Up to 10% data
EXPERIMENT DESIGN
0
 1000
 2000
 3000
 4000
 5000
 6000
42 machines
Time (s)
Training Time
Running Time
0%
 20%
 40%
 60%
 80%
 100%
Regression
Classification
KMeans
PCA
TIMIT
Prediction Error (%)
Experiment Design
Cost-based
Is Experiment Design useful ?
30
Cliffhanger:  Scaling  Performance  Cliffs  
in  Web  Memory  Caches
• Who?:Stanford  CS出⾝身で、現在はクラウドセキュリティ会社Sookasa
のCEO(共同創業者)。クラウドストレージが専⾨門、SIGCOMM12、
USENIX  ATC13,  15で発表あり。
• What?:Performance  cliffに対する、Memcachedの動的キャッシュ割
当て機構(Slab  allocator)の改良良
70 2000 4000 6000 8000 10000 12000 14000 16000 18000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Items in LRU Queue
Hitrate
Concave Hull
Application 19, Slab 0
Performance  Cliff,  
Talus[HPCA15]
+1  cache  hit-‐‑‒rate
↓
+35%  speedup
The  cache  hit-‐‑‒rate  of  
Facebookʼ’s  Memcached pool
is  98.2%[SIGMETRICS12]
Hit-‐‑‒rate  Curve
Cliffhanger:  Scaling  Performance  Cliffs  
in  Web  Memory  Caches
• How?:shadow  queues
– Hill  climbing  algorithm:  Hit  rate  curveの勾配の⼩小さいqueue  (slab)から⼤大
きいqueueにメモリを回す。
– Cliff  scaling  algorithm:  performance   cliff(凹区間)の始まりと終わりを⾒見見
つける。
8
Using&Shadow&Queues&to&Estimate&
Local&Gradient
823221
879
53
Queue$1
Queue$2
Physical$Queue Shadow$Queue
Physical$Queue Shadow$Queue
Credits
Queue&1 2
Queue&2 @2
1
Resize$Queues
Cliffhanger+Runs+Both+Algorithms+in+
Parallel
Par$$oned)
Original)Queue)
Par$$oned)
Queues)
Track)le4)of)pointer)
Track)le4)of)pointer)
Track)right)of)pointer)
Track)right)of)pointer)
Track)hill)climbing)
Track)hill)climbing)
• Algorithm+1:+incrementally+optimize+memory+
across+queues
– Across+slab+classes
– Across+applications
• Algorithm+2:+scales+performance+cliffs
Cliffhanger:  Scaling  Performance  Cliffs  
in  Web  Memory  Caches
• 汎⽤用に使えそうな技術。次の発表のFairRideのようなFairnessに対する
考慮はない。
9
Cliffhanger+Reduces+Misses+and+Can+
Save+Memory
• Average+misses+reduced:+36.7%
• Average+potential+memory+savings:+45%
Cliffhanger+Outperforms+Default+and+
Optimized+Schemes
• Average+Cliffhanger+hit+rate+increase:+1.2%
FairRide:  Near-‐‑‒Optimal,  Fair  Cache  
Sharing
• Who?:UCB  AMPLabの⼤大学院⽣生。MobiCom13、SIGCOMM15で発表
あり。
• What?:Isolation  guaranteeとStrategy  proofnessを満たし、Pareto  
Efficiencyを準最適にするファイルキャッシュポリシの提案。
106
… … …
Statically allocated
*
Globally shared
Cache
Backend (storage/network)
… … …
Backend (storage/network)
CacheCacheCache
What we want
Isolation
Strategy-proof
Higher utilization
Share data
Isolation
Guarantee
Strategy
Proofness
Pareto
Efficiency
✓ ✓max-min fairness ✗
priority allocation
max-min rate
✗ ✓ ✓
✓✗ ✗
static allocation ✓ ✓ ✗
Isolation
Guarantee
Strategy
Proofness
Pareto
Efficiency
106
Properties
FairRide ✓ ✓ Near-optimal
SIP定理理:ファイル共有において
下記の三つは同時に満たせない
FairRide:  Near-‐‑‒Optimal,  Fair  Cache  
Sharing
• How?
– Max-‐‑‒minポリシにProbabilistic  blockingを導⼊入することでチートに対する
dis-‐‑‒incentiveを与える。
– Alluxio (Tachyon)[SoCC14]ベースに実装。
11
LEGEND
A
C
5
5
A
B
C
5
5
10
B
A
B
C
5
5
10
true access
free-ride
cheat
blocked
Figure 3: Example with 2 users, 3 files and total cache
size of 2. Numbers represent access frequencies. (a). Al-
to get 1 hit/sec access rate for a unit file. To
mize over the utility, which is defined as the to
rate, a user’s optimal strategy is not to cache th
that one has highest access frequencies, but the
with lowest cost/(hit/sec). Compare a file of 10
shared by 2 users and another file of 100MB, share
users. Even though a user access the former 10 tim
and the latter only 8 times/sec, it is overall eco
to cache the second file (comparing 5MB/(hit/se
2.5MB/(hit/sec)).
(a)  Max-‐‑‒min  
fairness
(b)  second  user
makes  cheating
(c)  blocking  free-‐‑‒
riding  access
Probabilistic blocking
• FairRide blocks a user with p(nj) = 1/(nj+1) probability
– nj is number of other users caching file j
– e.g., p(1)=50%, p(4)=20%
• The best you can do in a general case
– Less blocking does not prevent cheating
25
FairRide:  Near-‐‑‒Optimal,  Fair  Cache  
Sharing
12
0
15
30
45
60
0 150 300 450 600 750 900 1050
missratio(%)
Time (s)
user 1
user 2
Cheating under FairRide
user 2 cheats
user 1 cheats
32
FairRide dis-incentives users from cheating.
400
300
200
100
0
Avg.response(ms)
Facebook experiments
FairRide outperforms max-min fairness by 29%
34
0
15
30
45
60
1-10 11-50 51-100 101-500 501-
RedcutioninMedian
JobTime(%)
Bin (#Tasks)
max-min
FairRide
HUG:  Multi-‐‑‒Resource  Fairness  for  
Correlated  and  Elastic  Demands
• Who?:ミシガン⼤大の助教。UCB  AMPLab出⾝身。ネットワークが専⾨門
(coflow-‐‑‒based  networking,  multi-‐‑‒resource  allocation  in  dataceters,  
compute  and  storage  for  big  data,  network  virtualization)でSIGCOMM
で毎年年のように発表。DRF[NSDI11]、FairCloud[SIGCOMM12]の発展。
• What?:ネットワーク帯域の割当て最適化問題
13
…
M1 M2 M3 MN
Congestion-Less Core
L1 L2 L3 LNLN+1 LN+2 LN+3 L2N
How to share the links
between multiple
tenants to provide
1. optimal performance
guarantees and
2. maximize utilization?
Tenant-A’s VMs
Tenant-B’s VMs
HUG:  Multi-‐‑‒Resource  Fairness  for  
Correlated  and  Elastic  Demands
• Highest  Utilization  with  the  Optimal  Isolation  Guarantee  
14
Isolation Guarantee
Utilization
Work-
Conserving
Low
Low Optimal
PS-P
DRF
Per-Flow Fairness
HUG
HUG in Cooperative Setting
1. Optimal Isolation
Guarantee
2. Work Conservation
Isolation Guarantee
Utilization
Work-
Conserving
Low
Low Optimal
PS-P
DRF
Per-Flow Fairness
HUG
1. Optimal Isolation
Guarantee
2. HighestUtilization
3. Strategyproof
HUG in Non-Cooperative Setting
Intuitively, we want to maximize the minimum
progress over all tenants, i.e., maximize mink Mk,
where mink Mk corresponds to the isolation guaran-
tee of an allocation algorithm. We make three observa-
tions. First, when there is a single link in the system,
this model trivially reduces to max-min fairness. Sec-
ond, getting more aggregate bandwidth is not always bet-
ter. For tenant-A in the example, ⟨50Mbps, 100Mbps⟩ is
better than ⟨90Mbps, 90Mbps⟩ or ⟨25Mbps, 200Mbps⟩,
even though the latter ones have more bandwidth in to-
tal. Third, simply applying max-min fairness to individ-
ual links is not enough. In our example, max-min fairness
allocates equal resources to both tenants on both links,
resulting in allocations ⟨1
2 , 1
2 ⟩ on both links (Figure 1b).
Corresponding progress (MA = MB = 1
2 ) result in a
suboptimal isolation guarantee (min{MA, MB} = 1
2 ).
Dominant Resource Fairness (DRF) [33] extends max-
min fairness to multiple resources and prevents such sub-
Cloud Network Sharing
Dynamic Sharing
Flow-Level
(Per-Flow Fairness)
No isolation guarantee
VM-Level
(Seawall, GateKeeper)
No isolation guarantee
Tenant-/Network-Level
Non-Cooperative
Environments
Require
strategy-proofness
Highest Utilization for
Optimal IsolationGuarantee
(HUG)
Cooperative
Environments
Do not require
strategy-proofness
Reservation
(SecondNet, Oktopus, Pulsar, Silo)
Uses admission control
Low
Utilization
(DRF)
Optimal isolation guarantee
Work-Conserving
Optimal Isolation Guarantee
(HUG)
Suboptimal
IsolationGuarantee
(PS-P, EyeQ, NetShare)
Work-conserving
HUG:  Multi-‐‑‒Resource  Fairness  for  
Correlated  and  Elastic  Demands
• 100台のEC2インスタンスで実験。
• 3つのテナント
– テナントA、C:pairwise  one-‐‑‒to-‐‑‒one  communication
– テナントB:all-‐‑‒to-‐‑‒all  communication
15
0
50
100
0 60 120 180 240 300 360 420 480 540
TotalAlloc(Gbps)
Time (Seconds)
Tenant A
Tenant B
Tenant C
(a) Per-flow Fairness (TCP)
0
50
100
0 60 120 180 240 300 360 420 480 540
TotalAlloc(Gbps)
Time (Seconds)
Tenant A
Tenant B
Tenant C
(b) HUG
Figure 10: [EC2] Bandwidth consumptions of three tenants arriving over time in a 100-machine EC2 cluster. Each tenant has 100
VMs, but each uses a different communication pattern (§5.1.1). We observe that (a) using TCP, tenant-B dominates the network by
creating more flows; (b) HUG isolates tenants A and C from tenant B.
感想
• 本セッションの対象はデータセンタ内の資源管理理
• ⾰革新的なアイデアがあるわけではなくが、問題をきちんと定式化し、そ
れに基づいて実⽤用的なシステムを構築するという研究のお⼿手本のような
論論⽂文が多い。さすがNSDI。
• シングルセッションで全発表を聞けるのはうれしいが、発表時間20分
は短い(スライドだけ⾒見見てもよくわからないところがある)
• UCB  AMPLab強い
• Facebook  trace  data欲しい
16
本資料料で使⽤用したすべての図はNSDI2016ホームページの
proceedingsおよびslidesから引⽤用しました。

USENIX NSDI 2016 (Session: Resource Sharing)

  • 1.
    USENIX  NSDI2016 Session:  Resource Sharing 2016-‐‑‒05-‐‑‒29  @oraccha
  • 2.
    Co-‐‑‒located  Events • ACM Symposium  on  SDN  Research  2016  (SOSR),  March  13-‐‑‒17 • 2016  Open  Networking  Summit  (ONS),  March  14-‐‑‒17 • The  12th  ACM/IEEE  Symposium  on  Architectures  for  Networking   and  Communications   Systems  (ANCSʼ’16),  March  17-‐‑‒19 • The  13th  USENIX  Symposium  on  Networked  Systems  Design  and   Implementation  (NSDIʼ’16)   • The  USENIX  Workshop  on  Cool  Topics  in  Sustainable  Data   Centers  (CoolDCʼ’16),   March  19 2
  • 3.
    Session:  Resource  Sharing •“Ernest:  Efficient  Performance  Prediction  for  Large-‐‑‒Scale  Advanced   Analytics,”  Shivaram Venkataraman,  Zongheng Yang,  Michael  Franklin,   Benjamin  Recht,  and  Ion  Stoica,  University  of  California,  Berkeley • “Cliffhanger:  Scaling  Performance  Cliffs  in  Web  Memory  Caches,”   Asaf Cidon and  Assaf Eisenman,  Stanford  University;  Mohammad   Alizadeh,  MIT  CSAIL;  Sachin Katti,  Stanford  University • “FairRide:  Near-‐‑‒Optimal,  Fair  Cache  Sharing,”  Qifan Pu  and  Haoyuan Li,  University  of  California,  Berkeley;  Matei Zaharia,  Massachusetts   Institute  of  Technology;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,   Berkeley • “HUG:  Multi-‐‑‒Resource  Fairness  for  Correlated  and  Elastic  Demands,”   Mosharaf Chowdhury,  University  of  Michigan;  Zhenhua Liu,  Stony  Brook   University;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,  Berkeley,   and  Databricks Inc. 3
  • 4.
    Ernest:  Efficient  Performance Prediction  for   Large-‐‑‒Scale  Advanced  Analytics • Who?:SparkやMesos等で知られるUCB  AMPLabの⼤大学院⽣生。⼤大規模 データ分析に対するシステムやアルゴリズムが専⾨門で、SoCC12、 EuroSys13、OSDI14、SIGMOD16等で発表あり。 • What?:クラウド環境における機械学習、ゲノム解析などのデータ分析 ワークロードを効率率率的に性能予測するフレームワークの提案 4 DO CHOICES MATTER ? 0 5 10 15 20 25 30 Time(s) 1 r3.8xlarge 2 r3.4xlarge 4 r3.2xlarge 8 r3.xlarge 16 r3.large Matrix Multiply: 400K by 1K 0 5 10 15 20 25 30 35 Time(s) QR Factorization 1M by 1K Network Bound Mem Bandwidth Bound DO CHOICES MATTER ? MATRIX MULTIPLY 10 15 20 25 30 Time(s) 1 r3.8xlarge 2 r3.4xlarge 4 r3.2xlarge 8 r3.xlarge Matrix size: 400K by 1K Cores = 16 Memory = 244 GB Cost = $2.66/hr Cosine Transform Normalization Linear Solver ~100 iterations Iterative (each iteration many jobs) Long Running à Expensive Numerically Intensive 7 Keystone-ML TIMIT PIPELINE Raw Data Properties 0 10 20 30 0 100 200 300 400 500 600 Time(s) Cores Actual Ideal r3.4xlarge instances, QR Factorization:1M by 1K 13 Do choices MATTER ? Computation + Communication à Non-linear Scaling
  • 5.
    Ernest:  Efficient  Performance Prediction  for   Large-‐‑‒Scale  Advanced  Analytics 5 • How?:⼩小規模なTraining  jobの実⾏行行結果から性能を予測。実験計画法 を使ってTraining  job数を削減。 OPTIMAL Design of EXPERIMENTS 1% 2% 4% 8% 1 2 4 8 Input Machines Use off-the-shelf solver (CVX) USING ERNEST Training Jobs Job Binary Machines, Input Size Linear Model Experiment Design Use few iterations for training 0 200 400 600 800 1000 1 30 900 Time Machines ERNEST BASIC Model time = x1 + x2 ∗ input machines + x3 ∗ log(machines)+ x4 ∗ (machines) Serial Execution Computation (linear) Tree DAG All-to-One DAG Collect Training Data Fit Linear Regression
  • 6.
    Ernest:  Efficient  Performance Prediction  for   Large-‐‑‒Scale  Advanced  Analytics • Results: 6 TRAINING TIME: Keystone-ml TIMIT Pipeline on r3.xlarge instances, 100 iterations 29 7 data points Up to 16 machines Up to 10% data EXPERIMENT DESIGN 0 1000 2000 3000 4000 5000 6000 42 machines Time (s) Training Time Running Time 0% 20% 40% 60% 80% 100% Regression Classification KMeans PCA TIMIT Prediction Error (%) Experiment Design Cost-based Is Experiment Design useful ? 30
  • 7.
    Cliffhanger:  Scaling  Performance Cliffs   in  Web  Memory  Caches • Who?:Stanford  CS出⾝身で、現在はクラウドセキュリティ会社Sookasa のCEO(共同創業者)。クラウドストレージが専⾨門、SIGCOMM12、 USENIX  ATC13,  15で発表あり。 • What?:Performance  cliffに対する、Memcachedの動的キャッシュ割 当て機構(Slab  allocator)の改良良 70 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Items in LRU Queue Hitrate Concave Hull Application 19, Slab 0 Performance  Cliff,   Talus[HPCA15] +1  cache  hit-‐‑‒rate ↓ +35%  speedup The  cache  hit-‐‑‒rate  of   Facebookʼ’s  Memcached pool is  98.2%[SIGMETRICS12] Hit-‐‑‒rate  Curve
  • 8.
    Cliffhanger:  Scaling  Performance Cliffs   in  Web  Memory  Caches • How?:shadow  queues – Hill  climbing  algorithm:  Hit  rate  curveの勾配の⼩小さいqueue  (slab)から⼤大 きいqueueにメモリを回す。 – Cliff  scaling  algorithm:  performance   cliff(凹区間)の始まりと終わりを⾒見見 つける。 8 Using&Shadow&Queues&to&Estimate& Local&Gradient 823221 879 53 Queue$1 Queue$2 Physical$Queue Shadow$Queue Physical$Queue Shadow$Queue Credits Queue&1 2 Queue&2 @2 1 Resize$Queues Cliffhanger+Runs+Both+Algorithms+in+ Parallel Par$$oned) Original)Queue) Par$$oned) Queues) Track)le4)of)pointer) Track)le4)of)pointer) Track)right)of)pointer) Track)right)of)pointer) Track)hill)climbing) Track)hill)climbing) • Algorithm+1:+incrementally+optimize+memory+ across+queues – Across+slab+classes – Across+applications • Algorithm+2:+scales+performance+cliffs
  • 9.
    Cliffhanger:  Scaling  Performance Cliffs   in  Web  Memory  Caches • 汎⽤用に使えそうな技術。次の発表のFairRideのようなFairnessに対する 考慮はない。 9 Cliffhanger+Reduces+Misses+and+Can+ Save+Memory • Average+misses+reduced:+36.7% • Average+potential+memory+savings:+45% Cliffhanger+Outperforms+Default+and+ Optimized+Schemes • Average+Cliffhanger+hit+rate+increase:+1.2%
  • 10.
    FairRide:  Near-‐‑‒Optimal,  Fair Cache   Sharing • Who?:UCB  AMPLabの⼤大学院⽣生。MobiCom13、SIGCOMM15で発表 あり。 • What?:Isolation  guaranteeとStrategy  proofnessを満たし、Pareto   Efficiencyを準最適にするファイルキャッシュポリシの提案。 106 … … … Statically allocated * Globally shared Cache Backend (storage/network) … … … Backend (storage/network) CacheCacheCache What we want Isolation Strategy-proof Higher utilization Share data Isolation Guarantee Strategy Proofness Pareto Efficiency ✓ ✓max-min fairness ✗ priority allocation max-min rate ✗ ✓ ✓ ✓✗ ✗ static allocation ✓ ✓ ✗ Isolation Guarantee Strategy Proofness Pareto Efficiency 106 Properties FairRide ✓ ✓ Near-optimal SIP定理理:ファイル共有において 下記の三つは同時に満たせない
  • 11.
    FairRide:  Near-‐‑‒Optimal,  Fair Cache   Sharing • How? – Max-‐‑‒minポリシにProbabilistic  blockingを導⼊入することでチートに対する dis-‐‑‒incentiveを与える。 – Alluxio (Tachyon)[SoCC14]ベースに実装。 11 LEGEND A C 5 5 A B C 5 5 10 B A B C 5 5 10 true access free-ride cheat blocked Figure 3: Example with 2 users, 3 files and total cache size of 2. Numbers represent access frequencies. (a). Al- to get 1 hit/sec access rate for a unit file. To mize over the utility, which is defined as the to rate, a user’s optimal strategy is not to cache th that one has highest access frequencies, but the with lowest cost/(hit/sec). Compare a file of 10 shared by 2 users and another file of 100MB, share users. Even though a user access the former 10 tim and the latter only 8 times/sec, it is overall eco to cache the second file (comparing 5MB/(hit/se 2.5MB/(hit/sec)). (a)  Max-‐‑‒min   fairness (b)  second  user makes  cheating (c)  blocking  free-‐‑‒ riding  access Probabilistic blocking • FairRide blocks a user with p(nj) = 1/(nj+1) probability – nj is number of other users caching file j – e.g., p(1)=50%, p(4)=20% • The best you can do in a general case – Less blocking does not prevent cheating 25
  • 12.
    FairRide:  Near-‐‑‒Optimal,  Fair Cache   Sharing 12 0 15 30 45 60 0 150 300 450 600 750 900 1050 missratio(%) Time (s) user 1 user 2 Cheating under FairRide user 2 cheats user 1 cheats 32 FairRide dis-incentives users from cheating. 400 300 200 100 0 Avg.response(ms) Facebook experiments FairRide outperforms max-min fairness by 29% 34 0 15 30 45 60 1-10 11-50 51-100 101-500 501- RedcutioninMedian JobTime(%) Bin (#Tasks) max-min FairRide
  • 13.
    HUG:  Multi-‐‑‒Resource  Fairness for   Correlated  and  Elastic  Demands • Who?:ミシガン⼤大の助教。UCB  AMPLab出⾝身。ネットワークが専⾨門 (coflow-‐‑‒based  networking,  multi-‐‑‒resource  allocation  in  dataceters,   compute  and  storage  for  big  data,  network  virtualization)でSIGCOMM で毎年年のように発表。DRF[NSDI11]、FairCloud[SIGCOMM12]の発展。 • What?:ネットワーク帯域の割当て最適化問題 13 … M1 M2 M3 MN Congestion-Less Core L1 L2 L3 LNLN+1 LN+2 LN+3 L2N How to share the links between multiple tenants to provide 1. optimal performance guarantees and 2. maximize utilization? Tenant-A’s VMs Tenant-B’s VMs
  • 14.
    HUG:  Multi-‐‑‒Resource  Fairness for   Correlated  and  Elastic  Demands • Highest  Utilization  with  the  Optimal  Isolation  Guarantee   14 Isolation Guarantee Utilization Work- Conserving Low Low Optimal PS-P DRF Per-Flow Fairness HUG HUG in Cooperative Setting 1. Optimal Isolation Guarantee 2. Work Conservation Isolation Guarantee Utilization Work- Conserving Low Low Optimal PS-P DRF Per-Flow Fairness HUG 1. Optimal Isolation Guarantee 2. HighestUtilization 3. Strategyproof HUG in Non-Cooperative Setting Intuitively, we want to maximize the minimum progress over all tenants, i.e., maximize mink Mk, where mink Mk corresponds to the isolation guaran- tee of an allocation algorithm. We make three observa- tions. First, when there is a single link in the system, this model trivially reduces to max-min fairness. Sec- ond, getting more aggregate bandwidth is not always bet- ter. For tenant-A in the example, ⟨50Mbps, 100Mbps⟩ is better than ⟨90Mbps, 90Mbps⟩ or ⟨25Mbps, 200Mbps⟩, even though the latter ones have more bandwidth in to- tal. Third, simply applying max-min fairness to individ- ual links is not enough. In our example, max-min fairness allocates equal resources to both tenants on both links, resulting in allocations ⟨1 2 , 1 2 ⟩ on both links (Figure 1b). Corresponding progress (MA = MB = 1 2 ) result in a suboptimal isolation guarantee (min{MA, MB} = 1 2 ). Dominant Resource Fairness (DRF) [33] extends max- min fairness to multiple resources and prevents such sub- Cloud Network Sharing Dynamic Sharing Flow-Level (Per-Flow Fairness) No isolation guarantee VM-Level (Seawall, GateKeeper) No isolation guarantee Tenant-/Network-Level Non-Cooperative Environments Require strategy-proofness Highest Utilization for Optimal IsolationGuarantee (HUG) Cooperative Environments Do not require strategy-proofness Reservation (SecondNet, Oktopus, Pulsar, Silo) Uses admission control Low Utilization (DRF) Optimal isolation guarantee Work-Conserving Optimal Isolation Guarantee (HUG) Suboptimal IsolationGuarantee (PS-P, EyeQ, NetShare) Work-conserving
  • 15.
    HUG:  Multi-‐‑‒Resource  Fairness for   Correlated  and  Elastic  Demands • 100台のEC2インスタンスで実験。 • 3つのテナント – テナントA、C:pairwise  one-‐‑‒to-‐‑‒one  communication – テナントB:all-‐‑‒to-‐‑‒all  communication 15 0 50 100 0 60 120 180 240 300 360 420 480 540 TotalAlloc(Gbps) Time (Seconds) Tenant A Tenant B Tenant C (a) Per-flow Fairness (TCP) 0 50 100 0 60 120 180 240 300 360 420 480 540 TotalAlloc(Gbps) Time (Seconds) Tenant A Tenant B Tenant C (b) HUG Figure 10: [EC2] Bandwidth consumptions of three tenants arriving over time in a 100-machine EC2 cluster. Each tenant has 100 VMs, but each uses a different communication pattern (§5.1.1). We observe that (a) using TCP, tenant-B dominates the network by creating more flows; (b) HUG isolates tenants A and C from tenant B.
  • 16.
    感想 • 本セッションの対象はデータセンタ内の資源管理理 • ⾰革新的なアイデアがあるわけではなくが、問題をきちんと定式化し、そ れに基づいて実⽤用的なシステムを構築するという研究のお⼿手本のような 論論⽂文が多い。さすがNSDI。 •シングルセッションで全発表を聞けるのはうれしいが、発表時間20分 は短い(スライドだけ⾒見見てもよくわからないところがある) • UCB  AMPLab強い • Facebook  trace  data欲しい 16 本資料料で使⽤用したすべての図はNSDI2016ホームページの proceedingsおよびslidesから引⽤用しました。