DICE
Horizon	2020		Research	&	Innovation	Action
Grant	Agreement	no.	644869
http://www.dice-h2020.eu
Funded	by	the	Horizon	 2020
Framework	Programme	of	the	European	Union
Configuration	Optimization	
Tool
Pooyan Jamshidi
Imperial	College	London
Big	Data	Technologies
Cloud	(Priv/Pub)
`
DICE	Framework
2©DICE
DICE	IDE
Profile
Plugins
Sim Ver Opt
DPIM
DTSM
DDSM																			TOSCAMethodology
Deploy Config Test
M
o
n
Anomaly
Trace
Iter.	Enh.
Data	Intensive	Application	(DIA)
Cont.Int. Fault	Inj.
WP4
WP3
WP2
WP5
WP1 WP6	- Demonstrators
Configuration	Optimization	Tool
The problem:
o Big	Data	technologies	have	
100s	of tuneable	parameters
o A	knowledge	gap	is	faced	by	
SMEs	in	configuring	these	
technologies
What does the tool do?
o Automatically runs	
experiments	on	DIAs
o Returns	recommended	
configuration parameters	for	
Big	Data	technologies
3©DICE
Innovation:
o Automate	DIA	configuration	
across	release	cycles
o Prior	art	focuses	on	manual	
configuration
Impact & stakeholders:
o Reduce	cost and	time	of	
testing between	releases
o Support	operators	of	DIAs
CO	Tool	Architecture
4©DICE
Configuration
Optimisation Tool
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
previous versions
configuration
parameters
GP model
Kafka
Vn
V1 V2
System Under Test
historical
data
Workload
Generator
Technology Interface
Storm
Cassandra
Spark
Two	impementations	of	CO:	BO4CO,	TL4CO
Tool	Input	(Parameters	and	Options)
5©DICE
1- Information
about the
experiment: budget,
config file, duration
of each experiment
2- Information
about the
configuration
parameters and
their options that
testers determine
CO	Tool	Architecture
6©DICE
Configuration
Optimisation Tool
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
previous versions
configuration
parameters
GP model
Kafka
Vn
V1 V2
System Under Test
historical
data
Workload
Generator
Technology Interface
Storm
Cassandra
Spark
Optimization	Component	(Matlab)
7©DICE
- This component select
the next configuration to
experiment considering
the current
measurements,
- This continues until
optimum configuration
located or experimental
budget finished.
The	optimization	overhead	
is	negligable	comparing	
with	measurements
This componly relies on
rayality free MCR
component
CO	Tool	Architecture
8©DICE
Configuration
Optimisation Tool
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
previous versions
configuration
parameters
GP model
Kafka
Vn
V1 V2
System Under Test
historical
data
Workload
Generator
Technology Interface
Storm
Cassandra
Spark
Experimental	Suite
9©DICE
This	component	runs	the	experiments	and	measures	the	
performance	of	the	system	under	test,	the	data	are	flushed	to	
csv	file	and	communicated	with	the	optimization	component
CO	Tool	Architecture
10©DICE
Configuration
Optimisation Tool
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
previous versions
configuration
parameters
GP model
Kafka
Vn
V1 V2
System Under Test
historical
data
Workload
Generator
Technology Interface
Storm
Cassandra
Spark
Performance	Repository
11©DICE
spouts max_spout sorters emit_freq chunk_size message_size throughput latency
1 10 1 1 1.00E+05 1000 22657 3.9833
1 10 1 1 1.00E+05 10000 3596.3 18.415
1 10 1 1 1.00E+05 1.00E+05 112.56 217.63
1 10 1 1 1.00E+06 1000 12273 5.1952
1 10 1 1 1.00E+06 10000 1174.9 24.247
1 10 1 1 1.00E+06 1.00E+05 111.88 205.49
1 10 1 1 2.00E+06 1000 12024 5.2935
1 10 1 1 2.00E+06 10000 1151.3 25.039
1 10 1 1 2.00E+06 1.00E+05 94.294 220.62
1 10 1 1 1.00E+07 1000 11552 6.2867
1 10 1 1 1.00E+07 10000 1228.1 24.975
1 10 1 1 1.00E+07 1.00E+05 102.29 236.19
1 10 1 10 1.00E+05 1000 25978 3.4782
1 10 1 10 1.00E+05 10000 10112 9.2847
1 10 1 10 1.00E+05 1.00E+05 1023.8 83.236
1 10 1 10 1.00E+06 1000 24147 3.6594
1 10 1 10 1.00E+06 10000 8400.2 11.804
1 10 1 10 1.00E+06 1.00E+05 1197.4 73.786
1 10 1 10 2.00E+06 1000 22858 3.7151
1 10 1 10 2.00E+06 10000 7141.3 10.755
1 10 1 10 2.00E+06 1.00E+05 1095.1 78.624
1 10 1 10 1.00E+07 1000 22693 4.3637
1 10 1 10 1.00E+07 10000 6281.5 14.308
1 10 1 10 1.00E+07 1.00E+05 951.27 71.492
1 10 1 60 1.00E+05 1000 25862 3.8521
1 10 1 60 1.00E+05 10000 10859 8.6452
1 10 1 60 1.00E+05 1.00E+05 1128.8 79.862
1 10 1 60 1.00E+06 1000 23553 3.9048
1 10 1 60 1.00E+06 10000 9734.3 9.345
1 10 1 60 1.00E+06 1.00E+05 982 66.852
1 10 1 60 2.00E+06 1000 25408 3.5738
1 10 1 60 2.00E+06 10000 7993.9 9.2784
Configuration
Metrics
Measured
The performance
repository
mediates between
the optimization
component and
experimental suite
Configuration	Parameters	(Output)
12©DICE
Technology	Support
13©DICE
Configuration
Optimisation Tool
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
previous versions
configuration
parameters
GP model
Kafka
Vn
V1 V2
System Under Test
historical
data
Workload
Generator
Technology Interface
Storm
Cassandra
Spark
Technologies:	Storm,	Spark,	Cassandra

Configuration Optimization Tool