On codes, machines, andOn codes, machines, and
environments: reflections andenvironments: reflections and
experiencesexperiences
Vincenzo De FlorioVincenzo De Florio
University of AntwerpUniversity of Antwerp
vincenzo.deflorio@gmail.comvincenzo.deflorio@gmail.com
22
AgendaAgenda
•• Short introductionShort introduction
•• Intro of main charactersIntro of main characters
•• Quality and driftQuality and drift
•• Drift containement strategiesDrift containement strategies
•• Off-line / on-lineOff-line / on-line
•• Adaptation servicesAdaptation services
•• Context awareness /Context awareness /
reactive behaviorsreactive behaviors
•• Reactive behaviors:Reactive behaviors:
•• Elasticity / resilienceElasticity / resilience
•• ConclusionsConclusions
Accountofanumberof
experiences
33
CareerCareer
•• MOSAIC / Universiteit AntwerpenMOSAIC / Universiteit Antwerpen
•• adaptive and dependable softwareadaptive and dependable software
•• resilience and antifragilityresilience and antifragility
•• cyber-physical societiescyber-physical societies
•• ACCA / ESAT / K.U.LeuvenACCA / ESAT / K.U.Leuven
•• parallel and distributed systemsparallel and distributed systems
•• advanced computer architecturesadvanced computer architectures
•• linguistic support to fault-tolerancelinguistic support to fault-tolerance
•• SASIAM / Tecnopolis (I)SASIAM / Tecnopolis (I)
•• parallel and distributed systemsparallel and distributed systems
•• complex systems modelingcomplex systems modeling
•• image processing operators.image processing operators.
https://goo.gl/wRlzkZ
44
http://goo.gl/PNCVJy
55
CodeCode
•• Code explicitly refers to a referenceCode explicitly refers to a reference
machinemachine
•• A physical or virtual machineA physical or virtual machine
•• In fact, aIn fact, a familyfamily of "interpreters"of "interpreters"
•• Code also refers, implicitly, to a setCode also refers, implicitly, to a set
of conditions: what we expect fromof conditions: what we expect from
the machine and what we expect thethe machine and what we expect the
environment will doenvironment will do
•• The system model and the fault model.The system model and the fault model.
66
Codes, Machines, EnvironmentsCodes, Machines, Environments
•• First, code is deployed on a machine:First, code is deployed on a machine:
CC →→ MM
•• Secondly, machine is deployed intoSecondly, machine is deployed into
an environment: (an environment: (CC,, MM)) →→ EE
•• ((CC,, MM,, EE) produces a set of) produces a set of
behaviors: the "service"behaviors: the "service"
•• We observe those behaviors and giveWe observe those behaviors and give
a measure of the service qualitya measure of the service quality
•• Qualitatively or quantitatively.Qualitatively or quantitatively.
77
Quality of the serviceQuality of the service
•• What do we measure?What do we measure?
•• We tell whether the service, e.g., isWe tell whether the service, e.g., is
•• trustworthy; reliable; available; safe;trustworthy; reliable; available; safe;
secure; efficient; etcsecure; efficient; etc
•• Important issue: all dynamicImportant issue: all dynamic
properties!properties!
•• Dynamic systems! trustworthiness(Dynamic systems! trustworthiness(tt),),
safety(safety(tt), efficiency(), efficiency(tt), ...), ...
•• AA driftdrift is possibleis possible
•• Service mutates its characteristics.Service mutates its characteristics.
88
Quality in terms ofQuality in terms of MM propertiesproperties
•• We can express QoS in terms ofWe can express QoS in terms of MM
propertiesproperties
•• For instance: "the service shallFor instance: "the service shall
express an algorithmic parallelismexpress an algorithmic parallelism
(AP) that is very close to the physical(AP) that is very close to the physical
parallelism (PP) expressed byparallelism (PP) expressed by MM."."
•• Efficiency(Efficiency(tt) = inv.distance (AP, PP)) = inv.distance (AP, PP)
•• Drift(Drift(tt) = how efficiency() = how efficiency(tt) varies) varies
withwith tt
99
Quality in terms ofQuality in terms of EE propertiesproperties
•• We can express QoS in terms ofWe can express QoS in terms of EE
properties tooproperties too
•• For instance: "the service mustFor instance: "the service must
tolerate up to 2 physical or designtolerate up to 2 physical or design
faults"faults"
•• Resilience(Resilience(tt) = a majority of) = a majority of
redundant modules can be found atredundant modules can be found at tt
•• Drift(Drift(tt) = how majority varies with) = how majority varies with tt
1010
Quality driftQuality drift
•• What if we observe a significant drift?What if we observe a significant drift?
•• Example 1:Example 1:
•• (C(C →→ MM11)) ⱵⱵ pp
•• C manifests property p on machine MC manifests property p on machine M11
•• (C(C →→ MM22)) ⱵⱵ ~~pp
•• On MOn M22, C, C does not!does not!
•• Example 2:Example 2:
•• (C(C →→ MM11)) ΛΛ MM11(s(s11)) ⱵⱵ pp
•• When MWhen M11 is in state sis in state s11, then p, then p
•• (C(C →→ MM11)) ΛΛ MM11(s(s22)) ⱵⱵ ~~pp
1111
Drift strategiesDrift strategies
•• Drift: due to failures; attacks;Drift: due to failures; attacks;
software aging...software aging...
•• What can we do?What can we do?
1)1)Focus onFocus on MM and, e.g., bring Mand, e.g., bring M11(s(s22))
back to Mback to M11(s(s11) or to a new M) or to a new M11(s(s33))
•• BW/FW error recoveryBW/FW error recovery
2)2)Focus onFocus on EE: impose: impose
restrictions onrestrictions on EE's behaviors's behaviors
•• Regulations (e.g., safety regs)Regulations (e.g., safety regs)
3)3)Or focus onOr focus on CC: "correct" /: "correct" /
transform my codetransform my code
1212
Experience #1Experience #1
•• A software house develops throughA software house develops through
the years a large amount of codethe years a large amount of code
•• for a proprietary target machinefor a proprietary target machine
•• using a proprietary programmingusing a proprietary programming
languagelanguage
•• and a proprietary OSand a proprietary OS
•• to be executed on proprietary terminals...to be executed on proprietary terminals...
•• Times changed. Machine/OS/... noTimes changed. Machine/OS/... no
more supported.more supported. What to doWhat to do??
1313
Experience #1 (continued)Experience #1 (continued)
•• A translator and a set of run-timeA translator and a set of run-time
librarieslibraries
•• Program transformation:Program transformation:
f: (proprietary code)f: (proprietary code) →→ (standard C)(standard C)
•• Net result?Net result?
1414
Experience #1 (continued)Experience #1 (continued)
•• Lots of problems!Lots of problems!
•• Phase 1: "Code: perfectly running"Phase 1: "Code: perfectly running"
•• Phase 2: "...yes but's" (many of them!)Phase 2: "...yes but's" (many of them!)
•• Hidden relationships, undocumentedHidden relationships, undocumented
features, idiosyncrasies:features, idiosyncrasies: I wantI want ‘‘em allem all..
→→ PortingPorting CC does not port the service!does not port the service!
•• A large number ofA large number of MM- and- and EE-specific-specific
behaviors had to be emulatedbehaviors had to be emulated
•• RoleRole: responsible for the design of several: responsible for the design of several
parts of the translators and for severalparts of the translators and for several
run-time functions (overall system wasrun-time functions (overall system was
conceived / designed by someone else.)conceived / designed by someone else.)
1515
Experience #2Experience #2
•• f: (C + message passing)f: (C + message passing) →→
(C + live data structures)(C + live data structures)
•• In the DomainIn the Domain: scheduler distributes: scheduler distributes
work units to workers and thenwork units to workers and then
collects intermediate resultscollects intermediate results
•• In the RangeIn the Range::
1.1.Tuple space of work unitsTuple space of work units
2.2.Cloud of workers that autonomouslyCloud of workers that autonomously
feed themselves according to their ownfeed themselves according to their own
speed, and publish their results.speed, and publish their results.
1616
Experience #2 (continued)Experience #2 (continued)
•• Simple production system to matchSimple production system to match
tuple patterns with tuples elementstuple patterns with tuples elements
•• Emerging results: autonomic loadEmerging results: autonomic load
balancing; graceful degradation;balancing; graceful degradation;
crash-failure tolerancecrash-failure tolerance
•• In practice, efficiency and reliabilityIn practice, efficiency and reliability
•• RoleRole: I conceived/designed the: I conceived/designed the
system; system developed by twosystem; system developed by two
M.Sc students that I promoted andM.Sc students that I promoted and
supervised.supervised.
1717
Experience #3Experience #3
•• Instead of translatingInstead of translating CC, add aadd a C'C'
•• A software architecture supportingA software architecture supporting
two cooperating application layerstwo cooperating application layers
–– A service language to express functionalA service language to express functional
concernsconcerns
–– AA recovery languagerecovery language to expressto express
dependability strategiesdependability strategies
•• Design time: separation of concernsDesign time: separation of concerns
•• Run time: separable codesRun time: separable codes
•• Actions: similar to production rules:Actions: similar to production rules:
nested IF/THEN/ELSE's.nested IF/THEN/ELSE's.
1818
Recovery
working
memory
Application
Recovery
executive
Error
Detection
Store
Recovery starts
Query
Skip / fire
actions
Result
Recovery endsOK
Recovery languagesRecovery languages
1919
OptimizationsOptimizations
Recovery
working mem
User application Recovery code
C C'
Broker
Recovery code 2
•• Currently, all guards are re-evaluatedCurrently, all guards are re-evaluated
•• Full re-evaluations could be avoidedFull re-evaluations could be avoided
(maybe through Rete? RWM deltas...)(maybe through Rete? RWM deltas...)
•• Separable code = meta-adaptationSeparable code = meta-adaptation
2020
Experience #3 (continued)Experience #3 (continued)
•• RoleRole: system conceived, designed,: system conceived, designed,
implemented.implemented.
•• More information: "A Fault-Tolerance LinguisticMore information: "A Fault-Tolerance Linguistic
Structure for Distributed Applications", Ph.D.Structure for Distributed Applications", Ph.D.
thesis, Oct. 2000,thesis, Oct. 2000,
http://win.uantwerpen.be/~vincenz/theses/http://win.uantwerpen.be/~vincenz/theses/
•• "Transformer: an adaptation framework with"Transformer: an adaptation framework with
contextual adaptation behavior compositioncontextual adaptation behavior composition
support," Gui, N. and De Florio, V. Software:support," Gui, N. and De Florio, V. Software:
Practice & Experience, Vol. 43, Issue 8, 2013.Practice & Experience, Vol. 43, Issue 8, 2013.
2121
StrategiesStrategies
•• Quality drifting can be managed inQuality drifting can be managed in
several waysseveral ways
•• In what follows, three such waysIn what follows, three such ways
A.A.Do nothing!Do nothing!
•• Service has completeService has complete faithfaith inin MM andand EE
•• Formally:Formally: synchronous system modelsynchronous system model
andand empty fault modelempty fault model..
2222
Pious software ;-)Pious software ;-)
•• ""MM is immutable. Computation isis immutable. Computation is
dependable. Communication isdependable. Communication is
dependable..."dependable..."
•• Facilitates development, though...Facilitates development, though...
•• ...anything breaks the code (Maximum...anything breaks the code (Maximum
fragility.)fragility.)
•• Ataraxic code (Ataraxic code (ἀἀταραξίαταραξία "impassiveness")"impassiveness")
Sitting ducks
2323
Strategies B and CStrategies B and C
B.B.Off-line adaptation (examples:Off-line adaptation (examples:
Experiences #1 and #2)Experiences #1 and #2)
C.C.On-line adaptation (Exp. #3)On-line adaptation (Exp. #3)
•• Two requirements:Two requirements:
1)1) CC must bemust be context-awarecontext-aware
2)2) CC must be able tomust be able to autonomouslyautonomously
react and adaptreact and adapt after changes inafter changes in
bothboth MM andand EE
•• Corresponds to the two blocks ofCorresponds to the two blocks of
production systemsproduction systems
•• In what follows, focus on two services:In what follows, focus on two services:
context awarenesscontext awareness;; reactivityreactivity..
Sensory
preconditi-
ons: LHSs
Actions: RHSs
2424
SS11: Context Awareness: Context Awareness
•• Goal: reify in the application layerGoal: reify in the application layer
changes pertaining to the context andchanges pertaining to the context and
in particular toin particular to MM andand EE..
•• How? Different waysHow? Different ways
•• In what follows I briefly describe oneIn what follows I briefly describe one
answeranswer
•• I chose it because it is related to one ofI chose it because it is related to one of
my past experiences and to programmingmy past experiences and to programming
languageslanguages
2525
Experience #4: reflective variablesExperience #4: reflective variables
•• Main idea: memory accesses as aMain idea: memory accesses as a
metaphor for detecting changes (andmetaphor for detecting changes (and
reacting from changes)reacting from changes)
•• Reflective variables (RR vars) =Reflective variables (RR vars) =
volatile variables associated tovolatile variables associated to MM oror EE
probes (e.g. sensors, RFID's, OSprobes (e.g. sensors, RFID's, OS
service...) that continuosly update theservice...) that continuosly update the
variablesvariables
•• Akin to signals in Elm: "Akin to signals in Elm: "values thatvalues that
change over timechange over time""
2626
ExampleExample
RRvars also support callbacks. Example:
int PrintCpu(); rrparse("cpu>0);",PrintCpu);
M
E
t
2828
Tracking CPUTracking CPU andand mplayermplayer
•• int mplayer returns the followingint mplayer returns the following
values:values:
void SystemIsSlow(void) {void SystemIsSlow(void) {
mplayer =mplayer = HARDFRAMEDROPHARDFRAMEDROP;;
}}
......
rrparse("(cpu>98)&&(mplayer==2);",rrparse("(cpu>98)&&(mplayer==2);",
SystemIsSlow);SystemIsSlow);
By coupling
an M fact with an E fact,
I can deduce conditions
29
t
3030
Tracking users' behaviors too!Tracking users' behaviors too!
int ui is now == X
int ui is now == Y
HCI interaction
actions are
logged...
...transcoded...
...analyzed...
...and reified...
3131
Janus systemJanus system
RR client mplayer uiRR client mplayer ui
3232
Currently, simple analysesCurrently, simple analyses
•• Typing frequency as simple userTyping frequency as simple user
stereotypestereotype
•• Too high a frequencyToo high a frequency ⇾⇾ discomfortdiscomfort
•• (cf. Therac-25 accidents...)(cf. Therac-25 accidents...)
Another exampleAnother example
•• int linkbeaconsint linkbeacons
[[««MAC addressMAC address»»] :] :
–– Number of beaconsNumber of beacons
received byreceived by
MANET peerMANET peer
during observationduring observation
periodperiod
–– int linkratesint linkrates
[[««MAC addressMAC address»»] :] :
–– EstimatedEstimated
bandwidthbandwidth
3434
Experience #4 (continued)Experience #4 (continued)
•• RRvars: conceived / designed /RRvars: conceived / designed /
implemented by meimplemented by me
•• including instrumenting mplayerincluding instrumenting mplayer
•• including simple TCL/TK user interfaceincluding simple TCL/TK user interface
•• More information:More information:
•• "A framework for trustworthiness"A framework for trustworthiness
assessment based on fidelity in cyberassessment based on fidelity in cyber
and physical domains,"and physical domains,"
https://arxiv.org/abs/1502.01899https://arxiv.org/abs/1502.01899
•• "Safety enhancement through situation-"Safety enhancement through situation-
aware user interfaces,"aware user interfaces,"
https://arxiv.org/abs/1504.03731https://arxiv.org/abs/1504.03731
3535
SS22: Reactive Behaviors: Reactive Behaviors
•• How to react to context changes? InHow to react to context changes? In
different ways.different ways.
•• Two major methods: mask changes /Two major methods: mask changes /
tolerate changes:tolerate changes:
A.A. elasticityelasticity
B.B. resilienceresilience
•• Elasticity requires an estimation of aElasticity requires an estimation of a
worst-case scenario.worst-case scenario.
3636
SS2A2A: Elastic strategy: Elastic strategy
•• The worst case scenario is used toThe worst case scenario is used to
define a point of yieldingdefine a point of yielding
•• Some algorithm is then used toSome algorithm is then used to
implement the point of yieldingimplement the point of yielding
•• Cf. information theory; ShannonCf. information theory; Shannon
•• Typical algorithm: modularTypical algorithm: modular
redundancy + votingredundancy + voting
3737
ExampleExample
•• Worst case scenario = "At most oneWorst case scenario = "At most one
disturbance per processing stage"disturbance per processing stage"
•• Yielding point: single disturbance.Yielding point: single disturbance.
•• Algorithm:Algorithm:
•• triplicate objectstriplicate objects
•• write: multiplex to each replicawrite: multiplex to each replica
•• read: demultiplex via majority votingread: demultiplex via majority voting
•• "Redundant data structures""Redundant data structures"
3838
Elasticity: intrinsic limitationsElasticity: intrinsic limitations
•• Two "syndromes":Two "syndromes":
•• Undershooting (US): Worst caseUndershooting (US): Worst case
hypothesis ishypothesis is wrongwrong..
•• Overshooting (OS): Worst caseOvershooting (OS): Worst case
hypothesis is correct, though ithypothesis is correct, though it
wastes too many resourceswastes too many resources
•• I will illustrate US and OS through anI will illustrate US and OS through an
exampleexample
aRDS: redundant data structuresaRDS: redundant data structures
•• Three threads:Three threads:
scrambler + aRDS + readerscrambler + aRDS + reader
1.1.scrambler: fault injection interpreterscrambler: fault injection interpreter
2.2.aRDS:aRDS: ““protectsprotects”” 20,000 4-byte20,000 4-byte
variablesvariables
–– Fixed allocation stride = 20Fixed allocation stride = 20
3.3.reader: round-robin read accessesreader: round-robin read accesses
•• Experiments recordExperiments record
–– number of scrambled cellsnumber of scrambled cells
–– number of read failuresnumber of read failures
Scrambler'sScrambler's ““little languagelittle language””
Case #1: undershootingCase #1: undershooting
Case #2: overshootingCase #2: overshooting
4343
SS2B2B: Resilient strategy: Resilient strategy
•• Point of yielding = dynamic systemPoint of yielding = dynamic system
•• Employed redundancy =Employed redundancy =
f (estimated risk of yielding)f (estimated risk of yielding)
•• DTOF: distance to failure.DTOF: distance to failure.
DTOF=DTOF=Indirect deduction of riskIndirect deduction of risk
OS = 6 OS = 4
OS = 2
OS =0 US!
DTOF = OS / (n-1)
Case #3: DTOF,Case #3: DTOF, nn(0)=5(0)=5
Redundancy evolutionRedundancy evolution
t
Redundancy
4
7
Hypothesis aboutHypothesis about EE::
a dynamic systema dynamic system
4848
ConclusionsConclusions
•• Quality of "service":Quality of "service": ff ((CC,, MM,, EE))
•• A complex problem of intertwinedA complex problem of intertwined
behaviors!behaviors!
•• Application layer(s)Application layer(s)
•• Metaprograms/protocolsMetaprograms/protocols
•• CompilersCompilers
•• OSOS
•• HWHW
•• Stigmergy complicates solutionsStigmergy complicates solutions
Environment
4949
ConclusionsConclusions
•• How to deal with this complexHow to deal with this complex
problem?problem?
•• My hypothesis: Game TheoryMy hypothesis: Game Theory
•• MM entities andentities and EE: GT players: GT players
•• Energy budgets shared byEnergy budgets shared by MM entitiesentities
•• GT payoffs associated to behaviorsGT payoffs associated to behaviors
•• Nested compositional hierarchies ofNested compositional hierarchies of
payoff matricespayoff matrices
•• Interconnected and mutually influencingInterconnected and mutually influencing
payoffpayoff ““spreadsheetsspreadsheets”” (cf. reactive prog.)(cf. reactive prog.)
•• Future research action: "Resilience asFuture research action: "Resilience as
concurrent interplays of opponents",concurrent interplays of opponents",
https://goo.gl/Mz8foA +https://goo.gl/Mz8foA + antifragilityantifragility
5050
Further detailFurther detail
•• System/fault modelsSystem/fault models: ": "Application-Application-
layer fault-tolerance protocolslayer fault-tolerance protocols",",
https://bit.ly/1WNJj6Vhttps://bit.ly/1WNJj6V
•• DriftDrift: Antifragility = ": Antifragility = "Elasticity +Elasticity +
Resilience + Machine Learning: ModelsResilience + Machine Learning: Models
and Algorithms for Open System Fidelityand Algorithms for Open System Fidelity",",
http://goo.gl/rdwMQH; "http://goo.gl/rdwMQH; "A Framework forA Framework for
Trustworthiness Assessment based on Fidelity inTrustworthiness Assessment based on Fidelity in
Cyber and Physical DomainsCyber and Physical Domains", http://goo.gl/fsYxqT", http://goo.gl/fsYxqT
•• ResilienceResilience: ": "On Resilient Behaviors in ComputationalOn Resilient Behaviors in Computational
Systems and EnvironmentsSystems and Environments", http://goo.gl/3eU12a;", http://goo.gl/3eU12a;
""On environments as systemic exoskeletons:On environments as systemic exoskeletons:
Crosscutting optimizers and antifragility enablersCrosscutting optimizers and antifragility enablers",",
http://goo.gl/82RsKwhttp://goo.gl/82RsKw
5151
Thanks for yourThanks for your
attention!attention!
Questions?Questions?

On codes, machines, and environments: reflections and experiences

  • 1.
    On codes, machines,andOn codes, machines, and environments: reflections andenvironments: reflections and experiencesexperiences Vincenzo De FlorioVincenzo De Florio University of AntwerpUniversity of Antwerp [email protected]@gmail.com
  • 2.
    22 AgendaAgenda •• Short introductionShortintroduction •• Intro of main charactersIntro of main characters •• Quality and driftQuality and drift •• Drift containement strategiesDrift containement strategies •• Off-line / on-lineOff-line / on-line •• Adaptation servicesAdaptation services •• Context awareness /Context awareness / reactive behaviorsreactive behaviors •• Reactive behaviors:Reactive behaviors: •• Elasticity / resilienceElasticity / resilience •• ConclusionsConclusions Accountofanumberof experiences
  • 3.
    33 CareerCareer •• MOSAIC /Universiteit AntwerpenMOSAIC / Universiteit Antwerpen •• adaptive and dependable softwareadaptive and dependable software •• resilience and antifragilityresilience and antifragility •• cyber-physical societiescyber-physical societies •• ACCA / ESAT / K.U.LeuvenACCA / ESAT / K.U.Leuven •• parallel and distributed systemsparallel and distributed systems •• advanced computer architecturesadvanced computer architectures •• linguistic support to fault-tolerancelinguistic support to fault-tolerance •• SASIAM / Tecnopolis (I)SASIAM / Tecnopolis (I) •• parallel and distributed systemsparallel and distributed systems •• complex systems modelingcomplex systems modeling •• image processing operators.image processing operators. https://goo.gl/wRlzkZ
  • 4.
  • 5.
    55 CodeCode •• Code explicitlyrefers to a referenceCode explicitly refers to a reference machinemachine •• A physical or virtual machineA physical or virtual machine •• In fact, aIn fact, a familyfamily of "interpreters"of "interpreters" •• Code also refers, implicitly, to a setCode also refers, implicitly, to a set of conditions: what we expect fromof conditions: what we expect from the machine and what we expect thethe machine and what we expect the environment will doenvironment will do •• The system model and the fault model.The system model and the fault model.
  • 6.
    66 Codes, Machines, EnvironmentsCodes,Machines, Environments •• First, code is deployed on a machine:First, code is deployed on a machine: CC →→ MM •• Secondly, machine is deployed intoSecondly, machine is deployed into an environment: (an environment: (CC,, MM)) →→ EE •• ((CC,, MM,, EE) produces a set of) produces a set of behaviors: the "service"behaviors: the "service" •• We observe those behaviors and giveWe observe those behaviors and give a measure of the service qualitya measure of the service quality •• Qualitatively or quantitatively.Qualitatively or quantitatively.
  • 7.
    77 Quality of theserviceQuality of the service •• What do we measure?What do we measure? •• We tell whether the service, e.g., isWe tell whether the service, e.g., is •• trustworthy; reliable; available; safe;trustworthy; reliable; available; safe; secure; efficient; etcsecure; efficient; etc •• Important issue: all dynamicImportant issue: all dynamic properties!properties! •• Dynamic systems! trustworthiness(Dynamic systems! trustworthiness(tt),), safety(safety(tt), efficiency(), efficiency(tt), ...), ... •• AA driftdrift is possibleis possible •• Service mutates its characteristics.Service mutates its characteristics.
  • 8.
    88 Quality in termsofQuality in terms of MM propertiesproperties •• We can express QoS in terms ofWe can express QoS in terms of MM propertiesproperties •• For instance: "the service shallFor instance: "the service shall express an algorithmic parallelismexpress an algorithmic parallelism (AP) that is very close to the physical(AP) that is very close to the physical parallelism (PP) expressed byparallelism (PP) expressed by MM."." •• Efficiency(Efficiency(tt) = inv.distance (AP, PP)) = inv.distance (AP, PP) •• Drift(Drift(tt) = how efficiency() = how efficiency(tt) varies) varies withwith tt
  • 9.
    99 Quality in termsofQuality in terms of EE propertiesproperties •• We can express QoS in terms ofWe can express QoS in terms of EE properties tooproperties too •• For instance: "the service mustFor instance: "the service must tolerate up to 2 physical or designtolerate up to 2 physical or design faults"faults" •• Resilience(Resilience(tt) = a majority of) = a majority of redundant modules can be found atredundant modules can be found at tt •• Drift(Drift(tt) = how majority varies with) = how majority varies with tt
  • 10.
    1010 Quality driftQuality drift ••What if we observe a significant drift?What if we observe a significant drift? •• Example 1:Example 1: •• (C(C →→ MM11)) ⱵⱵ pp •• C manifests property p on machine MC manifests property p on machine M11 •• (C(C →→ MM22)) ⱵⱵ ~~pp •• On MOn M22, C, C does not!does not! •• Example 2:Example 2: •• (C(C →→ MM11)) ΛΛ MM11(s(s11)) ⱵⱵ pp •• When MWhen M11 is in state sis in state s11, then p, then p •• (C(C →→ MM11)) ΛΛ MM11(s(s22)) ⱵⱵ ~~pp
  • 11.
    1111 Drift strategiesDrift strategies ••Drift: due to failures; attacks;Drift: due to failures; attacks; software aging...software aging... •• What can we do?What can we do? 1)1)Focus onFocus on MM and, e.g., bring Mand, e.g., bring M11(s(s22)) back to Mback to M11(s(s11) or to a new M) or to a new M11(s(s33)) •• BW/FW error recoveryBW/FW error recovery 2)2)Focus onFocus on EE: impose: impose restrictions onrestrictions on EE's behaviors's behaviors •• Regulations (e.g., safety regs)Regulations (e.g., safety regs) 3)3)Or focus onOr focus on CC: "correct" /: "correct" / transform my codetransform my code
  • 12.
    1212 Experience #1Experience #1 ••A software house develops throughA software house develops through the years a large amount of codethe years a large amount of code •• for a proprietary target machinefor a proprietary target machine •• using a proprietary programmingusing a proprietary programming languagelanguage •• and a proprietary OSand a proprietary OS •• to be executed on proprietary terminals...to be executed on proprietary terminals... •• Times changed. Machine/OS/... noTimes changed. Machine/OS/... no more supported.more supported. What to doWhat to do??
  • 13.
    1313 Experience #1 (continued)Experience#1 (continued) •• A translator and a set of run-timeA translator and a set of run-time librarieslibraries •• Program transformation:Program transformation: f: (proprietary code)f: (proprietary code) →→ (standard C)(standard C) •• Net result?Net result?
  • 14.
    1414 Experience #1 (continued)Experience#1 (continued) •• Lots of problems!Lots of problems! •• Phase 1: "Code: perfectly running"Phase 1: "Code: perfectly running" •• Phase 2: "...yes but's" (many of them!)Phase 2: "...yes but's" (many of them!) •• Hidden relationships, undocumentedHidden relationships, undocumented features, idiosyncrasies:features, idiosyncrasies: I wantI want ‘‘em allem all.. →→ PortingPorting CC does not port the service!does not port the service! •• A large number ofA large number of MM- and- and EE-specific-specific behaviors had to be emulatedbehaviors had to be emulated •• RoleRole: responsible for the design of several: responsible for the design of several parts of the translators and for severalparts of the translators and for several run-time functions (overall system wasrun-time functions (overall system was conceived / designed by someone else.)conceived / designed by someone else.)
  • 15.
    1515 Experience #2Experience #2 ••f: (C + message passing)f: (C + message passing) →→ (C + live data structures)(C + live data structures) •• In the DomainIn the Domain: scheduler distributes: scheduler distributes work units to workers and thenwork units to workers and then collects intermediate resultscollects intermediate results •• In the RangeIn the Range:: 1.1.Tuple space of work unitsTuple space of work units 2.2.Cloud of workers that autonomouslyCloud of workers that autonomously feed themselves according to their ownfeed themselves according to their own speed, and publish their results.speed, and publish their results.
  • 16.
    1616 Experience #2 (continued)Experience#2 (continued) •• Simple production system to matchSimple production system to match tuple patterns with tuples elementstuple patterns with tuples elements •• Emerging results: autonomic loadEmerging results: autonomic load balancing; graceful degradation;balancing; graceful degradation; crash-failure tolerancecrash-failure tolerance •• In practice, efficiency and reliabilityIn practice, efficiency and reliability •• RoleRole: I conceived/designed the: I conceived/designed the system; system developed by twosystem; system developed by two M.Sc students that I promoted andM.Sc students that I promoted and supervised.supervised.
  • 17.
    1717 Experience #3Experience #3 ••Instead of translatingInstead of translating CC, add aadd a C'C' •• A software architecture supportingA software architecture supporting two cooperating application layerstwo cooperating application layers –– A service language to express functionalA service language to express functional concernsconcerns –– AA recovery languagerecovery language to expressto express dependability strategiesdependability strategies •• Design time: separation of concernsDesign time: separation of concerns •• Run time: separable codesRun time: separable codes •• Actions: similar to production rules:Actions: similar to production rules: nested IF/THEN/ELSE's.nested IF/THEN/ELSE's.
  • 18.
  • 19.
    1919 OptimizationsOptimizations Recovery working mem User applicationRecovery code C C' Broker Recovery code 2 •• Currently, all guards are re-evaluatedCurrently, all guards are re-evaluated •• Full re-evaluations could be avoidedFull re-evaluations could be avoided (maybe through Rete? RWM deltas...)(maybe through Rete? RWM deltas...) •• Separable code = meta-adaptationSeparable code = meta-adaptation
  • 20.
    2020 Experience #3 (continued)Experience#3 (continued) •• RoleRole: system conceived, designed,: system conceived, designed, implemented.implemented. •• More information: "A Fault-Tolerance LinguisticMore information: "A Fault-Tolerance Linguistic Structure for Distributed Applications", Ph.D.Structure for Distributed Applications", Ph.D. thesis, Oct. 2000,thesis, Oct. 2000, http://win.uantwerpen.be/~vincenz/theses/http://win.uantwerpen.be/~vincenz/theses/ •• "Transformer: an adaptation framework with"Transformer: an adaptation framework with contextual adaptation behavior compositioncontextual adaptation behavior composition support," Gui, N. and De Florio, V. Software:support," Gui, N. and De Florio, V. Software: Practice & Experience, Vol. 43, Issue 8, 2013.Practice & Experience, Vol. 43, Issue 8, 2013.
  • 21.
    2121 StrategiesStrategies •• Quality driftingcan be managed inQuality drifting can be managed in several waysseveral ways •• In what follows, three such waysIn what follows, three such ways A.A.Do nothing!Do nothing! •• Service has completeService has complete faithfaith inin MM andand EE •• Formally:Formally: synchronous system modelsynchronous system model andand empty fault modelempty fault model..
  • 22.
    2222 Pious software ;-)Pioussoftware ;-) •• ""MM is immutable. Computation isis immutable. Computation is dependable. Communication isdependable. Communication is dependable..."dependable..." •• Facilitates development, though...Facilitates development, though... •• ...anything breaks the code (Maximum...anything breaks the code (Maximum fragility.)fragility.) •• Ataraxic code (Ataraxic code (ἀἀταραξίαταραξία "impassiveness")"impassiveness") Sitting ducks
  • 23.
    2323 Strategies B andCStrategies B and C B.B.Off-line adaptation (examples:Off-line adaptation (examples: Experiences #1 and #2)Experiences #1 and #2) C.C.On-line adaptation (Exp. #3)On-line adaptation (Exp. #3) •• Two requirements:Two requirements: 1)1) CC must bemust be context-awarecontext-aware 2)2) CC must be able tomust be able to autonomouslyautonomously react and adaptreact and adapt after changes inafter changes in bothboth MM andand EE •• Corresponds to the two blocks ofCorresponds to the two blocks of production systemsproduction systems •• In what follows, focus on two services:In what follows, focus on two services: context awarenesscontext awareness;; reactivityreactivity.. Sensory preconditi- ons: LHSs Actions: RHSs
  • 24.
    2424 SS11: Context Awareness:Context Awareness •• Goal: reify in the application layerGoal: reify in the application layer changes pertaining to the context andchanges pertaining to the context and in particular toin particular to MM andand EE.. •• How? Different waysHow? Different ways •• In what follows I briefly describe oneIn what follows I briefly describe one answeranswer •• I chose it because it is related to one ofI chose it because it is related to one of my past experiences and to programmingmy past experiences and to programming languageslanguages
  • 25.
    2525 Experience #4: reflectivevariablesExperience #4: reflective variables •• Main idea: memory accesses as aMain idea: memory accesses as a metaphor for detecting changes (andmetaphor for detecting changes (and reacting from changes)reacting from changes) •• Reflective variables (RR vars) =Reflective variables (RR vars) = volatile variables associated tovolatile variables associated to MM oror EE probes (e.g. sensors, RFID's, OSprobes (e.g. sensors, RFID's, OS service...) that continuosly update theservice...) that continuosly update the variablesvariables •• Akin to signals in Elm: "Akin to signals in Elm: "values thatvalues that change over timechange over time""
  • 26.
    2626 ExampleExample RRvars also supportcallbacks. Example: int PrintCpu(); rrparse("cpu>0);",PrintCpu); M E
  • 27.
  • 28.
    2828 Tracking CPUTracking CPUandand mplayermplayer •• int mplayer returns the followingint mplayer returns the following values:values: void SystemIsSlow(void) {void SystemIsSlow(void) { mplayer =mplayer = HARDFRAMEDROPHARDFRAMEDROP;; }} ...... rrparse("(cpu>98)&&(mplayer==2);",rrparse("(cpu>98)&&(mplayer==2);", SystemIsSlow);SystemIsSlow); By coupling an M fact with an E fact, I can deduce conditions
  • 29.
  • 30.
    3030 Tracking users' behaviorstoo!Tracking users' behaviors too! int ui is now == X int ui is now == Y HCI interaction actions are logged... ...transcoded... ...analyzed... ...and reified...
  • 31.
    3131 Janus systemJanus system RRclient mplayer uiRR client mplayer ui
  • 32.
    3232 Currently, simple analysesCurrently,simple analyses •• Typing frequency as simple userTyping frequency as simple user stereotypestereotype •• Too high a frequencyToo high a frequency ⇾⇾ discomfortdiscomfort •• (cf. Therac-25 accidents...)(cf. Therac-25 accidents...)
  • 33.
    Another exampleAnother example ••int linkbeaconsint linkbeacons [[««MAC addressMAC address»»] :] : –– Number of beaconsNumber of beacons received byreceived by MANET peerMANET peer during observationduring observation periodperiod –– int linkratesint linkrates [[««MAC addressMAC address»»] :] : –– EstimatedEstimated bandwidthbandwidth
  • 34.
    3434 Experience #4 (continued)Experience#4 (continued) •• RRvars: conceived / designed /RRvars: conceived / designed / implemented by meimplemented by me •• including instrumenting mplayerincluding instrumenting mplayer •• including simple TCL/TK user interfaceincluding simple TCL/TK user interface •• More information:More information: •• "A framework for trustworthiness"A framework for trustworthiness assessment based on fidelity in cyberassessment based on fidelity in cyber and physical domains,"and physical domains," https://arxiv.org/abs/1502.01899https://arxiv.org/abs/1502.01899 •• "Safety enhancement through situation-"Safety enhancement through situation- aware user interfaces,"aware user interfaces," https://arxiv.org/abs/1504.03731https://arxiv.org/abs/1504.03731
  • 35.
    3535 SS22: Reactive Behaviors:Reactive Behaviors •• How to react to context changes? InHow to react to context changes? In different ways.different ways. •• Two major methods: mask changes /Two major methods: mask changes / tolerate changes:tolerate changes: A.A. elasticityelasticity B.B. resilienceresilience •• Elasticity requires an estimation of aElasticity requires an estimation of a worst-case scenario.worst-case scenario.
  • 36.
    3636 SS2A2A: Elastic strategy:Elastic strategy •• The worst case scenario is used toThe worst case scenario is used to define a point of yieldingdefine a point of yielding •• Some algorithm is then used toSome algorithm is then used to implement the point of yieldingimplement the point of yielding •• Cf. information theory; ShannonCf. information theory; Shannon •• Typical algorithm: modularTypical algorithm: modular redundancy + votingredundancy + voting
  • 37.
    3737 ExampleExample •• Worst casescenario = "At most oneWorst case scenario = "At most one disturbance per processing stage"disturbance per processing stage" •• Yielding point: single disturbance.Yielding point: single disturbance. •• Algorithm:Algorithm: •• triplicate objectstriplicate objects •• write: multiplex to each replicawrite: multiplex to each replica •• read: demultiplex via majority votingread: demultiplex via majority voting •• "Redundant data structures""Redundant data structures"
  • 38.
    3838 Elasticity: intrinsic limitationsElasticity:intrinsic limitations •• Two "syndromes":Two "syndromes": •• Undershooting (US): Worst caseUndershooting (US): Worst case hypothesis ishypothesis is wrongwrong.. •• Overshooting (OS): Worst caseOvershooting (OS): Worst case hypothesis is correct, though ithypothesis is correct, though it wastes too many resourceswastes too many resources •• I will illustrate US and OS through anI will illustrate US and OS through an exampleexample
  • 39.
    aRDS: redundant datastructuresaRDS: redundant data structures •• Three threads:Three threads: scrambler + aRDS + readerscrambler + aRDS + reader 1.1.scrambler: fault injection interpreterscrambler: fault injection interpreter 2.2.aRDS:aRDS: ““protectsprotects”” 20,000 4-byte20,000 4-byte variablesvariables –– Fixed allocation stride = 20Fixed allocation stride = 20 3.3.reader: round-robin read accessesreader: round-robin read accesses •• Experiments recordExperiments record –– number of scrambled cellsnumber of scrambled cells –– number of read failuresnumber of read failures
  • 40.
  • 41.
    Case #1: undershootingCase#1: undershooting
  • 42.
    Case #2: overshootingCase#2: overshooting
  • 43.
    4343 SS2B2B: Resilient strategy:Resilient strategy •• Point of yielding = dynamic systemPoint of yielding = dynamic system •• Employed redundancy =Employed redundancy = f (estimated risk of yielding)f (estimated risk of yielding) •• DTOF: distance to failure.DTOF: distance to failure.
  • 44.
    DTOF=DTOF=Indirect deduction ofriskIndirect deduction of risk OS = 6 OS = 4 OS = 2 OS =0 US! DTOF = OS / (n-1)
  • 45.
    Case #3: DTOF,Case#3: DTOF, nn(0)=5(0)=5
  • 46.
  • 47.
    4 7 Hypothesis aboutHypothesis aboutEE:: a dynamic systema dynamic system
  • 48.
    4848 ConclusionsConclusions •• Quality of"service":Quality of "service": ff ((CC,, MM,, EE)) •• A complex problem of intertwinedA complex problem of intertwined behaviors!behaviors! •• Application layer(s)Application layer(s) •• Metaprograms/protocolsMetaprograms/protocols •• CompilersCompilers •• OSOS •• HWHW •• Stigmergy complicates solutionsStigmergy complicates solutions Environment
  • 49.
    4949 ConclusionsConclusions •• How todeal with this complexHow to deal with this complex problem?problem? •• My hypothesis: Game TheoryMy hypothesis: Game Theory •• MM entities andentities and EE: GT players: GT players •• Energy budgets shared byEnergy budgets shared by MM entitiesentities •• GT payoffs associated to behaviorsGT payoffs associated to behaviors •• Nested compositional hierarchies ofNested compositional hierarchies of payoff matricespayoff matrices •• Interconnected and mutually influencingInterconnected and mutually influencing payoffpayoff ““spreadsheetsspreadsheets”” (cf. reactive prog.)(cf. reactive prog.) •• Future research action: "Resilience asFuture research action: "Resilience as concurrent interplays of opponents",concurrent interplays of opponents", https://goo.gl/Mz8foA +https://goo.gl/Mz8foA + antifragilityantifragility
  • 50.
    5050 Further detailFurther detail ••System/fault modelsSystem/fault models: ": "Application-Application- layer fault-tolerance protocolslayer fault-tolerance protocols",", https://bit.ly/1WNJj6Vhttps://bit.ly/1WNJj6V •• DriftDrift: Antifragility = ": Antifragility = "Elasticity +Elasticity + Resilience + Machine Learning: ModelsResilience + Machine Learning: Models and Algorithms for Open System Fidelityand Algorithms for Open System Fidelity",", http://goo.gl/rdwMQH; "http://goo.gl/rdwMQH; "A Framework forA Framework for Trustworthiness Assessment based on Fidelity inTrustworthiness Assessment based on Fidelity in Cyber and Physical DomainsCyber and Physical Domains", http://goo.gl/fsYxqT", http://goo.gl/fsYxqT •• ResilienceResilience: ": "On Resilient Behaviors in ComputationalOn Resilient Behaviors in Computational Systems and EnvironmentsSystems and Environments", http://goo.gl/3eU12a;", http://goo.gl/3eU12a; ""On environments as systemic exoskeletons:On environments as systemic exoskeletons: Crosscutting optimizers and antifragility enablersCrosscutting optimizers and antifragility enablers",", http://goo.gl/82RsKwhttp://goo.gl/82RsKw
  • 51.
    5151 Thanks for yourThanksfor your attention!attention! Questions?Questions?