scaling μ-services at
Gilt
ade@gilt.com
San Francisco
26th October 2015
Adrian Trenaman, SVP Engineering, Gilt,
@adrian_trenaman
@gilttech
gilt: luxury designer brands at discounted prices
we shoot the product in our studios
we receive, store, pick, pack and ship...
we sell every day
at noon...
stampede...
this is what the stampede really looks like...
rails to riches: 2007 - ruby-on-rails monolith
2011: java, loosely-typed, monolithic services
(5) Hidden
linkages; buried
business logic
(4) Monolithic
Java App; huge
bottleneck for
innovation.
(2) Lots of
duplicated code
:(
(3) Teams
focused on
business lines
(1) Large
loosely-typed
JSON/HTTP
services
enter: µ-services
“How can we arrange our teams around
strategic initiatives? How can we make it
fast and easy to get to change to
production?”
2015: micro-services
driving forces behind gilt’s emergent
architecture
● team autonomy
● voluntary adoption (tools, techniques,
processes)
● kpi or goal-driven initiatives
● failing fast and openly
● open and honest, even when it’s difficult
service growth over time: point of inflexion === scala.
anatomy of a gilt service
anatomy of a gilt service - typical choices
gilt-service-framework,
log4j, cloudwatchCave,
, , javascript
or
service discovery: straight forward
zookeeper
Brocade Traffic Manager
(aka Zeus, Stringray,
SteelApp,...)
what are all these services doing?
… we used a “spread sheet”.
‘The Gilt Genome Project’
It’s hard to think of architecture in one dimension.
We added ‘Functional Area’, ‘System’ and ‘Subsystem’ columns to Gilt Genome;
provides a stronger (although subjective) taxonomy than the previous ‘tags’.
It turns out we have an elegant, emergent architecture.
Some services / components are deceptively simple.
Others are simply deceptive, and require knowledge of their surrounding
‘constellation’
n = 265, where n is the number of services.
Deceptively Simple - many services are small; < 2048 loc
Deceptively Simple - many services are small, < 32 files.
Gilt Admin (Legacy Ruby on Rails Application)
City
Discounts
Financial
Reporting
Fraud Mgmt
Gift Cards
Inventory
Mgmt
Order Mgmt
Sales Mgmt
Product
Catalog
Purchase
Orders
Targetting
Billing
Other Admin Applications (Scala + Play Framework)*
City Creative (2) CS
Discounts Distribution i18n Inventory (2)
Order
Processing
(2)
Util
Service Constellations (Scala, Java)*
Auth (1) Billing (1) City (6) Creative (4) CS (2) Discounts (1)
Distribution
(9)
i18n (3) inventory (6)
Order
Processing
(8)
Payments (3)
Product
Catalog (5)
Referrals (1) Util (2)
Core Database - ‘db3’
Job System (Java, Ruby)
Gilt Logical Architecture - Back Office Systems
* counts denote number of service / app components.
Simply deceptive:
service context only
make sense in
constellation.
from bare-metal...
PHX
IAD
… to vapour.
Lift-and-shift + elastic teams
Existing Data Centre
Dual 10Gb direct connect line, 2ms latency.
‘Legacy VPC’
MobileCommon
Person-
alisation
Admin Data
(1) Deploy to VPC
(2) ‘Department’ accounts for elasticity & devops
single tenant: one EC2 instance per service instance
reproducible, immutable deployments: docker
service discovery: same pattern, different LB
zookeeper
Amazon ELB
# running instances per service: ‘rule of three’
AWS instance sizing
evolution of architecture and tech organisation
Lessen dependencies
between teams: faster code-
to-prod
Lots of initiatives in parallel
Your favourite
<tech/language/framework>
here
We (heart) μ-services
Graceful degradation of
service
Disposable Code: easy to
innovate, easy to fail and
move on.
We (heart) cloud
Do devops in a
meaningful way.
Low barrier of entry for
new tech (dynamoDB,
Kinesis, ...)
Isolation
Cost visibility
Security tools (IAM)
Well documented
Resilience is easy
Hybrid is easy
Performance is great
seven μ-service
challenges
(& some solutions)
no one ever said this was gonna be easy
1. staging vs test-in-prod
We find it hard to maintain staging environments across
multiple teams with lots of services.
● We think TiP is the way to go: invest in automation, use
dark canaries in prod.
● However, some teams have found TiP counter-
productive, and use minimal staging environments.
2. ownership
Who ‘owns’ that service? What happens if that
person decides to work on something else?
We have chosen for teams and departments to
own and maintain their services. No throwing
this stuff over the fence.
1. Software is owned by
departments, tracked in
‘genome project’. Directors
assign services to teams.
2. Teams are responsible for
building & running their
services; directors are
accountable for their overall
estate.
bottom-up ownership, RACI-style
‘ownership donut’ informs tech strategy
3. Ownership is classified:
active, passive, at-risk.
‘done’ === 0% ‘at risk’
3. deployment
Services need somewhere to live. We’ve open-sourced
tooling over docker and AWS to give:
elasticity + fast provisioning + service isolation
+ fast rollback
+ repeatable, immutable deployment.
https://github.com/gilt/ionroller
4. lightweight APIs
We’ve settled on REST-style APIs, using http:
//apidoc.me. Separate interface from
implementation; ‘an AVRO for REST” (Mike
Bryzek, Gilt Founder)
We strongly recommend zero-dependency
strongly-typed clients.
5. audit + alerting
How do we stay compliant while giving
engineers full autonomy in prod?
Really smart alerting: http://cavellc.github.io
orders[shipTo: US].count.5m == 0
6. io explosion
Each service call begets more service calls;
some of which are redundant...
=> unintended complexity and performance
Looking to lambda architecture for critical-path
APIs: precompute, real-time updates, O(1)
lookup
7. reporting
Many services => many databases => data is
centralized.
Solution: real-time event queues to a data-lake.
scaling μ-services at
Gilt
ade@gilt.com
San Francisco
26th October 2015
Adrian Trenaman, SVP Engineering, Gilt,
@adrian_trenaman
@gilttech

JavaOne 2015: Scaling micro services at Gilt

  • 1.
    scaling μ-services at Gilt [email protected] SanFrancisco 26th October 2015 Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman @gilttech
  • 2.
    gilt: luxury designerbrands at discounted prices
  • 3.
    we shoot theproduct in our studios
  • 4.
    we receive, store,pick, pack and ship...
  • 5.
    we sell everyday at noon...
  • 6.
  • 7.
    this is whatthe stampede really looks like...
  • 8.
    rails to riches:2007 - ruby-on-rails monolith
  • 9.
    2011: java, loosely-typed,monolithic services (5) Hidden linkages; buried business logic (4) Monolithic Java App; huge bottleneck for innovation. (2) Lots of duplicated code :( (3) Teams focused on business lines (1) Large loosely-typed JSON/HTTP services
  • 10.
    enter: µ-services “How canwe arrange our teams around strategic initiatives? How can we make it fast and easy to get to change to production?”
  • 11.
  • 12.
    driving forces behindgilt’s emergent architecture ● team autonomy ● voluntary adoption (tools, techniques, processes) ● kpi or goal-driven initiatives ● failing fast and openly ● open and honest, even when it’s difficult
  • 13.
    service growth overtime: point of inflexion === scala.
  • 14.
    anatomy of agilt service
  • 15.
    anatomy of agilt service - typical choices gilt-service-framework, log4j, cloudwatchCave, , , javascript or
  • 16.
    service discovery: straightforward zookeeper Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)
  • 17.
    what are allthese services doing?
  • 18.
    … we useda “spread sheet”. ‘The Gilt Genome Project’
  • 19.
    It’s hard tothink of architecture in one dimension. We added ‘Functional Area’, ‘System’ and ‘Subsystem’ columns to Gilt Genome; provides a stronger (although subjective) taxonomy than the previous ‘tags’. It turns out we have an elegant, emergent architecture. Some services / components are deceptively simple. Others are simply deceptive, and require knowledge of their surrounding ‘constellation’ n = 265, where n is the number of services.
  • 20.
    Deceptively Simple -many services are small; < 2048 loc
  • 21.
    Deceptively Simple -many services are small, < 32 files.
  • 22.
    Gilt Admin (LegacyRuby on Rails Application) City Discounts Financial Reporting Fraud Mgmt Gift Cards Inventory Mgmt Order Mgmt Sales Mgmt Product Catalog Purchase Orders Targetting Billing Other Admin Applications (Scala + Play Framework)* City Creative (2) CS Discounts Distribution i18n Inventory (2) Order Processing (2) Util Service Constellations (Scala, Java)* Auth (1) Billing (1) City (6) Creative (4) CS (2) Discounts (1) Distribution (9) i18n (3) inventory (6) Order Processing (8) Payments (3) Product Catalog (5) Referrals (1) Util (2) Core Database - ‘db3’ Job System (Java, Ruby) Gilt Logical Architecture - Back Office Systems * counts denote number of service / app components. Simply deceptive: service context only make sense in constellation.
  • 23.
  • 24.
  • 25.
    Lift-and-shift + elasticteams Existing Data Centre Dual 10Gb direct connect line, 2ms latency. ‘Legacy VPC’ MobileCommon Person- alisation Admin Data (1) Deploy to VPC (2) ‘Department’ accounts for elasticity & devops
  • 26.
    single tenant: oneEC2 instance per service instance
  • 27.
  • 28.
    service discovery: samepattern, different LB zookeeper Amazon ELB
  • 29.
    # running instancesper service: ‘rule of three’
  • 30.
  • 31.
    evolution of architectureand tech organisation
  • 32.
    Lessen dependencies between teams:faster code- to-prod Lots of initiatives in parallel Your favourite <tech/language/framework> here We (heart) μ-services Graceful degradation of service Disposable Code: easy to innovate, easy to fail and move on.
  • 33.
    We (heart) cloud Dodevops in a meaningful way. Low barrier of entry for new tech (dynamoDB, Kinesis, ...) Isolation Cost visibility Security tools (IAM) Well documented Resilience is easy Hybrid is easy Performance is great
  • 34.
    seven μ-service challenges (& somesolutions) no one ever said this was gonna be easy
  • 35.
    1. staging vstest-in-prod We find it hard to maintain staging environments across multiple teams with lots of services. ● We think TiP is the way to go: invest in automation, use dark canaries in prod. ● However, some teams have found TiP counter- productive, and use minimal staging environments.
  • 36.
    2. ownership Who ‘owns’that service? What happens if that person decides to work on something else? We have chosen for teams and departments to own and maintain their services. No throwing this stuff over the fence.
  • 37.
    1. Software isowned by departments, tracked in ‘genome project’. Directors assign services to teams. 2. Teams are responsible for building & running their services; directors are accountable for their overall estate. bottom-up ownership, RACI-style
  • 38.
    ‘ownership donut’ informstech strategy 3. Ownership is classified: active, passive, at-risk. ‘done’ === 0% ‘at risk’
  • 39.
    3. deployment Services needsomewhere to live. We’ve open-sourced tooling over docker and AWS to give: elasticity + fast provisioning + service isolation + fast rollback + repeatable, immutable deployment. https://github.com/gilt/ionroller
  • 40.
    4. lightweight APIs We’vesettled on REST-style APIs, using http: //apidoc.me. Separate interface from implementation; ‘an AVRO for REST” (Mike Bryzek, Gilt Founder) We strongly recommend zero-dependency strongly-typed clients.
  • 41.
    5. audit +alerting How do we stay compliant while giving engineers full autonomy in prod? Really smart alerting: http://cavellc.github.io orders[shipTo: US].count.5m == 0
  • 42.
    6. io explosion Eachservice call begets more service calls; some of which are redundant... => unintended complexity and performance Looking to lambda architecture for critical-path APIs: precompute, real-time updates, O(1) lookup
  • 43.
    7. reporting Many services=> many databases => data is centralized. Solution: real-time event queues to a data-lake.
  • 44.
    scaling μ-services at Gilt [email protected] SanFrancisco 26th October 2015 Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman @gilttech