An Approach for Multi-
tenant Applications with
Apache Knox
Larry McCay
Architect and Manager for Security Infra -
Hortonworks
Sumit Gupta
Technical Lead for Knox - Hortonworks
April 5th 2017 – DataWorks Summit Munich
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Disclaimer
 This document may contain product features and technology directions that are under development,
may be under development in the future or may ultimately never be developed.
 Product capabilities are based on information that is publicly available within the Apache Software
Foundation websites (“Apache”). Progress of the project capabilities can be tracked from inception to
release through Apache, however, technical feasibility, market demand, user feedback and the
overarching Apache Software Foundation community development process can all effect timing and
final delivery.
 This document’s description of these features and technology directions does not represent a
contractual commitment, promise or obligation from Hortonworks to deliver these features in any
generally available product.
 Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
 Since this document may contain an outline of general product development plans, customers should
not rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
 Apache Knox
 Overview
 Topologies
 Identity Assertion and Authorization
 Multi-tenant Applications
 What are they?
 What are the concerns?
 Loanscore SaaS Application
 Overview,Requirements,Design
 Loanscore via Knox, Design
 Demo
 Q&A
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Knox
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Knox History and Community Growth
Mar 2013
Entered
Incubator
Oct 2013
0.1.0 - 0.3.0
Incubator
Releases
Feb 2014
Graduates
to
Apache TLP
Apr 2014
0.4.0
TLP
Release
Nov 2014
0.5.0 May 2015
0.6.0
Apr/Aug 2016
0.9.0/0.9.1
Feb 2016
0.8.0
Dec 2015
0.7.0
Nov 2016
0.10.0
Dec 2016
0.11.0
Mar 2017
0.12.0
TBD
1.0.0
Target
Release
Date
• Committers: 17
• Contributors from:
• Hortonworks, IBM,
CGI, Uber, Oracle,
Blue Talon
Apache 0.12.0/HDP 2.6
• Client SDK/DSL Improvements
• Apache Zeppelin Proxying
• YARN RM UI HA Support
• Knox Token Service
• Solr API and UI
Apache 0.11.0
• LDAP Improvements
• Hadoop Group Lookup Support
• Phoenix Server Support (Avatica)
• Management UI
• Metrics
@apache_knox
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Knox Overview
Proxying Services
Primary goals of the Apache Knox
project is to provide access to Apache
Hadoop via proxying of HTTP resources.
Authentication Services
Authentication for REST API access as
well as WebSSO flow for UIs. LDAP/AD,
Header based PreAuth, Kerberos, SAML,
OAuth are all available options.
Client DSL/SDK Services
Client development can be done with
scripting through DSL or using the Knox
Shell classes directly as SDK.
WebSSO
Authentication
And
Federation
providers
Groovy based
DSL
Client DSL/SDK Services
HTTP
Proxying
Services
UIs
REST
APIs
Web
Sockets
Hive
Ambari
HBase
WebHCatWebHDFS
Hadoop
UIs
Authentication ServicesProxying Services
KnoxShell
SDK
Token
Sessions
REST
API
Classes
KnoxSSO/Token
YARN
Ranger
Zeppelin
Oozie
Phoenix
Gremlin
SQL/DB
SAML
OAuth
LDAP/AD
SPNEGO
Header
Based
YARN
RM
WebHCat
WebHDFS
Hive
YARN
RM
HBase
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Knox Topologies
 Which services to proxy
– For instance: Hive, WebHDFS, WebHCat, HBase, etc
 Unique URLs per topology
– For instance: https://localhost:8443/gateway/TOPOLOGY/webhdfs/v1
 Separate Hadoop clusters
– For example: dev.xml and prod.xml
 Different access requirements for the same cluster (through providers)
– token.xml and basic.xml
 Tenant specific access to the Knox services
– acme1.xml and acme2.xml
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Identity Assertion and Authorization
 Establish the effective identity
 Can alter the effective identity through:
 Principal mapping
 Regular expressions
 Concatenation of prefixes, suffixes
 Establishes security context for service level authorization checks through:
 The principal and group mapping or transforms described above
 Group lookup
 Service Level Authorization for the effective user
 Simple ACL based authorization provider
 Ranger Knox plugin
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Multi-tenant Applications
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What is a Multi-tenant Application?
– Deployment
– Application
– Data
Shared Infrastructure
– Users have accounts within an Organization’s Account
– Each organization is a tenant
Account Context
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Multi-tenancy Concerns
– Tenants cannot view, modify or delete each other’s data
– Tenant admins may only affect tenant specific settings
– Application admins cannot access tenant data
Data Protection
– Users authenticate using their typical or chosen usernames
– Security context must include tenant membership (username ‘bob’ is too ambiguous)
– Only Authenticated and Authorized users may access the system
– Authentication Provider Flexibility
• Application managed providers
• Tenant specific provider integrations
Authentication
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Loanscore SaaS Application
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Loanscore SaaS Application
Continually improve risk assessment with a central risk model, analytics and
machine learning with tenant specific thresholds
– machine learning capabilities
– models for scoring risk
– small businesses and individuals can be scored
– configurable datasources (e.g. yelp)
 Application Provides
– Users are employees of the lending institution (e.g. an originator)
– Tenant specific authentication integrations
– Tenants have their own configuration/settings
– Tenants get their own sub-domain and branding
 Tenants are Lending Institutions
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Loanscore SaaS Application
Loan
Scoring
Business
Logic
and
Branding
Tenant
Specific
Authentication
(Login form,
LDAP, SAML,
etc)
User
Disambiguation
for access to
Hadoop
(bob ->
bob_goodloans)
Hadoop
Access
(Kerberos +
doas)
SAML
IDP
Corp
ADLDAP
Loanscore SaaS
Authentication
Application must account for authentication
configuration per tenant. This is for different
LDAP search bases within a shared LDAP or
tenant specific LDAP servers or IdP integrations.
Business Logic of the App
The business logic and branding of the
application for each tenant.
User Disambiguation
The effective security context for backend
interactions must contain the tenant affiliation
for authorization policy to be enforced properly.
Hadoop Access Patterns
REST API calls to Hadoop services generally
require kerberos+doas for secure clusters.
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Loanscore SaaS Application v2.0
Loan
Scoring
Business
Logic
and
Branding
Tenant
Specific
Authentication
(Login form,
LDAP, SAML,
etc)
User
Disambiguation
for access to
Hadoop
(bob ->
bob_goodloans)
Hadoop
Access
(Kerberos +
doas)
SAML
IDP
Corp
ADLDAP
Loanscore SaaS v2.0
Authentication
Application must account for authentication
configuration per tenant. This is for different
LDAP search bases within a shared LDAP or
tenant specific LDAP servers or IdP integrations.
Business Logic of the App
The business logic and branding of the
application for each tenant.
User Disambiguation
The effective security context for backend
interactions must contain the tenant affiliation
for authorization policy to be enforced properly.
Hadoop Access Patterns
REST API calls to Hadoop services generally
require kerberos+doas for secure clusters.
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Loanscore SaaS with Knox Services
Business Logic, Branding, Knox Client SDK
Loanscore SaaS
SAML
IDP
Corp
ADLDAP
Tenant
Specific
Authentication
User
Disambiguation
(identity
assertion)
KnoxSSO Proxying
Knox Authentication and Proxying Services
kerberos+doas or simple auth
Proxying Services
By proxying the app through Apache Knox, the
gateway is able to require authentication prior
to the user accessing the actual application.
Hadoop API access is also proxied through Knox
and the dispatch within the gateway handles
the kerberos+doas and user disambiguating
requirements.
Authentication Services
The authentication or federation provider
within the proxying topology for the tenant may
contain the actual authentication configuration
or may redirect to KnoxSSO for a WebSSO flow.
Client SDK
The backend of the application may consume
Hadoop REST APIs via the KnoxShell client
classes.
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Loanscore SaaS with Knox Services
Loanscore
ApplicationSAML
IDP
Corp
ADLDAP
goodloans-
sso.xml
goodloans.xml
(user
disambiguation)
KnoxSSO Proxying
https://goodloans.loanscore.comhttps://unwise.loanscore.com
doas=bob_goodloans
username: bob
password: ***
1. Goodloans originator bob navigates to the
goodloan’s loanscore app URL
2. Since he has yet to authenticate he is
redirected to the KnoxSSO topology for
goodloans
3. He is authenticated against the goodloan’s
configured identity provider. He provides his
username and password (bob:***)
4. Upon successful auth he is redirected back
the loanscore application and granted
access
5. The user principal propagated to the
loanscore app has been disambiguated by
adding the tenant name to the end of the
username (bob_goodloans) in the identity
assertion provider
6. Loanscore app adds a file to a tenant
specific directory within HDFS using
KnoxShell SDK classes
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Q&A

An Approach for Multi-Tenancy Through Apache Knox

  • 1.
    An Approach forMulti- tenant Applications with Apache Knox Larry McCay Architect and Manager for Security Infra - Hortonworks Sumit Gupta Technical Lead for Knox - Hortonworks April 5th 2017 – DataWorks Summit Munich
  • 2.
    2 © HortonworksInc. 2011 – 2017. All Rights Reserved Disclaimer  This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately never be developed.  Product capabilities are based on information that is publicly available within the Apache Software Foundation websites (“Apache”). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.  This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.  Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.  Since this document may contain an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3.
    3 © HortonworksInc. 2011 – 2017. All Rights Reserved Agenda  Apache Knox  Overview  Topologies  Identity Assertion and Authorization  Multi-tenant Applications  What are they?  What are the concerns?  Loanscore SaaS Application  Overview,Requirements,Design  Loanscore via Knox, Design  Demo  Q&A
  • 4.
    4 © HortonworksInc. 2011 – 2017. All Rights Reserved Apache Knox
  • 5.
    5 © HortonworksInc. 2011 – 2017. All Rights Reserved Apache Knox History and Community Growth Mar 2013 Entered Incubator Oct 2013 0.1.0 - 0.3.0 Incubator Releases Feb 2014 Graduates to Apache TLP Apr 2014 0.4.0 TLP Release Nov 2014 0.5.0 May 2015 0.6.0 Apr/Aug 2016 0.9.0/0.9.1 Feb 2016 0.8.0 Dec 2015 0.7.0 Nov 2016 0.10.0 Dec 2016 0.11.0 Mar 2017 0.12.0 TBD 1.0.0 Target Release Date • Committers: 17 • Contributors from: • Hortonworks, IBM, CGI, Uber, Oracle, Blue Talon Apache 0.12.0/HDP 2.6 • Client SDK/DSL Improvements • Apache Zeppelin Proxying • YARN RM UI HA Support • Knox Token Service • Solr API and UI Apache 0.11.0 • LDAP Improvements • Hadoop Group Lookup Support • Phoenix Server Support (Avatica) • Management UI • Metrics @apache_knox
  • 6.
    6 © HortonworksInc. 2011 – 2017. All Rights Reserved Apache Knox Overview Proxying Services Primary goals of the Apache Knox project is to provide access to Apache Hadoop via proxying of HTTP resources. Authentication Services Authentication for REST API access as well as WebSSO flow for UIs. LDAP/AD, Header based PreAuth, Kerberos, SAML, OAuth are all available options. Client DSL/SDK Services Client development can be done with scripting through DSL or using the Knox Shell classes directly as SDK. WebSSO Authentication And Federation providers Groovy based DSL Client DSL/SDK Services HTTP Proxying Services UIs REST APIs Web Sockets Hive Ambari HBase WebHCatWebHDFS Hadoop UIs Authentication ServicesProxying Services KnoxShell SDK Token Sessions REST API Classes KnoxSSO/Token YARN Ranger Zeppelin Oozie Phoenix Gremlin SQL/DB SAML OAuth LDAP/AD SPNEGO Header Based YARN RM WebHCat WebHDFS Hive YARN RM HBase
  • 7.
    7 © HortonworksInc. 2011 – 2017. All Rights Reserved Knox Topologies  Which services to proxy – For instance: Hive, WebHDFS, WebHCat, HBase, etc  Unique URLs per topology – For instance: https://localhost:8443/gateway/TOPOLOGY/webhdfs/v1  Separate Hadoop clusters – For example: dev.xml and prod.xml  Different access requirements for the same cluster (through providers) – token.xml and basic.xml  Tenant specific access to the Knox services – acme1.xml and acme2.xml
  • 8.
    8 © HortonworksInc. 2011 – 2017. All Rights Reserved Identity Assertion and Authorization  Establish the effective identity  Can alter the effective identity through:  Principal mapping  Regular expressions  Concatenation of prefixes, suffixes  Establishes security context for service level authorization checks through:  The principal and group mapping or transforms described above  Group lookup  Service Level Authorization for the effective user  Simple ACL based authorization provider  Ranger Knox plugin
  • 9.
    9 © HortonworksInc. 2011 – 2017. All Rights Reserved Multi-tenant Applications
  • 10.
    10 © HortonworksInc. 2011 – 2017. All Rights Reserved What is a Multi-tenant Application? – Deployment – Application – Data Shared Infrastructure – Users have accounts within an Organization’s Account – Each organization is a tenant Account Context
  • 11.
    11 © HortonworksInc. 2011 – 2017. All Rights Reserved Multi-tenancy Concerns – Tenants cannot view, modify or delete each other’s data – Tenant admins may only affect tenant specific settings – Application admins cannot access tenant data Data Protection – Users authenticate using their typical or chosen usernames – Security context must include tenant membership (username ‘bob’ is too ambiguous) – Only Authenticated and Authorized users may access the system – Authentication Provider Flexibility • Application managed providers • Tenant specific provider integrations Authentication
  • 12.
    12 © HortonworksInc. 2011 – 2017. All Rights Reserved Loanscore SaaS Application
  • 13.
    13 © HortonworksInc. 2011 – 2017. All Rights Reserved Loanscore SaaS Application Continually improve risk assessment with a central risk model, analytics and machine learning with tenant specific thresholds – machine learning capabilities – models for scoring risk – small businesses and individuals can be scored – configurable datasources (e.g. yelp)  Application Provides – Users are employees of the lending institution (e.g. an originator) – Tenant specific authentication integrations – Tenants have their own configuration/settings – Tenants get their own sub-domain and branding  Tenants are Lending Institutions
  • 14.
    14 © HortonworksInc. 2011 – 2017. All Rights Reserved Loanscore SaaS Application Loan Scoring Business Logic and Branding Tenant Specific Authentication (Login form, LDAP, SAML, etc) User Disambiguation for access to Hadoop (bob -> bob_goodloans) Hadoop Access (Kerberos + doas) SAML IDP Corp ADLDAP Loanscore SaaS Authentication Application must account for authentication configuration per tenant. This is for different LDAP search bases within a shared LDAP or tenant specific LDAP servers or IdP integrations. Business Logic of the App The business logic and branding of the application for each tenant. User Disambiguation The effective security context for backend interactions must contain the tenant affiliation for authorization policy to be enforced properly. Hadoop Access Patterns REST API calls to Hadoop services generally require kerberos+doas for secure clusters.
  • 15.
    15 © HortonworksInc. 2011 – 2017. All Rights Reserved Loanscore SaaS Application v2.0 Loan Scoring Business Logic and Branding Tenant Specific Authentication (Login form, LDAP, SAML, etc) User Disambiguation for access to Hadoop (bob -> bob_goodloans) Hadoop Access (Kerberos + doas) SAML IDP Corp ADLDAP Loanscore SaaS v2.0 Authentication Application must account for authentication configuration per tenant. This is for different LDAP search bases within a shared LDAP or tenant specific LDAP servers or IdP integrations. Business Logic of the App The business logic and branding of the application for each tenant. User Disambiguation The effective security context for backend interactions must contain the tenant affiliation for authorization policy to be enforced properly. Hadoop Access Patterns REST API calls to Hadoop services generally require kerberos+doas for secure clusters.
  • 16.
    16 © HortonworksInc. 2011 – 2017. All Rights Reserved Loanscore SaaS with Knox Services Business Logic, Branding, Knox Client SDK Loanscore SaaS SAML IDP Corp ADLDAP Tenant Specific Authentication User Disambiguation (identity assertion) KnoxSSO Proxying Knox Authentication and Proxying Services kerberos+doas or simple auth Proxying Services By proxying the app through Apache Knox, the gateway is able to require authentication prior to the user accessing the actual application. Hadoop API access is also proxied through Knox and the dispatch within the gateway handles the kerberos+doas and user disambiguating requirements. Authentication Services The authentication or federation provider within the proxying topology for the tenant may contain the actual authentication configuration or may redirect to KnoxSSO for a WebSSO flow. Client SDK The backend of the application may consume Hadoop REST APIs via the KnoxShell client classes.
  • 17.
    17 © HortonworksInc. 2011 – 2017. All Rights Reserved Loanscore SaaS with Knox Services Loanscore ApplicationSAML IDP Corp ADLDAP goodloans- sso.xml goodloans.xml (user disambiguation) KnoxSSO Proxying https://goodloans.loanscore.comhttps://unwise.loanscore.com doas=bob_goodloans username: bob password: *** 1. Goodloans originator bob navigates to the goodloan’s loanscore app URL 2. Since he has yet to authenticate he is redirected to the KnoxSSO topology for goodloans 3. He is authenticated against the goodloan’s configured identity provider. He provides his username and password (bob:***) 4. Upon successful auth he is redirected back the loanscore application and granted access 5. The user principal propagated to the loanscore app has been disambiguated by adding the tenant name to the end of the username (bob_goodloans) in the identity assertion provider 6. Loanscore app adds a file to a tenant specific directory within HDFS using KnoxShell SDK classes
  • 18.
    18 © HortonworksInc. 2011 – 2017. All Rights Reserved Demo
  • 19.
    19 © HortonworksInc. 2011 – 2017. All Rights Reserved Q&A