PCI Compliance With Hadoop
Rommel Garcia
Currently Global Security SME @ Hortonworks
Was Tokenization Solutions Engineer @ Liaison Technologies
PCI Focus
Hadoop environment
This is what we will cover
Business process
Let’s leave this to business folks
PCI In A Nutshell
Level of Paranoia
PCI Extra Measures Cybersecurity
PCI Compliance Guideline
Transmission of identity must be encrypted
Two factor authentication
Key management
location
encryption
expiration of keys/tokens
Management of user access to resources
Services
Data
Geography
PCI Compliance Guideline
Strong encryption protocols at rest (AES-256, etc.) and in
motion (latest TLS/SSL)
No passwords in the clear
System audit information based on resource, time, client
info, userid and function
Prove “Chain of Custody”
No sensitive data stored in logs
PCI Scope
De-scope using Tokenization
100% in-scope using Encryption
De-Scoping Through
Tokenization
Reduce sensitive data footprint
What is tokenization?
Process of turning sensitive data into a value with no
meaning, called token i.e. 1234-567890-12345 =>
$^hAt_786Ab}+=-12345
If token is compromised, there’s zero risk
Recipient of token is out-of-scope for PCI compliance
De-scoping
Tokenization App
keys
tokens/
encrypted data
lookup token
Hadoop Environments
batch/realtime data sourcescreate token
Non-Hadoop Environments
tokens/non-sensitivedata
= in scope
Sample De-Scoping Architecture v1
Data
Sources
CDC Kafka NiFi HBase
HDFS
API
Tokenization
App
Sample De-Scoping Architecture v2
Data
Sources
CDC Kafka NiFi HBase
HDFS
API
Tokenization
App
Compliance Thru Encryption
100% In-Scope
Hadoop Security Quadrant
Encryption Scope
Network links
Data streams
Local storage
HDFS
In-Motion At-Rest
At-Rest Encryption
Data
Sources
CDC Kafka NiFi HBase
HDFS
API
1
1 32334
1
2
3
4
HDFS TDE
EncryptContent Processor or LUKS
LUKS
LUKS / Encryption Appliance / Native Encryption
In-Motion Encryption
Data
Sources
CDC Kafka NiFi HBase
HDFS
API
7
5 64321
1
2
3
4
FTPS / SFTP / HTTPS / JDBC, ODBC over SSL
SSL
SSL
TLS/SSL
5
6
7
SSL
TLS/SSL
RPC / DTP / SSL
Secure DR Link
DC DR
1 distcp (mapred over ssl)
2 vpn (guaranteed)
2 SSL
Separation Of Concerns
Admins
Operators
Developers
Analyst
Data Scientist
InfoSec
Infrastructure Engineer
What To Watch Out For
Kerberos is a MUST
If using Tokenization App, choose with NoSQL Backend (HBase,
Redis, etc.)
No RC4 or MD5
Use TLSv1.2 or newer
Use key length greater than 128 bits
All passwords must be encrypted
No super user - root & hdfs has access to encryption keys
What To Watch Out For
Do not delete encryption keys/rolled over keys
LDAPS is a MUST
If operators, not admins, has access to machines at OS level,
LUKS won’t work.
Lock down permissions to OS security config files
Use CA Certs if possible
Only open ports you will use
Guarantee “ordered” processing from a batch source
PCI Compliane With Hadoop

PCI Compliane With Hadoop