How to Get Hadoop
Application Intelligence with Driven
Confidential!
WHY NOW
2!
As Hadoop applications become the
engine of your data management strategy,
they must meet higher standards of
quality, reliability, and manageability.
Confidential!
WE’RE THE FORCE BEHIND CASCADING…
3!
Cascading is a proven platform for building and
deploying big data applications on Hadoop with
10,000+ production deployments!
Java, Scala (Scalding), SQL!




SIMPLE
Ensure best practices !
at any scale thanks to !
easy-to-learn design
principles!




FLEXIBLE
Leverage existing Java,
Scala, and SQL skills
and easily adapt to new
systems!




RELIABLE
Always get optimal
performance and !
reliability for big data
applications!
!
Confidential!
… POWERING BIG DATA APPS ACROSS INDUSTRIES
!
Social Media Consumer & Retail Business Services Ad & Marketing
Financial
Telecom
What people are saying…!
4!
Confidential!
WHO ARE WE
!


TRUSTED
by over 10,000
companies as their big
data app platform!
!


BACKED
by top Silicon Valley
investors True Ventures,
Rembrandt VP, Bain
Capital!
!
!
!
FOUNDED !
in 2008, with
headquarters in San
Francisco!
5!
HADOOP APP INTELLIGENCE
Confidential!
DEVELOPERS, OPS TEAMS, AND CIOS ASKED US
Can you help us improve the quality, reliability and manageability
of all our big data applications? !
!
By visualizing our entire data pipeline!
!
By tracking exactly how our big data apps behave at runtime and
pinpointing bottlenecks!
!
By helping us understand how our departments, teams, and other
segments consume big data resources and deliver value!
!7!
Confidential!
PERFORMANCE MANAGEMENT FOR HADOOP APPS
PERFORMANCE MANAGEMENT FOR
HADOOP APPLICATIONS
higher quality
hadoop apps
BUILD
hadoop apps
more reliably
RUN
hadoop apps
more effectively
MANAGE
BUILD HIGHER QUALITY APPS
Confidential!
BUILD HIGHER QUALITY HADOOP APPS
10!
SOURCES OPERATIONS !
(Functions, filters, joins, and aggregators)
RESULTS
Fully visualize your entire data pipeline
 Quickly and easily identify execution errors
Confidential!11!
BUILD HIGHER QUALITY HADOOP APPS
Fully visualize your entire data pipeline
 Quickly and easily identify execution errors
RUN APPS MORE RELIABLY
Confidential!
RUN HADOOP APPS MORE RELIABLY
13!
CURRENTLY EXECUTING
Watch your apps execute in real time
Easily detect apps that violate SLA’s and
policies
Pinpoint bottlenecks and identify causes
Confidential!
RUN HADOOP APPS MORE RELIABLY
14!
Pinpoint bottlenecks and
identify causes
EXECUTING! WAITING!
Watch your apps execute in real time
Easily detect apps that violate SLA’s and
policies
Pinpoint bottlenecks and identify causes
DETAILED MAPPER/REDUCER STATS!
Confidential!
RUN HADOOP APPS MORE RELIABLY
15!
Pinpoint bottlenecks and
identify causes
Watch your apps execute in real time
Easily detect apps that violate SLA’s and
policies
Pinpoint bottlenecks and identify causes
View metrics for all apps on the
production cluster that failed to execute
in under 5 minutes… !
…or all applications that use more than
their allotment of mappers!
MANAGE APPS MORE EFFECTIVELY
Confidential!
MANAGE BIG DATA APPS MORE EFFECTIVELY
17!
See how all apps consume resources as they run
Segment performance by team, by department or custom tags for
role-based views, chargeback models, and capacity planning
Confidential!
MANAGE HADOOP APPS MORE EFFECTIVELY
18!
See how all apps consume resources as they run
Segment performance by team, by department or custom tags for
role-based views, chargeback models, and capacity planning
View the performance of all apps owned by the
DevOps team!
Marketing
Sales
Compliance
Data science team
QA cluster
Production cluster
Confidential!
MANAGE HADOOP APPS FOR COMPLIANCE
19!
Visualize Lineage – See exactly how each app ingests, manipulates
and outputs data
Further inspect lineage by detecting apps that write to, or read from, a
given dataset
SOURCES OPERATIONS !
(Functions, filters, joins, and aggregators)
RESULTS
Confidential!
MANAGE HADOOP APPS FOR COMPLIANCE
20!
Visualize Lineage – See exactly how each app ingests, manipulates
and outputs data
Further inspect lineage by detecting apps that write to, or read from, a
given dataset
For example, show all apps that interact
with the dataset in “rain.txt”!
Confidential!
MANAGE HADOOP APPS WITH COLLABORATION
21!
Create JIRA issues with views and data for quickly collaborating to
resolve performance problems
Integrate alerts with popular notification platforms like HipChat,
PagerDuty, & Nagios
With one click, create a JIRA issue with
a link to this view!
Confidential!
MANAGE HADOOP APPS WITH INTEGRATION
22!
Create JIRA issues with views and data for quickly collaborating to
resolve performance problems
Integrate alerts with popular notification platforms like HipChat,
PagerDuty, & Nagios
Automatically send app status
notifications via webhooks or JMX !
ARCHITECTURE / DEMO
Confidential!
End-to-end operational telemetry metadata for big data applications!
!
Accessible via Web browser, command-line interface (CLI), or simple search queries!
!
Easy integrations through JMX and upcoming Driven SDK!
DRIVEN ARCHITECTURE
Telemetry metadata!
(SSL)!
YARN!
HADOOP APPS AND INFRASTRUCTURE
APPLICATIONS!
Plugin!
24!
HADOOP CLUSTERS!
WARfiles!
Web App!
Server!
Server!
Web CLI JMX!
Web App!
Server!
Confidential!
DELIVERING OPERATIONAL EXCELLENCE
“The coolest part about Driven
is being able to visualize data
pipelines and inspect
components in real time for
easy troubleshooting and
optimization. I don't know of
any other tool that's close in
functionality.”

- Neville Li
Software Engineer, Spotify
25!
“With Driven, it’s easy to see
how our apps use the data.
When there’s an exception,
Driven shows the history, so we
can learn exactly what went
wrong. That’s a huge time
saver.”"

- Niels Boldt
Lead Software Engineer, Mojn
Confidential!
PERFORMANCE MANAGEMENT FOR HADOOP APPS
PERFORMANCE MANAGEMENT FOR
HADOOP APPLICATIONS
higher quality
hadoop apps
BUILD
hadoop apps
more reliably
RUN
hadoop apps
more effectively
MANAGE
QUESTIONS

How To Get Hadoop App Intelligence with Driven

  • 1.
    How to GetHadoop Application Intelligence with Driven
  • 2.
    Confidential! WHY NOW 2! As Hadoopapplications become the engine of your data management strategy, they must meet higher standards of quality, reliability, and manageability.
  • 3.
    Confidential! WE’RE THE FORCEBEHIND CASCADING… 3! Cascading is a proven platform for building and deploying big data applications on Hadoop with 10,000+ production deployments! Java, Scala (Scalding), SQL! 
 
 SIMPLE Ensure best practices ! at any scale thanks to ! easy-to-learn design principles! 
 
 FLEXIBLE Leverage existing Java, Scala, and SQL skills and easily adapt to new systems! 
 
 RELIABLE Always get optimal performance and ! reliability for big data applications! !
  • 4.
    Confidential! … POWERING BIGDATA APPS ACROSS INDUSTRIES ! Social Media Consumer & Retail Business Services Ad & Marketing Financial Telecom What people are saying…! 4!
  • 5.
    Confidential! WHO ARE WE ! 
 TRUSTED byover 10,000 companies as their big data app platform! ! 
 BACKED by top Silicon Valley investors True Ventures, Rembrandt VP, Bain Capital! ! ! ! FOUNDED ! in 2008, with headquarters in San Francisco! 5!
  • 6.
  • 7.
    Confidential! DEVELOPERS, OPS TEAMS,AND CIOS ASKED US Can you help us improve the quality, reliability and manageability of all our big data applications? ! ! By visualizing our entire data pipeline! ! By tracking exactly how our big data apps behave at runtime and pinpointing bottlenecks! ! By helping us understand how our departments, teams, and other segments consume big data resources and deliver value! !7!
  • 8.
    Confidential! PERFORMANCE MANAGEMENT FORHADOOP APPS PERFORMANCE MANAGEMENT FOR HADOOP APPLICATIONS higher quality hadoop apps BUILD hadoop apps more reliably RUN hadoop apps more effectively MANAGE
  • 9.
  • 10.
    Confidential! BUILD HIGHER QUALITYHADOOP APPS 10! SOURCES OPERATIONS ! (Functions, filters, joins, and aggregators) RESULTS Fully visualize your entire data pipeline Quickly and easily identify execution errors
  • 11.
    Confidential!11! BUILD HIGHER QUALITYHADOOP APPS Fully visualize your entire data pipeline Quickly and easily identify execution errors
  • 12.
    RUN APPS MORERELIABLY
  • 13.
    Confidential! RUN HADOOP APPSMORE RELIABLY 13! CURRENTLY EXECUTING Watch your apps execute in real time Easily detect apps that violate SLA’s and policies Pinpoint bottlenecks and identify causes
  • 14.
    Confidential! RUN HADOOP APPSMORE RELIABLY 14! Pinpoint bottlenecks and identify causes EXECUTING! WAITING! Watch your apps execute in real time Easily detect apps that violate SLA’s and policies Pinpoint bottlenecks and identify causes DETAILED MAPPER/REDUCER STATS!
  • 15.
    Confidential! RUN HADOOP APPSMORE RELIABLY 15! Pinpoint bottlenecks and identify causes Watch your apps execute in real time Easily detect apps that violate SLA’s and policies Pinpoint bottlenecks and identify causes View metrics for all apps on the production cluster that failed to execute in under 5 minutes… ! …or all applications that use more than their allotment of mappers!
  • 16.
    MANAGE APPS MOREEFFECTIVELY
  • 17.
    Confidential! MANAGE BIG DATAAPPS MORE EFFECTIVELY 17! See how all apps consume resources as they run Segment performance by team, by department or custom tags for role-based views, chargeback models, and capacity planning
  • 18.
    Confidential! MANAGE HADOOP APPSMORE EFFECTIVELY 18! See how all apps consume resources as they run Segment performance by team, by department or custom tags for role-based views, chargeback models, and capacity planning View the performance of all apps owned by the DevOps team! Marketing Sales Compliance Data science team QA cluster Production cluster
  • 19.
    Confidential! MANAGE HADOOP APPSFOR COMPLIANCE 19! Visualize Lineage – See exactly how each app ingests, manipulates and outputs data Further inspect lineage by detecting apps that write to, or read from, a given dataset SOURCES OPERATIONS ! (Functions, filters, joins, and aggregators) RESULTS
  • 20.
    Confidential! MANAGE HADOOP APPSFOR COMPLIANCE 20! Visualize Lineage – See exactly how each app ingests, manipulates and outputs data Further inspect lineage by detecting apps that write to, or read from, a given dataset For example, show all apps that interact with the dataset in “rain.txt”!
  • 21.
    Confidential! MANAGE HADOOP APPSWITH COLLABORATION 21! Create JIRA issues with views and data for quickly collaborating to resolve performance problems Integrate alerts with popular notification platforms like HipChat, PagerDuty, & Nagios With one click, create a JIRA issue with a link to this view!
  • 22.
    Confidential! MANAGE HADOOP APPSWITH INTEGRATION 22! Create JIRA issues with views and data for quickly collaborating to resolve performance problems Integrate alerts with popular notification platforms like HipChat, PagerDuty, & Nagios Automatically send app status notifications via webhooks or JMX !
  • 23.
  • 24.
    Confidential! End-to-end operational telemetrymetadata for big data applications! ! Accessible via Web browser, command-line interface (CLI), or simple search queries! ! Easy integrations through JMX and upcoming Driven SDK! DRIVEN ARCHITECTURE Telemetry metadata! (SSL)! YARN! HADOOP APPS AND INFRASTRUCTURE APPLICATIONS! Plugin! 24! HADOOP CLUSTERS! WARfiles! Web App! Server! Server! Web CLI JMX! Web App! Server!
  • 25.
    Confidential! DELIVERING OPERATIONAL EXCELLENCE “Thecoolest part about Driven is being able to visualize data pipelines and inspect components in real time for easy troubleshooting and optimization. I don't know of any other tool that's close in functionality.” - Neville Li Software Engineer, Spotify 25! “With Driven, it’s easy to see how our apps use the data. When there’s an exception, Driven shows the history, so we can learn exactly what went wrong. That’s a huge time saver.”" - Niels Boldt Lead Software Engineer, Mojn
  • 26.
    Confidential! PERFORMANCE MANAGEMENT FORHADOOP APPS PERFORMANCE MANAGEMENT FOR HADOOP APPLICATIONS higher quality hadoop apps BUILD hadoop apps more reliably RUN hadoop apps more effectively MANAGE
  • 27.