Copyright © 2012 Splunk Inc.




Introducing Splunk –
The Big Data Engine
5th Big Data Usergroup Meeting
Zurich, 21.01.2012
Splunk – The Big Data Company
              Company (NASDAQ: SPLK)
                  Founded 2004, first software release in 2006
                  HQ: San Francisco / Region HQ: London, Hong
                  Kong
                  Over 600 employees, based in 12 countries
                  FY2012 $120 million; +83% year-over-year


              5,000+ Customers
                  Customers in over 80 countries
                  54 of the Fortune 100
                  Largest license: 100 Terabytes per day



              2
Over 3,000 Customers in 70+ Countries

Cloud and Online Services   Education        Energy and Utilities   Financial Services and Insurance




      Government            Healthcare         Manufacturing                    Media




         Retail             Technology       Telecommunications           Travel and Leisure

                                         4
Some Splunk Big Data Customers
Customer          Daily Data Volume

                        12 TB
                         6 TB
                         4 TB
                       1.2 TB
                      900 GB
                      800 GB
              5
Big Data Comes from Machines
           Volume | Velocity | Variety | Variability


                                                                      GPS,
 Machine-generated data is one of the                                RFID,
    fastest growing, most complex                               Hypervisor,
and most valuable segments of big data                        Web Servers,
                                                          Email, Messaging
                                                     Clickstreams, Mobile,
                                                Telephony, IVR, Databases,
                                             Sensors, Telematics, Storage,
                                      Servers, Security Devices, Desktops




                               6
Big Data Technologies
                                                Aster Data        Cassandra
                                                Greenplum         Hbase
                                                                  MongoDB
                                                         Hadoop




 Single      Single           RDBMS               SQL &                  NoSQL
RDBMS       Bigger           Sharding           Map/Reduce
            RDBMS
                                                                        Map / Reduce


          Relational Database (highly structured)                 Key/Value, Tables or     Temporal, Unstructured
                                                                 Other (semi-structured)      Heterogeneous
                                                                                                                    Time
                                                             7
Splunk: the Platform for Machine Data
     Innovative, Easy to Use and Powerful

                             Ad hoc   Monitor     Report and      Custom      Developer
                             search   and alert    analyze      dashboards     Platform




           Data collection
            and indexing



                                      Splunk storage           Other Big Data stores




                                8
Apps and Solutions
Application       IT                                        Web          Business
                              Security     Compliance
Monitoring    Operations                                Intelligence     Analytics



   User Interface                        APIs                     SDK



                           Core Functions
   Access          Stats/
                                     Alerts         Reports            Dashboards
  Controls        Analytics



                                   Search

                                 Indexing

                                Collection

                                                                                     9
Scales to TBs/day and Thousands of Users
  Automatic load balancing linearly scales        Distributed search and MapReduce linearly
  indexing                                        scales search and reporting




                                             10
What Does Machine Data Look Like?
  Sources

Order Processing



  Middleware
     Error




    Care IVR




    Twitter


                               11
Machine Data Contains Critical Insights
  Sources                                   Customer ID    Order ID            Product ID


Order Processing

                                                            Order ID    Customer ID
  Middleware
     Error

                   Time Waiting On Hold


    Care IVR
                                    Customer ID


                                                       Twitter ID      Customer’s Tweet


    Twitter
                     Company’s Twitter ID
                                                  12
What do we do? Collect and index Machine Data

Customer                                                                                                                 Outside the
Facing Data                                                                                                              Datacenter
  Click-stream data                                                                                                       Manufacturing,
  Shopping cart data                                                                                                      logistics…
  Online transaction data                                                                                                 CDRs & IPDRs
                                                                                                                          Power consumption
                                Logfiles   Configs Messages   Traps         Metrics   Scripts     Changes   Tickets       RFID data
                                                              Alerts                                                      GPS data




 Windows                    Linux/Unix           Virtualization              Applications             Databases           Networking
   Registry                  Configurations      & Cloud                       Web logs                 Configurations      Configurations
   Event logs                syslog                Hypervisor                  Log4J, JMS, JMX          Audit/query         syslog
   File system               File system           Guest OS, Apps              .NET events              logs                SNMP
   sysinternals              ps, iostat, top       Cloud                       Code and scripts         Tables              netflow
                                                                                                        Schemas



                                                                       13
What do we do? Collect and index Machine Data

Customer                                                                                                                     Outside the
Facing Data                                                                                                                  Datacenter
  Click-stream data                                                                                                           Manufacturing,
  Shopping cart data                                                                                                          logistics…
  Online transaction data
                                •Any amount, any location, any source.                                                        CDRs & IPDRs
                                                                                                                              Power consumption
                                Logfiles       Configs Messages   Traps         Metrics   Scripts     Changes   Tickets       RFID data
                                                                  Alerts                                                      GPS data
                                                      No upfront schema
                                                      No custom connectors
 Windows                    Linux/Unix               Virtualization
   Registry                  Configuration
                                                      No RDBMS Applications
                                                     & Cloud           Web logs
                                                                                                          Databases
                                                                                                            Configurations
                                                                                                                              Networking
                                                                                                                                Configurations
   Event logs
   File system
                             s
                             syslog
                                                      No need to filter/forward
                                                       Hypervisor      Log4J, JMS, JMX
                                                                       .NET events
                                                                                                            Audit/query
                                                                                                            logs
                                                                                                                                syslog
                                                                                                                                SNMP
                                                       Guest OS, Apps
   sysinternals              File system               Cloud                       Code and scripts         Tables              netflow
                             ps, iostat, top                                                                Schemas



                                                                           14
Inside Universal Indexing

                                          Automatic event boundary identification



Automatic timestamp normalization




 ...enable accurate searching and
 trending by time across all data:


                                     15
Inside Universal Indexing
                                      Segmentation & dense
                                      indexing of every term




     ...enable Boolean search on
    anything in the original event:




                     16
Inside Search-time Knowledge Extraction
              Automatically discovered fields
                                                     And user-defined fields




... enable statistics and precise search on
               specific fields:




                                                17
New Approach to Heterogeneous Data
  Universal Indexing      Search-time Knowledge          Flexibility and
                                                       Fast Time to Value

• No data normalization   • Knowledge applied at      • Normalization as it’s
• Automatically handles     search-time                 needed
  timestamps              • No brittle schema to      • Faster implementation
• Parsers not required      work around               • Easy search language
• Index every term &      • Multiple views into the   • Multiple views into the
  pattern “blindly”         same data                   same data
• No attempt to           • Splunk helps find
  “understand” up front     transactions, patterns
                            and trends


                                      18
Splunk Used Across IT and the Business
                                Application
                               Management

                                Operations
                               Management

                                 Security &
                                Compliance

                                 Web and
                             Business Analytics



                  19
Provides Strong Machine Data Governance
   Provides comprehensive controls for data   Single sign-on integration enables pass-
   security, retention and integrity          through authentication of user credentials




                                         20
Splunk Big Data Strategy
Deliver ease of use, real-time analytics and enterprise capabilities
                                                                         Ad hoc
                                                                         search


                                                                        Monitor
                                                                        and alert

            Data collection
                                                                       Report and
             and indexing                                               analyze


                                Splunk storage
                                                           Other
                                                                         Custom
                                                           Stores      dashboards


                                                                       Developer
                                                                        Platform

                                   21
Deploying New Technologies is a Challenge




                    22
Splunk-Hadoop: Co-existence use cases
                                                          Real-time Analytics
      Side by Side
                                                          ETL / recommendation
                                                                  system


Splunk in-front of Hadoop
                            Collect, Visualize, Report              ETL, Archival, Long Running
                                                                              Queries




   Splunk visualize and
   secure Hadoop Data
                                                         } Combine
                            Splunk Index Hadoop Data
Splunk: Enabling the Big Data Ecosystem
  Real-time       Dashboards,
Collection and      Reports,
   Analysis      Access Controls


                                        Splunk Hadoop Connect
                                        • Reliable Data Export
                                        • Import Hadoop Data
  >      >                              Splunk App for HadoopOps
 >       >                              • End-to-end monitoring,
>         >                               troubleshooting , analysis of
                                          Hadoop environment

                                   24
Splunk Hadoop Connect

               Delivers reliable integration
               between Splunk and Hadoop
                 Export events to Hadoop
                 Explore and Browse Hadoop
                 directories
                 Import and Index Hadoop data
                 into Splunk



          25
Splunk App for HadoopOps
Monitoring the full Hadoop environment – Hadoop, Switch, OS, AS, and Database

     Splunk HadoopOps                                                                             Splunk HadoopOps
     Forwarder Package on every                                                                Dashboards, alerts and notifications,
                host                 Add       Collect &   Distributed   Monitor     Rich UI       powered by Splunk search
                                  Knowledge   Index Data     Search      & Alert   Framewor
                                                                                        k




               Host




         Operating System


           Infrastructure



                                                                  26
Splunk and Big Data

Product-based                   Integrated and                Performance
   Solution                       End-to-end                    at scale
Easy to download and           Collects data from tens of    Proven at multi-terabyte
deploy                         thousands of sources          scale per day
Pre-integrated, end-to-        Advanced real-time and        Upwards of PB under
end functionality              historical analysis of data   management
Enterprise-grade features      Fast, custom visualizations   Thousands of enterprise
                               for IT and business users     customers
                               Developer API, SDKs



                                           27
Thank You

Introducing Splunk – The Big Data Engine

  • 1.
    Copyright © 2012Splunk Inc. Introducing Splunk – The Big Data Engine 5th Big Data Usergroup Meeting Zurich, 21.01.2012
  • 2.
    Splunk – TheBig Data Company Company (NASDAQ: SPLK) Founded 2004, first software release in 2006 HQ: San Francisco / Region HQ: London, Hong Kong Over 600 employees, based in 12 countries FY2012 $120 million; +83% year-over-year 5,000+ Customers Customers in over 80 countries 54 of the Fortune 100 Largest license: 100 Terabytes per day 2
  • 3.
    Over 3,000 Customersin 70+ Countries Cloud and Online Services Education Energy and Utilities Financial Services and Insurance Government Healthcare Manufacturing Media Retail Technology Telecommunications Travel and Leisure 4
  • 4.
    Some Splunk BigData Customers Customer Daily Data Volume 12 TB 6 TB 4 TB 1.2 TB 900 GB 800 GB 5
  • 5.
    Big Data Comesfrom Machines Volume | Velocity | Variety | Variability GPS, Machine-generated data is one of the RFID, fastest growing, most complex Hypervisor, and most valuable segments of big data Web Servers, Email, Messaging Clickstreams, Mobile, Telephony, IVR, Databases, Sensors, Telematics, Storage, Servers, Security Devices, Desktops 6
  • 6.
    Big Data Technologies Aster Data Cassandra Greenplum Hbase MongoDB Hadoop Single Single RDBMS SQL & NoSQL RDBMS Bigger Sharding Map/Reduce RDBMS Map / Reduce Relational Database (highly structured) Key/Value, Tables or Temporal, Unstructured Other (semi-structured) Heterogeneous Time 7
  • 7.
    Splunk: the Platformfor Machine Data Innovative, Easy to Use and Powerful Ad hoc Monitor Report and Custom Developer search and alert analyze dashboards Platform Data collection and indexing Splunk storage Other Big Data stores 8
  • 8.
    Apps and Solutions Application IT Web Business Security Compliance Monitoring Operations Intelligence Analytics User Interface APIs SDK Core Functions Access Stats/ Alerts Reports Dashboards Controls Analytics Search Indexing Collection 9
  • 9.
    Scales to TBs/dayand Thousands of Users Automatic load balancing linearly scales Distributed search and MapReduce linearly indexing scales search and reporting 10
  • 10.
    What Does MachineData Look Like? Sources Order Processing Middleware Error Care IVR Twitter 11
  • 11.
    Machine Data ContainsCritical Insights Sources Customer ID Order ID Product ID Order Processing Order ID Customer ID Middleware Error Time Waiting On Hold Care IVR Customer ID Twitter ID Customer’s Tweet Twitter Company’s Twitter ID 12
  • 12.
    What do wedo? Collect and index Machine Data Customer Outside the Facing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… Online transaction data CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data Windows Linux/Unix Virtualization Applications Databases Networking Registry Configurations & Cloud Web logs Configurations Configurations Event logs syslog Hypervisor Log4J, JMS, JMX Audit/query syslog File system File system Guest OS, Apps .NET events logs SNMP sysinternals ps, iostat, top Cloud Code and scripts Tables netflow Schemas 13
  • 13.
    What do wedo? Collect and index Machine Data Customer Outside the Facing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… Online transaction data •Any amount, any location, any source. CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data No upfront schema No custom connectors Windows Linux/Unix Virtualization Registry Configuration No RDBMS Applications & Cloud Web logs Databases Configurations Networking Configurations Event logs File system s syslog No need to filter/forward Hypervisor Log4J, JMS, JMX .NET events Audit/query logs syslog SNMP Guest OS, Apps sysinternals File system Cloud Code and scripts Tables netflow ps, iostat, top Schemas 14
  • 14.
    Inside Universal Indexing Automatic event boundary identification Automatic timestamp normalization ...enable accurate searching and trending by time across all data: 15
  • 15.
    Inside Universal Indexing Segmentation & dense indexing of every term ...enable Boolean search on anything in the original event: 16
  • 16.
    Inside Search-time KnowledgeExtraction Automatically discovered fields And user-defined fields ... enable statistics and precise search on specific fields: 17
  • 17.
    New Approach toHeterogeneous Data Universal Indexing Search-time Knowledge Flexibility and Fast Time to Value • No data normalization • Knowledge applied at • Normalization as it’s • Automatically handles search-time needed timestamps • No brittle schema to • Faster implementation • Parsers not required work around • Easy search language • Index every term & • Multiple views into the • Multiple views into the pattern “blindly” same data same data • No attempt to • Splunk helps find “understand” up front transactions, patterns and trends 18
  • 18.
    Splunk Used AcrossIT and the Business Application Management Operations Management Security & Compliance Web and Business Analytics 19
  • 19.
    Provides Strong MachineData Governance Provides comprehensive controls for data Single sign-on integration enables pass- security, retention and integrity through authentication of user credentials 20
  • 20.
    Splunk Big DataStrategy Deliver ease of use, real-time analytics and enterprise capabilities Ad hoc search Monitor and alert Data collection Report and and indexing analyze Splunk storage Other Custom Stores dashboards Developer Platform 21
  • 21.
    Deploying New Technologiesis a Challenge 22
  • 22.
    Splunk-Hadoop: Co-existence usecases Real-time Analytics Side by Side ETL / recommendation system Splunk in-front of Hadoop Collect, Visualize, Report ETL, Archival, Long Running Queries Splunk visualize and secure Hadoop Data } Combine Splunk Index Hadoop Data
  • 23.
    Splunk: Enabling theBig Data Ecosystem Real-time Dashboards, Collection and Reports, Analysis Access Controls Splunk Hadoop Connect • Reliable Data Export • Import Hadoop Data > > Splunk App for HadoopOps > > • End-to-end monitoring, > > troubleshooting , analysis of Hadoop environment 24
  • 24.
    Splunk Hadoop Connect Delivers reliable integration between Splunk and Hadoop Export events to Hadoop Explore and Browse Hadoop directories Import and Index Hadoop data into Splunk 25
  • 25.
    Splunk App forHadoopOps Monitoring the full Hadoop environment – Hadoop, Switch, OS, AS, and Database Splunk HadoopOps Splunk HadoopOps Forwarder Package on every Dashboards, alerts and notifications, host Add Collect & Distributed Monitor Rich UI powered by Splunk search Knowledge Index Data Search & Alert Framewor k Host Operating System Infrastructure 26
  • 26.
    Splunk and BigData Product-based Integrated and Performance Solution End-to-end at scale Easy to download and Collects data from tens of Proven at multi-terabyte deploy thousands of sources scale per day Pre-integrated, end-to- Advanced real-time and Upwards of PB under end functionality historical analysis of data management Enterprise-grade features Fast, custom visualizations Thousands of enterprise for IT and business users customers Developer API, SDKs 27
  • 27.