Skip to content

werowe/sparkStreamingScala

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

This Scala Spark Streaming program reads streaming CSV price data from files, transforms them into DataFrames, and performs live analytics using SQL. Here’s a step-by-step explanation:


1. Spark Streaming Setup

  • Imports Spark SQL and streaming libraries.
  • Creates a Spark Streaming context (ssc) that processes new data every 5 seconds.
  • Sets up a stream to watch the /tmp/amazon/ directory for new text files (each file assumed to contain price data).

2. Data Modeling

  • Defines a Scala case class Prices with two fields: price and time, representing a price value and its timestamp (both as strings).

3. Parsing and DataFrame Generation

  • For each RDD (batch) of lines:
    • Splits each line by commas (assumes format "price,time").
    • Converts the split arrays to Prices case class objects.
    • Converts the collection of Prices into a Spark SQL DataFrame.
    • Registers the DataFrame as a temporary table named "prices".
    • Shows the contents of the DataFrame with .show().

4. SQL Analytics

  • Executes a SQL query (select count(*) from prices) to count how many price entries exist in the current batch.
  • Prints the count to the console.

5. Streaming Output

  • Also prints all incoming lines directly with lines.print().
  • Starts the streaming context to begin processing and continues until stopped.

Summary

  • Purpose: Performs real-time streaming ingestion, transformation, and SQL-based analysis of Amazon-style (price, time) CSV files dropped in a local directory.
  • Features: Live monitoring, immediate conversion of each batch of records into a DataFrame, on-the-fly SQL, and streaming batch output.
  • Use Case: Useful for ingesting live price or log feeds into Spark for ETL, visualization, or monitoring dashboards.

This workflow demonstrates how to go from live file data to structured SQL analysis in Spark Streaming, using minimal code and strong type safety.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages