GitHub - werowe/sparkStreamingScala

This Scala Spark Streaming program reads streaming CSV price data from files, transforms them into DataFrames, and performs live analytics using SQL. Here’s a step-by-step explanation:

1. Spark Streaming Setup

Imports Spark SQL and streaming libraries.
Creates a Spark Streaming context (ssc) that processes new data every 5 seconds.
Sets up a stream to watch the /tmp/amazon/ directory for new text files (each file assumed to contain price data).

2. Data Modeling

Defines a Scala case class Prices with two fields: price and time, representing a price value and its timestamp (both as strings).

3. Parsing and DataFrame Generation

For each RDD (batch) of lines:
- Splits each line by commas (assumes format "price,time").
- Converts the split arrays to Prices case class objects.
- Converts the collection of Prices into a Spark SQL DataFrame.
- Registers the DataFrame as a temporary table named "prices".
- Shows the contents of the DataFrame with .show().

4. SQL Analytics

Executes a SQL query (select count(*) from prices) to count how many price entries exist in the current batch.
Prints the count to the console.

5. Streaming Output

Also prints all incoming lines directly with lines.print().
Starts the streaming context to begin processing and continues until stopped.

Summary

Purpose: Performs real-time streaming ingestion, transformation, and SQL-based analysis of Amazon-style (price, time) CSV files dropped in a local directory.
Features: Live monitoring, immediate conversion of each batch of records into a DataFrame, on-the-fly SQL, and streaming batch output.
Use Case: Useful for ingesting live price or log feeds into Spark for ETL, visualization, or monitoring dashboards.

This workflow demonstrates how to go from live file data to structured SQL analysis in Spark Streaming, using minimal code and strong type safety.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
stockStreamingMonitor.scala		stockStreamingMonitor.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Spark Streaming Setup

2. Data Modeling

3. Parsing and DataFrame Generation

4. SQL Analytics

5. Streaming Output

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1. Spark Streaming Setup

2. Data Modeling

3. Parsing and DataFrame Generation

4. SQL Analytics

5. Streaming Output

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages