Skip to content

r-siddiq/Markov-Text-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Markov Text Generation (Java)

This repository showcases a Java implementation of a simple Markov-chain text generator. It ingests a plain-text corpus, learns word transitions (using a first‑order Markov model), and generates whimsical new sentences that statistically resemble the source text. For sample I/O traces and additional information, visit the project page:
https://www.rsiddiq.com/software-design.html

Why it matters: Markov chains are a foundational concept for probabilistic language modeling. This project demonstrates data ingestion, frequency‑based modeling, and random sampling; core ideas that generalize to modern NLP workflows.

Highlights

  • Clean Java implementation of a first‑order Markov chain for text generation
  • Corpus‑driven modeling: preserves frequency by storing duplicate successors
  • Sentence bootstrapping: seeds generation from a special BEGINS_SENTENCE key
  • Robust parsing: ignores blank lines, handles punctuation end‑markers
  • Deterministic API surface: addFromFile, addLine, addWord, getSentence, randomWord, endsWithPunctuation

Repository Structure

├── Markov-Text-Generation/
│   ├── app/
│   │   ├── src/
│   │   │   ├── main/
│   │   │   │   ├── java/
│   │   │   │   │   ├── Main.java
│   │   │   │   │   └── Markov.java
│   │   │   ├── test/
│   │   │   │   ├── java/
│   │   │   │   │   └── MarkovTest.java
│   │   ├── build.gradle
│   │   ├── azkaban.txt
│   │   ├── cloudy.txt
│   │   ├── hamlet.txt
│   │   ├── phrases.txt
│   │   └── spam.txt
│   ├── gradle/
│   │   ├── wrapper/
│   │   │   ├── gradle-wrapper.jar
│   │   │   └── gradle-wrapper.properties
│   │   └── libs.versions.toml
│   ├── .gitattributes
│   ├── .gitignore
│   ├── gradle.properties
│   ├── gradlew
│   ├── gradlew.bat
│   └── settings.gradle

Quick Start

  1. Compile
    javac Markov-Text-Generation/app/src/main/java/Main.java Markov-Text-Generation/app/src/main/java/Markov.java Markov-Text-Generation/app/src/test/java/MarkovTest.java
  2. Run (example driver)
    java Main

Tip: Try different corpora (e.g., spam.txt, hamlet.txt) to see how style and vocabulary affect generated sentences.

Design Overview

  • State: HashMap<String, ArrayList<String>> words maps each token to the multiset of followers.
  • Frequency proxy: Duplicates are stored to preserve empirical transition probabilities.
  • Sentence logic: Start from BEGINS_SENTENCE; stop when a token ends with punctuation in PUNCTUATION_MARKS (e.g., . ! ?).
  • Error handling: File I/O guarded; punctuation checks isolate into endsWithPunctuation(String) for testability.

This mirrors a classic teaching pattern: simple data structures + clear APIs + randomized outputs for quick visual feedback.

Testing Ideas

  • Verify constructor initializes words with the BEGINS_SENTENCE key
  • Unit-test addFromFile on tiny corpora (e.g., Hello World.)
  • Assert getSentence() returns a non-empty string ending with punctuation
  • Validate endsWithPunctuation for positive and negative cases

What This Demonstrates Professionally

  • Ability to translate a written specification into a maintainable Java implementation
  • Competence with collections (HashMap, ArrayList) and file I/O
  • Sensible API design enabling unit testing and reuse
  • Clear documentation and repository hygiene

✅ License / Usage

Feel free to use, modify, and extend for learning and portfolio purposes.

About

Java Gradle project implementing Markov chain text generation, combining file I/O for input, collections for transitions, and randomness to generate probabilistic, varied output text.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages