This repository showcases a Java implementation of a simple Markov-chain text generator. It ingests a plain-text corpus, learns word transitions (using a first‑order Markov model), and generates whimsical new sentences that statistically resemble the source text. For sample I/O traces and additional information, visit the project page:
https://www.rsiddiq.com/software-design.html
Why it matters: Markov chains are a foundational concept for probabilistic language modeling. This project demonstrates data ingestion, frequency‑based modeling, and random sampling; core ideas that generalize to modern NLP workflows.
- Clean Java implementation of a first‑order Markov chain for text generation
- Corpus‑driven modeling: preserves frequency by storing duplicate successors
- Sentence bootstrapping: seeds generation from a special
BEGINS_SENTENCEkey - Robust parsing: ignores blank lines, handles punctuation end‑markers
- Deterministic API surface:
addFromFile,addLine,addWord,getSentence,randomWord,endsWithPunctuation
├── Markov-Text-Generation/
│ ├── app/
│ │ ├── src/
│ │ │ ├── main/
│ │ │ │ ├── java/
│ │ │ │ │ ├── Main.java
│ │ │ │ │ └── Markov.java
│ │ │ ├── test/
│ │ │ │ ├── java/
│ │ │ │ │ └── MarkovTest.java
│ │ ├── build.gradle
│ │ ├── azkaban.txt
│ │ ├── cloudy.txt
│ │ ├── hamlet.txt
│ │ ├── phrases.txt
│ │ └── spam.txt
│ ├── gradle/
│ │ ├── wrapper/
│ │ │ ├── gradle-wrapper.jar
│ │ │ └── gradle-wrapper.properties
│ │ └── libs.versions.toml
│ ├── .gitattributes
│ ├── .gitignore
│ ├── gradle.properties
│ ├── gradlew
│ ├── gradlew.bat
│ └── settings.gradle
- Compile
javac Markov-Text-Generation/app/src/main/java/Main.java Markov-Text-Generation/app/src/main/java/Markov.java Markov-Text-Generation/app/src/test/java/MarkovTest.java
- Run (example driver)
java Main
Tip: Try different corpora (e.g.,
spam.txt,hamlet.txt) to see how style and vocabulary affect generated sentences.
- State:
HashMap<String, ArrayList<String>> wordsmaps each token to the multiset of followers. - Frequency proxy: Duplicates are stored to preserve empirical transition probabilities.
- Sentence logic: Start from
BEGINS_SENTENCE; stop when a token ends with punctuation inPUNCTUATION_MARKS(e.g.,. ! ?). - Error handling: File I/O guarded; punctuation checks isolate into
endsWithPunctuation(String)for testability.
This mirrors a classic teaching pattern: simple data structures + clear APIs + randomized outputs for quick visual feedback.
- Verify constructor initializes
wordswith theBEGINS_SENTENCEkey - Unit-test
addFromFileon tiny corpora (e.g.,Hello World.) - Assert
getSentence()returns a non-empty string ending with punctuation - Validate
endsWithPunctuationfor positive and negative cases
- Ability to translate a written specification into a maintainable Java implementation
- Competence with collections (
HashMap,ArrayList) and file I/O - Sensible API design enabling unit testing and reuse
- Clear documentation and repository hygiene
Feel free to use, modify, and extend for learning and portfolio purposes.