Skip to content

brej-29/disaster-tweets-nlp-model-benchmarks

Repository files navigation

🛰️ Disaster Tweets NLP – Model Benchmarking

End-to-end NLP text classification on Kaggle “Disaster Tweets” — baseline ML + multiple deep learning architectures + transfer learning (USE) in a single reproducible notebook



Built with the tools and technologies:

Python | TensorFlow / Keras | TensorFlow Hub (USE) | scikit-learn | pandas | NumPy | Matplotlib | Jupyter Notebook

Screenshot

Disaster Tweets NLP notebook preview


Table of Contents


Overview

Disaster Tweets NLP – Model Benchmarking is a portfolio-style NLP project built from a learning notebook and upgraded into a clean, reproducible mini-project.

The goal is to classify tweets into:

  • Real disaster (target = 1)
  • Not a disaster (target = 0)

This repository is intentionally structured like a mini case study:

  • Start with a strong classical ML baseline (fast and competitive)
  • Add multiple neural architectures (Dense, RNNs, CNN)
  • Add transfer learning using Universal Sentence Encoder (USE) from TensorFlow Hub
  • Compare everything using the same evaluation metrics

Problem Statement

Tweets are short, noisy, and often ambiguous.

Examples:

  • “Forest fire near La Ronge Sask. Canada” → likely a real disaster
  • “This exam destroyed me 😭” → contains “disaster-like” words but not a real disaster

The challenge is to build models that learn context rather than reacting to keywords alone.


Dataset

This project uses Kaggle’s dataset:

Typical columns:

  • id: unique tweet id
  • keyword: (optional) keyword for the tweet
  • location: (optional) user location
  • text: tweet text (main input)
  • target: label (only in train.csv) where 1 = disaster, 0 = not disaster

Important repo note:

  • For licensing and cleanliness, you typically should NOT commit the raw Kaggle dataset into GitHub.
  • Instead, place it locally (see Dataset Setup).

Project Highlights

  • End-to-end workflow in one notebook: data → split → vectorization → multiple models → evaluation → comparison
  • Fair comparison:
    • consistent train/validation split
    • consistent evaluation metrics
  • Covers both classic NLP and modern embedding-based workflows
  • Includes reusable helper utilities (TensorBoard callback + plotting + metrics)

Approach

Data Pipeline

  1. Load dataset from CSV files
  2. Inspect class balance and sample texts
  3. Split into training and validation sets
  4. Prepare text for deep learning models using:
    • TextVectorization layer (strings → integer token sequences)
    • Embedding layer (token IDs → dense vectors)

Modeling Strategy

This notebook progresses from simplest to strongest:

  • A strong baseline model first
  • Then neural models trained from scratch
  • Then transfer learning (USE) as the high-performance benchmark
  • Finally a “10% data” experiment to show label-efficiency of transfer learning

Evaluation Metrics

For each model we compute:

  • Accuracy
  • Precision
  • Recall
  • F1 score

Why F1 matters:

  • Tweets can be ambiguous and noisy
  • Precision/recall tradeoffs matter in “disaster detection” scenarios
  • F1 gives a balanced view of classification quality

Models Implemented

Classical Baseline

Baseline pipeline:

  • TF-IDF Vectorizer
  • Multinomial Naive Bayes classifier

Why this baseline is important:

  • Fast to train
  • Surprisingly strong for short-text classification
  • Sets a minimum bar that neural models must beat

Deep Learning Models

All neural models follow the pattern:

  • Input: raw tweet strings
  • TextVectorization: strings → sequences of token IDs
  • Embedding: token IDs → trainable dense vectors
  • Architecture-specific layers
  • Output: sigmoid probability for binary classification

Neural models included:

  1. simple_dense
  2. lstm
  3. gru
  4. bidirectional
  5. conv1d

Why benchmark multiple architectures?

  • Different inductive biases:
    • RNNs capture sequential dependencies
    • CNNs capture local n-gram patterns efficiently
    • Dense baselines test if simple pooling is sufficient

Transfer Learning

Universal Sentence Encoder (USE) via TensorFlow Hub:

  • Encodes an entire sentence/tweet into a pretrained embedding vector
  • A small classifier head is trained on top

Why USE is strong:

  • Pretrained sentence embeddings often generalize well on small/medium datasets
  • Particularly useful for short texts where handcrafted features can miss semantics

Low-Data Experiment

USE model trained on only 10% of training data:

  • Shows how transfer learning behaves when labeled data is limited
  • Mirrors real-world situations where labels are expensive

Results

Final benchmark results from this notebook run:

Model Accuracy Precision Recall F1
baseline 0.792651 0.811139 0.792651 0.786219
simple_dense 0.776903 0.779556 0.776903 0.774487
lstm 0.772966 0.775649 0.772966 0.770438
gru 0.769029 0.773154 0.769029 0.765780
bidirectional 0.750656 0.752468 0.750656 0.747956
conv1d 0.780840 0.783458 0.780840 0.778533
tf_hub_sentence_encoder 0.814961 0.815272 0.814961 0.814225
tf_hub_10_percent_data 0.784777 0.790478 0.784777 0.781448

Key takeaways:

  • Best overall model here: tf_hub_sentence_encoder (highest F1)
  • Strong classical baseline: baseline (TF-IDF + NB)
  • conv1d performs well among from-scratch neural models
  • Transfer learning remains competitive even with only 10% training data

Getting Started

Project Structure

Recommended repository layout:

disaster-tweets-nlp-model-benchmarks/
├─ NLP.ipynb
├─ helper_functions.py
├─ requirements.txt
├─ screenshots/
│  ├─ results-table.png
│  ├─ dataset-preview.png
│  ├─ label-distribution.png
│  ├─ training-curves-use.png
│  └─ training-curves-baseline.png
├─ .gitignore
├─ LICENSE
└─ README.md

Prerequisites

  • Python 3.10+ recommended
  • pip installed
  • Optional: GPU for faster deep learning training

Installation

  1. Clone the repository

    git clone https://github.com/brej-29/disaster-tweets-nlp-model-benchmarks.git cd disaster-tweets-nlp-model-benchmarks

  2. Create and activate a virtual environment

Windows (PowerShell):

python -m venv .venv
.\.venv\Scripts\Activate.ps1

macOS / Linux:

python3 -m venv .venv
source .venv/bin/activate
  1. Install dependencies

    pip install -r requirements.txt


Dataset Setup

This notebook expects Kaggle dataset files to be available locally.

Option A: download Kaggle dataset zip

After unzipping, you should have:

  • train.csv
  • test.csv
  • sample_submission.csv

Option B: already extracted files

  • Place train.csv/test.csv/sample_submission.csv in the same folder as NLP.ipynb
  • Ensure filenames match the notebook expectations

Run the Notebook

  1. Start Jupyter Notebook

    jupyter notebook

(or)

jupyter lab
  1. Open NLP.ipynb
  2. Run cells top-to-bottom
  3. Recommended: Restart Kernel & Run All for full reproducibility

Optional: GPU Setup

If you have an NVIDIA GPU and want faster training:

  • Follow official TensorFlow installation guidance for your OS/CUDA setup
  • If you are on Google Colab:
    • Runtime → Change runtime type → GPU

How to Reproduce Exactly

To reproduce results cleanly:

  1. Create a fresh virtual environment
  2. Install dependencies from requirements.txt
  3. Ensure dataset files are present
  4. Run the notebook from top to bottom without skipping cells
  5. Save the results table screenshot into screenshots/results-table.png

Optional:

  • Capture package versions for strict reproducibility:

    pip freeze > requirements-freeze.txt


Notes on helper_functions.py

helper_functions.py provides reusable utilities commonly used in ML notebooks, such as:

  • TensorBoard callback creation
  • Training curve plotting
  • Metric calculation helpers

Keeping helpers separate makes the notebook easier to read and the evaluation more consistent.


Common Issues & Fixes

  1. FileNotFoundError: nlp_getting_started.zip
  • Confirm the zip file exists next to NLP.ipynb
  • Confirm the filename matches exactly
  1. Training is slow
  • Use GPU if available
  • Reduce epochs temporarily while testing
  • Use the baseline model for quick sanity checks
  1. TensorFlow Hub download takes time
  • First run may download the model from TF Hub
  • Ensure stable internet and rerun the cell if needed

Future Improvements

Ideas to upgrade this from “notebook project” to “full ML project”:

  • Add an inference script: input tweet → output disaster probability
  • Save best model + preprocessing artifacts for deployment
  • Add confusion matrix and error analysis:
    • inspect false positives/false negatives
  • Try transformer baselines (DistilBERT/BERT) and threshold tuning
  • Add experiment tracking:
    • structured results logging (CSV/JSON)
    • TensorBoard organization per model run

References


License

This project is licensed under the MIT License.
See the LICENSE file in this repository for full details.


Contact

If you’d like to discuss this project, provide feedback, or connect:

Feel free to fork the repo, open issues, or suggest improvements!

About

Benchmark NLP models on Kaggle “Disaster Tweets”: TF-IDF + Naive Bayes baseline, Keras deep nets (Dense/LSTM/GRU/BiRNN/Conv1D), and TensorFlow Hub Universal Sentence Encoder transfer learning—compared using accuracy, precision, recall, and F1.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors