Skip to content

[EMNLP 2025] Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis

License

Notifications You must be signed in to change notification settings

MarkGHX/Profiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0


Table of Contents

Overview

  • This is the official implementation for EMNLP 2025 paper "Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis".

  • [video] | [slides] | [poster] | [paper]

Dataset

  • Our dataset is located at ./Dataset and ./Paraphrased_Dataset.
  • Dataset Statistics:
    • 6 Domains: Arxiv, Code, Creative, Essay, GCJ, Yelp

    • 6 Sources: Human + 5 AI models (GPT-3.5-Turbo, GPT-4-Turbo-Preview, Claude-3-Sonnet, Claude-3-Opus, Gemini-1.0-Pro)

    • Total Samples: 51,407 text samples (4,788 human + 46,619 AI)

      Domain Human AI-Generated AI-Paraphrased Total
      Arxiv 350 1,750 1,750 3,850
      Code 164 819 819 1,802
      Creative 1,000 4,840 4,743 10,583
      Essay 1,000 4,897 4,883 10,780
      GCJ 274 1,370 1,370 3,014
      Yelp 2,000 9,739 9,639 21,378
      Total 4,788 23,415 23,204 51,407

Code Implementation

Code Structure

The project is organized into two main Python files:

  • profiler.py: This is the main script that:

    • Parses command line arguments.
    • Sets up evaluation strategies:
      • In-domain evaluation: 5-fold cross validation when training and testing datasets are from the same distribution.
      • Out-of-domain (OOD) evaluation: Train on normal dataset, test on paraphrased dataset to evaluate robustness.
    • Implements one-vs-all strategy for multi-class origin detection:
      • For each of the 6 sources (human + 5 AI models), trains a separate binary classifier.
      • Reports ROC-AUC score for each source individually.
      • Computes average ROC-AUC across all 6 sources.
    • Saves feature files and evaluation results in a structured output directory.
  • profiler_utils.py: Contains utility functions for feature extraction using pretrained language models from Hugging Face Transformers. The features are computed by:

    • Tokenizing text with a simple completion prompt.
    • Computing context loss sequences (controlled by context_window).
    • Extracting 5 statistics per context loss sequence: mean, max, min, std, median.
    • Computing KL divergence between different pairs of context loss sequences.
    • Aggregating features across multiple detection models (if ensemble is used).
    • Saving the features as pickle files.

Running the Code

1. Set up your environment:

conda create -n profiler_env python=3.11
conda activate profiler_env
# install pytorch, please refer to https://pytorch.org/get-started/previous-versions/ to install the newest version that fits your device. We only show the install commands for the latest pytorch version.
pip3 install torch torchvision
# install other dependencies
pip install transformers scikit-learn tqdm numpy accelerate

2. Run the code:

First, please set up the GPU devices, e.g.,

export CUDA_VISIBLE_DEVICES=0,1

Then, you can run the code with the following command:

python profiler.py --task={task} --train_dataset={train_dataset} --test_dataset={test_dataset} --data_generation=1

Example commands:

Standard Setting (5-fold CV on normal data):

python profiler.py \
    --task Arxiv \
    --train_dataset normal_Arxiv \
    --test_dataset normal_Arxiv \
    --data_generation 1

Paraphrased Setting (5-fold CV on paraphrased data):

python profiler.py \
    --task Yelp \
    --train_dataset paraphrased_Yelp \
    --test_dataset paraphrased_Yelp \
    --data_generation 1

OOD Setting (train on normal, test on paraphrased):

python profiler.py \
    --task Essay \
    --train_dataset normal_Essay \
    --test_dataset paraphrased_Essay \
    --data_generation 1

Argument Reference:

Argument Default / Choices Explanation
--seed Default: 42 Sets the random seed for reproducibility in Python's random, NumPy, and PyTorch.
--task Required
Choices: Arxiv, Code, Yelp, Essay, Creative, GCJ
Specifies the task/domain for text origin detection.
--base_model Default: all
Choices: all, llama2-7b, llama2-13b, llama3-8b, gemma-2b, gemma-7b, mistral-7b, or comma-separated list (e.g., "llama3-8b,gemma-7b")
Specifies which detection model(s) to use for feature extraction. Default all uses all 6 models in an ensemble. Can specify a single model or comma-separated list for ablation studies.
--sample_clip Default: 4000 Maximum character length for each text sample. Samples longer than this will be truncated.
--context_window Default: 6 Full context window size for context analysis computation. For example, context_window=6 creates 6 target positions (±3 from center). Common values: 2, 4, 6, 8.
--train_dataset Required
Format: {normal or paraphrased}_{task} (e.g., normal_Arxiv, paraphrased_Yelp)
Indicates the training dataset. The first part specifies whether the dataset is normal or paraphrased, and the second part specifies the task/domain.
--test_dataset Required
Format: Same as --train_dataset
Indicates the testing dataset. If same as --train_dataset, performs 5-fold cross-validation. If different (e.g., train on normal_Essay, test on paraphrased_Essay), performs OOD evaluation.
--data_generation Default: 0
Choices: 0, 1
Whether to generate features (1) or load pre-computed features (0). Set to 1 for the first run to extract features.

Ablation Studies

Profiler supports flexible ablation studies through the --base_model argument:

1. Test individual detection models:

# Test with llama3-8b only
python profiler.py --task Yelp --train_dataset normal_Yelp --test_dataset normal_Yelp --base_model llama3-8b

# Test with gemma-7b only
python profiler.py --task Yelp --train_dataset normal_Yelp --test_dataset normal_Yelp --base_model gemma-7b

# Test with mistral-7b only
python profiler.py --task Yelp --train_dataset normal_Yelp --test_dataset normal_Yelp --base_model mistral-7b

2. Test model combinations:

# Test with llama models only
python profiler.py --task Yelp --train_dataset normal_Yelp --test_dataset normal_Yelp --base_model "llama2-7b,llama3-8b,llama2-13b"

# Test with all models (default ensemble)
python profiler.py --task Yelp --train_dataset normal_Yelp --test_dataset normal_Yelp --base_model all

Output Organization

  • Results are automatically organized by model configuration
  • Format: ./results/{train_dataset}_vs_{test_dataset}_model_{model_suffix}_context_window_{context_window}/
  • Examples:
    • All models: normal_Arxiv_vs_normal_Arxiv_model_all_context_window_3/
    • Single model: normal_Arxiv_vs_normal_Arxiv_model_llama3-8b_context_window_3/
    • Multiple models: normal_Arxiv_vs_normal_Arxiv_model_llama3-8b+gemma-7b_context_window_3/

Cite Our Work

If you find our work helpful, please consider citing our paper and giving us a star ☆:

@inproceedings{profiler2025,
  title={Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis},
  author={Guo, Hanxi and Cheng, Siyuan and Jin, Xiaolong and Zhang, Zhuo and Shen, Guangyu and Zhang, Kaiyuan and An, Shengwei and Tao, Guanhong and Zhang, Xiangyu},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={24892--24912},
  year={2025}
}

About

[EMNLP 2025] Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages