Logparser provides a toolkit and benchmarks for automated log parsing, which is a crucial step towards structured log analytics. By applying logparser, users can automatically learn event templates from unstructured logs and convert raw log messages into a sequence of structured events. In the literature, the process of log parsing is sometimes refered to as message template extraction, log key extraction, or log message clustering.

An illustrative example of log parsing
👉 Read the docs: https://logparser.readthedocs.io
🔭 If you use any of our tools or benchmarks in your research for publication, please kindly cite the following papers.
- [ICSE'19] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. To appear in International Conference on Software Engineering (ICSE), 2019.
- [DSN'16] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.
- [Arxiv'18] Pinjia He, Jieming Zhu, Hongyu Zhang, Pengcheng Xu, Zibin Zheng, and Michael R. Lyu. A Directed Acyclic Graph Approach to Online Log Parsing, 2018.
Input: A raw log file. Each line of the file follows "ID\tword1 word2 word3"
Output: Two parts. One is splitted log messages (only contains log ID for simplicity) in different text files. The other is the templates file which contains all templates.
Examples: Before running the examples, please copy the parser source file to the same directory.
- Example1: This file is a simple example to demonstrate the usage of Drain. The usage of other log parsers is similar.
- Evaluation of Drain: This folder provides a package for you to evaluate the Drain log parser on 2k HDFS dataset. You could simply run the evaluateDrain.py file.
In data, there are 11 datasets for you to play with. Each dataset contains several text files.
- rawlog.log: The raw log messages with ID. "ID\tword1 word2 word3"
- template[0-9]+: The log messages belong to a certain template.
- templates: The text of templates.
| Tools | References |
|---|---|
| SLCT | [IPOM'03] Risto Vaarandi. A Data Clustering Algorithm for Mining Patterns from Event Logs, 2003 |
| AEL | [QSIC'08] Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, Gilbert Hamann. Abstracting Execution Logs to Execution Events for Enterprise Applications, 2008 [JSME'08] Zhen Ming Jiang, Ahmed E. Hassan, Gilbert Hamann, Parminder Flora. An Automated Approach for Abstracting Execution Logs to Execution Events, 2008 |
| IPLoM | [KDD'09] Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios. Clustering Event Logs Using Iterative Partitioning, 2009 [TKDE'12] Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios. A Lightweight Algorithm for Message Type Extraction in System Application Logs, 2012 |
| LKE | [ICDM'09] Qiang Fu, Jian-Guang Lou, Yi Wang, Jiang Li. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis, 2009 |
| LFA | [MSR'10] Meiyappan Nagappan, Mladen A. Vouk. Abstracting Log Lines to Log Event Types for Mining Software System Logs, 2010 |
| LogSig | [CIKM'11] Liang Tang, Tao Li, Chang-Shing Perng. LogSig: Generating System Events from Raw Textual Logs, 2011 |
| SHISO | [SCC'13] Masayoshi Mizutani. Incremental Mining of System Log Format, 2013 |
| LogCluster | [CNSM'15] Risto Vaarandi, Mauno Pihelgas. LogCluster - A Data Clustering and Pattern Mining Algorithm for Event Logs, 2015 |
| LenMa | [CNSM'15] Keiichi Shima. Length Matters: Clustering System Log Messages using Length of Words, 2015. |
| LogMine | [CIKM'16] Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Geoff Jiang, Adbullah Mueen. LogMine: Fast Pattern Recognition for Log Analytics, 2016 |
| Spell | [ICDM'16] Min Du, Feifei Li. Spell: Streaming Parsing of System Event Logs, 2016 |
| Drainconf | [ICWS'17] Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. Drain: An Online Log Parsing Approach with Fixed Depth Tree, 2017 |
| MoLFI | [ICPC'18] Salma Messaoudi, Annibale Panichella, Domenico Bianculli, Lionel Briand, Raimondas Sasnauskas. A Search-based Approach for Accurate Identification of Log Message Formats, 2018 |
| Drainjournal | [Arxiv'18] Pinjia He, Jieming Zhu, Hongyu Zhang, Pengcheng Xu, Zibin Zheng, and Michael R. Lyu. A Directed Acyclic Graph Approach to Online Log Parsing, 2018 |
Please follow the installation steps and demo in the docs to get started.
- [ICSE'19] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. To appear in International Conference on Software Engineering (ICSE), 2019.
- [TDSC'18] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. Towards Automated Log Parsing for Large-Scale Log Data Analysis. IEEE Transactions on Dependable and Secure Computing (TDSC), 2018.
- [ICWS'17] Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu. Drain: An Online Log Parsing Approach with Fixed Depth Tree. IEEE International Conference on Web Services (ICWS), 2017.
- [DSN'16] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.
Logparser is implemented based on a number of existing open-source projects:
- SLCT (C++)
- LogCluster (perl)
- LenMa (python 2)
- MoLFI (python 3)
For any questions or feedback, please post to the issue page.
Copyright © 2018, LogPAI, CUHK