Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Datasets

This page lists the 11 datasets used in our journal submission, which are freely accessible for research purposes. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment.

Among these datasets, 6 of them (Hadoop, Spark, Windows, Linux, Apache, Thunderbird) are collected for the extended experiments in our journal submission. For datasets (BGL, HPC, HDFS, Zookeeper, Proxifier) used in the previous conference paper, we host only sample datasets (2k lines) here due to their large size. If you are interested in the full datasets, please request the logs at Zenodo or visit the source links wherever applicable. We will send you the full datasets upon requests.

Details

Software System Dataset Name #Messages Message Length #Events Source Link
Distributed systems
HDFS HDFS 4,747,963 10-102 376 Link
Hadoop Hadoop 2,000 6-48 116
Spark Spark 2,000 6-22 36
Zookeeper Zookeeper 74,380 8-27 80
Operating systems
Windows Windows 2,000 6-22 50
Linux Linux 2,000 7-25 123
Server applications
Apache Web server Apache 2,000 5-10 6
Supercomputers
Blue Gene/L BGL 4,747,963 10-102 376 Link
HPC HPC 433,490 6-104 105
Thunderbird Thunderbird 2,000 11-133 154
Standalone software
Proxifier Proxifier 10,108 10-27 8

Publications using these datasets

License

The log datasets are freely available ONLY for research purposes.

LogPAI Team, 2018