The Data Module is a collection of APIs that drastically simplify working with datasets in the Hadoop filesystems (such as HDFS).
This project is part of the Cloudera Development Kit (CDK), an open source set of APIs that helps developers build robust systems and applications with CDH and Cloudera Manager.
See the online documentation for more information.