HADOOP

Main Purpose

Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.

Benefits of this Resource

Incredibly robust for large sets of data
Does not require high-end hardware
Currently open source but there are other vendors that have more sophisticated software utilizing the Hadoop platform (Cloudera, IBM BigInsights)

Limitations

Primarily used for visualizing big data. SQL based.

File Type Compatibility

Level of Expertise Required

High

Where to access

Apache Hadoop
Cloudera
SAS
IBM

Sources to Learn

coming soon