Authors: Steve Bryson, David Kenwright, Michael Cox, David Ellsworth and Robert Haimes
Date: Aug. 1999
From: Communications of the ACM(Vol. 42, Issue 8)
Publisher: Association for Computing Machinery, Inc.
Document Type: Article
Length: 4,557 words

Analyzing large amounts of data presents a number of technical challenges. Simply getting all the data for analysis stresses even high-end hardware; it can take an hour to load a 100GB data set into memory--assuming you have 100GB of memory to use. Loading the data little by little results in long times for a single pass through the data. These large data sets are interesting precisely because they contain new and often complex phenomena. An overview of the data set can be crucial to understanding the phenomena they represent, as well as their context.

