eDiscovery and Litigation Support

Current Articles | RSS Feed RSS Feed

How Does Digital Reef’s eDiscovery Solution Utilize “Grid Computing”?

  
  
  
C  Documents and Settings Wizard Presentation My Documents My Pictures eDiscovery solution resized 600By Michael McClelland, Senior Systems Engineer

I am asked this question quite often. Mostly because “Grid Computing” has become such an overloaded marketing term, that people want to understand what it means with regards to the way Digital Reef’s eDiscovery solution does processing and indexing.  I will try to explain how we use the “grid” in simplest terms because these are the least likely to get me in trouble with hardcore computer scientists who actually develop and optimize these ideas (like Steve Akers, Digital Reef’s Founder).

Grid computing is a term which has evolved from its early days as something akin to a single “virtual supercomputer” to a more utilitarian “cloud computing” model (though the term “cloud” is loaded in and of itself).  This is mostly because people are getting better at solving problems in parallel, and the use of computing resources is better reflecting that.  You are far better off with discrete, redundant computing units doing simple things in parallel, than trying to emulate a massive, monolithic computing unit (that was probably never designed to be massively parallel in the first place).

Electronically stored information (ESI) processing is a good problem space for “cloud” or “grid” computing because the problem (extraction and indexing) can easily be subdivided up into smaller, discrete problems.  All you need is a coordinator to watch the sub-jobs and collect up the results.  Doing something like an MD5 hash or a checksum is a bad candidate for grid computing because each calculation relies on the last one to be effective.  Hashes and Checksums are often an extremely linear problem space.

When it comes to ESI extraction, indexing and especially export it is extremely helpful to be able to spread that job across lots of CPUs, RAM and, importantly, Network Cards (why go 1 or 2 GB/second when you can go 60?)

Digital Reef was designed with grid architecture from day one.  We haven’t had to retrofit “distributed computing” into the product design like some vendors have, and you can run our product on commodity hardware making it cheaper and easier for us to be faster than anything else out there.  Combine that with the massive drop in price in storage over the last few years (3 TB drives for $99?  Insanity) and you have the basis for a highly scalable processing tool.

In a nutshell, we are a grid because we were “born this way.”  (Okay, that was pretty terrible… mental note: Don’t quote Lady Gaga lyrics when discussing parallelization, next time go with Amdahl’s law or something).

Questions?

Ask the Experts at Digital Reef