Contact Us

eDiscovery and Litigation Support

Current Articles | RSS Feed RSS Feed

Turning Data into Actionable Information

  
  
  

Let's talk about a couple of concepts around making storage infrastructure more cost effective and extending the concept of cost effective storage into a "data management lifestyle", where you can turn data into actionable information. The topics may not seem related, but they are actually closely connected.

The first concept is simple and involves storage inventory and reclamation. In our travels, we run into a significant number of people who tell us, "I don't know what I have"; "I don't know where certain types of documents reside"; and "I don't know the value or the risk associated with the information that I am storing". Getting out ahead of this problem has been a tricky, if not an impossible exercise in the past. 

Presently, a number of customers are using our technology to take inventory of what is stored in their infrastructure. Once they know what they have, they can decide if it is worth keeping or if it is safe to discard. This sounds like a straight-forward task, but being able to do this when you're dealing with large collections of data is very complicated and hasn't been possible with traditional search engine solutions. Here are some reasons why:

1.The data inventory problem is a very large one, because many organizations are dealing with terabytes or petabytes of data. Traditional products are not built to handle enough information to make data inventory a tractable task (in terms of the human effort required to manage and administer the process). Search engines, for example, build a single index of this information, but when it gets to a certain size, it becomes intractable to manage--because search is not designed to manage the amount of data found in most enterprises.

2.The task is not just about indexing all of the data. It is about enabling search, grouping according to content, and comparison to other content--for documents, spreadsheets, PDFs, email, and instant messages (IMs). Doing this allows you to see the information context and place a value on the information. Digital Reef does just that, on a very large scale--even if you didn't know what data you had to begin with. This is a first and it is very necessary, because the old approach (keyword roulette) doesn't work when you have 100 terabytes of data and it changes every day.

3.Enterprises need much more than a file count. Context identification is critical. It goes way beyond a file count and/or keyword search capability. It actually helps you understand what is in your data collections. In other words, you get an "enterprise topology", so you know what you and everyone else in the organization has generated or downloaded and put into your content stores. Context identification and analysis functionality is also a Digital Reef first.

4.Other approaches use too much manual effort and require too much apriori knowledge of the data stores. Digital Reef automates the process, so that an operator can see the data they have and take action on it.

5.Large amounts of data need to be managed in place. The Digital Reef platform is the first (that we know of) to manage data in place. You don't move the data to manage it. It enables one operator to move and convert data from one area of storage to another and one file format to another, if need be. With Digital Reef, this is a largely automated process.

The benefits of having this type of inventory and actionable window into data stores include the ability to save storage space and the administrative overhead of managing storage that houses unnecessary data. You can also identify and de-duplicate or delete unnecessary data. By knowing what data categories exist, you can find documents quickly, see other documents that are most like them (context), and make use of information that would have been hard to find using keyword search. This means that responding to electronic discovery requests becomes much less expensive--because one person can find all documents that may be responsive to a case from one location. You don't have to index multiple sub-divisions of the content store. You can convert the data into a form that is "consumable" by a legal review platform. This had been a manual process in the past, but today, it can be automated.

So there is huge ROI to having a scalable system that can inventory data, make it categorically available, and then give one operator the capability to de-duplicate, move, and otherwise work with it. This is what I refer to as "turning data into information"--providing the capabilities for understanding and controlling data in a way that allows you to use it for the maximum benefit of the organization. This is what turns data into true information assets.