A blog focusing on unstructured data - topics that address the challenges, best practices, and technologies

Navigating Unstructured Data

Why We (All) Need to Pay Attention to Unstructured Data

Unstructured data is growing at a greater rate than any other form of enterprise data
(see figure below: consumption of enterprise storage). And volume is not the only issue. There is also risk--litigation risk, compliance risk, and security risk--associated with the unstructured data stored on servers scattered all over corporations and government organizations. When I talk to enterprises, one of the most prevalent statements I hear is, "We don't know what we have, because we don't have any reasonable way to determine what is in our unstructured data".

When it comes to structured data (data in databases), enterprises know what they have. But unstructured data is exactly what its name implies "unstructured" --which makes it very difficult to get a handle on. In an attempt to control it, users rely on contrived structure known as taxonomies. To create taxonomies, they use products that are not exact (are error prone) and do not account for the dynamic nature of unstructured content. To add to the challenge, unstructured content is constantly changing. Users download content from the Internet (both appropriate and inappropriate content) and save it on hard drives. These users create documents and emails by the thousands and send them to other users (literally) at near light speed. Every day the unstructured data component of the risk equation grows, morphing and taking the shape of whatever is happening at that moment in the business.

Unstructured data is fraught with risk and it is changing constantly--a data management nightmare. And, as I mentioned above, traditional tools use a pre-defined taxonomy that requires very specific expertise to create. Unfortunately for users, this approach isn't practical in a world where the content in the data changes hourly.

So, on one side of the coin is the risk associated with unstructured data. On the other side of the coin there is value. Until an enterprise understands what intellectual property and other valuable unstructured data assets they have on their servers, in their SharePoint environments, and in their storage infrastructure; they cannot leverage the expertise of their own people that is locked away, hidden somewhere within their own company. The need to realize the maximum value of data makes unstructured data management tools that can handle and present enormous volumes of data in the context of a user's interests a must have in today's data environment.

I also want to touch on traditional keyword search, because the current "state of the art" in keyword search exposes the inadequacy of data analysis as most people know it today. What is required is a more comprehensive and useful view into the unstructured data--one that can grow as the enterprise data pile grows and can provide a foundation for search that makes it more effective and accurate.

Next time, I'll take a look at some technology issues that make this problem a difficult problem to solve. I'll also touch on the Digital Reef approach, including a look at our similarity engine and why it is designed the way it is. I'll talk about why unstructured data should "speak for itself" if an enterprise really wants to gain control over it.

I'll be interested to hear your thoughts.

Labels: , , ,

Digg it   |   del.icio.us   |   reddit   |   Add to Technorati Faves

0 Comments:

Post a Comment

<< Back to Digital Reef's Blog

Steve Akers
Founder & CTO
Digital Reef

view bio

A highly successful entrepreneur, he has spent his career designing technology solutions that solve complicated, large scale business problems.

Recommended Links

Receive updates: