A blog focusing on unstructured data - topics that address the challenges, best practices, and technologies

Navigating Unstructured Data

Inaugural Blog

I am the founder and CEO of Digital Reef, an enterprise software company that has recently emerged from stealth mode after two years of intensive design and development work on our unstructured data management platform. I started the company for two reasons. First, I discovered that the management of unstructured data was an unsolved problem at every large company I came in contact with. Second, I realized that it was unsolved because it was a technically difficult problem. I am a business person and a technologist. I don't believe in technology for technology sake. My career has been devoted to solving large-scale business problems with technology. Presented with a problem like the growth and complexity of understanding and working with unstructured data, I was hard pressed to let it go.

When I first encountered this problem it was presented to me as a large and growing issue that emerged from the Sarbanes-Oxley mandates of the early part of this decade. As a result of this legislation, and other regulations springing from issues with organizations caught in the largess of the dot-com "crash", companies had been forced to save vast stores of electronic business records. The obvious consequence of complying with these regulations was that the tools that existed were woefully underpowered for evaluating the content within these vast stores of information.

This content evaluation task was strangely similar to others I had encountered while working with scientists at Bell Laboratories, back when I was CTO of Lucent Technologies' Wireline Business Unit. There, we learned that evaluating large amounts of content to identify insidious threats to network and server infrastructure required new approaches and functionality that did not exist in current network security solutions. The same concepts can be applied to understanding large content stores. That is what we are doing at Digital Reef: making content easier to understand and manage.

I am embarking on this blog in the hopes of sparking discussion around technology solutions to difficult business problems. My plan is to blog about business challenges created by unstructured data--topics including eDiscovery, data storage, knowledge reuse, data security, compliance and data governance, to name a few. I also plan to provide my assessment of some of the technology solutions out there today and my thoughts about what is coming in the future.

I look forward to getting the discussion started.

Labels: , , , , , , ,

Digg it   |   del.icio.us   |   reddit   |   Add to Technorati Faves

6 Comments:

Blogger Tony Asaro said...

Steve,

The problem is a major one but often gets obscured because ownership is scattered within many enterprises. Right now eDiscovery seems to be the driving force - and in these litigious times even greater demand will be created. However, I also believe a new era of content use is required going forward.

February 27, 2009 11:04 AM
 
Blogger Steve Akers said...

Tony,

You make a good point. eDiscovery is a critical and top-of-mind problem these days, but the companies we talk to also bring up initiatives like knowledge re-use, efficient compliance auditing, and risk management. So you are correct; enterprises are looking to unearth the value in their content. And this type of advanced 'content mining', as you have referred to it in the past, is a very powerful concept and one that our platform can easily address in the real world.
— Steve

March 2, 2009 12:31 PM
 
Blogger Barry said...

Steve;
While there is probably a viable business opportunity in helping to bridge the gap between the increasing demand for logical structure in content and the level at which most content is created, it would seem that this is entire question is indicative of a deeper problem in our information culture.

While the consumption side of information has moved ahead at an astonishing pace, demanding more and more logical handles on content of all types, the creation side has stayed pretty much the same for nearly 30 years, generating text in word processing format with internal tagging designed to identify document parts (heads, paragraphs, tables, etc.) and visible style (font, color, size, spacing, etc.)
While we have masked the growing gap between creation and delivery through software, rework and other band aids, the exponential growth of content must, soon, overwhelm our ability to "fix" the problem after the fact.

As a part of the text world since the 60s, I have seen the dramatic advances in the ability to generate text that is both visibly and logically structured (Goldfarb once said, with thinly veiled ironty, that SGML does little more than enable text to act like a database.) I have also seen, however, an almost complete failure of the software industry and the vast majority of information creators of all types, to understand that our ability to leverage what we create is based on the structure built into our text, that once created, text is very difficult to uptranslate to infor or embed structure and that our myopia can take us only so far before we hit the wall.

So I guess the upshot of these musings is that while efforts such as yours should be encouraged, someone or some groups must simultaneously raise the issure of what and how we are creating textual data. If we do not embrace a more structured creation approach, we are going to find ourselves buried in unusable content.

Regards,

Barry

March 2, 2009 3:35 PM
 
Blogger XSPRADA_MAN said...

Steelpoint, ZANTAZ, it looks like your guys hail from Autonomy. Is that accurate? I'm also interested in your opinion on how your technology might (or not) dovetail with very large analytical databases for BI purposes.
Thanks.

March 10, 2009 1:23 AM
 
Blogger Steve Akers said...

One of the newer members of our team is from Autonomy-Zantaz. The rest come from Boston area technology companies and most of us worked together at Spring Tide Networks, where we focused on highly available network systems that dealt with per-user security and personalized routing. After Spring Tide, a bunch of the team branched off and worked for a spyware company. When we reunited for this effort, I had developed the mathematical approaches that we use in our similarity engine and the guys from the spyware effort were well-versed in the ways that content can be hidden inside files. Along with our common heritage from Stratus Computer (fault-tolerant computing) and the Spring Tide (highly available and scalable security products) we were able to come up with a very scalable architecture for the platform we market today.
Steve

March 18, 2009 2:57 PM
 
Blogger Steve Akers said...

Barry,

First, thank you for the thoughtful comments. You point out an interesting aspect around content generation: humans do not recognize how the data they create is structured. They also change the focus of their interests every day or every hour so asking them to ascribe to a structure for organizing content is an impossible task. We have tried to build a system that can recognize the similarities within the semantics of content so that it can be organized automatically into logical containers. It makes the vast sea of content more recognizable to users because, as you point out, there is so much content that it can become indistinguishable and meaningless. We felt that the automated approach was more realistic than a human categorization approach.

Steve

March 18, 2009 3:00 PM
 

Post a Comment

<< Back to Digital Reef's Blog

Steve Akers
Founder & CTO
Digital Reef

view bio

A highly successful entrepreneur, he has spent his career designing technology solutions that solve complicated, large scale business problems.

Recommended Links

Receive updates: