Next Generation Information Management Sytems
Posted by Steve Akers on Fri, Sep 18, 2009 @ 11:15 AM
In my last post I cited an article by George Crump, the founder of Storage Switzerland. He peers into the future and predicts that the enterprise will want to make strategic use of its data as well as respond effectively to regulatory requirements.
The question that comes to mind is, "Why don't more enterprise IT managers have capabilities to discover, analyze, and act on their data now?" I would like to address that question in this post.
As George mentioned in his article, the key missing piece is a "next generation" solution for supplying these functions. He mentions scalable processing and indexing capabilities (beyond those in appliance-type solutions) as being necessary to achieving the next set of required functionality, but I will go one step further: the next generation solution has to add analysis and management capabilities to simple indexing and search functionality. The scale of the solution is important, but the combined power of a discovery, analysis, and management platform has not existed until just recently. There are components that can be integrated together with great effort, but until now, there was no single, integrated discovery, analysis, and management platform to enable the services required by the enterprise to make their vast stores of data useful.
In the Beginning
The search and information access companies have done a good job of providing technology to index and make certain data repositories useful. These are often tied to critical enterprise applications or to a revenue-generating aspect of the business, like an e-commerce website. These solutions are not typically implemented across all data sources within the enterprise, however. In many cases, they are implemented to provide access to only certain repositories of data. Extending their use across all data repositories would become prohibitively expensive (due to system configuration, management, and data preparation costs).
These existing technologies have been built and tuned to generate the largest number of relevant results as quickly as possible -- for the data they are able to serve. Once the data within one of these "searchable repositories" is available, it is useful, but the user of the system is left with the task of collecting and managing the data they represent. These types of systems provide resultant "links" or documents. This is the only function that was ever envisioned as being necessary for the operation of the system, so information management capabilities on top of the search functions these systems supply don't exist. A second-generation solution is one that can provide analysis and management functions on top of the basic function of finding results.
Key Capabilities of Second-Generation Systems
In a second-generation system, ease of use -- such as dynamic collection from repositories like departmental SharePoint servers -- without a major logistical effort on the part of IT is a key and useful attribute. Second-generation systems are capable of connecting to "native" repositories and indexing and classifying them quickly (with minimal setup and configuration). Second-generation systems simply attach to the network and pull in various types of data then present it in a useful context for users. Complex configuration and setup of the system or its data is not required.
Another key difference between first-generation and second-generation information access systems is the addition of useful analysis and management applications on top of the basic functions of indexing and searching. These second-generation systems provide classification functions that reveal relative document similarities and semantic labels common to groups of documents. Self-learning algorithms, for identifying the existing relationships among data items found throughout the enterprise, make the data they find useful. This self-learning nature does not require any effort on the part of the IT staff implementing the solutions. With existing technology, great effort can be required to prepare data for navigation by an information access system. Painful configuration of system options can also be required to get the data into the system. With very little administrative effort, second-generation systems provide users with "contextual views" into data that are not available with the current "index and search" appliances or solutions.
Most importantly, a second-generation solution will allow data to be managed. The items in a large data set can be grouped, tagged (intelligently marked up), and moved between storage locations. The second-generation system can discover data that is "similar" (semantically similar) to retention policy documents and tag, de-duplicate, and move the relevant material to a specific SharePoint library with minimal user intervention. Versions (near-duplicate copies) of documents can be identified and tagged by the system and moved as a related collection of material.
The same type of system can identify data that belongs within a certain policy criteria and move it to a storage system with retention enforcement capabilities (the system can identify move and "lock down" data that needs to be secured as "read only" for a certain period of time). Intelligent movement of data with policy lockdown capabilities is a large benefit that saves manual effort and improves accuracy of records management initiatives.
I want to avoid making this post too much longer, but I wanted to point out some of the differences between existing information access technologies and the next-generation solutions like the one produced by Digital Reef. I feel that it becomes obvious why the old and the new approaches are very different and how the Digital Reef solution is superior.