Subscribe via E-mail

Your email:

Digital Information Governance

Current Articles | RSS Feed RSS Feed

Devising an Information Governance Strategy

 

The Trends:

  • In 2010 courts within every federal circuit have issued at least one e-discovery opinion.
  • The U.S. Supreme Court decided a case which presented potential e-discovery implications
  • The New Jersey Supreme Court issued an opinion concerning the attorney-client privilege and an employee's personal use of an employer-issued computer.

Steve Akers (DigitalReef founder) gives a concise and reasoned approach on how and why corporate legal counsel and their IT departments should get ahead of the problem's created by the absence of an information governance strategy.

Originally posted on the AIIM Blog Digital Landfill

1 --Knowing Why You Need It.

Information Governance ultimately means being able to transform un-managed information into valuable business assets.

It provides enterprise readiness to proactively service the legal and compliance policies in today’s business and it requires continuous visibility, trust and control across all of your digital information.

With the combination of new government mandates, increased corporate accountability, and the digital information explosion, it is a necessity to have a holistic view of all information. With the right governance strategy, business will have insight into unstructured content while complementing existing investments in content management, email, archiving and storage management.

2 --Consider the Source.

When devising an information governance strategy, first consider all the different sources of information within your organization.

  • Network Attached Storage devices with potentially hundreds of millions of files
  • NT file server farms that contain shared repositories of essentially invisible information
  • Ubiquitous SharePoint farms that are sprouting like spring flowers.
  • Email repositories and perhaps several types of content management systems (Documentum, FileNet, OpenText, etc.).

In order to govern this information, these sources all need to be accessed, their contents analyzed, and the results made visible from a single interface. In most environments, this single view of data across multiple sources or repositories is impossible to achieve and leads to incomplete collection and identification, let alone governance of data.

3 --Data Analysis Must be Virtualized.

The strategy cannot omit critical sources of information or assume that one single archive will be built (thus doubling the aggregate size of the information involved in the strategy). It is important to have an email archive, but it is incorrect to assume that all the sources of relevant information across an enterprise will be re-committed to an archive for purposes of a governance strategy. The governance of information must be undertaken from a system that can understand data where it lives.

4 --Search is Not Enough.

The logical first response by IT people is to specify an enterprise search platform and use that as basis for the corporate governance strategy. The problem with this is that governance transcends the mere identification of information. This approach is fairly inflexible and misses the other aspects of a governance strategy: insight and control or management of content. A governance strategy must include identification of relevant information, insight into the information that was identified (how does the information relate to other content within the enterprise; regardless of location) and control or management of the information. The first is merely search, the second is accomplished with search combined with analytics.

5 --Automation is the Necessary Ingredient.

The above aspect of the problem highlights that governance requires technology to aid the process by automating the identification of similar content. A self-classification capability is the key to making data governance possible by making relevant data visible. Automation has been the missing ingredient that has kept true governance from being possible. A machine-learning platform that can guide the human reviewer to content with similar characteristics is the key to solving the problems that surface when attempting to implement a strategy of governing information. Human beings must make governance decisions but often don’t know where to start; automated learning techniques give them the place where they should start the process.

6 --Scale, Extensibility and Ease of Deployment.

In the modern governance era, solutions will include an ability to extend almost infinitely across larger and larger data sets with little or no provisioning of storage and server capacity being necessary. These solutions will also have a portable indexing capability that can be expanded as the data within the enterprise expands.

To date, there have been appliances that provide some insight into the content being accumulated for a specific purpose or project, but there has not been a scalable governance platform that can aggregate a view of all the data in place that is relevant. In order to allow IT professionals to govern enterprise information, governance architecture must be extensible across commodity hardware, fit into the virtual server environments of modern data centers and deploy easily.

7 --You Need to Go Global.

Having an additional capability to move an entire index and analyze it without having to remove the data from a particular country is a key component of a governance strategy. Emerging technologies will allow an index, not the data itself, to be removed to a remote location where the data can be analyzed with forcing it to be removed from the local country. These kinds of features are part of a total governance strategy that would be ideal in a global information environment. Data is global and therefore, solutions must be global in scope and local in use and analytics.

8 --Don’t Forget Security.

Data governance, particularly for legal matters, is an ongoing process with a definite life-cycle. Over the course of reviewing content pertinent to legal and other regulatory matters, different individuals with different levels of permitted access will be required to view certain documents. Allowing different classes of users to view data with certain characteristics at different times is a key attribute of a governance platform and this must be accounted for in a governance strategy.





Making Data an Asset

 
I ran across this excellent article by George Crump, the founder of Storage Switzerland and wanted to share it with you -- Making Data an Asset 

In the article George echoes a theme that we at Digital Reef hear again and again when we work with customers and potential customers: enterprises (and other organizations) want to understand and unlock the potential inside their ever-expanding stores of data. They really want a "second-generation" set of discovery tools capable of helping IT reach their goal of making data an asset.

IT understands that the first step in turning data into an asset is some form of discovery. Discovery isn't limited to legal discovery, although the primary reason behind many discovery efforts is to satisfy a sudden need to find everything relevant to a legal proceeding. In general though, the IT professional understands that being able to put users in touch with the data they need saves the organization pain. More importantly, it yields strong benefits for everyone and, in some cases, enhances the revenue potential of the enterprise.

The obvious question is, "Why hasn't this already happened and what is needed to make data an asset?"George points out that companies providing first-generation tools (appliances) have built a business supplying indexing and search products that are appropriate for smaller subsets of data. These companies used an appliance-based approach that had difficulty scaling to handle enterprise-sized data stores. This lack of scalability necessitated moving their offerings toward litigation discovery, where the data was typically divided into smaller pieces manually.

These first-generation tools cannot provide a useful view into enterprise data stores. Further, they require the user to shoulder the administrative burden of managing multiple appliances and handling the results of searches from each of these low-powered devices. In the case of some first-generation solutions, the user is required to load and unload data onto the appliance to derive benefit. What is needed (as George points out) is a single solution that the IT manager can engage with and use to index all of the data across the enterprise.

In addition to being able to handle the volume of data contained within the enterprise from a single system, IT professionals must have a system that preserves unique user views into data while preserving corporate security permissions, such as those carefully imposed via Active Directory and LDAP infrastructure. Any system that supplies access to information must also preserve its security permissions and allow groups of users to collaborate on the data and mark or "tag" it as the needs of their project dictate. This user-view model is not available with current appliance-based review tools or most search products. Furthermore, the system should provide capabilities that automate the discovery and data management processes.

Intelligent mark-up and tagging and user-views of data that has been placed in the "corporate community data store" are important for making data an asset. A second-generation information management solution not only indexes and makes data available, it allows for user-centric views of data and intelligent data mark-up. Policy-based movement of data is another function that second-generation tools provide to the IT professional. This type of function allows the system user to take action on data once it is identified. The system can use classification and analysis algorithms to identify and "group together" data that is similar and that "belong" together. Such second generation systems sound like Nirvana-- in that they remove much manual effort from any task around finding, marking and managing information, or putting data into a form that is meaningful for users (turning it into an asset or "information"). Any IT professional who has used a current generation search tool knows that once the search query is issued, the real work begins. A second-generation device that can mark, move, and manage data is what is required for making data into information (and thus an asset).

The second-generation devices that discover, analyze, and manage data are here now. Digital Reef makes such an extensible software solution that runs on standard server hardware and provides these new capabilities that turn data into information. The first generation appliances have found their place in the limited realm of "load-and-store" limited litigation review applications. The second-generation, with all of its promise, is here now and will unlock the potential within information. It will be exciting in the months and years ahead.

Personal and Corporate Responsibility for Searching, Preserving and Producing Information

 
In March of this year, a court noted that a corporation’s failure to adopt appropriation information polices can results in potentially costly legal sanctions. While sanctions themselves may or may not be substantial, the legal fees leading up to the sanctions will likely to be weighty. See, Phillip M. Adams & Assoc. L.L.C. v. Dell, Inc., 2009 U.S. Dist. LEXIS 26964 (N.D. Utah. Mar. 27, 2009). This decision and other recent holdings serve notice that it is in technologist’s best interest to bring potentially sub-standard retention policies or irresponsible data retention practices that may result in loss of data to the attention of their legal and business archiving/eDiscovery counterparts. The courts, by holding corporations responsible, are certainly acting within the dictates of logic. A corporation deploying a solution that seamlessly allows for additional search, preservation, or production burdens without imposing additional burdens individual employees may be in a stronger position to assert that they fall within the ambit of safe harbor.

Technologists who knowingly withhold such information from their legal and business counterparts, place their employers and their employment at risk. While many grey areas exist as to what constitutes a failure of policies/practices to synchronize with systems, there seems to be clarity on one thing: when policies and practices are in-place, but the systems fail to retain data, a potentially sizable legal problem may arise for the entity.

Technologists are not policy or legal experts, but it is arguably within their domain of expertise to inform the legal and business creators of these policies about the technological feasibility. Moreover, it is evident that a company’s position around discovery is a great deal stronger when a particular employee is responsible for the execution of the preservation, search, and production of information. However, the reality is that placing additional burdens on already overworked employees is a fiction and the information is not likely to be preserved. In addition, companies that elect to place the burden for implementing data retention or preservation orders on their employees–effectively placing the operational execution of preservation, search and production at the mercy of an individual employee’s practice–are making a potentially bad decision.

The extent of personal liability for an individual responsible for ensuring that the corporation policies, practice, and systems operate to some standard is still yet to be established. Irrespective of the legal finding, it can potentially impact your attractiveness to an employer.

Shaken Proportionality & Enterprise Accountability, Not Stirred, Might Be the eDiscovery Martini

 
Trial lawyers of America, via the American College of Trial Lawyers, recognized that the trial system of America is too expensive and that resolving matters takes too much time.  Trial lawyers also recognize that expense exceeds the actual value in all but the most important matters.  Several organizations, including the ACTL and the Sedona, propagate to reformulate the litigator mindset from combative to cooperative.

The great trial litigators need to resist the temptation to “out cost” their opponents. However, the courts deliver the message via the application of the construct of “techno-legal proportionality”.  The axiom rests on the precept that the value of the information sought exceeds the cost of extraction, respective to the issue in dispute, accounting for the societal benefit catch all.  The success of this requires that attorneys and Judges be more informed about the technological side, so that informed common sense can be applied.

The electronic discovery problem is exacerbated by the high costs of identifying, collecting, preserving and reviewing information.  The questions are, “Why do organizations today have so much information?” and “Why shouldn’t an organization that preserves less information be rewarded?” Companies might be generating more information than ever before, but should the cost abdicate them from responsible and effective enterprise information management principles? It seems that the problem of electronic discovery might actually be a symptom of a larger problem of digital information responsibility.

Shouldn’t companies that have high discovery costs be forced to ante up? If they did, economics would dictate the evolution of new technologies and the adoption of a smarter information management infrastructure.    No lawyer enjoys mindless document review.  The money earned from review is nothing to sneeze at, but increased job satisfaction on the part of lawyers and substantially lower legal bills for clients might be compelling enough to drive lawyers to forgo the additional revenue.

Finally, it seems to me that companies adopting technologies that allow them to effectively manage their information so that there is less retained and that which is retained can be seamlessly collected, preserved, and reviewed would save millions and millions in legal and technology fees. Of course, selecting the right solution and synchronizing the solution with a company’s policies is critical, but it is certainly feasible that most lawsuit-prone organizations would show a fast ROI by selecting the right technology tools.

Solving the Enterprise Search Dilemma

 

Digital Reef recently came out of stealth mode and is now talking to press openly about their solution. I spoke to a few trade press editors about Digital Reef and they wanted to know what made Digital Reef uniquely valuable in a market that seems to have a wide range of solutions for customers to choose from. It is not enough that a vendor is valuable or unique. If competitors offer the same value then the solution may have no real market traction. If a solution is unique but that singular capability offers no real value then customers will not pay for it.

In the world of high-tech there is often confusion because we often use the same terms to mean different things and different terms to mean the same thing. Therefore XYZ vendor may say they provide Enterprise-class search and indexing and are able to scale and provide rapid access to content for users. Therefore when Digital Reef states that they are “Enterprise-class” – it is important to distinguish and articulate what makes them uniquely valuable.

The ability to provide Enterprise-class search and indexing requires two very different core competencies. The first requirement is to build a platform – IT infrastructure – to address the needs of the Enterprise. These include massive amounts of content that is stored on heterogeneous storage that is most likely geographically dispersed. How do you index all of the existing content – which consists of hundreds of terabytes and perhaps even petabytes – while new data is created continuously? How long will it take the solution to catch up? Days? Weeks? Months? Years? Ever?

Digital Reef has built a scalable system that works like a grid or cluster – enabling you to add more compute resources to tackle this huge challenge. In other words, they have developed and provide sophisticated infrastructure – applying grow-able grid technology leveraging massive amounts of compute power in a unified fashion to index mountains of content.

The other core competency is to quickly access relevant content. Digital Reef provides this through keyword search and their unique similarity engine – I discussed this in greater depth in my last blog – The Power of Similarity. Their search capability enables you to get results based on context. Consider the sentence – “I’m feeling blue” – which has nothing to do with the actual color but a pure keyword search would be swimming with content that included a myriad of references to the color blue including paints, fabrics, the sky, the ocean, etc.

Digital Reef excels when looking for abstract concepts, metaphors, idioms, specifics, vertical terminology, and word associations. And the magic of all of this is mathematics – complex, reasoned, considered and sophisticated algorithms.

It is the combination of their scalable clustered architecture and similarity engine that makes Digital Reef uniquely valuable.

Digital Patternicity

 
Patternicity – defined by Michael Shermer – a writer for Scientific American – is the tendency to find meaningful patterns in meaningless noise.  When I read this article on Pattnernicity I immediately related it to the challenges we face with information access. 

Patternicity deals with false positives and we have a compartive with search tools – too many responses that may or may not be what we are looking for.  Human Patternicity is meant to err on the side of caution because as Shermer points out – “the cost of believing that the rustle in the grass is a dangerous predator when it is just the wind is relatively low compared with the opposite. Thus, there would have been a beneficial selection for believing that most patterns are real.”

Digital Patternicity is also meant to err on the side of caution because the cost of believing that the keyword matches your intentions is relatively low compared with returning a false negative.  Therefore returning a false positive is better than returning a false negative. 

The problem in both Human and Digital Patternicity is that the algorithms are limited and have stopped evolving because they don’t need to improve.   Human beings are very successful and don’t require more sophisticated methods for returning fewer false positives.  Likewise, search companies like Google are very successful and have built a huge business in spite of the number of false positives they return. 

However, increasingly within the world of business – where information equates to revenue, competitive advantage and market growth – there is a big price to pay with false positives and a shift in the evolution of Digital Patternicity must occur.  There will always be a place for acceptable false positives in the mass market – but when you get to specialization, when the stakes become too high, when survival is at risk – then evolution aggressively adapts.

Data Growth and Data Girth

 
Alright we all know that we have a ton of data and its growing and growing.  And maybe you are sick of hearing about it.  But you should really listen.  I liken the growth of data in business to the growth of the human body taking on too much weight.  The result is that we may be able to function for a long time but eventually there will be serious ramifications if we don’t do the right things to become healthy. 

There is an interesting IDC report that was published in 2007 – a bit old but has some compelling information and insight.  Let’s break down some of it:

  •  In 2006, the amount of digital information created, captured, and replicated 161 exabytes or 161 billion gigabytes. This is about 3 million times the information in all the books ever written. Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes. 
My observation:  This numbers illustrate the sheer volume of digital data that has being created and further – tells you that we ain’t seen nothing yet. 

  •  IDC predicts that by 2010, while nearly 70% of the digital universe will be created by individuals, organizations (businesses of all sizes, agencies, governments, associations, etc.) will be responsible for the security, privacy, reliability, and compliance of at least 85% of that same digital universe.
My observation:  The importance of this is that organizations will have to manage data created by their customers and employees – which will have a real business impact.  And IDC left a few things out – accessing the data and protecting it. 
  • The cost of not responding to the avalanche of information can add up, yet not be immediately visible to CEOs and CFOs.
My observation:  This goes back to my unhealthy body analogy – you may not know what vital organ or system is going to collapse - it may be more than one – and you won’t know until something bad happens. 

  • In surveys of U.S. companies, we have found that information workers spend 14.5 hours per week reading and answering email, 13.3 hours creating documents, 9.6 hours searching for information, and 9.5 hours analyzing information.
  • We estimate that an organization employing 1,000 knowledge workers loses $5.7 million annually just in time wasted having to reformat information as they move among applications.  Not finding information costs that same organization an additional $5.3 million a year.
IDC is saying that poor data management can cost you $11 million annually just based on your users wasting time.  That doesn’t take into account other costs – such as outside audits, e-Discovery, litigation, etc.  You’ve just been told you have a severe case of diabetes and need to do something about it.

We need greater levels of integration between applications, storage systems and data management tools to turn data into information and then to get us the right information when we need it.  Okay?  Go make it happen. :) 

Certainly this is easier said than done.  But the ecosystem – customers and the various vendors – must all move towards this objective.  We already have better tools to accomplish these tasks but we have a long way to go before reaching information utopia.  The first step is to recognize that there is an issue – a problem – and make it a priority to research and begin to address the unhealthiness and the short and long term ramifications.
All Posts