Posted by Jeff Kurpaska on Mon, Aug 30, 2010 @ 10:05 AM
FRCP rules set eDiscovery requirementsAs most people in the Enterprise IT and Litigation support world know by now, there have been recent amendments to the Federal Rules of Civil Procedure (FRCP) that electronically stored information (ESI) is a discoverable record type and should be treated as any other type of evidence .
That has broad implications for the discovery process , in that all electronically stored information that could pertain to a case must be found, protected, and turned over to opposing counsel when requested. This FRCP requirement is the foundation of the eDiscovery requirement and the eDiscovery processes that companies and their counsel will need to create and adhere to going forward.
For the electronic portion of discovery it will be IT departments that will bear the brunt of the work as they may be directed to find specific files (including SharePoint records, emails and metadata) containing specific content, having been sent to or from specific employees, sent between specific dates, having certain content types , or residing in locations unknown within the enterprise.
The search could require wading through terabytes of data to find this information. A failure to comply in a timely and thorough manner with opposing counsel’s request, could result in the loss of the case before it ever gets to
trial. The loss also means fines, damages, and opposing counsel’s fees.
SharePoint adds to the eDiscovery Problem in a Big WayAccording to a Global Intranet Trends 2009 report, 55 percent of organizations have SharePoint implementations and the problems SharePoint creates for eDiscovery.
Most companies have no governance model in their build efforts that would institute a framework for managing content of all types.
They have no process for drawing the right lessons gained from deployment and joining those lessons with a long term governance model that brings value and managed frameworks to mountains and terabytes of corporate data.
For their SharePoint data they struggle to gain new business value from SharePoint’s capabilities as an enterprise content platform.
One symptom of this disjointed SharePoint deployment effort is the Document propagation that occurs when the same content is replicated and spread from server to server without concern for information governance practices and document retention requirements.
In essence these siloed and disjointed SharePoint installations probably are not meeting compliance requirements and are consuming needless bandwidth and for storage. These inefficiencies are at the root of why many enterprises stop using SharePoint.
Digital Reef SharePoint eDiscovery SolutionResponding to litigation can be a risky, time-consuming, and costly process. Each step is not only laden with its own
inherent cost, but if not done can correctly can exacerbate the downstream cost and risk. Organizations must
scramble to:
• Identify, preserve, and collect all potentially responsive electronically stored information
• Cull it down to a manageable set for review by counsel
• Produce the responsive data set containing electronically stored information to opposing counsel
SharePoint data is stored and maintained differently than data in other types of storage repositories, and its storage
method creates a myriad of issues relating to identification, collection, and extraction of content. It is not engineered
for content extraction, and it has limited capabilities for managing the context of information. Contextual analysis is
essential to eDiscovery so that the legal team can understand the framework for the content—including who created
it and who modified it.
The Digital Reef Virtual Governance Warehouse is a platform for eDiscovery and information governance. Digital
Reef transforms enterprise-wide content—including data stored in SharePoint infrastructure—into valuable assets by
taking data “in the wild” and enabling legal, corporate, and IT visibility, insight, and control.
Posted by Jeff Kurpaska on Thu, Aug 12, 2010 @ 10:59 AM
Bridging the Gap Between Legal and IT
Legal IT Professionals has posted the “Bridging the Gap Between Legal and IT” written by Steve Akers, founder and CTO of Digital Reef. The online coverage profiles the obstacles organizations face for both IT and the associated processes around ediscovery governance. Additionally it outlines what requirements are bubbling to the surface and how the right IT platform can address these issues.
The Electronic Discovery Reference Model (EDRM) codified the components that the legal department needs and that IT must provide to meet judicial requirements .
Now IT must focus on what it does best: find scalable solutions that meet the needs of a business process at minimum cost.
Posted by Jeff Kurpaska on Tue, Aug 03, 2010 @ 11:27 AM
Thinking Next Generation eDiscovery
ECM Connection has posted the “Thinking Next Generation eDiscovery” written by Steve Akers, founder and CTO of Digital Reef. The online coverage provides a snapshot view of the how to solve the problems arising from lack of access to information critical to eDiscovery assessment i.e. being held by Content Management Systems, SharePoint, Email databases, File servers, Storage Appliances, Social Network or The Cloud. The lack of insight into this information asset actually results in significant business risk and costs.
Posted by Steve Akers on Tue, Jul 20, 2010 @ 07:37 AM
As the sales force at Digital Reef engage more customers we learn a great deal. The number one trend that we have seen in the last six months is that the size of matters being undertaken by electronic discovery providers is large and growing. Given the general education that the customer has obtained (probably due to the maturation of products in the space and the publishing of the EDRM model so that everyone has a common basis of understanding) they are realizing how much ESI they should address for a given matter. We (Digital Reef) can handle vast quantities of content within hours of receiving it so we probably get involved in more discussions around multi-terabyte collections that need processing, analysis and production than legacy vendors.
It is clear that the legacy products and the appliance makers cannot handle the kinds of cases that are coming to the courts in a cost-effective way. The sheer volume of some of these cases makes it nearly infeasible to address them with existing (known) technologies. So scale is extremely important. This is surfacing other issues and trends that might have been tolerable in the past, but that now are intolerable.
Given that cases are getting larger, the other aspect of addressing them that is difficult with existing technologies is that most of them handle a part of the ECA function; making them rely on other tools (than their own) to handle an entire ECA workflow. Having a platform that can process large quantities of content (including images which need Optical Character Recognition), build an index and then cull information, subsequently analyze it for duplicates, near-duplicates, message (conversation) threads, and also produce it without having to load it from one system into another is a key element for success.
Last but not least; another aspect of large scale analysis of electronic information in an e-discovery context is that many different groups with different security privilege levels need to access certain "views" of the information pertinent to a matter. Most products cannot provide this "multi-tenancy" functionality. Our customers tell us that the ability to break out "views" of the data as they relate to technical expert witnesses for review, or other counsel aiding the case, or for opposing counsel to review is extremely valuable. Most products have to set up external security products with manual configuration being required before this can be achieved. It is generally so labor-intensive that most people only attempt this for very large cases that might go on for years.
As cases are tending to get larger, these attributes of an electronic discovery system become more evident. They are not commonly considered. These also tend to illuminate general enterprise requirements for electronic discovery "behind the firewall" where corporations are setting up infrastructure for governance functions.
As we engage more customers we hear that in addition to scale of the solution, a vision for integrated functions that allow full insight into data sources and then an ability to govern them is what they desire. We (Digital Reef) have produced this platfrom for total scale with a "one-flow" platform approach that minimizes the loading and un-loading of content from one tool into another.
We are hearing that this is extremely valuable in this day and age where large scale and ubiquitous functional workflow is required.

Steve
Posted by Jeff Kurpaska on Sun, Jul 18, 2010 @ 01:00 AM
The Trends:
- In 2010 courts within every federal circuit have issued at least one e-discovery opinion.
- The U.S. Supreme Court decided a case which presented potential e-discovery implications
- The New Jersey Supreme Court issued an opinion concerning the attorney-client privilege and an employee's personal use of an employer-issued computer.
Steve Akers (DigitalReef founder) gives a concise and reasoned approach on how and why corporate legal counsel and their IT departments should get ahead of the problem's created by the absence of an information governance strategy.
Originally posted on the AIIM Blog Digital Landfill
1 --Knowing Why You Need It.
Information Governance ultimately means being able to transform un-managed information into valuable business assets.
It provides enterprise readiness to proactively service the legal and compliance policies in today’s business and it requires continuous visibility, trust and control across all of your digital information.
With the combination of new government mandates, increased corporate accountability, and the digital information explosion, it is a necessity to have a holistic view of all information. With the right governance strategy, business will have insight into unstructured content while complementing existing investments in content management, email, archiving and storage management.
2 --Consider the Source.
When devising an information governance strategy, first consider all the different sources of information within your organization.
- Network Attached Storage devices with potentially hundreds of millions of files
- NT file server farms that contain shared repositories of essentially invisible information
- Ubiquitous SharePoint farms that are sprouting like spring flowers.
- Email repositories and perhaps several types of content management systems (Documentum, FileNet, OpenText, etc.).
In order to govern this information, these sources all need to be accessed, their contents analyzed, and the results made visible from a single interface. In most environments, this single view of data across multiple sources or repositories is impossible to achieve and leads to incomplete collection and identification, let alone governance of data.
3 --Data Analysis Must be Virtualized.
The strategy cannot omit critical sources of information or assume that one single archive will be built (thus doubling the aggregate size of the information involved in the strategy). It is important to have an email archive, but it is incorrect to assume that all the sources of relevant information across an enterprise will be re-committed to an archive for purposes of a governance strategy. The governance of information must be undertaken from a system that can understand data where it lives.
4 --Search is Not Enough.
The logical first response by IT people is to specify an enterprise search platform and use that as basis for the corporate governance strategy. The problem with this is that governance transcends the mere identification of information. This approach is fairly inflexible and misses the other aspects of a governance strategy: insight and control or management of content. A governance strategy must include identification of relevant information, insight into the information that was identified (how does the information relate to other content within the enterprise; regardless of location) and control or management of the information. The first is merely search, the second is accomplished with search combined with analytics.
5 --Automation is the Necessary Ingredient.
The above aspect of the problem highlights that governance requires technology to aid the process by automating the identification of similar content. A self-classification capability is the key to making data governance possible by making relevant data visible. Automation has been the missing ingredient that has kept true governance from being possible. A machine-learning platform that can guide the human reviewer to content with similar characteristics is the key to solving the problems that surface when attempting to implement a strategy of governing information. Human beings must make governance decisions but often don’t know where to start; automated learning techniques give them the place where they should start the process.
6 --Scale, Extensibility and Ease of Deployment.
In the modern governance era, solutions will include an ability to extend almost infinitely across larger and larger data sets with little or no provisioning of storage and server capacity being necessary. These solutions will also have a portable indexing capability that can be expanded as the data within the enterprise expands.
To date, there have been appliances that provide some insight into the content being accumulated for a specific purpose or project, but there has not been a scalable governance platform that can aggregate a view of all the data in place that is relevant. In order to allow IT professionals to govern enterprise information, governance architecture must be extensible across commodity hardware, fit into the virtual server environments of modern data centers and deploy easily.
7 --You Need to Go Global.
Having an additional capability to move an entire index and analyze it without having to remove the data from a particular country is a key component of a governance strategy. Emerging technologies will allow an index, not the data itself, to be removed to a remote location where the data can be analyzed with forcing it to be removed from the local country. These kinds of features are part of a total governance strategy that would be ideal in a global information environment. Data is global and therefore, solutions must be global in scope and local in use and analytics.
8 --Don’t Forget Security.
Data governance, particularly for legal matters, is an ongoing process with a definite life-cycle. Over the course of reviewing content pertinent to legal and other regulatory matters, different individuals with different levels of permitted access will be required to view certain documents. Allowing different classes of users to view data with certain characteristics at different times is a key attribute of a governance platform and this must be accounted for in a governance strategy.
Posted by Steve Akers on Thu, Mar 25, 2010 @ 05:29 PM
Today I want to explore the topic of Cloud Storage services. These are of course getting lots of attention due to their flexible and on-demand provisioning model and because they allow the consumer to evade capital expense for storage and minimize the operational expense devoted to storage services.
One has to be careful when deciding on this path however. The total service costs of utilizing third-party storage are not always evident. In general it is easy to consume bulk cloud storage; it is not as easy to have a total "Cloud Storage Service". The ingredient that is missing from cloud storage that would make it into a total service is information governance: the ability to access, identify, analyze and manage data assets. A total service solution giving the consumer of storage services visibility into the amount and types of data stored in the cloud, and secure methods of searching and analyzing the vast quantities of data stored there is not available today.
To demonstrate governance of assets, all management activities undertaken on content must be recorded so that a "chain of custody" can be identified. A total cloud storage service must contain all the components necessary for identifying content objects in storage or other infrastructure, and it must fully document the management activities undertaken on those assets. Recording who accessed a data object, when it was accessed, where it was accessed and that it was moved to a new location are important parts of information governance. Having a capability to identify that content found in one location and moved to another was not altered during the journey is important as well. The ability to determine what was stored in the cloud and search it from an index that lets the consumer access select objects without bulk storage moves is probably the biggest missing ingredient. Solutions for utilizing storage in the cloud do not have these capabilities today and the lack of them will likely inhibit broad general acceptance of the cloud as a storage medium.
Challenges that must be overcome to utilize cloud storage services:
1. Finding documents that should be moved to and stored in the cloud is a difficult and time-consuming task in itself. The sheer scale of many enterprise and government file systems dwarfs the capability of many discovery systems. Just being able to view the system Meta data (to identify documents that have not been modified or accessed frequently) on many large-scale file systems is beyond the practical limit of existing tools. Scale of file systems is not the only aspect of content evaluation that is difficult. Evaluating the content that exists across multiple systems (storage systems or servers) and assessing its value is a difficult and expensive task. Analytic services are necessary for tagging items that are of value and promoting them as candidates for further management activities (de-duplication, encryption, data movement). Managing data after it has been deemed "valuable" is so expensive in many cases that the storage savings available from a cloud deployment cannot be justified.
2. Moving the data to the cloud is a daunting and expensive task. Manually moving content server by server or system by system is not feasible in most cases. In addition to the cost and complexity of the data movement, the load placed on the storage systems and the network of an enterprise or a government entity is nearly an impossible burden to bear during a storage migration. Systems that move and manage data must be able to "throttle" the burden that they place on the storage and network infrastructure. Systems that effect data migration must also be configurable so that the data movement activities can be undertaken at off-hours when they will not impact business operations. Validating that all content destined for cloud storage was successfully received there and having a means of finding it again is beyond the scope of most systems.
3. Identifying content items that contain confidential information that must be secured before they are moved off-site to cloud storage is very difficult. Due to state and federal privacy and security regulations many data items cannot be sent outside the security infrastructure of an enterprise until they are deemed non-confidential.Scalable regular expression capabilities add another dimension to the complexity of intelligent data migration systems. These capabilities are difficult or impossible to find in many data discovery and migration products. Personnel contemplating the use of data migration systems don't often know how to correctly configure systems to identify regular expressions like Social Security Numbers (SSN) or credit card numbers in electronic content. Due to these difficulties many organizations either ignore the risk associated with moving confidential data or they choose not to migrate data to the cloud because they don't want to violate privacy laws. A proper governance solution would include an ability to identify regular expressions in data with a somewhat automated approach so that manual intervention is minimized. Actions such as redacting data that is found to be confidential are also important facets of a total governance solution.
4. Reducing the content duplication by removing duplicate copies of information is another part of a governance solution that is very hard to accomplish at scale. Having this capability inside a storage cloud and governance service would be a tremendous benefit to consumers.
5. As stated above, having a compact index that can be searched to discover items that should be retrieved from a vast "cloud store" would be truly useful. Other analytic services that aid the consumer by allowing them to group and classify information, or find versions of documents that exist within their storage would be highly desirable. These services genuinely don't exist today within cloud storage services.
In conclusion, total information governance will make cloud storage more than a flexible "junk drawer". Next time I will discuss other aspects of cloud services and information governance.
Posted by Larry D'Angelo on Wed, Feb 03, 2010 @ 09:35 AM
This February, Digital Reef attended LegalTech New York, the #1 resource for law firms and legal departments to get hands-on practical information for improving their law practice management. Today, Digital Reef CTO and Founder Steve Akers and VP Marketing David Butler check in with Lisa DiMonte from MyLegal.com from the event floor. If you aren't able to attend LegalTech and visit booth #1519, just watch the video to virtually meet the team and hear a little bit about how we're combining smart access to data and the need for processing large amounts of data specifically for Early Case Assessment.
Posted by Steve Akers on Sun, Sep 20, 2009 @ 11:13 AM
I ran across this excellent article by George Crump, the founder of Storage Switzerland and wanted to share it with you --
Making Data an Asset
In the article George echoes a theme that we at Digital Reef hear again and again when we work with customers and potential customers: enterprises (and other organizations) want to understand and unlock the potential inside their ever-expanding stores of data. They really want a "second-generation" set of discovery tools capable of helping IT reach their goal of making data an asset.
IT understands that the first step in turning data into an asset is some form of discovery. Discovery isn't limited to legal discovery, although the primary reason behind many discovery efforts is to satisfy a sudden need to find everything relevant to a legal proceeding. In general though, the IT professional understands that being able to put users in touch with the data they need saves the organization pain. More importantly, it yields strong benefits for everyone and, in some cases, enhances the revenue potential of the enterprise.
The obvious question is, "Why hasn't this already happened and what is needed to make data an asset?"George points out that companies providing first-generation tools (appliances) have built a business supplying indexing and search products that are appropriate for smaller subsets of data. These companies used an appliance-based approach that had difficulty scaling to handle enterprise-sized data stores. This lack of scalability necessitated moving their offerings toward litigation discovery, where the data was typically divided into smaller pieces manually.
These first-generation tools cannot provide a useful view into enterprise data stores. Further, they require the user to shoulder the administrative burden of managing multiple appliances and handling the results of searches from each of these low-powered devices. In the case of some first-generation solutions, the user is required to load and unload data onto the appliance to derive benefit. What is needed (as George points out) is a single solution that the IT manager can engage with and use to index all of the data across the enterprise.
In addition to being able to handle the volume of data contained within the enterprise from a single system, IT professionals must have a system that preserves unique user views into data while preserving corporate security permissions, such as those carefully imposed via Active Directory and LDAP infrastructure. Any system that supplies access to information must also preserve its security permissions and allow groups of users to collaborate on the data and mark or "tag" it as the needs of their project dictate. This user-view model is not available with current appliance-based review tools or most search products. Furthermore, the system should provide capabilities that automate the discovery and data management processes.
Intelligent mark-up and tagging and user-views of data that has been placed in the "corporate community data store" are important for making data an asset. A second-generation information management solution not only indexes and makes data available, it allows for user-centric views of data and intelligent data mark-up. Policy-based movement of data is another function that second-generation tools provide to the IT professional. This type of function allows the system user to take action on data once it is identified. The system can use classification and analysis algorithms to identify and "group together" data that is similar and that "belong" together. Such second generation systems sound like Nirvana-- in that they remove much manual effort from any task around finding, marking and managing information, or putting data into a form that is meaningful for users (turning it into an asset or "information"). Any IT professional who has used a current generation search tool knows that once the search query is issued, the real work begins. A second-generation device that can mark, move, and manage data is what is required for making data into information (and thus an asset).
The second-generation devices that discover, analyze, and manage data are here now. Digital Reef makes such an extensible software solution that runs on standard server hardware and provides these new capabilities that turn data into information. The first generation appliances have found their place in the limited realm of "load-and-store" limited litigation review applications. The second-generation, with all of its promise, is here now and will unlock the potential within information. It will be exciting in the months and years ahead.
Posted by Steve Akers on Fri, Sep 18, 2009 @ 11:15 AM
In my last post I cited an article by George Crump, the founder of Storage Switzerland. He peers into the future and predicts that the enterprise will want to make
strategic use of its data as well as respond effectively to regulatory requirements.
The question that comes to mind is, "Why don't more enterprise IT managers have capabilities to discover, analyze, and act on their data now?" I would like to address that question in this post.
As George mentioned in his article, the key missing piece is a "next generation" solution for supplying these functions. He mentions scalable processing and indexing capabilities (beyond those in appliance-type solutions) as being necessary to achieving the next set of required functionality, but I will go one step further: the next generation solution has to add analysis and management capabilities to simple indexing and search functionality. The scale of the solution is important, but the combined power of a discovery, analysis, and management platform has not existed until just recently. There are components that can be integrated together with great effort, but until now, there was no single, integrated discovery, analysis, and management
platform to enable the services required by the enterprise to make their vast stores of data useful.
In the Beginning
The search and information access companies have done a good job of providing technology to index and make certain data repositories useful. These are often tied to critical enterprise applications or to a revenue-generating aspect of the business, like an e-commerce website. These solutions are not typically implemented across all data sources within the enterprise, however. In many cases, they are implemented to provide access to only certain repositories of data. Extending their use across all data repositories would become prohibitively expensive (due to system configuration, management, and data preparation costs).
These existing technologies have been built and tuned to generate the largest number of relevant results as quickly as possible -- for the data they are able to serve. Once the data within one of these "searchable repositories" is available, it is useful, but the user of the system is left with the task of collecting and managing the data they represent. These types of systems provide resultant "links" or documents. This is the only function that was ever envisioned as being necessary for the operation of the system, so information management capabilities on top of the search functions these systems supply don't exist. A second-generation solution is one that can provide analysis and management functions on top of the basic function of finding results.
Key Capabilities of Second-Generation Systems
In a second-generation system, ease of use -- such as dynamic collection from repositories like departmental SharePoint servers -- without a major logistical effort on the part of IT is a key and useful attribute. Second-generation systems are capable of connecting to "native" repositories and indexing and classifying them quickly (with minimal setup and configuration). Second-generation systems simply attach to the network and pull in various types of data then present it in a useful context for users. Complex configuration and setup of the system or its data is not required.
Another key difference between first-generation and second-generation information access systems is the addition of useful analysis and management applications on top of the basic functions of indexing and searching. These second-generation systems provide classification functions that reveal relative document similarities and semantic labels common to groups of documents. Self-learning algorithms, for identifying the existing relationships among data items found throughout the enterprise, make the data they find useful. This self-learning nature does not require any effort on the part of the IT staff implementing the solutions. With existing technology, great effort can be required to prepare data for navigation by an information access system. Painful configuration of system options can also be required to get the data into the system. With very little administrative effort, second-generation systems provide users with "contextual views" into data that are not available with the current "index and search" appliances or solutions.
Most importantly, a second-generation solution will allow data to be managed. The items in a large data set can be grouped, tagged (intelligently marked up), and moved between storage locations. The second-generation system can discover data that is "similar" (semantically similar) to retention policy documents and tag, de-duplicate, and move the relevant material to a specific SharePoint library with minimal user intervention. Versions (near-duplicate copies) of documents can be identified and tagged by the system and moved as a related collection of material.
The same type of system can identify data that belongs within a certain policy criteria and move it to a storage system with retention enforcement capabilities (the system can identify move and "lock down" data that needs to be secured as "read only" for a certain period of time). Intelligent movement of data with policy lockdown capabilities is a large benefit that saves manual effort and improves accuracy of records management initiatives.
I want to avoid making this post too much longer, but I wanted to point out some of the differences between existing information access technologies and the next-generation solutions like the one produced by Digital Reef. I feel that it becomes obvious why the old and the new approaches are very different and how the Digital Reef solution is superior.
Posted by Tony Asaro on Tue, Aug 04, 2009 @ 02:08 PM
Steve discusses a number of things including the challenges faced by Enterprises, how Digital Reef solves these problems and some customer use cases. There were also two slides that I thought was a great overview of the Digital Reef solution:
Discover
- Automatically identify and index all unstructured data
- Provide tools to find and understand the data:
- Boolean searches (freeform, fuzzy, metadata, phrase, proximity)
- Similarity searches using example files
- Email thread reconstruction
- Exact and near duplicate identification
- Pattern expression recognition
- Organize the data using automatic classification
Manage
- Transform files into common file types
- Collect and move data
- Manage data retention policies
Designed for Scale and Security
- Grid-based, distributed architecture provides performance and resiliency
- Multi-tenant, role-based security model
- Easily deployed and maintained
- Indexes and prepares the full content and metadata of up to 10TBs of data in 24 hours with a standard configuration