eDiscovery and Litigation Support

Current Articles | RSS Feed RSS Feed

Why we need a Cloud Data Services “Cable Box”

  
  
  

What we need to get wider adoption of cloud data services (either computing or storage services) is a “cloud cable box” that is “data aware”.  We all have a “cable box” that deals with the network that supplies television, data services, etc. to our homes. My thought is that we need this same kind of simple “de-mark” (as the network guys would say) to enable true “Cloud Services”.

With Cable, phone or other “telco” services the “De-mark” is the point in the customer environment where the services are provisioned and controlled for the user environment and where the public network begins. For cloud data services, this de-mark is unspecified and missing. Customers have to supply the de-mark services themselves. This is a huge problem for adoption of cloud services. I explain myself below.

Figure One – Digital De-Mark Concept

 Digital De-Marking

As many of you have read in my blog posts previously, I am a big believer in the cloud or a utility model for computing capacity and for storage management services. The allure of such technologies includes:

  1. Ease of provisioning of resources
  2. Rapid access to infrastructure with a minimum of capital expenditure
  3. Instantly expandable services (computing or storage)
  4. Rapid “removal” of services that become obsolete (you are not “stuck” with them if they are no longer useful)

The impediments to enjoying these benefits continue to be:

  1. Lack of privacy for data exposed to public infrastructure in the cloud
  2. Lack of discernible/defensible data security for objects stored in cloud infrastructure
  3. Lack of complete robust and reliable electronic discovery of items that have been stored in cloud infrastructure and that may be subsequently relevant to an electronic discovery action of some kind (legal discovery, compliance or regulatory audit and review)
  4. The lack of a client-side service component (the cable box) that is available behind the customer firewall to help mitigate these issues and supply other functionality that most people don’t think about needing from the service. This missing component basically would help the customer understand what they have in the first place, what they want to put into the cloud, help them move it there and then help “remind” them of what they put into a particular cloud service. Adding other services like classifying information and encrypting items that require that kind of processing would naturally fit into the service de-mark component.

Fear Uncertainty and Doubt

The reason great uncertainty exists around the use of cloud services is that they are implemented on the “public side” of the interface (Amazon’ storage API is amazing) but they are not specified as a “total solution” on the customer side of the cloud interface. I explain what I mean here:

  1. To consume cloud services one must understand the data that is being placed into the cloud before it is used in a computing environment there or stored in the cloud. Most potential consumers of cloud services don’t know what is in the data they are putting into a cloud service so they don’t know how much to worry about privacy or security of the data they are using in the utility model.
  2. Many consumers don’t have a reasonable way to measure “how much” of a cloud service they are actually consuming. By definition the cloud service is meant to be elastic and easy to use. As a result the user of cloud services “loses track” of what they have put into the cloud and cannot value it or measure the risk around any items residing there.
  3. Once cloud services become commonplace and there is little knowledge of how much data or how many resources are actually in use in the cloud, it is hard to do a compliance audit, regulatory affairs audit or electronic discovery of the data. One particular Fortune 500 company audited “external services” being used by various departments and found that they were using a number of hosted storage services. When they analyzed how much data was in the total, they realized that they had Terabytes of data in cloud storage. They realized that pulling most of it down (back inside the corporate firewall) to index and analyze for an electronic discovery action related to a lawsuit would require hundreds of thousands if not millions of dollars in expenditures. This was clearly something that they had not counted on or planned to address.

The bottom line is that there is no common “network interface” that helps identify, move and manage content that is destined for the cloud. If there were a “data aware cable box” (like there exists for home television and data services) with certain prescribed functionality it would help users of cloud services understand the data they are committing to the cloud and how much of it they are managing there. If there was such a “cable box” for cloud data, what functions would it support?

The Cloud Aware Cable Data Box

 

The Data aware Cloud Data Box would have certain services that it performs for the users/consumers of cloud services. I confine my comments to the data functions that are necessary for the cloud and not the “on-ramp” services like those which help install virtual machines in cloud computing environments. The “on-ramp” product that would assist in deploying an application into an elastic computing environment (like Amazon EC2) is a specific software appliance for getting applications deployed. The Digital Data De-mark is a different type of appliance. The primary function of the digital de-mark is to identify and aid the management of data into either a compute or a storage service environment.

I am talking about a “data cable box” that will safely identify data that should (or should not) go into the cloud and then perform certain tasks to help the user/consumer analyze, move and manage the data in the cloud environment. The data aware cable box should exist “in front of” any device enabling applications in the cloud to screen and analyze the data that will be loaded along with them into the cloud service. Another obvious use of the data aware cable box is that it should find local (enterprise) data that should be in the cloud and then help it get into cloud storage or other services in a safe secure manner.

Key Functions of the Data Service “Cable Box”

The functions that are helpful when one is deploying a cloud data service include:

  1. Identifying data that are good candidates for movement to cloud storage services (data that is “important” but that is not accessed frequently would fit this category. This includes a classification step (identifying that a certain message or document is a business record) and then a policy-based rule that further identifies that each record is not “frequently” accessed.
  2. Remembering that the business record exists and its classification so that the policies that are applied to a given record are appropriate for the record’s contents.
  3. Managing the data according to a set of policy rules. For example, the cable box policy rules could state that each business record that is confidential (it contains Social Security numbers or credit card information) must be left inside the corporate firewall or encrypted before they can be sent to a cloud service. Other non-confidential business records must be de-duplicated and compressed before they are “shipped off” to cloud storage (for example). These policy actions must be based on the deep content knowledge that can only come from analysis of the content within documents.
  4. Deep content inspection must include the ability to access all forms of content. Many textual documents exist within the corporate environment as images (TIFF/JPEG) and they must be processed with Optical Character Recognition (OCR) software before they are “readable” and “indexable” the ultimate cable data box would just handle this in-line and without human intervention.
  5. The index and other structures within the cable data box must contain a “chain of custody” history that explains where all of the documents were found, what duplicates were found and why they were duplicate, and then where they were moved into the cloud service. The index/virtual repository must retain a “handle” that allows the data objects to be retrieved from the cloud service subsequently.
  6. As stated for computing services, the data entering a compute environment must be screened first to indicate that it is “safe” for transport into the cloud environment. If data is to be shipped to a cloud computing environment for application development, the cable box should alert users who might inadvertently be sending sensitive data to the cloud where it may be exposed to outsiders.
  7. For electronic discovery purposes, the cable data box should retain the index and other analytic structures that describe the data that is being sent to the cloud for either computing or storage service reasons. If electronic discovery or legal discovery is ever necessary, this allows only relevant items to be returned to the user from the cloud service. If this index were not retained (within the corporate firewall) then all data from the cloud would have to be returned to the enterprise over a network (at great expense) so that it could be indexed searched and retained (or not). Since most data is not relevant to a given matter at any one time it would be very expensive to perform this kind of data transfer to handle legal discovery of large data sets.

Figure two (below) illustrates what I mean by an appliance that aids the movement of data into a cloud service. A big problem with the data services for storage as the currently exist is in knowing what data is not accessed regularly and what would be a good candidate to be moved into such a service. A second function of this product is acting as a “pre-flight check” on any data that is identified for use by an application within the cloud (for computing environments).

 

Figure Two – Intelligent File Management Services (Cable Box)

 Intelligent File Management Services

 

So for many good reasons (more could be cited) the “cable data box” with all kinds of intelligent data awareness capabilities would be a good idea. Let’s hope that they start to exist with the requisite features soon. Customers currently have to perform many of the “pre-flight checks” and data analysis functions manually and then provide metering and accounting of data through the service console or other systems. There is no way to identify what amount of a given service is in use or has been used at any one time.

As can be seen above, the desired system for the de-mark functionality is like that supplied by the Digital Reef system (shown above). Digital Reef can identify, analyze and provide useful functions like de-duplication, version identification (near-duplicate analysis) and email analysis before moving it to the cloud. This “cable box” makes the operator not have to know or care about the details behind these very complicated yet valuable data management tasks.

The future is here; the digital de-mark exists. Let’s use the cloud.

 

 

 

Questions?

Ask the Experts at Digital Reef