Cloud Services and Information Governance
Posted by Steve Akers on Thu, Mar 25, 2010 @ 05:29 PM
Today I want to explore the topic of Cloud Storage services. These are of course getting lots of attention due to their flexible and on-demand provisioning model and because they allow the consumer to evade capital expense for storage and minimize the operational expense devoted to storage services.
One has to be careful when deciding on this path however. The total service costs of utilizing third-party storage are not always evident. In general it is easy to consume bulk cloud storage; it is not as easy to have a total "Cloud Storage Service". The ingredient that is missing from cloud storage that would make it into a total service is information governance: the ability to access, identify, analyze and manage data assets. A total service solution giving the consumer of storage services visibility into the amount and types of data stored in the cloud, and secure methods of searching and analyzing the vast quantities of data stored there is not available today.
To demonstrate governance of assets, all management activities undertaken on content must be recorded so that a "chain of custody" can be identified. A total cloud storage service must contain all the components necessary for identifying content objects in storage or other infrastructure, and it must fully document the management activities undertaken on those assets. Recording who accessed a data object, when it was accessed, where it was accessed and that it was moved to a new location are important parts of information governance. Having a capability to identify that content found in one location and moved to another was not altered during the journey is important as well. The ability to determine what was stored in the cloud and search it from an index that lets the consumer access select objects without bulk storage moves is probably the biggest missing ingredient. Solutions for utilizing storage in the cloud do not have these capabilities today and the lack of them will likely inhibit broad general acceptance of the cloud as a storage medium.
Challenges that must be overcome to utilize cloud storage services:
1. Finding documents that should be moved to and stored in the cloud is a difficult and time-consuming task in itself. The sheer scale of many enterprise and government file systems dwarfs the capability of many discovery systems. Just being able to view the system Meta data (to identify documents that have not been modified or accessed frequently) on many large-scale file systems is beyond the practical limit of existing tools. Scale of file systems is not the only aspect of content evaluation that is difficult. Evaluating the content that exists across multiple systems (storage systems or servers) and assessing its value is a difficult and expensive task. Analytic services are necessary for tagging items that are of value and promoting them as candidates for further management activities (de-duplication, encryption, data movement). Managing data after it has been deemed "valuable" is so expensive in many cases that the storage savings available from a cloud deployment cannot be justified.
2. Moving the data to the cloud is a daunting and expensive task. Manually moving content server by server or system by system is not feasible in most cases. In addition to the cost and complexity of the data movement, the load placed on the storage systems and the network of an enterprise or a government entity is nearly an impossible burden to bear during a storage migration. Systems that move and manage data must be able to "throttle" the burden that they place on the storage and network infrastructure. Systems that effect data migration must also be configurable so that the data movement activities can be undertaken at off-hours when they will not impact business operations. Validating that all content destined for cloud storage was successfully received there and having a means of finding it again is beyond the scope of most systems.
3. Identifying content items that contain confidential information that must be secured before they are moved off-site to cloud storage is very difficult. Due to state and federal privacy and security regulations many data items cannot be sent outside the security infrastructure of an enterprise until they are deemed non-confidential.Scalable regular expression capabilities add another dimension to the complexity of intelligent data migration systems. These capabilities are difficult or impossible to find in many data discovery and migration products. Personnel contemplating the use of data migration systems don't often know how to correctly configure systems to identify regular expressions like Social Security Numbers (SSN) or credit card numbers in electronic content. Due to these difficulties many organizations either ignore the risk associated with moving confidential data or they choose not to migrate data to the cloud because they don't want to violate privacy laws. A proper governance solution would include an ability to identify regular expressions in data with a somewhat automated approach so that manual intervention is minimized. Actions such as redacting data that is found to be confidential are also important facets of a total governance solution.
4. Reducing the content duplication by removing duplicate copies of information is another part of a governance solution that is very hard to accomplish at scale. Having this capability inside a storage cloud and governance service would be a tremendous benefit to consumers.
5. As stated above, having a compact index that can be searched to discover items that should be retrieved from a vast "cloud store" would be truly useful. Other analytic services that aid the consumer by allowing them to group and classify information, or find versions of documents that exist within their storage would be highly desirable. These services genuinely don't exist today within cloud storage services.
In conclusion, total information governance will make cloud storage more than a flexible "junk drawer". Next time I will discuss other aspects of cloud services and information governance.