Mr. Wes Schooley, US TRANSCOM J6, asks: "Can industry share some examples, including high level architecture details, regarding successful implementation of cloud computing-based 'Data-as-a-Service'?"
Notes for submitters: If referencing a Government example, please do not mention the organization or system name on the blog submission. We'd be happy to facilitate follow-up conversations between Government organizations and the submitter for more specifics.
Alternate question: Please share considerations and/or architectural approaches Federal IT leaders should examine for providing data services with a cloud based model.
Director of Platform Research
Data services are evolving from a commodity, priced on simple measures of data volume and transfer rate, toward a more differentiated market with a lengthening list of measures of quality – including accuracy, timeliness, integration/conversion flexibility and disaster preparedness. Data services are thus an excellent example of the need to think of the cloud as something more than a relocation of familiar infrastructure. The value is in the services that are added, at least as much as in the core costs that are (one hopes) substantially reduced.
One example of this transition is incorporation of the Jigsaw "crowdsourced" database into salesforce.com cloud applications such as sales force automation. When a salesperson touches a contact or company record, real-time triggers can highlight possible conflicts between that record and the latest available information. It's clear that similar mechanisms of real-time update and discrepancy detection could have great value in many Federal task domains, and that these should become part of the basis for data service selection.
One can't buy a hard drive that comes with built-in data quality maintenance: such a feature is only conceivable in a cloud-based service, and buyers will benefit by expanding familiar frames of reference for data storage procurement to think in these new terms.
Agencies will doubtless be concerned about the issue of data residency, both in terms of physical location and operational control of data storage devices. It's essential to understand the flexibility achievable with cloud integration points and partner services. Agencies will increasingly come to appreciate the rigor and cost-effectiveness of cloud service providers' protections, but may also choose to take advantage of selective masking capabilities (including encryption) for cloud-resident data – or even to partition their applications, keeping the most sensitive data fields in local storage while associating them with cloud-based records only by means of anonymous identifiers (e.g., "Case Number").
To be sure, the fundamental challenges of volume and throughput continue to grow. One analytics company recently posted a job description headed "Petabyte Platform Engineer"; Amazon Web Services invites cloud customers with large data volumes to deliver physical storage devices directly to an AWS data center for faster transfer of volumes ranging from 100 GBytes (82 days on a T1 line) to several TBytes (many days on even a T3 connection). At the same time, however, Federal agencies with critical data dependencies will do well to consider the next-generation services of cloud-focused systems partners such as Model Metrics, Informatica and Cast Iron Systems – with the acquisition of the latter by IBM giving emphasis to the growing importance of services, added to core capabilities, to create compelling cloud options.
An IT buyer's conversation with a data services provider will still begin, most likely, with questions of gross capacity and speed, but that's just the introduction to a discussion of more interesting value-adds. Consideration of cloud services should focus on the need to be met, not on the hardware attributes to be virtualized.
Chief Technology Officer
There are many specific examples of implementing cloud based data as a service and I would be happy to follow up with specific customer references in person but I will talk to a couple examples where we have implemented such services to support our customers both in and out of DoD.
A specific DoD customer was interested in moving 24K user exchange and SharePoint into cloud services but was challenged by the distributed nature of the environment, need for continuous operations in multiple locations, a frequently moving user base and the need to rapidly scale to 45K users on demand. This was becoming more and more challenging as demands for flexibility and resiliency were seen as a huge cost hurdle that they were not able to overcome. By implementing a 100% virtualized cloud environment we were able to provide the flexibility needed to support their needs. The next challenge was how do we make data available wherever the customer traveled and make it transparent to the user. We addressed this by implementing object based Cloud Optimized Storage to present data to users and distributed applications using simple SOAP and REST protocols. By presenting the data as objects to the user and applications, data mobility became a by-product of implementing a cloud based data as a service architecture and existing scalability and replication challenges within the SharePoint environment were eliminated.
A similar example in the commercial space was implemented when a customer needed to support five geographically distributed data stores which exceeded three petabytes of information. The data was changed frequently by many millions of users and the need to avoid a centralized data management application, support continuous changing IO requirements and to do this in a cost effective centrally managed infrastructure required us to build a data as a service architecture that was based around user defined policy management of information. This approach takes many decisions typically made in the application and pushes them down the technology stack where data could be managed by policy at the storage layer. This approach dramatically reduced the operations and maintenance costs because O&M personnel were not required at each site to do data management and replication tasks. This is an example of how automation is beginning to and will eventually replace many of the lower level IT functions currently being performed and will free the IT specialist’s time to focus more on how IT is supporting the business.
Many organizations can reap significant benefit from the features associated with cloud computing, such as location independent access to information and the ability to access data services with up-times guaranteed by service-level agreements (SLAs). Additionally, many large cloud-based offerings can provide extensive "on demand" scalability that can help an organization to increase their data service usage without a large, planned capital investment in storage hardware and infrastructure.
A cloud-based data service approach can be used for a variety of purposes in Government, ranging from highly secure private cloud purposes to very visible public capabilities supporting the White House's Open Government initiative. For example, the NASA Open Government Plan describes the NASA Nebula cloud computing environment as "an open-source cloud computing platform" which provides "an easier way for NASA scientists and researchers to share large, complex data sets with external partners and the public." USAspending.gov 2.0, a website for government budget information, is hosted on the Nebula cloud.
While the potential benefits are numerous, Federal IT leadership should consider the business case for data services. For private cloud deployments, there will be an investment in hardware and software, and costs for implementing, integrating and testing data services. For community and public deployments, there can be significant savings in capital investment; however, leadership should ensure that the cost/benefit analysis includes the costs for applications that have to be ported to take advantage of data services. For example, some public database cloud computing offerings can only be accessed with proprietary application programming interfaces (APIs). Usage of these APIs can require significant porting, integration and testing of legacy applications.
Network latency and throughput should be considered in determining the approach to using a cloud-based data service. A degraded network could slow the upload and retrieval of information, and in the event of a network failure, access could be totally severed. Therefore, data services that are placed in off-premise, cloud-based environments will not be available for "disconnected operations."
Security solutions should be employed (e.g., encryption) and SLAs should be codified with providers to meet the organization's needs for protection of data at rest and in motion. "While data outsourcing relieves the owners of the burden of local data storage and maintenance, it also eliminates their physical control of storage dependability and security, which traditionally has been expected by both enterprises and individuals with high service-level requirements," writes Wang et al in IEEE. Similarly as Gartner has noted, Federal leadership should examine the location of data (e.g., CONUS) and controls for segregating it.
While there are many benefits and considerations for cloud-based data services, as Peter Coffee writes, the conversation will likely start with a discussion of "capacity and speed" before moving to the more interesting value-adds. It's this value coupled with meeting requirements that should be compared against the costs and considerations for each organization to determine their return on investment (ROI), benefits, and trade-offs.
If you are from a U.S. government agency or DoD organization and would like to pose a question for this forum, let us know.
"Ahead in the Clouds" is a public forum to provide federal government agencies with meaningful answers to common cloud computing questions, drawing from leading thinkers in the field. Each month we pose a new question, then post both summary and detailed responses.