Commercial Cloud Strategy: A Supplement to Enable World- Class Research?

Commercial Cloud Strategy: A Supplement to Enable World- Class Research?

With rapid advancements in technology, our ability to collate and interpret data has increased tremendously. Modern research, across all fields of study, requires analyses of the data points collected to garner new insights and develop solutions. In Canada, we refer to advanced computing of these data points using supercomputers and their supporting infrastructure as “advanced research computing” or “ARC.” The key components of the ARC ecosystem are skilled workers, physical infrastructure (hardware and software), and big data. Despite Canada investing $100M on ARC hardware over the past 2 years, Ontario is only able to meet only approximately 40% of the researcher’s demand for compute resources on publicly-funded platforms.

With an organizational vision of “driving advanced computing to accelerate research and enhance competitiveness in the global marketplace, for a more prosperous Ontario,” Compute Ontario sought to rectify this access challenge by exploring alternative approaches to supplement existing ARC infrastructure. Commercial cloud computing has emerged as a viable option to address access demand. Cloud computing is the delivery of a network of remote servers on the internet that can store, manage, and process data to accelerate research and innovation while providing economies of scale and flexibility. A consumer can rent cycles from commercial vendors such as Amazon, Google, IBM, etc. The Top500 supercomputers list for June 2019 featured an Amazon “cluster” that cost just $5000, and ranked #136, ahead of platforms that cost millions of dollars to own. Academic groups who need ARC access have begun purchasing access from commercial cloud providers in order to complete their research.

Commercial cloud platforms provide agile, reliable, and easy-to-scale infrastructure, which is constantly evolving to meet emerging needs in various fields of research, such as genomics, cybersecurity, artificial intelligence, and more. The scale of compute and storage resources of these platforms are significantly larger than what existing public ARC infrastructure provides. Compute Ontario is working on developing a cloud strategy to advance its use in Ontario. In a pilot project currently underway, Compute Ontario, CanDIG, and SciNet are exploring the feasibility and potential role of commercial cloud providers in the academic ARC space while investigating its opportunities and challenges. This project involves assessing the ongoing costs and capabilities of commercial clouds for scientific applications, considering both cost-performance and time-to-solution metrics.

Opportunities

Our analyses indicate that commercial cloud platforms have several potential benefits, with the following being most important to researchers in Ontario:

  • Flexible hardware: commercial cloud platforms provide access to multiple types of resources and services which can be configured to carry out complex computational work
  • Newest hardware: cloud vendor platforms are constantly being upgraded to the newest and most powerful infrastructure available. In comparison to Canada’s investment of $100M in ARC infrastructure over 2 years, cloud providers have an average combined infrastructure spending of over $200M per day. This allows researchers to carry out their work using the latest resources to enable best-in-class research
  • Skills development: experience building and using cloud solutions provide attractive, transferable skills that can help develop more skilled workers.
  • Industry engagement: engaging with cloud vendors on research projects is an easier and quicker process it is within the typical academic infrastructure
  • Maximize skilled workers: with rapid access to required resources and a more straightforward process, those with data science skills can focus their energies on other value-add activities that will provide benefits to the research community, such as monitoring and maintaining hardware, among others.

Challenges

Despite the many potential opportunities cloud computing provides, there are a few glaring challenges that the ARC ecosystem needs to address before looking towards it to supplement resources. Here are some of the key concerns:

  • Cost: most experts agree that commercial clouds cost more than academic systems in our existing ARC infrastructure. Expert user support for cloud platforms drives up this cost significantly, and it is challenging to determine the exact value of the benefits provided due to insufficient data
  • Support: governing bodies and policymakers will have to significantly change the current support model for ARC researchers to accommodate commercial cloud infrastructure.
  • Shifting researchers: it is difficult to determine the consequences of moving thousands of researchers to a new ARC environment and its resulting learning curve

 

How can we address this?

Compute Ontario’s strategic priorities include advancing the use of ARC and its availability to researchers in Ontario and acting as a credible voice in policymaking and key strategies to accelerate research in Ontario and across Canada. We have developed a technology strategy that focusses on three main areas of Canada’s ARC ecosystem: compute power and access, network performance and security, and storage and backups. In 2018, we commissioned Hyperion to conduct a study on Ontario and Canada’s ARC ecosystems. This study indicated Canada’s spending on ARC as a percentage of its GDP is the next-to-lowest of all G8 countries. This lack of funding has resulted in inadequate access to research infrastructure, and Hyperion recommended providing researchers with access to commercial cloud resources to bridge this need-gap.

We are yet to determine if the commercial cloud is superior to the current ARC infrastructure, but its potential benefits warrant a more in-depth consideration. We launched the Compute Ontario High-Performance Computing (HPC) Cloud Forecast project to examine the feasibility of adopting cloud infrastructure and determining appropriate performance benchmarks. The project involves running quarterly tests on three commercial providers, namely Amazon, Microsoft, and Google, and on a publicly funded platform with Niagara at the University of Toronto. Through these tests, we want to determine ongoing costs and capabilities and provide up-to-date comparisons for all major public clouds.

Although there are many reasons to be optimistic about the use of commercial cloud platforms to accelerate research, it is important to take note of the significant and capital intensive challenges that come along with this approach. By analyzing the results of our pilot project, we hope to determine the best way forward for a cloud strategy that benefits the ARC ecosystem and researchers in Ontario.