Github

Cloud-hosting Solution

This study set out to investigate the feasibility of hosting a digital repository system in a managed way in a private cloud environment. The system that was developed was functional, usable and had an acceptable level of system performance. Thus, a private cloud is indeed a feasible option for such software systems. Current interest in containerisation of software tools, through mechanisms such as Docker, are highly compatible with cloud management systems, as they each address a different aspect of the ease of management equation. - Investigating the Feasibility of Digital Repositories in Private Clouds

Although their findings were graded on user experience and less on the technical requirements it does show some promise for cloud hosting a repository. Currently there are a couple of Islandora 8 instances running in the cloud. After asking around in the community, there doesn’t seem to be any larger (more than 5TB) repositories currently running in the cloud. There are several pricing options within a cloud hosting provider.

  1. A few solutions are available for vendors to host a solution for a fee.

  2. Able to run the stack on a cloud hosting provider like Amazon’s AWS. The demo is running on EC2 and Lightsail (an Amazon cloud solution).

    • Price modeling on any AWS solution is complicated and varies greatly by size and frequency of use. The task of summarizing AWS option pricing could be its own project. “In 2020, AWS comprised more than 212 services including computing, storage, networking, database, analytics, application services, deployment, management, mobile, developer tools, and tools for the Internet of Things.” [link].

aws-refarch-drupal-v20170713

  1. A few options specifically beneficial to cloud hosting include load balancing VM cluster servers (multiple identical servers) and a simple snapshot process for backing up. These can be achieved in hardware, they would require some expensive investments. Cloud hosting also makes hosting hardware unnecessary. An additional note is that some functions can be automated for either solution like scheduled backups, automated application deployment, and system monitoring. There is absolutely no mitigation of fully automated systems and application maintenance and most cloud solutions cost are prohibitively high when scaling to high numbers of CPU cores or large amounts of RAM.

  2. A few vendors are available like Discovery Garden and Born Digital offering several options; anything from migration consultation, to migrating content from 7 to 8, develop a Islandora 8 solution that meets the institution’s needs, Islandora-as-a-service and several other options.

Other Hidden Costs

Other potential “gotcha’s” that could be inflating your AWS bill include unused services started in AWS OpsWorks, unhealthy instances, and fees for excessive API calls. There are also numerous indirect costs associated with AWS and other cloud solutions in the form of performance, reliability, and cyber security problems. Misconfigured AWS servers were at fault for the recent data breaches at business associates of Verizon, the Republican National Committee, and private security firm TigerSwan. In February, numerous large websites were knocked offline due to an error by an employee at AWS, and the tech community recently expressed grave concerns about widespread chaos if AWS were to have another, larger failure, particularly since so many financial institutions rely on it.

The Cloud Isn’t Always Cheaper

Despite sticker shock and concerns about these hidden and indirect costs, many organizations continue to grumble and pay their AWS bill due to the misconception that cloud computing is always cheaper and more efficient than purchasing their own IT infrastructure. This is a myth. In many cases, an organization’s monthly AWS bill alone costs more than an in-house solution would. If your organization processes large amounts of data, it would probably be more cost-effective to purchase and maintain your own infrastructure.

Amazon suggests S3 Glacier is the ideal option if low storage cost is paramount, and you do not require millisecond access to your data. Glacier can retrieve data within 1 – 5 mins if it is expedited. Standard is available which takes 3 – 5 hours. Glacier comes with a 10GB retrieval free tier per month. Amazon suggests using S3 storage if you need low latency or frequent access to your data like accessing from an image viewer or web application. According to the AWS S3 Calculator the storage of 10TBs of data cost $2,700 a year not including accessing or modifying the data (those are additional costs). Discovery garden currently implements a combination of these to lower the cost but couldn’t say how much and what were the drawbacks.

It’s not always necessary or beneficial to abandon the cloud completely. Many organizations would greatly benefit from a hybrid approach, where they use their own infrastructure for certain tasks and utilize cloud solutions when they need additional capacity.