The Two Waves Toward Cutting Costs In The Cloud: Part One.
Drawing an overall cost-comparison between cloud computing and on-premise datacentres requires looking at many factors to find a black-and-white conclusion. Migrating to the cloud reduces on-premise expenses across hardware and maintenance, but these savings can be eaten up when computing resource consumption accelerates. The bottom line is that reducing spend whilst maintaining optimal performance is a high priority goal for many development teams and businesses.
Godel’s DevOps engineers have recently driven savings of up to 50% for clients’ cloud-based systems following three approaches. None of this required application code changes.
Tactic 1: Tagging
When developers have powerful access to spinning up new resources in development cloud environments, it is easy to find many resources being set up and promptly forgotten about. As previously highlighted, this can incur creeping costs over time. When considering larger development teams, it is very possible – common, even – for this issue to add up quickly and quietly.
One Godel client, a major online fashion retailer, used a large content management system (CMS) which allows its front-end development team to build pages for their eCommerce site. In order to do this, each developer needs their own VM which can replicate a production environment for the site – so depending on the development in each environment, some of these VMs can use massive volumes of compute resources.
In cases such as this where dozens of VMs are in play at once, keeping track can get out of hand. To add to the issue, the setup in this scenario was across separate AWS accounts – development, testing and production. Thus, billing was not initially accessible in a single view. Godel first set to work tagging each VM (as well as other resources where applicable) with the developer or team responsible for it. Tags – such as “environment”, “purpose”, “owner” – provide an easy view of each AWS resource’s details, enabling teams to differentiate between servers and resource groups. This immediately connected resource consumption to specific people, teams and roles, so it could be addressed by the client. Then costs view for monthly expenses could be broken down by resources, teams and environments, and areas for costs savings can be easily identified.
Additionally, different AWS accounts were connected to single AWS organization owned by client to force standard security policies across all accounts in a consistent way. Also, account expenses are accumulated into a master AWS account, and along with tagging can be grouped for further analysis and costs planning.
Tactic 2: Scheduling
The client referenced above uses around 30 virtual machines (VMs) at any given time as development environments to test the implementation of new software features. They had recently migrated to AWS, and as such their set-up was brand new. As a fashion retailer in a competitive market, they are constantly building, testing and releasing new code to their website, so availability of these environments was key.
Most expenses in the cloud come from consumption of computing resources. In many cases, environments will be left running day, night, on weekends and bank holidays. Like leaving a light switch on when you leave the house, it’s easy to do this, but the costs quickly add up – especially across dozens of VMs.
Cloud scheduling is a solution to this. Godel first agreed with the client an appropriate daily schedule for given VMs to run – say, 9AM – 6PM and not on weekends. Then, using AWS’ Scheduler tool (AWS SSM), the Godel team defined a start and stop process for computing which automatically switches VMs on and off according to the defined schedule. Additionally, Godel team made further changes in the base version of AWS scheduler to work with VMs assembled in autoscaling groups.
Godel set up a similar solution for another client in the insurance sector, this time in Azure cloud. This client operates with a hybrid approach with some datacentres in their office, some in a datacentre and some in Azure. Prior to COVID-19 the Godel team had created a recovery account in Azure which regularly backs up their most business-critical VMs hosted in the on-premises datacentres. The Azure recovery account monitors these servers, and if they happen to go down a recovery process will automatically start up, where backup VMs will spin up in Azure so that to immediately reproduce the failed on-premise infrastructure on Azure.
As well as reducing costs on large infrastructure units like VMs or databases, the Godel team pays attention to reduction of costs on smaller items. For example, the team has built environments in Azure that consist of a series of VMs connected to a single virtual network and Load Balancer (instead of multiple). This is protected by one Network Security Group which at the same time provides flexible access policies, essential security features and cost efficiency by reduction of excessively redundant infrastructure elements.
They have implemented a solution which, like AWS SSM, automatically starts and stops VMs against a set schedule. In this case the chosen VMs are off by default. If, for example, a tester would like to run an automation framework, they can start a pipeline which will trigger the VM to start so the automation framework can carry out its scenario, and once this is complete another webhook triggers to shut the VM down. It also stops automatically at 9PM every day as a secondary failsafe.
Tactic 3: Reserved Instances
Another retail client of Godel’s found their biggest cloud cost was from their production setup, hosted in AWS. This environment consisted of heavy-usage VMs that were necessary to operate their eCommerce site. By default, in AWS VMs are paid for “on-demand” – charged by use per hour for Windows EC2 instances and per second for Linux instances. Since this client’s production environment was so demanding, these costs were skyrocketing.
AWS (Azure, too) offers another payment option that Godel identified as right for the client. It is a prepayment option for resources, where a fixed-term contract for the usage of VMs is paid in a lump sum. The major advantage is that at the end of the term, a significant percentage saving can be achieved in comparison to on-demand billing – a discount for each hour of use is continually applied by the provider. In this client’s case, a projected 30% has been saved.
This solution isn’t applicable for every situation – resources that aren’t used during the contract will still be paid for. However, in high-usage scenarios that regard stable environments, this is an ideal option.
THE SECOND WAVE TO CUTTING COSTS: APPLICATION CODE CHANGES
The methods detailed above can be implemented by experienced platform engineers with (relative) ease to reduce costs in the cloud. Further savings can be unlocked when applications and workloads are modified with some coding effort. This includes consuming PaaS resources as opposed to VMs, open-source alternatives to licensed software, etc. In the second part of this article series, we explore how this can be achieved.