Data Transformation: What’s the new technical landscape?
Data transformation is the process of converting data into an alternative format or model. Doing so can bring many benefits including scalability, cost savings and speed of delivery to your business. Godel’s Head of Data, Siarhei Oshyn explains the new digital landscape and his vision for the new architecture.
The traditional approach
There are two approaches, the first being the traditional approach which we also call “old school”. In this case, traditional technologies such as Oracle database or Microsoft SQL Server database are used. Oracle Exadata is a hardware-software solution that allows you to process the future amount of data in seconds. It is hard to scale and distribute resources effectively as the infrastructure is on-premises, making monitoring and logging difficult.
The classic approach, in my opinion, has a lot of disadvantages because if you buy hardware or servers, you don’t have an option for scope scalability of your server. As a result, the more your database grows, the more it will cost overall. For example, if your workload is initially quite small, but next year, you predict that your workload will increase by 20- 30% but your workload increases by a lot more, your only solution is to buy the hardware as it scales, or just implement your solutions from the start. But in this case, you will be paying a lot of money.
The new architecture
The introduction of cloud technologies outdates the classic approach. Cloud technologies allows you to scale your solution.
Benefits of the new architecture:
- It is scalable. Making it easy to scale the business
- Distributed architecture with low coupling between components
- Cost-optimised, flexible cost plans
- Streaming approach and Lambda architecture
- Fail fast approach and reduced ‘go to market’ time
In my experience, cloud technologies are better for scalability and is considerably faster and cheaper. You can start with a small application and when you get the revenue, you can grow your business. If you look at the brand-new architecture, cloud technologies allow us to use a lot of technologies at once. To process different amounts of data, we can use different services, so it enables us to make our solution scalable and cost effective because you only need to use one tool to process small to medium amounts of data.
Tool diversification provides more flexibility, for example, open-source technology. If we’re comparing our old school architecture, such as Talend, you must buy licenses for each developer and user. Apache Airflow or Terraform avoids vendor lock-in – creating a much better community. You just need to buy an AWS/Azure/GCP subscription, giving access to a complete toolkit.
Web Services provides us with a quick site as you only buy a licence for the users and there’s no need to buy additional BI tools for your analytics. The trend from the last two years has seen the accessibility for machine learning toolkits much better with cloud-based data engineering. So, say you need to build your data science solution from scratch, you can use the services, just input your data and some parameters, making it quicker in the long run.
How has COVID affected data?
The data stream or amount of the data that was ingested in 2019 decreased by 50% when the COVID pandemic hit. During COVID, the usage of server space plummeted for some businesses. If they were on-premises, they would still need to pay for the full space (without dynamic scaling) – even infrastructure assessment to downscale would be a lot more difficult on-premises. Cloud technologies enables you to be flexible in the case of emergency. In the case of COVD, it allows a person to save their business, save money and use it in another way, such as to support their employees.
Migrating a Godel client’s database
We are currently working with a UK PLC to migrate data and data pipelines from their big Oracle databases into AWS. There are lots of cost benefits to this. We also proposed a cloud solution for a client in the telecommunications sector who were already in AWS to allow us to make a consultancy and build a proof of concept. In this case, we didn’t need to wait for infrastructure and the proof of concept was a matter of weeks, with no difficult configuration required.
These examples show Godel’s personal experience of how we’re working with cloud technologies, and our vision of the solution architecture.
The data division who run AWS/Azure Labs makes it much easier to organise with cloud technologies than building and purchasing on-premises servers – allowing teams to learn and develop a lot faster, and prepare and update engineers for the future projects. We are pleased to say that Godel saw 7 people certified only in Q2 2021.