The Secret Code To Successful Data Anonymisation.

One of the most important components of a user-facing application is the data that it’s fed. In any context where personally identifiable information or other sensitive assets are handled, data security is paramount. Not even the most cognizant technology leader can predict when the threat of a breach will arise – for instance, if an employee leaves their laptop on the train and it’s picked up by an immoral individual, sensitive data could easily be swiped.

Countries in the European Union adhere to GDPR, which defines the concepts of the “data controller” (the entity that decides what is done with personal data) and “data processors” (any entity that processes this data on behalf of the controller). Sometimes software development requires a lot of stakeholders to get involved, and the more cooks in the kitchen, the bigger the risk of a data leak or other mishap. Therefore, Godel’s data anonymisation approach is part and parcel in partnerships with its clients.

What is data anonymisation?

Data anonymisation is a method of information sanitisation for the purposes of privacy and protection. It is the process of either encrypting or removing personally identifiable information – names, postcodes, medical conditions, salaries, IPs, etc. – from data sets, so that the people whom the data describes can remain fully anonymous. Effective anonymisation completely eliminates the possibility of nefarious people tracing any information back to its original provider.

Why is data anonymisation important?

Imagine that Godel is working with a client to develop a system that processes hundreds of transactions per second, coming from a userbase of 1,000,000. Each user’s profile consists of their very sensitive personal data. The Godel development team must be able to feed the system information that looks and feels exactly like the user’s real data, in order to test it in realistic conditions. The system could behave erratically when processing real transactions if anonymised data used during development and testing doesn’t replicate real datasets well enough. That is why Godel carries out data anonymisation with the following three steps.

1) Scoping with the client

First, the Godel team will hold a shared discussion with the client on how the user data should be anonymised. Stakeholders involved will discuss the functional scope of work – for example, how many “refreshes” of the anonymised data will be required during the development cycle, and how often?

If suboptimal anonymisation techniques are chosen, it’s possible that anonymised data will be inapplicable to the system. Therefore, it’s fundamental to understand the context of data, too, so that the most appropriate methods can be applied.

The partnered team will also dive into architectural details such as what areas of the estate will be affected by the data, database requirements and schema, to map an in-depth view of the system at hand. The outcome of this conversation will be a view of the data that is to be anonymised, a scope of how long it will take and how it will be done.

2) Design and analysis

The data anonymisation process starts with choosing which approaches to take. There are many options, and for different types of information, one approach may be better than another.

The Secret Code To Successful Data Anonymisation 1

Examples of anonymisation techniques:

Data Masking:  Where original data is “masked” by new values, but the original type of data (e.g. addresses) is retained. SQL Server, for example, offers dynamic data masking, which hides data from unauthorized users as per a defined central policy.

Randomisation: Godel teams normally write scripts to achieve this process, but tools such as RedGate Data Masker are available too.

Encryption:  When data is run through algorithms that scrambles it up into a “code” that can only be deciphered via a decryption key. Microsoft’s Cloud Discovery data anonymisation ensures that anything uploaded to its Security Portal is encrypted so that no personally identifiable information is stored.

Data Hashing:  Where data is anonymised into an irreversible code via an algorithm such as SHA-2. It is a similar process to data encryption, except for the complete irreversibility of hashing.

3) Implementation

Finally, it’s time to put the anonymised data to work, by creating scripts, testing them and applying them to the anonymised target database. These scripts can then be included in existing or new CI/CD or data migration pipelines for future use.

At this stage especially, the importance of thorough scoping at stage 1 cannot be underestimated. Teams do not want to find themselves unprepared when it comes to anonymising datasets. For example, if a new type of user data needs to be introduced to the application, having a pre-defined, tested approach in place will enable anonymisation to take place without headaches or risk.


A Guide to Basic Data Anonymisation Techniques

Dynamic Data Masking in SQL

Cloud Discovery Data Anonymisation