For data anonymization, choose Gedis Studio


Anonymization with Data Masking and Data Scrambling

Why Data Quality Matters

Data masking is one of the most popular approach to live data anonymization. By replacing sensitive data with fake data you will be able to disclose your production data outside of your organization.

However, an ineffective data masking process may result in anonymized but poor quality data, useless for replacing sensitive real data.

The major pitfall of any anonymization process is to focus on masking sensitive data and miss the primary goal : obtain quality data for your test.

Therefor, your anonymization process should start with targetting the best data suited for your test, find in that subset the sensitive data to mask and choose a way to building replacement data for those sensitive data in way that both ensure anonymization and that data are still appropriate for your test!

One typical pitfall of unappropriate trivial anonymization is when you want to apply without considering the structure of the anonymized data. For example, when considering very sensitive data such as NIS, Bank account, Credit Card number, Phone number, Mobile device code, etc., all have standardized structure that MUST be respected by the substitution data you are using.

This is why anonymization is actually both a matter of generating appropriate substitution data AND applying consistently those masking data.

When it comes to data quality and realism, GEDIS Studio is the best tool on the market. It is not another database replication tool; it is made for producing replacement data.

Data anonymization cannot be reduced to just cleaning sensitive data: the resulting data need to be as good as the original data

Anonymization with Data Encryption and Scrambling

Data encryption anonymizes data by replacing selected sensitive data with encrypted data. Depending on the type of encryption you use, you can reverse the process using an encryption key for example.

GEDIS Studio provides various way of encrypting data among which you will find:

  • Replacing a data with is CRC code (like a Hash code)
  • Encode the data with Blowfish algorithm, given a secret key.
  • Encode the data with a Twofish algorithm, given a secret key.
  • Encoding a data (even binary) using a Base64 encoding (no secret key!)
  • Scrambling the data by appling various swap, insert and character replacements

With GEDIS Studio, each encrypted data (such as a column of CSV input file to anonymize) may have its own secret key or share a common secret. The encoder support decoding as well, thus using the same secret key you will be able to get back to the original sensitive data.

Being able to decode encrypted data is valuable since the owner of the data may analyze the piece of data returned by the users of the encrypted data in order to help diagnosing the detected defects for example.

If you want to know more about the Blowfish encryption properties and how relevant it can be for your data anonymization, you can have a look at the Wikipedia page : Blowfish. Same for the Twofish algorithm there at Wikipedia: Twofish.

Advantages of data encryption: data encryption is easy to put in place and provides good anonymization since it is almost impossible to revert to the original data without knowledge of the encryption key and encryption algorithm. With respect to producing undisclosed data, encryption is a fast and efficient way to proceed.

Drawbacks of data encryption: Data encryption transforms data into unreadable data. For example, if you encrypt names like "John" you will obtain something like "geyFSre4". This is satisfactory from an anonymization point of view, but it may end up with unsuitable data for the test or training environment.

You can have a look at this video demonstrating GEDIS Studio to data encryption for anonymization purposes.

Anonymization with Data Masking

Masking data with generated substitution values is the best way of anonymizing production data to obtain data for testing purposes. Because your testers will provide you with the quality requirements expected for the test data produced after anonymized data, you ensure that you will have both anonymized data and data suitable for purpose.

GEDIS Studio is dedicated to this purpose: let our video convince you to produce substitution values!

Reversible Data Anonymization

Anonymization reversibility is crucial if you want to be able to retreive the original data from the masked data. GEDIS Studio provides various way to build on the fly encoding dictionnary or reuse existing dictionnary. Using this mecanism you can ensure that the same source data is always replaced by the same substitution value, whatever this substitution value is.

See how GEDIS Studio encoding/decoding mechanism is ideally suited todata de-anonymization.

Do you really need anonymization ?

If you have a considerable amount of private data in your databases, you will need to produce substitution values for a large subset of your databases. In this case, why not directly generate data from scratch instead of extracting production data and masking it?

GEDIS Studio makes it easy to generate data from scratch. Think about it...