# Data Correlation

Data correlation expresses the constraints between values of different fields in one or more records. To ensure the realism of generated test data, various types of constraints on one or multiple fields and records should be considered.

## Why Do You Need Test Data Realism?

What is at stake is the realism of your test data and the quality of your test campaign. Obviously, using real data is an easy solution for obtaining realistic data. But even if you can have access to production data, you must be aware that **real data are not test data**. Indeed, you still have to rework real data to obtain anonymized data focused on your test cases.

On the other hand you can use synthetic test data. Using a data generator to build your test data requires to characterize the realism of the data and find an appropriate generator for expressing this realism and produce the data.

The realism constraints could be driven by the application itself requiring that injected values satisfy some specific business rules. Those constraints could also be related to the test case objectives and the way you will decide if your test scripts have passed or not with your generated data.

- Test Data Realism is required to allow data injection in the tested application.
- Test Data Realism is required to obtain test data focused on test cases

## What Is Test Data Realism About?

So the first issue is about expressing the level of realism required for your test data. The Trap is that when considering test data realism you may be tempted to achieved a level of realism that matches the real world data.

Going this way is a very difficult work and most of the time it is useless. In fact, what you need is test data, that is just data for your tests.

That means that the constraints you must take into account are not what are the data in a production environment but what are the minimal set of constraints required for your test and for injecting those data in your test environment.

## What is Data Correlation?

There are various level of data correlations depending on the number of fields involved in the correlation as well as the way the correlation constraints can be between the values of these fields are expressed.

Often testers don't rely on data generators because these tools usually lack ease-of-use for expressing quickly and efficiently correlation constraints and more generally realism constraints.

By decomposing the problem of data correlation in level of complexity and for each level providing a pointer to implementation with GEDIS Studio we will demonstrate that this test data generator provides real advanced features for effective test data.

## Data Generation with Inner Record Correlation.

Inner record correlation relates to correlation between values of two ore more fields in the same record. It that case, the values taken by a subset of fields must be selected altogether, and the rule established to constraint the valid combinations of values is applicable to all the generated records.

A trivial correlation rule between fields GEN-001 and GEN-002 could be of the form : *If field GEN-001 is equal to 'A' then field GEN-002 must be equal to 'B'*. Saying so for all the combination of valid value pairs for field GEN-001 and GEN-002, we design a perfect correlation constraint table for the two fields.

This simple correlation is definitely necessary to have for building realistic test data. But not all the data generators are capable of providing an elegant implementation. In the remaining of this explanation, we will go beyond this trivial example and show that advanced correlation and realism deserve more work and only GEDIS Studio is capable of providing tools for the implementation of those mechanisms.

A more complicated, but yet possible, constraint could be expressed like *If field GEN-001 is equal to 'A' then field GEN-001 must be chosen in the list { 1, 3 , 400 , 402 }*.

Again, we can imagine a list of value for field GEN-001 and for each of them a way to produce the correlated field GEN-002 in a different way.

In that more complicated correlation rule, GEDIS Studio makes use of the Selector Generator applied to the second field (GEN-002) upon the values taken by the first field (GEN-001).

There are still more complicated correlation constraints that can be expressed as *If value of GEN-001 satisfies condition Cond_1, then value taken by GEN-002 should be generated by rule Rule_1*. Here again, we can list the pair Cond_i and Rule_i that came together to express the correlation between GEN-001 and GEN-002.

## Data Generation with Outer record correlation.

The outer record correlation relates to correlation between values taken by one or more fields in consecutive records. In that case, the fields of the consecutive records to consider may, or may not, be the same.