Customer Data Anonymisation in the Finance Sector

25 Feb

Banking, financial services, and insurance companies are continuously seeking ways to enhance their operations. One key aspect is the responsible and secure handling of customer data. This blog post delves into customer data anonymisation and its impact on the finance sector.

Understanding the Importance of Customer Data Anonymisation

Customer data is the lifeblood of the finance sector, enabling personalised services and informed decision-making. However, the increasing emphasis on data privacy and regulatory compliance, such as the General Data Protection Regulation (GDPR), necessitates a strategic approach to data handling.

Customer data anonymisation is the process of protecting private or sensitive information by removing or encrypting certain private information that can be used to identify a person in data. Information like names, addresses, contact details, social security numbers and financial information all fall under personally identifiable information (PII), i.e. data that should be anonymised.

Techniques for Customer Data Anonymisation

There are many techniques for customer data anonymisation. Data masking, Pseudonymization, and Redacting are the most commonly used techniques. Let’s look at each in more detail.

Data Masking

Data masking means replacing the original data with fictitious but realistic values. This technique is often used to anonymise production data for use in the development and testing processes. Using anonymised production data while testing allows for realistic data in regard to quality and quantity. This is crucial for performing realistic testing with the test system in a state that is as close to the production environment as possible.

Pseudonymisation

Pseudonymisation is a technique for replacing personal identifiers with placeholder values in order to protect the original data. For example, the name of a person could be replaced with the placeholder value Person1. The original name must be stored in a lookup table together with the placeholder value in order to resolve the identity for analysis or debugging purposes. The lookup table content should be protected from unauthorised access, and preferably, the data should be encrypted.

Data Redacting

Data redacting is removing sensitive data or replacing it with data that will hide the original. Data redaction is a one-way process, and the original data cannot be reversed from the redacted data. An example of data redaction could be replacing the letters and numbers in a social security number 010150-113X with asterixis ******-****. Let’s focus on this technique for the rest of this blog post.

Example: Data Redacting in Know Your Customer (KYC) Forms

The best way to protect sensitive customer data in the application logs is not to log them in the first place. To do so, best practices for logging should be established and followed. The output of the log content should be reviewed during code or peer reviews. Automated tests can be implemented to verify that no sensitive data is being logged by the application.

In some situations, customer information must be logged. This can be logging the user input of Know Your Customer (KYC) forms or customer detail updates. In these situations, the following approaches must be implemented:

Identify the sensitive data to be masked. This includes the PII and financial data that can be used to identify the customer, such as social security number, credit card number, email address, mobile number, etc.

Decide on a strategy to mask the data. Options are to completely mask the data so that it cannot be used for anything meaningful during an audit or problem analysis. Another option is to partially mask the data to make it possible to, for example, search the logs with the partially masked information.

One approach for masking the data is to process all log statements before sending them to a log stream such as a file or a centralised logging system. This can be accomplished with pattern recognition using regular expressions and adapters that intercept and process the message content. For this strategy to work, the data in the log messages must be structured in a uniform pattern, such as prefixing the value with an identifier that hints at the logging adaptor of the data, e.g. “ssn=121212-123A” or “creditCardNr=5555 1234 1234 1234”.

In object-oriented programming languages, it is common to send an object to the logger and have the object’s string representation perform the output of the object’s attributes. Java handles this using the object’s toString() method. Let’s say we have a User class that looks like this:

public class User {
	private String firstName;
	private String lastName;
	private String ssn;
	private String creditCardNr
	// Setters and Getters are omitted
	public String toString() {
		return “User: [firstName=”+firstName+”, lastName=”+lastName+”, ssn=”+ssn+”, creditCardNr=”+creditCardNr+”]”;
	}
}

If an instance of this class would be passed on to the logger like so:

User myUser = new User("John", "Doe", "121212-123A", "5555 1234 1234 1234");
logger.info("User=" + myUser);

This would lead to a log statement:

INFO: User=User: [firstName=John, lastName=Doe, ssn=121212-123A, creditCardNr=5555 1234 1234 1234]

As seen above, the customer’s social security number (ssn) and credit card number are logged in clear text when logging the created user. To fix this, the User class should be modified to either mask the ssn and credit card number or omit them completely from the toString() method. An even better alternative would be to introduce own data types for the ssn and credit card number instead of declaring them as strings. The new data types could then mask the values in their respective toString() methods:

public class SSN {
	private String ssnValue;
	//Getters and setters are omitted

	public String toString() {
		return “SSN: [”+maskSSN()+”]”;
	}
	private String maskSSN() {
		//Implement more elegant masking if needed
		return “******-****”;
	}
}

public class CreditCardNumber {
    private String creditCardNumber;

    public CreditCardNumber(String creditCardNumber) {
        this.creditCardNumber = creditCardNumber;
    }

    @Override
    public String toString() {
        return "CreditCardNumber: [" +  maskCreditCardNumber() + "]";
    }

    private String maskCreditCardNumber() {
        return creditCardNumber.replaceAll("([\\d]{6})[\\d]{8}([\\d]{2})", "$1 ****** $2");
    }
}

Now, if we change the types for the ssn and credit card numbers in the User class:

public class User {
    private String firstName;
    private String lastName;
    private SecureSSN ssn;
    private CreditCardNumber creditCardNumber;
…
}

And logs the user again:

User myUser = new User("John", "Doe", "121212-123A", "5555 1234 1234 1234");
logger.info("User=" + myUser);

The logged output will be:

INFO: User=User: [firstName=John, lastName=Doe, ssn=Ssn: [value=******-****], creditCardNr=CreditCardNumber: [555512 ****** 34]]

Trust is a Currency in the Financial Services Industry

In recent years, there have been many instances where unwanted parties have gained access to personal data and used it for blackmail purposes, and protecting customers’ personal information from identity theft has become increasingly important on all levels.

By redacting the customer information in the application log files, we can mitigate one of the ways unwanted parties can steal personal information. Hence, customer data anonymisation is the way to unlock the full potential of the data while respecting privacy regulations.

Ensuring that the data logged from the application is adequately masked or redacted is critical, especially when dealing with customer information and Know Your Customer (KYC) form data. The logging code should be peer-reviewed, and the log outputs should be tested to ensure no unwanted personal information is written on them. This must also include the debug logs, as they can be enabled in production either on purpose or by accident.

As technology evolves and data handling practices evolve, maintaining a proactive stance towards customer data anonymisation will remain paramount, fortifying the foundation of a secure and resilient financial ecosystem.

Mathias Hannus

Mathias is a digital development veteran in the finance sector. Before Vuono Group, he worked at Nordic Financial Solutions, Nordnet, and EQ Bank, for example.