Data Virtualization Vs Data Masking

Using data virtualization, organizations can create a logical layer that allows business consumers to access data without knowing its format or where it resides. They can then quickly design reports and analytics for a wide range of use cases.

This reduces the cost and complexity of integrating new information through initiatives like cloud-first, app modernization, and Big Data. It also enables developers to build applications that provide holistic enterprise information.

Test Data Management

The QA team needs test data management to create realistic testing environments. This ensures that when the software goes live, it will perform well across all devices and user types. However, the physical movement of this data is time-consuming and costly.

Data masking is an approach to solving this challenge. It obfuscates sensitive information in the source database and duplicates it in the test environment, providing realistic values for testing without exposing the original, vulnerable data.

This approach can improve logical data warehouse (LDW) functionality and reduce the need for manual processing and specialized hardware. However, it’s not ideal for applications that require dynamically masked data, as the process is slow and requires dedicated resources. In these cases, a more modern TDM solution is needed.

The Denodo Platform offers out-of-the-box TDM functionality that can mask data consistently and efficiently while also maintaining referential integrity across multiple, heterogeneous sources. It can also be integrated with external data transformation and data quality tools via APIs for greater flexibility.

Masked Data

To keep up with a faster release cadence, a dev and test environment needs access to real data. But this data isn’t always available, and it takes a long time to refresh the test environment with new data. Consequently, teams often work with stale data, which can lead to mistakes and defects that escape into production.

Masking creates a copy of the original data that looks similar but is fake, making it unidentifiable to someone trying to reverse engineer sensitive information. This can include removing numeric values from PII, replacing them with artificial values, generating random numbers, or substituting specific heights in a medical database with ranges of values.

Another type of masking called obfuscation or data redaction removes data values based on user permissions, listing the values that aren’t permitted to a given user as “null” or deleting them. Other types of masking, such as generalization and averaging, replace real values with averages or other statistically representative values.

Data Sanitization

Data masking is an approach to keeping sensitive PII private by replacing specific values with fictitious ones, such as “zoomed out” heights for a medical data set. It is a useful tool to reduce the risk of re-identification, helping businesses stay compliant and avoid data breaches.

However, it isn’t as secure as encrypting the data or moving the data to a private environment because the original values can be easily reverse-engineered to identify individuals. Fortunately, there are new technologies that combine both masking and encryption to prevent the risk of re-identification while still allowing developers and testers to access a data set.

These technologies include data generalization and averaging that replace specific identifiers with statistical outputs that cannot be used to re-identify people. By creating a virtual view of the data without replicating it, these technologies remove the need for physical data movement, eliminating the potential for data leaks or other problems associated with moving sensitive data across systems. They also deliver on-demand refreshes for developers by leveraging the efficiency of the VDP engine to only copy changed blocks.

Reverse Engineering

Data masking creates a characteristically accurate but fictitious version of real data that hackers can’t reverse-engineer or use to identify individuals. This helps minimize risks for Synthetic Data while keeping it available to the organization for day-to-day operations and analytics.

Data virtualization offers a lower-cost alternative to ETL for accessing enterprise data and enables more agile BI and analytics systems. It delivers data in real-time and eliminates the need for intermediate servers or storage 'depots' where the data is consolidated.

To begin implementing data virtualization, you need to make a list of all the systems and applications that produce information. Determine their management demands and connectivity requirements to ensure they’re accessible through the virtual layer. You also need to define the logical views that you want to build. Hevo has a comprehensive suite of tools to help you do just that. Sign up for a 14-day free trial to experience Hevo for yourself.

California Business Blogger

Search This Blog