Using data virtualization, organizations can create a logical layer that allows business consumers to access data without knowing its format or where it resides. They can then quickly design reports and analytics for a wide range of use cases.
This reduces the cost and complexity of integrating new
information through initiatives like cloud-first, app modernization, and Big
Data. It also enables developers to build applications that provide holistic
enterprise information.
Test Data Management
The QA team needs test data management
to create realistic testing environments. This ensures that when the software
goes live, it will perform well across all devices and user types. However, the
physical movement of this data is time-consuming and costly.
Data masking is an approach to solving this challenge. It
obfuscates sensitive information in the source database and duplicates it in
the test environment, providing realistic values for testing without exposing
the original, vulnerable data.
This approach can improve logical data warehouse (LDW)
functionality and reduce the need for manual processing and specialized
hardware. However, it’s not ideal for applications that require dynamically
masked data, as the process is slow and requires dedicated resources. In these
cases, a more modern TDM solution is needed.
The Denodo Platform offers out-of-the-box TDM functionality
that can mask data consistently and efficiently while also maintaining
referential integrity across multiple, heterogeneous sources. It can also be
integrated with external data transformation and data quality tools via APIs
for greater flexibility.
Masked Data
To keep up with a faster release cadence, a dev and test
environment needs access to real data. But this data isn’t always available,
and it takes a long time to refresh the test environment with new data.
Consequently, teams often work with stale data, which can lead to mistakes and
defects that escape into production.
Masking creates a copy of the original data that looks similar
but is fake, making it unidentifiable to someone trying to reverse engineer
sensitive information. This can include removing numeric values from PII,
replacing them with artificial values, generating random numbers, or
substituting specific heights in a medical database with ranges of values.
Another type of masking called obfuscation or data redaction
removes data values based on user permissions, listing the values that aren’t
permitted to a given user as “null” or deleting them. Other types of masking,
such as generalization and averaging, replace real values with averages or
other statistically representative values.
Data Sanitization
Data masking is an approach to keeping sensitive PII private
by replacing specific values with fictitious ones, such as “zoomed out” heights
for a medical data set. It is a useful tool to reduce the risk of
re-identification, helping businesses stay compliant and avoid data breaches.
However, it isn’t as secure as encrypting the data or moving
the data to a private environment because the original values can be easily reverse-engineered
to identify individuals. Fortunately, there are new technologies that combine
both masking and encryption to prevent the risk of re-identification while
still allowing developers and testers to access a data set.
These technologies include data generalization and averaging
that replace specific identifiers with statistical outputs that cannot be used
to re-identify people. By creating a virtual view of the data without
replicating it, these technologies remove the need for physical data movement,
eliminating the potential for data leaks or other problems associated with
moving sensitive data across systems. They also deliver on-demand refreshes for
developers by leveraging the efficiency of the VDP engine to only copy changed
blocks.
Reverse Engineering
Data masking creates a characteristically accurate but
fictitious version of real data that hackers can’t reverse-engineer or use to
identify individuals. This helps minimize risks for Synthetic Data while
keeping it available to the organization for day-to-day operations and
analytics.
Data virtualization offers a lower-cost alternative to ETL
for accessing enterprise data and enables more agile BI and analytics systems.
It delivers data in real-time and eliminates the need for intermediate servers
or storage 'depots' where the data is consolidated.
To begin implementing data virtualization, you need to make
a list of all the systems and applications that produce information. Determine
their management demands and connectivity requirements to ensure they’re
accessible through the virtual layer. You also need to define the logical views
that you want to build. Hevo has a comprehensive suite of tools to help you do
just that. Sign up for a 14-day free trial to experience Hevo for yourself.
Comments
Post a Comment