Data is power in the realm of HR. It fuels insightful analysis, strategic planning, and informed decision-making. However, with this wealth of sensitive data comes great responsibility - ensuring its security. Given the increasing complexity of HR analytics tasks and the expanding scope of data privacy laws, protecting HR data from unauthorized access has never been more critical. This is where advanced data management tools like Microsoft Power BI and Apache Superset step in, armed with robust features like Row-Level Security (RLS).
In this article, we delve into data security within HR analytics, examining how we can leverage RLS in both Power BI and Apache Superset to mitigate potential data breaches. With their RLS capabilities, we will explore how these tools can provide granular access control, ensuring that sensitive data is only accessible to authorized individuals, thus protecting employee information, maintaining compliance with data protection regulations, and fortifying trust within the organization. Please accompany me as we embark on this rigorous exploration of enhanced data security within People analytics.
The HR department is the custodian of sensitive employee data, from personal identifiers like Social Security numbers to performance appraisals, salary details, and health records.
Given this wealth of personal and confidential information, HR has legal and ethical obligations to maintain the security of this data.
Legal responsibilities stem from an ever-evolving regulatory landscape with stringent data privacy laws such as the General Data Protection Regulation (GDPR) in the EU and the California Consumer Privacy Act (CCPA) in the US. These regulations demand strict adherence to data privacy norms and inflict heavy penalties for non-compliance.
On the ethical front, employees trust their employers to handle their personal information responsibly. Breaches of this trust can lead to severe consequences, including reputational damage and loss of employee morale and engagement. Moreover, data security is also a matter of respect for individual privacy rights, underlining the ethical obligation to protect data from unauthorized access or misuse.
People analytics involves utilizing employee data to gain insights that guide strategic decision-making. While this data-driven approach significantly improves HR processes, it also heightens the risk of data breaches.
The consequences can be severe if unauthorized individuals access confidential HR data. Employee identities can be stolen, leading to financial loss and credit damage for the individuals affected. Leaked salary or performance data could lead to internal conflict and loss of trust within the organization.
Moreover, breaches can have legal ramifications, with heavy fines imposed by regulatory bodies for failing to protect employee data. Companies might also face lawsuits from affected employees or experience a decline in their stock value.
The reputational damage from a data breach can be immense and long-lasting. Companies that experience breaches can lose the trust of employees, customers, and shareholders, damaging their brand and making it difficult to attract and retain talent.
Given the sensitivity of HR data and the potential risks of data breaches, there is a critical need for measures like RLS in People analytics. These measures allow for granular control over data access, ensuring that only authorized individuals can access specific data, thereby significantly reducing the risk of data breaches.
Row-Level Security (RLS) is a method for restricting data access at the row level within your database. It offers a way to implement data access control by limiting the data visibility to specific users based on their roles. In other words, RLS enables you to control "who sees what" when accessing databases or data sets.
The basic principle of RLS is quite simple. When users connect to a database or open a report, the system checks their identity and role. Based on the predefined security rules, the system determines which rows of data the user is permitted to see. The filtering is transparent to the user: they only see the data they can access and might be unaware that other data is being filtered out.
RLS is pivotal in data analytics, particularly in preventing data breaches. It has several key advantages that make it an essential tool in modern data-driven environments.
In summary, RLS is a powerful tool for preventing data breaches in data analytics. It offers robust and flexible data access controls, enhances data security, and helps organizations comply with regulatory requirements. Implementing RLS is more crucial than ever in an era of growing cyber threats and increasingly strict data privacy laws.
Power BI is a suite of business analytics tools developed by Microsoft that allows you to analyze data and share insights. It provides a 360-degree view of the most critical metrics in one place, updated in real time, and available on all your devices. Power BI lets you connect to hundreds of data sources, simplify data preparation, and drive ad hoc analysis. It offers robust data visualization tools to turn unrelated data sources into coherent, visually immersive, and interactive insights. Whether it's HR analytics or sales reporting, Power BI is a go-to tool for many organizations.
Implementing RLS in Power BI involves a few key steps. Here's a basic walkthrough:
The application of RLS in Power BI for People analytics can be demonstrated through several use cases:
Implementing RLS in Power BI provides a robust security mechanism that ensures sensitive HR data is only visible to authorized users, thus helping to prevent potential data breaches.
Let's break it down with a specific example.
organizational structure,
training,
reviews.
The company has team leads and individual contributors. In the company culture, by default, the team lead sees everything that happens with his subordinates in HR processes without exceptions.
A permissions dictionary allows each user to show/hide data of a specific department.
There are four roles:
Thus, we need to consider two sources for the role model:
organizational structure (to provide team leads with access to subordinate data)
permissions (to show data for supervised departments as part of the function)
Let's build a data scheme and write the necessary DAX.
= Table.AddColumn(#"change_type", "Chain1", each Text.Combine({"SampleCompany",
if [Unit] is null then null else "->",
[Unit],
if [Department] is null then null else "->",
[Department]
}))
This will allow us to dynamically create chains of departments and use them for the role model.
Create a new exception table using DAX
exception = CROSSJOIN('permissions',DISTINCT('employee_data'[Chain1]))
The meaning of this procedure is to join the tables employee_data and permissions
Create an RLS column in the exception table using the DAX formula
RLS = if(CONTAINSSTRING([Chain1],[Chain]),[ShowDpt],blank())
Create an eRLS column in the exception table using the DAX formula
eRLS=
var mail = 'exception'[Email]
var cep = 'exception'[Chain1]
var rls = 'exception'[RLS]
return
if(CALCULATE(count('exception'[RLS]),filter('exception','exception'[email]=mail&&'exception'[Chain1]=cep&&'exception'[RLS]="no"))>0 ,"no",rls)
The idea behind this expression is to combine information about accessible departments from the organizational structure (employee_data table) and the permissions dictionary (permissions table)
Create a structured table with a list of unique departments. In Power Query Editor->New Source->Blank->Go to Advanced Editor and past code:
let
Source = Table.Combine({employee_data}),
#"Drop columns" = Table.SelectColumns(Source,{"Unit", "Department","Chain1"}),
#"Distinct" = Table.Distinct(#"Drop columns")
in
#"Distinct"
Open menu Modeling->Manage roles
Create a new role (for example named rls)
For the employee_data table (it will be our cornerstone one), write the following DAX:
//get user login from system
var usr = userprincipalname()
//create table with chains and isHead
var xtable = addcolumns( selectcolumns(
CALCULATETABLE('employee_data',filter('employee_data','employee_data'[email]=usr)
),
"Chain_feature",[Chain1],"Head_feature",[isHead]
),
"xxx",if(containsstring('employee_data'[Chain1],[Chain_feature])=true(),1,0)
)
var headFeature = SUMX(xtable,[Head_feature])
var xfactor = SUMX(xtable,[xxx])
//subordinates if head
var x = if(headFeature =0, false(), if (xfactor=0, false(), true()))
//add exceptions
var exc = contains('permissions', 'permissions'[email],usr)
var head_Search = calculate(firstnonblank('exception'[eRLS],1),filter('exception', 'exception' [Chain1]= 'employee_data'[Chain1]&&'exception'[email]=usr))
return
//result
if(exc=true(),if(head_Search="yes",true(),if(x=true(),true(),false())),if(x=true(),true(),false()))
We have loaded the data and implemented the union of access dictionaries. it remains to link the main table employee_data with secondary data on training and review. This is done by organizing relationships
structure -> employee_data (Cardinality - ManyToOne, Cross Filter direction - Single)
employee_data -> performance_review_data (Cardinality - ManyToOne, Cross Filter direction - Single)
employee_data -> training_data (Cardinality - ManyToOne, Cross Filter direction - Single)
It's time to test the performance of our RLS model.
Widget with logged person created with measure:
lgn = userprincipalname()
We got what we expected.
Apache Superset is an open-source, enterprise-ready business intelligence (BI) web application. It provides an intuitive interface for visualizing datasets and crafting interactive dashboards. Its standout features include a rich set of data visualizations, a robust SQL editor, and a flexible, high-granularity security model. Moreover, Superset's ability to connect with various data sources, from traditional SQL databases to web APIs and machine learning servers, makes it a versatile tool for data analytics tasks.
Apache Superset, being an open-source tool, has community-driven support for features. It didn't have the specific Row-Level Security feature built-in like Power BI. However, access control can be set up using the existing security features, mainly data source access control and permission management.
Superset's security model allows you to manage permissions and roles to restrict access to specific datasets or components. Here is a high-level guide to setting up access control:
While this might not be as granular as true Row-Level Security “out of the box”, this can be implemented using SQL, which will be discussed below.
By effectively utilizing Apache Superset's existing security features, you can create a secure environment for your HR analytics, mitigating potential risks and ensuring that sensitive data is accessed only by authorized personnel. However, organizations should stay tuned to the latest developments in Superset, as the community could introduce RLS or similar features in future releases.
Let's apply a similar principle of building an RLS from the PowerBI example dataset (get the dashboard user login from the system, get the node of the organizational structure available to him, and combine the organizational structure and the exception dictionary) on Superset infrastructure. To do this, write the following query:
WITH
--get user login from system
'{{ current_username() }}' as user_name,
--add permissions
rls_levels as
(SELECT
login,
Unit, Department
FROM HR.permissions
WHERE login = user_name and show_or_hide = 1),
--add org structure
manager_levels as (SELECT
user_name,
Unit, Department
FROM HR.employee_data
WHERE login = user_name
and is_manager = 1),
--union two dictionaries
included as (
SELECT groupArray(values)
FROM ( SELECT CONCAT('SampleCompany',
ifNull(Unit, ''),
ifNull(Department, '')) as values
FROM ( SELECT * FROM rls_levels
union all
SELECT * FROM manager_levels))
),
--add exceptions
exceptions as (
SELECT groupArray(values)
FROM ( SELECT CONCAT('SampleCompany',
ifNull(Unit, ''),
ifNull(Department, '')) as values
FROM HR.permissions
WHERE login = user_name and show_or_hide = 0
)
)
--employee_data
SELECT * FROM
(SELECT
category, email, Name, Unit, Department, isHead, Age, Experience
FROM HR.employee_data) AS t
--filter
WHERE
--filter all departments with access
multiSearchAny(
--needle
CONCAT('SampleCompany',ifNull(t.Unit,''),
ifNull(t.Department,'')),
--haystack
(SELECT * FROM included)
) > 0
AND
--filter out all departments without access
multiSearchAny(
--needle
CONCAT('SampleCompany',ifNull(t.Unit,''),
ifNull(t.Department,'')),
--haystack
(SELECT * FROM exceptions)
) == 0
Now you can build any superset widgets, similar to Power BI.
Microsoft Power BI provides an explicit Row-Level Security (RLS) feature that allows you to define roles and rules at the row level in your data model. You can set up dynamic rules based on user attributes and even test these rules within Power BI Desktop. After publishing your report to the Power BI service, you can assign users to the roles. Power BI's RLS is transparent to the end-user and works across various data visualization components, maintaining consistent data security.
On the other hand, Apache Superset does not offer an explicit RLS feature. However, it does provide robust security features that can be used to achieve a certain level of data access control. With Superset, you can create roles, assign permissions (like data source access, view permissions, etc.) to these roles, and then assign users to the roles.
The choice between Power BI and Apache Superset for securing HR analytics might depend on several factors:
In conclusion, both tools offer robust features for data security, with Power BI providing explicit row-level security and Apache Superset offering flexible role-based security controls. The choice between the two depends on your organization's needs and constraints.
While RLS is an essential tool in your data security toolkit, additional measures can further safeguard your data:
By combining RLS with these additional measures, you can create a robust data security strategy that significantly reduces the risk of a data breach in your HR analytics.
Row-Level Security (RLS) has emerged as a powerful mechanism to secure sensitive data in HR analytics, addressing potential data breaches effectively. By implementing RLS, organizations can limit data visibility to specific users based on their roles, ensuring that individuals only see the data they are authorized to see. This article has explored how RLS can be implemented in powerful business intelligence tools like Microsoft's Power BI and Apache Superset, demonstrating its vital role in maintaining data security and compliance.
Data security in the HR field is of paramount importance. HR data is often sensitive and personal, with ethical and legal obligations around how it is stored, processed, and accessed. Tools like Power BI and Apache Superset, through features like RLS and robust access controls, provide a means to uphold these obligations. They enable HR departments to derive insightful analytics while maintaining a high data security standard.
In an era where data breaches are increasingly common and costly, employing security measures such as RLS is not just beneficial. It's essential. Implementing such measures can help safeguard your organization against potential data breaches.
Remember, achieving effective HR analytics doesn't mean compromising data security. On the contrary, with careful planning and the right tools, these two goals can and should go hand in hand. We encourage you to consider implementing RLS in your organizations, fostering a secure and effective environment for your HR analytics. Secure data translates to trust, the foundation of every successful organization.