Mitigating Data Breaches in HR Analytics with RLS in MS Power BI and Apache Superset

Written by antonlukyanov | Published 2023/06/09
Tech Story Tags: data-breaches | data-breach-prevention | apache-superset | analytics | power-bi | people-analytics | row-level-security | how-to

TLDRData is power in the realm of HR. It fuels insightful analysis, strategic planning, and informed decision-making. However, with this wealth of sensitive data comes great responsibility - ensuring its security. Given the increasing complexity of HR analytics tasks and the expanding scope of data privacy laws, protecting HR data from unauthorized access has never been more critical. This is where advanced data management tools like Microsoft Power BI and Apache Superset step in, armed with robust features like Row-Level Security (RLS).via the TL;DR App

Data is power in the realm of HR. It fuels insightful analysis, strategic planning, and informed decision-making. However, with this wealth of sensitive data comes great responsibility - ensuring its security. Given the increasing complexity of HR analytics tasks and the expanding scope of data privacy laws, protecting HR data from unauthorized access has never been more critical. This is where advanced data management tools like Microsoft Power BI and Apache Superset step in, armed with robust features like Row-Level Security (RLS).

In this article, we delve into data security within HR analytics, examining how we can leverage RLS in both Power BI and Apache Superset to mitigate potential data breaches. With their RLS capabilities, we will explore how these tools can provide granular access control, ensuring that sensitive data is only accessible to authorized individuals, thus protecting employee information, maintaining compliance with data protection regulations, and fortifying trust within the organization. Please accompany me as we embark on this rigorous exploration of enhanced data security within People analytics.

Understanding the Need for RLS in People Analytics

The HR department is the custodian of sensitive employee data, from personal identifiers like Social Security numbers to performance appraisals, salary details, and health records.

Given this wealth of personal and confidential information, HR has legal and ethical obligations to maintain the security of this data.

Legal responsibilities stem from an ever-evolving regulatory landscape with stringent data privacy laws such as the General Data Protection Regulation (GDPR) in the EU and the California Consumer Privacy Act (CCPA) in the US. These regulations demand strict adherence to data privacy norms and inflict heavy penalties for non-compliance.

On the ethical front, employees trust their employers to handle their personal information responsibly. Breaches of this trust can lead to severe consequences, including reputational damage and loss of employee morale and engagement. Moreover, data security is also a matter of respect for individual privacy rights, underlining the ethical obligation to protect data from unauthorized access or misuse.

People analytics involves utilizing employee data to gain insights that guide strategic decision-making. While this data-driven approach significantly improves HR processes, it also heightens the risk of data breaches.

The consequences can be severe if unauthorized individuals access confidential HR data. Employee identities can be stolen, leading to financial loss and credit damage for the individuals affected. Leaked salary or performance data could lead to internal conflict and loss of trust within the organization.

Moreover, breaches can have legal ramifications, with heavy fines imposed by regulatory bodies for failing to protect employee data. Companies might also face lawsuits from affected employees or experience a decline in their stock value.

The reputational damage from a data breach can be immense and long-lasting. Companies that experience breaches can lose the trust of employees, customers, and shareholders, damaging their brand and making it difficult to attract and retain talent.

Given the sensitivity of HR data and the potential risks of data breaches, there is a critical need for measures like RLS in People analytics. These measures allow for granular control over data access, ensuring that only authorized individuals can access specific data, thereby significantly reducing the risk of data breaches.

Deep Dive into RLS

Row-Level Security (RLS) is a method for restricting data access at the row level within your database. It offers a way to implement data access control by limiting the data visibility to specific users based on their roles. In other words, RLS enables you to control "who sees what" when accessing databases or data sets.

The basic principle of RLS is quite simple. When users connect to a database or open a report, the system checks their identity and role. Based on the predefined security rules, the system determines which rows of data the user is permitted to see. The filtering is transparent to the user: they only see the data they can access and might be unaware that other data is being filtered out.

RLS is pivotal in data analytics, particularly in preventing data breaches. It has several key advantages that make it an essential tool in modern data-driven environments.

  1. Improved Data Security: The most significant benefit of RLS is enhanced data security. By limiting data access to authorized users, RLS significantly reduces the risk of data leakage or misuse.
  2. Compliance with Regulations: RLS helps organizations comply with various data privacy laws and regulations. It enables them to demonstrate that they have stringent data access controls in place, thus meeting the requirements of regulations like GDPR or CCPA.
  3. Granular Access Control: RLS provides the ability to define access on a granular level, not just at the database or table level. This means you can customize data access based on specific user roles or even on an individual user basis.
  4. Scalability and Flexibility: RLS is a scalable and flexible solution. It allows for creating complex security models that can adapt as the organization grows or its security needs change.

In summary, RLS is a powerful tool for preventing data breaches in data analytics. It offers robust and flexible data access controls, enhances data security, and helps organizations comply with regulatory requirements. Implementing RLS is more crucial than ever in an era of growing cyber threats and increasingly strict data privacy laws.

Implementing RLS in MS Power BI

Power BI is a suite of business analytics tools developed by Microsoft that allows you to analyze data and share insights. It provides a 360-degree view of the most critical metrics in one place, updated in real time, and available on all your devices. Power BI lets you connect to hundreds of data sources, simplify data preparation, and drive ad hoc analysis. It offers robust data visualization tools to turn unrelated data sources into coherent, visually immersive, and interactive insights. Whether it's HR analytics or sales reporting, Power BI is a go-to tool for many organizations.

Implementing RLS in Power BI involves a few key steps. Here's a basic walkthrough:

  1. Define Roles: Within Power BI Desktop, you can define roles under the Modeling tab by selecting 'Manage Roles'. You would typically define roles corresponding to different user types accessing your report (e.g., HR Manager, Team lead, HR Analyst).
  2. Set up Role Filters: Once roles are defined, you can set up row-level filters for each role. This involves specifying which rows of data in your tables can be viewed by each role. This is typically based on a logical expression.
  3. Test Your Roles: Power BI Desktop allows you to check the effectiveness of your RLS settings by letting you view the data as if you were a user assigned a specific role.
  4. Publish and Assign Roles: After publishing your Power BI report to the Power BI service, you can assign users to the earlier roles you defined in the Power BI service.
  5. Set Up Object-level Security (Optional): For more advanced security requirements, you can set up Object-level Security (OLS) in Power BI to control access to tables and columns.

The application of RLS in Power BI for People analytics can be demonstrated through several use cases:

  1. Employee Data Access: Let's assume that the HR Manager should be able to see all employee records, whereas an HR Analyst should only be able to see records of non-managerial staff. RLS can be used to set up these roles and access rules, ensuring that when an HR Analyst accesses the report, they see only the data they are allowed to see.
  2. Salary Data: Salary data is sensitive and should not be accessible to everyone. With RLS, you can restrict access to salary data only to specific roles, such as an HR Manager, keeping it hidden from other users.
  3. Performance Evaluation: If performance data needs to be shared with department heads, but you want to ensure that each head can only see data related to their department, RLS can facilitate this. By setting a role-based filter on the department field in your data, you can ensure that users see only data relevant to their role when they access the report.

Implementing RLS in Power BI provides a robust security mechanism that ensures sensitive HR data is only visible to authorized users, thus helping to prevent potential data breaches.

Let's break it down with a specific example.

  1. The company has digitized processes and data about
  • organizational structure,

  • training,

  • reviews.

  1. The company has team leads and individual contributors. In the company culture, by default, the team lead sees everything that happens with his subordinates in HR processes without exceptions.

  2. A permissions dictionary allows each user to show/hide data of a specific department.

  3. There are four roles:

Thus, we need to consider two sources for the role model:

  • organizational structure (to provide team leads with access to subordinate data)

  • permissions (to show data for supervised departments as part of the function)

Let's build a data scheme and write the necessary DAX.

  1. Prepare 4 data tables:
  • employee_data (login, Chain1, isHead, Name, Unit, Department, Age, Experience)

  • permissions (login, Chain, ShowDpt, where Chain - full path to the department, ShowDpt - show/hide department data)

  • training_data (login, Training title, Training type)

  • performance_review_data (login, Last Performance Review)

  1. Load all four tables into PowerBI. After that step:
  • In Power Query Editor, we add the Chain1 column to the employee_data table using the formula
= Table.AddColumn(#"change_type", "Chain1", each Text.Combine({"SampleCompany",
if [Unit] is null then null else "->",
[Unit],
if [Department] is null then null else "->",
[Department]
}))

This will allow us to dynamically create chains of departments and use them for the role model.

  • Create a new exception table using DAX

    exception = CROSSJOIN('permissions',DISTINCT('employee_data'[Chain1]))

The meaning of this procedure is to join the tables employee_data and permissions

  • Create an RLS column in the exception table using the DAX formula

    RLS = if(CONTAINSSTRING([Chain1],[Chain]),[ShowDpt],blank())

  • Create an eRLS column in the exception table using the DAX formula

    eRLS=
    var mail = 'exception'[Email]
    var cep = 'exception'[Chain1]
    var rls = 'exception'[RLS]
    return
    if(CALCULATE(count('exception'[RLS]),filter('exception','exception'[email]=mail&&'exception'[Chain1]=cep&&'exception'[RLS]="no"))>0 ,"no",rls)
    

The idea behind this expression is to combine information about accessible departments from the organizational structure (employee_data table) and the permissions dictionary (permissions table)

  • Create a structured table with a list of unique departments. In Power Query Editor->New Source->Blank->Go to Advanced Editor and past code:

    let
        Source = Table.Combine({employee_data}),
        #"Drop columns" = Table.SelectColumns(Source,{"Unit", "Department","Chain1"}),
        #"Distinct" = Table.Distinct(#"Drop columns")
    in
        #"Distinct"
    

  1. It is necessary to write rules for roles.
  • Open menu Modeling->Manage roles

  • Create a new role (for example named rls)

  • For the employee_data table (it will be our cornerstone one), write the following DAX:

    //get user login from system
    var usr = userprincipalname()
    
    //create table with chains and isHead
    var xtable = 	addcolumns( selectcolumns(
    				CALCULATETABLE('employee_data',filter('employee_data','employee_data'[email]=usr)
    						),
    				"Chain_feature",[Chain1],"Head_feature",[isHead]
    					),	
    "xxx",if(containsstring('employee_data'[Chain1],[Chain_feature])=true(),1,0)
    			)
    
    
    var headFeature = SUMX(xtable,[Head_feature])
    var xfactor = SUMX(xtable,[xxx])
    
    //subordinates if head
    var x = if(headFeature =0, false(), if (xfactor=0, false(), true()))
    
    
    //add exceptions
    var exc = contains('permissions', 'permissions'[email],usr)
    
    var head_Search = calculate(firstnonblank('exception'[eRLS],1),filter('exception', 'exception' [Chain1]= 'employee_data'[Chain1]&&'exception'[email]=usr))
    return
    
    //result
    if(exc=true(),if(head_Search="yes",true(),if(x=true(),true(),false())),if(x=true(),true(),false()))
    

  • We have loaded the data and implemented the union of access dictionaries. it remains to link the main table employee_data with secondary data on training and review. This is done by organizing relationships

  • structure -> employee_data (Cardinality - ManyToOne, Cross Filter direction - Single)

  • employee_data -> performance_review_data (Cardinality - ManyToOne, Cross Filter direction - Single)

  • employee_data -> training_data (Cardinality - ManyToOne, Cross Filter direction - Single)

It's time to test the performance of our RLS model.

  1. Raw data

  1. Head of HR Unit [email protected] (sees all units of the company)

Widget with logged person created with measure:

lgn = userprincipalname()  

  1. Head of Training department [email protected] (sees all units of the company, except HR)

  1. Head of Finance Unit [email protected] (sees data of his own unit and the Sales department under his supervision)

  1. Head of Treasury [email protected] (sees data only of her own department)

We got what we expected.

Using RLS in Apache Superset

Apache Superset is an open-source, enterprise-ready business intelligence (BI) web application. It provides an intuitive interface for visualizing datasets and crafting interactive dashboards. Its standout features include a rich set of data visualizations, a robust SQL editor, and a flexible, high-granularity security model. Moreover, Superset's ability to connect with various data sources, from traditional SQL databases to web APIs and machine learning servers, makes it a versatile tool for data analytics tasks.

Apache Superset, being an open-source tool, has community-driven support for features. It didn't have the specific Row-Level Security feature built-in like Power BI. However, access control can be set up using the existing security features, mainly data source access control and permission management.

Superset's security model allows you to manage permissions and roles to restrict access to specific datasets or components. Here is a high-level guide to setting up access control:

  1. Define Roles: Define different roles in the Superset, each with distinct levels of permissions. For example, you can have roles like 'HR_Manager' or 'HR_Analyst'.
  2. Assign Permissions: For each role, assign relevant permissions, like accessing specific datasets, executing SQL queries, or creating charts and dashboards.
  3. Assign Roles to Users: Assign the defined roles to individual users based on their job functions and data access needs.

While this might not be as granular as true Row-Level Security “out of the box”, this can be implemented using SQL, which will be discussed below.

  1. Restricting Data Source Access: For instance, if you have an 'Employee Performance' dataset that only certain HR personnel should access, you can assign a role only this group has and give that role permission to access the specific dataset.
  2. Dashboard Permissions: Suppose you have an 'Employee Compensation' dashboard that HR Managers should only view. You can configure a role (e.g., 'HR_Manager') with access to this dashboard and assign it only to the HR Managers.

By effectively utilizing Apache Superset's existing security features, you can create a secure environment for your HR analytics, mitigating potential risks and ensuring that sensitive data is accessed only by authorized personnel. However, organizations should stay tuned to the latest developments in Superset, as the community could introduce RLS or similar features in future releases.

Let's apply a similar principle of building an RLS from the PowerBI example dataset (get the dashboard user login from the system, get the node of the organizational structure available to him, and combine the organizational structure and the exception dictionary) on Superset infrastructure. To do this, write the following query:

WITH 
--get user login from system
'{{ current_username() }}' as user_name,

--add permissions
rls_levels as
(SELECT
       login,
       Unit,       Department
       FROM HR.permissions
       WHERE login = user_name and show_or_hide = 1),

--add org structure
manager_levels as     (SELECT
       user_name,
       Unit,       Department
       FROM HR.employee_data
      WHERE login = user_name 
              and is_manager = 1),

--union two dictionaries
included as (
    SELECT groupArray(values)
    FROM ( SELECT CONCAT('SampleCompany',
      ifNull(Unit, ''),
      ifNull(Department, '')) as values

      FROM ( SELECT * FROM rls_levels
                    union all
                    SELECT * FROM manager_levels))
      ),

--add exceptions
exceptions as (
  SELECT groupArray(values)
    FROM ( SELECT CONCAT('SampleCompany',
      ifNull(Unit, ''),
      ifNull(Department, '')) as values
      FROM HR.permissions
      WHERE login = user_name and show_or_hide = 0
)
    )


--employee_data
SELECT * FROM
    (SELECT
        category, email, Name, Unit, Department, isHead, Age, Experience
        
    FROM HR.employee_data) AS t

--filter
WHERE
    --filter all departments with access
    multiSearchAny(
	  --needle
       CONCAT('SampleCompany',ifNull(t.Unit,''),
       ifNull(t.Department,'')),
       --haystack
       (SELECT * FROM included)
    ) > 0
    AND
    --filter out  all departments without access
    multiSearchAny(
       --needle
       CONCAT('SampleCompany',ifNull(t.Unit,''),
       ifNull(t.Department,'')),
       --haystack
       (SELECT * FROM exceptions)
    ) == 0

Now you can build any superset widgets, similar to Power BI.

Comparing RLS in Power BI and Apache Superset

Microsoft Power BI provides an explicit Row-Level Security (RLS) feature that allows you to define roles and rules at the row level in your data model. You can set up dynamic rules based on user attributes and even test these rules within Power BI Desktop. After publishing your report to the Power BI service, you can assign users to the roles. Power BI's RLS is transparent to the end-user and works across various data visualization components, maintaining consistent data security.

On the other hand, Apache Superset does not offer an explicit RLS feature. However, it does provide robust security features that can be used to achieve a certain level of data access control. With Superset, you can create roles, assign permissions (like data source access, view permissions, etc.) to these roles, and then assign users to the roles.

The choice between Power BI and Apache Superset for securing HR analytics might depend on several factors:

  1. Level of Access Control Needed: The same.
  2. Budget and Resource Constraints: Power BI is a paid tool, with pricing varying based on the level of service required. On the other hand, Apache Superset is an open-source tool that can be used for free. However, Superset may require technical expertise to set up and manage effectively.
  3. Integration with Other Systems: Power BI, being a Microsoft product, integrates well with other Microsoft products such as Azure, Excel, and SharePoint. Apache Superset, on the other hand, is more flexible in terms of data sources it can connect with, supporting a wide range of SQL-speaking data sources.

In conclusion, both tools offer robust features for data security, with Power BI providing explicit row-level security and Apache Superset offering flexible role-based security controls. The choice between the two depends on your organization's needs and constraints.

Tips for Ensuring Data Security in HR Analytics

  1. Role Designation: Clearly define and assign roles in the RLS setup. These roles should correspond to job functions or departments within the organization, with each role having access only to the required data.
  2. Least Privilege Principle: Adhere to the principle of least privilege. This means that users should only be granted the minimal levels of access they need to perform their jobs effectively.
  3. Dynamic Data Masking: Consider implementing dynamic data masking (DDM) and RLS whenever possible. DDM is a feature that hides sensitive data in the result set of a query over designated database fields while the data remains unchanged in the database.
  4. Regular Audits and Updates: Review and update your RLS rules to ensure they remain relevant as your organizational structure or data requirements change.

While RLS is an essential tool in your data security toolkit, additional measures can further safeguard your data:

  1. Encryption: Ensure that data at rest and in transit is encrypted. This means encrypting the data when it's being stored, as well as when it's being transferred over a network.
  2. Regular Backups: Regularly back up your data to recover it in case of accidental loss or a data breach.
  3. Access Controls: Implement strong user authentication and access controls. This could mean using multi-factor authentication or requiring strong, complex passwords.
  4. Monitoring and Alerts: Set up systems to monitor access to your data and alert you to any unusual activity or attempted breaches.
  5. Employee Training: Regularly train employees on data security best practices and the importance of protecting sensitive data. Many data breaches can be traced back to human error, so this step is crucial.
  6. Data Governance Policy: Establish a clear policy that sets out how data should be handled, who has access to what, and what procedures should be followed in case of a data breach.

By combining RLS with these additional measures, you can create a robust data security strategy that significantly reduces the risk of a data breach in your HR analytics.

Conclusion

Row-Level Security (RLS) has emerged as a powerful mechanism to secure sensitive data in HR analytics, addressing potential data breaches effectively. By implementing RLS, organizations can limit data visibility to specific users based on their roles, ensuring that individuals only see the data they are authorized to see. This article has explored how RLS can be implemented in powerful business intelligence tools like Microsoft's Power BI and Apache Superset, demonstrating its vital role in maintaining data security and compliance.

Data security in the HR field is of paramount importance. HR data is often sensitive and personal, with ethical and legal obligations around how it is stored, processed, and accessed. Tools like Power BI and Apache Superset, through features like RLS and robust access controls, provide a means to uphold these obligations. They enable HR departments to derive insightful analytics while maintaining a high data security standard.

In an era where data breaches are increasingly common and costly, employing security measures such as RLS is not just beneficial. It's essential. Implementing such measures can help safeguard your organization against potential data breaches.

Remember, achieving effective HR analytics doesn't mean compromising data security. On the contrary, with careful planning and the right tools, these two goals can and should go hand in hand. We encourage you to consider implementing RLS in your organizations, fostering a secure and effective environment for your HR analytics. Secure data translates to trust, the foundation of every successful organization.


Written by antonlukyanov | Data analytics leader with 10+ years of experience in ML, DS, People Analytics, and Product Management.
Published by HackerNoon on 2023/06/09