Too Long; Didn't Read
An important requirement for data privacy and protection is to find and catalog tables and columns that contain PII or PHI data in a data warehouse. Open source data catalogs like [Datahub] and [Amundsen] enable cataloging of information in data warehouses. This post describes two strategies to scan and detect PII as well as introduce an open source application [PIICatcher] that can be used to scan data warehouses for PII. PII data includes SSN, email or phone numbers, login ID details, social media posts, digital images, geolocation and more.