When it comes to Jira/Confluence Data Center applications deployed to AWS infrastructure, it's straightforward these days to export their databases as Apache Parquet databricks and their attachments folders to S3 buckets. Once there, Amazon Macie could be used to scan this exported Jira/Confluence data for PII.
Have you done this? Can you comment on your experience, such as, was it challenging to match findings back to the Jira/Confluence apps? Are the exported Parquet databricks in a format that enables PII discovery?
I can easily imagine that it would detect PII, but I'm not sure if it would be easy to trace back to the original location.
Jira and Confluence databases are highly normalized across dozens of tables. When you export to Parquet, you lose the relational context. For example, PII found in what was the "jiraissue" or "customfieldvalue" table needs to be joined back to project keys and issue IDs...
You could, of course, maintain this mapping, but it requires investments. AFAIK there are apps that scan your Jira and Confluence data for PII, and I would expect that they can precisely point out the original location. (I don't have actual experience with any.)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.