Is it possible to remove information from Page History versions via editing the database?

pam_hite_wbd_com
Contributor
January 6, 2021

CSO is requiring us to remove plain text passwords from our Confluence pages and all historical versions of the pages.  We do not want to delete the versions and lose all additional historical information, so are wondering if the version history is accessible via the database?  If so, can we remove the password information directly from each version by editing the database directly?

Thanks. 

2 answers

1 accepted

0 votes
Answer accepted
Nic Brough -Adaptavist-
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 6, 2021

You can do it in the database, but it's likely to be slow, painful and a huge amount of work.

You'll need some way to search content to identify where a password has been put in.  That's always going to be clunky, because there's no clean and simple way to do it that doesn't come down to "read each page and its history, looking for suspect lines"

However you do that, you'll need to end up with a list of pages and historical versions that contain the unwanted text.

Then you'll need to identify each line in the content table that you want to amend, then stop Confluence, go into the massive block of densely encoded xml and edit out the snippet of text you want to remove (without messing up the xml structure) for each entry, and then restart Confluence and re-index it before letting people back in.

This is not a minor undertaking, and SQL is probably the worst way to do it.  While "find" is a problem no matter what you do, I'd strongly recommend telling CSO that the only practical option is to simply destroy the history.  It's then up to them to decide if they want you to do many days of find and delete, or many months of find and edit.  

Or, better, get an automation or scripting tool that runs on the front end.  If you can identify the exact text, things like Scriptrunner for Confluence could do a lot of the work for you.

0 votes
Radek Dostál
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 6, 2021

Everything is in the database as far as content goes, so yes you would be able to do it, question is how do you intent on doing that if you do not know the passwords upfront (i.e. how do you identify what is or isn't a password)?

pam_hite_wbd_com
Contributor
January 8, 2021

The passwords are actually in the historical version of the pages, in plain text format, so someone would have to gather all passwords to be removed and the version that they exist on.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events