Problem
Our Confluence documentation contains sensitive information, I would like to be able to generate a report with a link to the page and line number of the sensitive information.
Question
Is this possible using Rovo?
Hi @Kid Bachan ,
Welcome to the Community!
This (Data Leak Prevention or DLP) is one of my favorite topics, so let me jump with a slightly longish answer. (Quick disclaimer: The company I work for (Polymetis Apps) has some experience in this, as we have a couple of apps dedicated to DLP on Jira & Confluence.)
So, first of all, you can just straight up ask Rovo to give you a list of pages with sensitive data. While this should give you some results, you'll want to be much more focused. As @Nikola Perisic said: You should tell Rovo what to look for. Which brings me to the biggest problem in DLP and more generally in search: The tradeoff between recall and precision.
Basically, you want the results of your search for sensitive data to be complete, but also correct. In the above tradeoff, recall refers to how many instances of sensitive data you find and precision refers to how correct those results are. Every search you do, with Rovo, a dedicated DLP app, or even by just searching directly using the search bar makes this tradeoff somehow. The more results you get, the worse the precision gets and vice versa.
Some data types are more easy to find than others: email addresses for example follow a certain pattern that can easily be searched for. Passwords on the other hand are extremely hard to find, as almost anything could be a password. Sometimes, data types can overlap, for example URLs that use basic authentication can look like email addresses, see here: https://admin:password@marketplace.atlassian.com/rest
Rovo and other LLMs are very good at a specific way of looking at your Confluence pages. They can and will detect sensitive data by context. We have found that they would often find and flag sensitive data in a sentence like this: "You can use admin and secret345 to log in."
However, this flexibility comes with a downside. While Rovo and modern LLMs are pretty reliable, they are not mathematically reliable – meaning that they sometimes simply misfire and misjudge. Speaking in the terms of recall and precision, they have great recall, but their precision is often a bit off. (By how much is debatable, and does vary a lot.)
On the other end of the spectrum are pattern-based DLP apps. These often use regular expressions to find patterns of sensitive data. They will pretty much always find the email address and other data that follows a predictable pattern, but can struggle with info hidden in natural language.
You will have already guessed what I am going to recommend: Use a combination of both approaches if you can, to make sure that you find as much, if not all, of the sensitive data you care about.
Of course, telling Rovo to scan your pages is a great first step, but if you want to dive deeper may I recommend taking a look at our app PII Protection and DLP for Confluence? It not only allows you to run scans on all of your existing data even in the trial period, it also comes with a pre-configured, dedicated Rovo agent for DLP scanning that you can use directly and that plays well with Automation for Confluence.
In any case, hope all of the above helps. If you have any questions or comments, let me know!
Best regards,
Oliver from Polymetis Apps
Thanks for the detailed answer. I feel like you understand my problem, but offcourse because you already have an app for that :) I will take a look at your app.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Welcome @Kid Bachan
You need to specify which of these words are sensitive like password, security, for example. Based on that, Rovo will give you links to it.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks, let me do some more research on prompts and howto craft a good one.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
No deep research needed.
List all of the documentation that is containing sensitive information like passwords, API keys and so forth.
This shouldn't be abused in any kind of a way.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.