Find Excel XLS or CSV file attachments within Jira and confluence- to check for PI.

andrew.edwards May 18, 2023

Hi, our organisation wants to find and remove any PI - personal information uploaded into Jira or confluence.  This is hard, as anyone can load anything in descriptive fields.  The typical method we have seen our staff do, is to attach a CSV or XLS file to a Jira task.  I'm looking for a method for finding all issue types with a CSV or XLS file attached.  Any extra APPs will have to allow data residency to be stored in Australia.  I got excited about 'scriptrunner' but their data residency options are limited to EU or US locations.  Would appreciate any suggestions.  thank you

2 answers

1 accepted

2 votes
Answer accepted
Ken McClean
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 19, 2023

For anyone who finds this and is curious, here's how we'd do it with ScriptRunner:



def maxResults = 100
def startAt = 0
def attachmentArr = []
def stillPaginating = true

//Define the global variables

while (stillPaginating == true){
   
    //We need to paginate through the results
    //We're iterating by 100 results at a time
   
    def getIssueKeys = get("/rest/api/3/search?jql=project%20is%20not%20EMPTY&maxResults=${maxResults}&startAt=${startAt}")
        .header('Content-Type', 'application/json')
        .asObject(Map)
    //We start by returning all of the issues in the instance with a JQL search
   
    getIssueKeys.body.issues.each{ issueKey ->
    //Next we're iterating through each result (issue) that was returned
   
            def issueObject = get("/rest/api/3/issue/${issueKey.key}/")
            .header('Content-Type', 'application/json')
            .asObject(Map)
            //We query the system for more information about the issue in question
           
                issueObject.body.fields.attachment.each { attachmentID ->
                //Specifically, we're after the ID of any attachment on the issue
               
                         def attachment = get("/rest/api/3/attachment/${attachmentID.id}")
                        .header('Content-Type', 'application/json')
                        .asObject(Map)
                        //Once we have the ID of the attachment, we can use that ID to get more information about the attachment
                       
                            if(attachment.body.filename.toString.contains(".csv") || attachment.body.filename.toString.contains(".xlsx")){
                           attachmentArr.add("Issue ${issueKey.key} has an attachment: ${attachment.body.filename}")
                            }
                                 
                }//End attachment-each loop
        }//End getIssueKeys loop
   
            if(getIssueKeys.body.total < maxResults){
            stillPaginating = false;
        }
        //Finally, we check to see if we're still paginating
        //If the total number of results is less than the total number of maximum possible results,
        //We must be at the end of the line and can stop paginating by terminating the loop
       
}

return attachmentArr
//Return the results
1 vote
Ken McClean
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 18, 2023

Hi Andrew:

Is this jira server or jira cloud? I can write you a Python script do it, if you're just looking for a report.

- Ken 

andrew.edwards May 18, 2023

We are using jira cloud.  Yes we are just looking for a report, then will do a manual review of XLS and CSV attachments for PI

Ken McClean
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 19, 2023

Here's a Python script that will do it.

Please note: the steps required to generate the authorization token:

1.  Go to https://id.atlassian.com/manage-profile/security/api-tokens and generate a token
2. Go to https://www.base64encode.net/ and encode it in this style:
YourLoginEmail@domain.com:APIToken


You can read more about that process here:
https://developer.atlassian.com/cloud/jira/platform/basic-auth-for-rest-apis/


Here is the script:


 

import requests

max_results = 100
start_at = 0
attachment_arr = []
still_paginating = True

yourDomain = "<domain>"

headers = {
    'Authorization': 'Basic <base-64 encoded string>',
    'Content-Type': 'application/json',
    'Accept': 'application/json',
}


# Define the global variables

while still_paginating:
    # We need to paginate through the results
    # We're iterating by 100 results at a time
   
    response = requests.get(f"https://{yourDomain}.atlassian.net/rest/api/3/search?jql=project%20is%20not%20EMPTY&maxResults={max_results}&startAt={start_at}",
                            headers=headers)
                           
   #print(response.content)
    issue_keys = response.json().get("issues")
    # We start by returning all of the issues in the instance with a JQL search
   
    for issue in issue_keys:
        # Next, we're iterating through each result (issue) that was returned
       
        issue_key = issue.get("key")
        issue_response = requests.get(f"https://{yourDomain}.atlassian.net/rest/api/3/issue/{issue_key}",
                                      headers=headers)
                                     
        #print(issue_response.content)
        issue_data = issue_response.json()
        # We query the system for more information about the issue in question
       
        attachments = issue_data.get("fields", {}).get("attachment", [])
        for attachment in attachments:
            # Specifically, we're after the ID of any attachment on the issue
           
            attachment_id = attachment.get("id")
            attachment_response = requests.get(f"https://{yourDomain}.atlassian.net/rest/api/3/attachment/{attachment_id}",
                                               headers=headers)
                                               
            #print(attachment_response.content)
            attachment_data = attachment_response.json()
            # Once we have the ID of the attachment, we can use that ID to get more information about the attachment
           
            filename = attachment_data.get("filename")
            if filename and (".csv" in filename or ".xlsx" in filename):
                attachment_arr.append(f"Issue {issue_key} has an attachment: {filename}")
   
    if len(issue_keys) < max_results:
        still_paginating = False
    # Finally, we check to see if we're still paginating
    # If the total number of results is less than the total number of maximum possible results,
    # we must be at the end of the line and can stop paginating by terminating the loop

print(attachment_arr)
# Print the results
andrew.edwards May 21, 2023

Thank you Ken, much appreciated, I will give it a run.

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PRODUCT PLAN
PREMIUM
PERMISSIONS LEVEL
Site Admin
TAGS
AUG Leaders

Atlassian Community Events