Jira / Confluence / Bitbucket: what kind of personal data is logged (GPDR)

Jens Kisters //SeibertSolutions
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 29, 2018

Hello fellow atlassian users,

 

our data security officer had one more question in the course of finalizing our measures to be GPDR compliant, and i am struggeling to find an answer:

 

Which personal data is logged by Jira / Confluence / Bitbucket in any "technical logs" ?

 

I studied Atlassians GDPR Guide and it names the obvious locations (Issue Fields, User Fields, Comments, Worklogs etc.)

But i know that for example in Jira

  • the atlassian-jira.log often holds user names
  • you can find out what a administrator did when looking at the audit logs
  • it is logged when a user logs in

and im pretty sure theres more data like this stored in Jira.

Is there a comprehensive list somewhere about these kind of logs?

Or is this not relevant from the GPDR point of view?

 

thanks in advance

Jens

3 answers

1 accepted

1 vote
Answer accepted
Jens Kisters //SeibertSolutions
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
June 7, 2018

To answer my own question after some great insights from @Alexander Kueken:

I am thinking of making the GPDR documentation state that Jira logs every action a user did and from which IP he/she did do that,

also stating how long logs are stored.

Even if this is only partially true, but i guess we'd be on the safe(er) side like this.

1 vote
Alexander Kueken June 6, 2018

Hi Jens,

logging is always a difficult conflicted topic, already before GDPR. Before GDPR there was always the conflict between workers' councils, which didn't want you to write user information in the log files as you could potentially misuse them for performance measurements, and compliance, revision, it security and the operations department.

The solution in many of my projects was to take certain actions, to bring all involved parties together:

  • Reducing the number of people, who are able to access the log files, to a minimum
  • Configuring a very short log rotation (keeping the logs for less than 7 days)
  • Not including the log files in the backups

Perhaps the same works together with the GDPR. As long as you have a valid reason for it, it would be OK to write PIIs in the log files. For example, the access.log contains user information, which could be an important instrument to detect intruders or misuse.

As far as I know, there is no comprehensive list of information about in which cases Atlassian writes PIIs to the log. To make it more difficult, it could depend on the configuration of the logging. For example in the standard configuration "Mapped Diagnostic Context" is activated. This context contains the username and leads to the log entry in your example. See

https://developer.atlassian.com/server/confluence/logging-guidelines/

By manipulating the log4j settings, you can configure whether and how you want the content of the context printed to the log files. More difficulty it will get if PIIs are part of the log message by itself and as such is generated internally. There will be no easy solution to remove those log entries, besides deactivating the logging overall. Besides the base systems, also all the plugins/apps are writing to the same log file, which will make it even more difficult.

Independent of how your solution will look like, there is one important fact to keep in mind, in case you need technical support from Atlassian or a plugin vendor. In most cases, Atlassian or the vendor will ask you to generate the support ZIP and send it together with your support request. The support ZIP will contain a copy of the existing log files, so you would transfer PIIs of your users to Atlassian or the vendor.

I know all this is not a concrete answer to your questions, but perhaps it helps you on your journey to GDPR compliance.

Regards,
Alexander

Jens Kisters //SeibertSolutions
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
June 7, 2018

Alexander, thank you very much for taking so much time, i really appreciate it.

This definiately helps us.

1 vote
Mirek
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
May 29, 2018

There is a complete GDPR guide for all Atlassian products.

https://confluence.atlassian.com/gdpr/server-and-data-center-gdpr-support-guides-949245592.html

Nothing more to say :)

Jens Kisters //SeibertSolutions
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 29, 2018

Mirek, thank you for your reply,

i did check out the guide and what i am trying to find out:

Why does the guide not mention technical logs?

Mirek
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
May 29, 2018

Eveeryone understand GDPR regulations differently so you need to focus on the articles. In JIRA logs nothing can clearly identify you as you by name, email or phone. This is personal data which GDPR is protecting.

Jens Kisters //SeibertSolutions
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 29, 2018

hmmm....

To me this excerpt from our atlassian-jira.log dientifies me pretty clearly

 

2018-05-29 17:15:57,701 http-nio-8080-exec-8077 WARN j.kisters 1935x1119412x1 191yscl 89.246.xx.xx,127.0.0.1 /browse/ISS-123

Mirek
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
June 7, 2018

Not quite .. It does not say Jens Kisters that live in here and here.. have this email address .. etc. It just say j.kisters which mith be Jeo Kisters, Jurgen Kisters, Jerremy Kristerson.. etc. It does not CLEARLY identify you as you.. If you provide this log to me I cannot directly say that is you. That is main point of GDPR regulations. Username is not a personal data. Overall it is up to the admin of JIRA or AD (LDAP) where this information is stored and this is where GDPR regulations should be checked. Your login might be "j.kist", "jk" or "abc12141", but this is not a personal data like social number of address where you live. In JIRA those information are not stored and you do not need to fill them when you create an account.

Jens Kisters //SeibertSolutions
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
June 7, 2018

This wont help for enough cases, even if the username doesn not contain a link to the identity, as long as the user is not anomized / deleted you can find out who this username belongs to and thus find out what he has been doing and know he was somewhere where he has access to the system.

Mirek
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
June 7, 2018

Hmm.. How you can find out easily to who this username belongs to? :) In JIRA there are not PERSONAL data stored. If this is stored somewhere is it is In AD (LDAP) and administrator of this system is responsible for personal data and count towards GDPR regulations. Let me underline again very important point... In the logs by default there are NO personal data stored that can in 100% clearly identify you as you. Don't assume that you can use some information to check in other system who you are because even you would not be able to spell your name publicly since someone could find you in the Internet or some other system. GDPR counts towards places where the information is stored like a database and you put your personal data there and click that you accept to store it under some policy. If you create an account in JIRA you do not need to fill your address, social number, or anything else.. Your company store this for example in AD and if you want to remove your data it will be removed there and even if someone will have your username found in the logs and will try to find info about you he will not achieve it.

Alexander Kueken June 7, 2018

What you two are discussing here right now is the difference between indirect and direct personal data. Login names, display names, screen names and nicknames count toward indirect personal data and so GDPR applies to them, sadly.

Also, it is not correct, that Jira does not save personal data. First of all, not every instance uses an AD or LDAP server and uses Jira as its own user directory. And even if you use AD/LDAP, information is synced between them and Jira.

With the login name, e.g. from the log files, you are able to identify the internal user identifier, which is directly the login name or another key. The user identifier on its own may not have a meaning, but when it is used in a registry (e.g. LDAP/AD/User Directory) to identify a person, the key becomes personal data in all contexts.

And under this perspective, Jira stores a lot of personal data. See alone: 

https://confluence.atlassian.com/adminjiraserver071/monitor-a-user-s-activity-802592329.html

Mirek
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
June 7, 2018

@Alexander KuekenThank you for jumping in however you actually did not catch my point. If you are sure that username or nickname count please provide an article that clearly define that as a personal information and we can refer to it. I did not find anything like that. Maybe I missed it. Thanks!

For me this GDPR only applies to a system where data is actually stored not somewhere else when you can search for it because you know an username.. if you use AD it is stored there and AD system admin is responsible for keeping it safe.. if using only JIRA a local user management system then when you create and account you do not need to provide any personal data. Atlassian itself do not need your personal data to create an account.

The ability to search in other systems information that you know is not a same as having a database with many personal information and being an data administrator of a specific system.

Alexander Kueken June 7, 2018

@Mirek Sure, the main definition comes from the GDPR itself, see Article 4(1) and Recital 26:

Especially Recital 26 is interesting in combination with Article 4(5). Even if you see the username or user identifier as pseudonymisation, it should be considered as personal data.

The problem is, the definitions in the GDPR are very broad and open, and there is a lot of legal uncertainty. For that reason, most organizations refer to the PII and SPI definition of NIST: "Guide to Protecting the Confidentiality of Personally Identifiable Information (PII)"

Overall I would be with you if Jira would use only online access to the AD or LDAP. But instead Jira "caches" the user information like username and e-mail addresses. But "cache" is misleading in this context, as Jira synchronizes the user information from the AD/LDAP and persist it in its own database:

https://confluence.atlassian.com/adminjiraserver/synchronizing-data-from-external-directories-938847064.html

But, from my point of view, you would be right, if you do not store any direct personal data, like the full name or the e-mail address in the internal or external directory. And yes, you are correct, you are not forced to provide this information in order to use Jira. At least if you are OK with it, that certain functionalities are not working, e.g. e-mail notifications or a lot of add-ons.

Overall do not get me wrong, from my point of view all this usage of personal data in Jira is legal. The GDPR is about minimization of the personal data you store and process, or Datensparsamkeit as we say here. It is not about getting rid of any personal data. And a lot of people seem to forget, that the GDPR lives beside and not prior other laws and regulations.

That also applies to Jens original questions about the log files. They are needed for multiple reasons. The GDPR just postulate you have to document the existence of this data, how you process it and what it is used for. And having processes in place to delete the data if needed and providing the user the data, if he asks for, and so on.

Mirek
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
June 7, 2018

Exactly! "The problem is, the definitions in the GDPR are very broad and open, and there is a lot of legal uncertainty". And we both agree with this. That is why I mentioned that everyone understand GDPR regulations differently. It is like with other law and regulations. If there is a problem you meet at court :)

Pseudonymization is the separation of data from direct identifiers so that linkage to an identity is not possible without additional information that is held separately. From JIRA is it hard to simply click on username and get info from AD about specific person, since AD is protected and even if you do not know where is located you cannot get directly this info..

On the other hand.. overall article says "This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes." You simply say it is for information or statistical (like logs might be) and you are done.. 

Either way .. GDPR have one big advantage.. People start discussing about protecting data. Data privicy started to become more crucial than before. People provide data everywhere without even thinking how are those used and later are shocked that someone could use it in a way that they did not wanted to or share with others without approval.

Thank you for a good discussion! :)

Suggest an answer

Log in or Sign up to answer