Postgres regex to find issue keys in description or comments

Enrique Cadalso June 14, 2021

Hi,

We need a regex compatible with Postgres regex syntax (https://www.postgresql.org/docs/current/functions-matching.html)

The two regexes we have found are for Java and they don't work in a postgres SQL.

1. Atlassian Java regex to match an issue key at https://confluence.atlassian.com/stashkb/integrating-with-custom-jira-issue-key-313460921.html

((?<!([A-Z]{1,10})-?)[A-Z]+-\d+)

2. The community improved regex at https://community.atlassian.com/t5/Bitbucket-questions/Regex-pattern-to-match-JIRA-issue-key/qaq-p/233319

((?<!([A-Za-z]{1,10})-?)[A-Z]+-\d+)


We want it to run an update query using Regexp_Replace to change the project key in issue key mentions in the descriptions and the comments.

 

The best we have so far is this, but it fails to replace mentions like KEY-123/KEY-456, and KEY-123(something else)

 Regexp_Replace(jiraissue.description, '(\s|^|\(|\"|\.|\,)KEY(-[0-9]+)([\)|\"|\.|\,]*)(?!\S)', '\1NEWKEY\2\3', 'g') 

 


Thanks,

1 answer

2 votes
Andy Heinzer
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
June 17, 2021

Hi Enrique,

I understand you are looking to create a regex that will match Jira issue keys.  However the answer here (at least for Jira Server and Data Center) could be different between systems.  This is because Jira server will allow a system admin to determine what is the correct patter for a Project Key to follow.  If you're a system admin, you can login to Jira and go to System > General configuration and then click the Advanced Settings button.

Scroll down to the entry called jira.projectkey.pattern. This value will determine what are valid values for the project key (the part before the -1234 that we see in an issuekey such as ABC-1234).  This value is actually in a regex format to begin with.  By default it has a value of

([A-Z][A-Z0-9]+)

We can use that as a base to start to figure this out.  I really like the website of https://regexr.com/ to help me figure out what my regex is going to match against using some realtime highlighting.  Using this tool I played with this for a bit and found what I think the correct expression to match a default issuekey.  Try this:

([A-Z][A-Z0-9]+-[1-9][0-9]*)

 

Ok, so the reason for this is that the first character has to be a letter (and capital letter at that), but then the project key can have either another letter or a number (but we need at least two characters for the project key), and then we can repeat the number of characters there. Next we look for the hyphen (-), and need to match at least one digit and the first digit will never be a leading zero in Jira. 

So that should do it.  However, you could try to refine this even further if need be.  I think this is optional in most cases, but just in case.  There is another possible value we need to lookup, that is back on the general settings page, it's called "Maximum project key size".  Default value is 10, but this could be bigger or smaller in different environments. 

In which case, a more precise regex could be

([A-Z][A-Z0-9]{1,9}-[1-9][0-9]*)

Try that and let me know if you have any problems with that.  I haven't used the postgresql regex lookup, so perhaps the syntax is a bit different there.  But I'm also not sure why you are looking to change issue keys on the database level.  When a project key is renamed, or issues in Jira are moved to a new project, I would expect that Jira would automatically attempt to leave a redirect for the location/issuekey of the new issue. At least this is true when the issue is still on the same Jira instance, which is to say if this is issue data migrated from a different Jira site then perhaps that redirect would not be expected to exist. 

I hope this helps.

Andy

Enrique Cadalso June 18, 2021

Hi Andy,

Thanks for your regex. It is an improvement over the one I have. It is able to match cases like

  • case TEXT1-123/TEXT2-456/
  • case (TEXT1-123)
  • case TEXT1-123.

 

However it also matches for other cases we do not want, such as

  • case OTHERTEXT-TEXT1-123456789-MORETEXT, string
  • test case TEXT1-123456789-MORETEXT, string

 

I tried adding \m and \M to your regex but it still matches those cases. I have used a simpler regex (because I know the project key) with mixed results.

 

Regexp_Replace(description, '(\mOLDKEY)(-[0-9]+\M)', 'NEWKEY\2', 'g')

 

But it also matches the cases where the issue key is between dashes. I think the reason is the dash character is not considered as word start or a word end (\n and \M in Postgres syntax.

The reason we are using SQL is that a bunch of projects was imported into an existing instance, but some project keys had to be changed in the source because they exist in the target. That is, source OLDKEY = target EXISTINGKEY, so OLDKEY was changed to NEWKEY in the source. When importing the project that Jira redirection is lost.

Even if we manage to configure the redirection in the target instance from OLDKEY to NEWKEY (playing with the projectkey table) that would not be correct, because that redirection makes sense only in the scope of the imported projects. There are valid OLDKEY (=EXISTINGKEY) issue mentions in the projects previously in the target that are not supposed to be redirected.

An alternative to SQL would be to use a script (ScriptRunner) or a custom addon with some code using the Java API. But we do not want to change the updated date (which rules out the REST API) and I don't like to use anything other than IssueService with IssueInputParameters when updating issues, which would also change the updated date. Other procedures require to programatically reindex the updated issues and it is a data center instance, so I don't want to mess with the index and inter-node replication. So the SQL seems a more direct and safer approach here.

 

I have the queries but not a good regex for all cases.

 

UPDATE
ONLY jiraissue i
SET
description = Regexp_Replace(i.description, '(\mOLDKEY)(-[0-9]+\M)', 'NEWKEY\2', 'g')
FROM
project p
WHERE
i.project = p.id
AND p.pkey IN ( IMPORTED PROJECT KEYS )
AND i.description ~ '\mOLDKEY-[0-9]+\M';

UPDATE
ONLY jiraaction c
SET
actionbody = Regexp_Replace(c.actionbody, '(\mOLDKEY)(-[0-9]+\M)', 'NEWKEY\2', 'g')
FROM
jiraissue i, project p
WHERE
c.issueid = i.id
AND i.project = p.id
AND p.pkey IN ( IMPORTED PROJECT KEYS )
AND c.actionbody ~ '\mOLDKEY-[0-9]+\M';

 


Thanks,

 

Enrique

Andy Heinzer
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
June 21, 2021

Hi Enrique,

Thanks for the detailed explanation.  It really helps me better understand what you are trying to do here.  In this case, I think we can tweak my previous regex and use something like this:

\b([A-Z][A-Z0-9]{1,9}-[1-9][0-9]*)\b(?![-_])

I added the \b as a means of using a word boundary, although that isn't what excludes those unwanted previous matches of

  • case OTHERTEXT-TEXT1-123456789-MORETEXT, string
  • test case TEXT1-123456789-MORETEXT, string

Instead, I also added a negative lookhead with the (?!) and then have a group of characters there of _ and -.  That would exclude those two examples you posted.  And I have tested this against the regexr.com site. 

But I have not tried this against posgresql, so I'm not certain if there is perhaps some other differences between how they might be implementing regex there.

Try that and let me know the results.

Andy

Suggest an answer

Log in or Sign up to answer