How do I search Confluence for an external URL?

Mandy Grover March 13, 2018

I need to find all the instances of an external URL in our wiki. Is there any way to do this through an advanced search?

7 answers

1 accepted

2 votes
Answer accepted
Miranda Rawson
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 13, 2018

Hi Mandy,

There are some limitations with the Confluence search.  We had an existing feature request to extend the search capabilities, but this has since been closed.  Your best bet will be to use the workaround described here: https://confluence.atlassian.com/confkb/how-to-perform-a-confluence-site-search-for-keywords-and-links-through-the-database-830284252.html.  This involves using the Confluence database instead.

Kind regards,

Miranda Rawson

 

Atlassian Sales July 12, 2018

The link above is not working.

AnnWorley
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 12, 2018

Here is an updated link: How to perform a Confluence site search for keywords and links through the database

In case there is an issue accessing it, here is an excerpt:

For an example, an external URL was inserted to numerous pages but as time goes by, those URL might point to a dead link as there might be some changes in the subdomain/URL path.

Solution
To search for these contents, run the SQL query below on your Confluence database. Replace the <INSERT_KEYWORD_HERE> with your keyword. The % symbol represents a wildcard search.
The SQL results will return the content type, title along with the space details including the spacestatus (either CURRENT or ARCHIVED). If a space is Archived, it won't be searchable in Confluence's User Interface.

select c.CONTENTTYPE,c.TITLE, s.SPACENAME, s.SPACEKEY, s.SPACETYPE, s.SPACESTATUS 
from content c join spaces s on c.SPACEID=s.SPACEID
where CONTENTID in
(select CONTENTID from bodycontent where BODY like '%<INSERT_KEYWORD_HERE>%')
Like Michael Kortrey likes this
15 votes
Davin Studer
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
March 13, 2018

There is a little known feature of Confluence search in that is can also do regular expression searches. Try doing this for your search ...

/.*{your url here}.*/

Now, you will have to format the url to escape out any regular expression reserved characters. The biggest one would be periods. but if you have any of these characters in the url you would need to change them. See below for replacements

\    ->    \\
. -> \.
( -> \(
) -> \)
[ -> \[
^    ->    \^
$    ->    \$
|    ->    \|
*    ->    \*
+    ->    \+
?    ->    \?
{    ->    \{

So if you url was https://www.google.com/stuff+things you would search using this syntax ...

/.*https://www\.google\.com/stuff\+things.*/
Mandy Grover March 13, 2018

@Davin Studer

This is really interesting (and I didn't know about regex in Confluence at all). But would this show me the instances of URL if the full URL appeared on the page OR would this show me if the URL occurred in the source?

Paul Mansfield
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
December 14, 2018

I tried that and I got a system error and a HUGE stack trace; am self-hosted running Confluence 6.9.3 on CentOS6.

 

logo System Error

Cause

java.lang.IllegalArgumentException: integer expected at position 3
    at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:896)

Stack Trace:[hide]

java.lang.IllegalArgumentException: integer expected at position 3
 at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:896)
 at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:880)

... etc ....

    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)


 

Norman Cates December 13, 2022

I also am trying to search for a URL in our Confluence site. 

We are migrating a Twiki to Confluence.

The Twiki URLs are of the form:

https://twiki.company.com

I have tried this search which works:

/.*twiki.*/

Which will find those URLS, but also every other instance of Twiki.

This search does not work:

/.*twiki\.company.*/

Which doesn't make sense. I've escaped the full stop.

Any ideas?

4 votes
Russ Baker
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 11, 2020

You guys understand that not being able to find a basic text string in your pages is a pretty major failing of a data repository in this day and age, right?  Sure, there are workarounds.  Sure, you can use a regex.  But critically, when I type a string I know is in there and can't find it, I start asking some very fundamental questions about this product.  Strongly recommend you get the search tools up to a point that they meet basic expectations of a search tool.  This has been one of my biggest issues with Confluence, and the kind of thing that would lead us to consider alternatives.

K.C. Murphy
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
May 14, 2020

Even Sharepoint has a way to dredge the entire contents and Find and Replace.  This is 1980's technology, and it is shocking that it is not possible in this tool.

4 votes
easifyandroid January 25, 2019

For example if external site url to search is https://demo.site.com/..... . This worked in confluence for me : 

http*demo*site*com*

 

Regards,

Ankit

GJP August 9, 2019

This solution works and is simple in application.

Like olisteadman likes this
Amy Russell July 22, 2022

I can't thank you enough for this solution! This worked great!

Norman Cates December 13, 2022

Does this search still work for anyone?

We are migrating a Twiki to Confluence.

The Twiki URLs are of the form:

https://twiki.company.com

I have tried this which does not work. 

*twiki*

There are instances of twiki in the pages being searched.

Any ideas?

GJP December 14, 2022

Try, for example, https*twiki*company*com or any combination of words that make up your url. The more words you use, the more accurate results you will get.

Richard Cross January 15, 2024

Just to extend/expand upon the previous response, take the full URL you want to search and substitute any special characters with an asterisk (*).

So

https://mysite/content/fruit.html?search=apples

would become:

https*mysite*content*fruit*html*search*apples

Pro Tip:  Go directly to the advanced search page (at <confluence site>/dosearchsite.action) and put some restrictions on the search before running.

Like Rowell_ Chris _Explore_ likes this
Natasha Noble-Hunter
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
February 13, 2024

Thanks for this @Richard Cross 

I tried doing this but it's not working. I have pages I know are 100% using my url but are not being returned in the search. It is returning only 14 results and I know there are at least 20, so I cannot rely on the results at all. I wonder if it excludes Space home pages in the results?

I am looking for any instance where a page includes part of the old microsoft stream link (https://web.microsoftstream.com/video/....)

I understand that * (asterisk) is required at the end as a wildcard because each URL will have a different path after 'video/'

I have tried regex search and your method with no real luck. Any ideas?

 

https*web*microsoftstream*com*video* - (14 results)

https*web*microsoftstream*com*video  - (1 result)

"https*web*microsoftstream*com*video*" - (no results)

"https://web.microsoftstream.com/video*" - (14 results)

"https://web.microsoftstream.com/video/" - (14 results)

"https*web*microsoftstream*com*video/*" - (no results)

https://web.microsoftstream.com/video/ - (any page with 'video')

https*web*microsoftstream*com*video/* - (any page with 'web', 'video' and 'stream')

/.*https://web.microsoftstream.com/video.*/ - (any page with word 'video' and 'stream' in)

/.*https://web\.microsoftstream\.com/video.*/  - (any page with the word 'video' in)

/.*https://web\.microsoftstream\.com/video*.*/ - (any page with the word 'video' in)

1 vote
André de Carvalho
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
December 15, 2021

Its just a show stopper : to have a wiki where You can not find all occurrences of links You want to update. How are You supposed to keep links updated in Your system?

Again this is just a result of the childish "ephemeral chat" perspective and not a serious, professional system to produce and maintain knowledge.

 

The response "We had an existing feature request to extend the search capabilities, but this has since been closed." is just unacceptable and rude. I know You don't care, don't need to say it in my face...

0 votes
Michi S.
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 21, 2023

Export to Word.jpgExport to Word is in my opinion the easiest way to find the links. Important: check both of the url checkboxes on the content tab (s. attached screenshot).

Afterwards, you find all the links by a simple text search in the resulting docx file.

0 votes
Michelle Rau good
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 1, 2020

If you have access to the Reporting add-on, you can create a report to find all these links. I wrote about doing that here:  https://community.atlassian.com/t5/Confluence-articles/Finding-and-fixing-broken-links-with-Reporting-for-mere-mortals/ba-p/1334589 

robert_seeger June 17, 2022

Ironically, the link to a recipe to fix broken links was broken by some trailing characters ;-)

Searching for it found it https://community.atlassian.com/t5/Confluence-articles/Finding-and-fixing-broken-links-with-Reporting-for-mere-mortals/ba-p/1334589 

Norman Cates December 13, 2022

Could someone PLEASE expand on how you created a report, what Service Rocket is and how to DO all these things?

This article assumes so many bits of knowledge I have no idea where to start. 

What Macros are required? And / or what addons are required?

I am pretty new to Confluence, and landed here because I need to try to find URLs in a site we are migrating. 

Links to information is totally fine. 

robert_seeger December 14, 2022

Service Rocket is the producer of the Confluence plugin Reporting for Confluence.

It is very powerful but not easy to use for a newbie or non-developer - it is basically "programming by macro design" - and quite expensive.

The article shows you all the macros required and how to nest them for this specific use case, but this requires the plugin. The plugin documentation contains plenty of other use cases with "recipes".

Norman Cates December 14, 2022

Thank you. 

Since Atlassian search should work to be able to find things like URLs in a trivial manner, I will keep bugging them about actually getting Search to work.

A plugin as described above is far too much overkill for something that should work anyway.

Like robert_seeger likes this
Norman Cates January 23, 2023

As a follow up to this, I found a section in the help for Confluence search that says basically, you can't use an * or ? at the start of a search term. The search will just fail silently.

Hence my attempts at *twiki*company*com were just failing silently.

@GJP suggested https*twiki*company*com which does indeed work.

But the basic takeaway is that an * or ? at the start will silently fail with Confluences search implementation.

Like robert_seeger likes this

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events