Community
Products
Confluence
Questions
How do I search Confluence for an external URL?

How do I search Confluence for an external URL?

I need to find all the instances of an external URL in our wiki. Is there any way to do this through an advanced search?

7 answers

1 accepted

2 votes

Answer accepted

Hi Mandy,

There are some limitations with the Confluence search. We had an existing feature request to extend the search capabilities, but this has since been closed. Your best bet will be to use the workaround described here: https://confluence.atlassian.com/confkb/how-to-perform-a-confluence-site-search-for-keywords-and-links-through-the-database-830284252.html. This involves using the Confluence database instead.

Kind regards,

Miranda Rawson

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

The link above is not working.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Here is an updated link: How to perform a Confluence site search for keywords and links through the database

In case there is an issue accessing it, here is an excerpt:

For an example, an external URL was inserted to numerous pages but as time goes by, those URL might point to a dead link as there might be some changes in the subdomain/URL path.

Solution
To search for these contents, run the SQL query below on your Confluence database. Replace the <INSERT_KEYWORD_HERE> with your keyword. The % symbol represents a wildcard search.
The SQL results will return the content type, title along with the space details including the spacestatus (either CURRENT or ARCHIVED). If a space is Archived, it won't be searchable in Confluence's User Interface.

select c.CONTENTTYPE,c.TITLE, s.SPACENAME, s.SPACEKEY, s.SPACETYPE, s.SPACESTATUS 
from content c join spaces s on c.SPACEID=s.SPACEID 
where CONTENTID in 
(select CONTENTID from bodycontent where BODY like '%<INSERT_KEYWORD_HERE>%')

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Michael Kortrey likes this

15 votes

There is a little known feature of Confluence search in that is can also do regular expression searches. Try doing this for your search ...

/.*{your url here}.*/

Now, you will have to format the url to escape out any regular expression reserved characters. The biggest one would be periods. but if you have any of these characters in the url you would need to change them. See below for replacements

\    ->    \\
.    ->    \.
(    ->    \(
)    ->    \)
[    ->    \[
^    ->    \^
$    ->    \$
|    ->    \|
*    ->    \*
+    ->    \+
?    ->    \?
{    ->    \{

So if you url was https://www.google.com/stuff+things you would search using this syntax ...

/.*https://www\.google\.com/stuff\+things.*/

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

@Davin Studer

This is really interesting (and I didn't know about regex in Confluence at all). But would this show me the instances of URL if the full URL appeared on the page OR would this show me if the URL occurred in the source?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

I tried that and I got a system error and a HUGE stack trace; am self-hosted running Confluence 6.9.3 on CentOS6.

System Error

Cause

java.lang.IllegalArgumentException: integer expected at position 3
at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:896)

Stack Trace:[hide]

java.lang.IllegalArgumentException: integer expected at position 3
 at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:896)
 at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:880)

... etc ....

    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

I also am trying to search for a URL in our Confluence site.

We are migrating a Twiki to Confluence.

The Twiki URLs are of the form:

https://twiki.company.com

I have tried this search which works:

/.*twiki.*/

Which will find those URLS, but also every other instance of Twiki.

This search does not work:

/.*twiki\.company.*/

Which doesn't make sense. I've escaped the full stop.

Any ideas?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

5 votes

You guys understand that not being able to find a basic text string in your pages is a pretty major failing of a data repository in this day and age, right? Sure, there are workarounds. Sure, you can use a regex. But critically, when I type a string I know is in there and can't find it, I start asking some very fundamental questions about this product. Strongly recommend you get the search tools up to a point that they meet basic expectations of a search tool. This has been one of my biggest issues with Confluence, and the kind of thing that would lead us to consider alternatives.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Even Sharepoint has a way to dredge the entire contents and Find and Replace. This is 1980's technology, and it is shocking that it is not possible in this tool.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

4 votes

For example if external site url to search is https://demo.site.com/..... . This worked in confluence for me :

http*demo*site*com*

Regards,

Ankit

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

This solution works and is simple in application.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • olisteadman likes this

I can't thank you enough for this solution! This worked great!

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Does this search still work for anyone?

We are migrating a Twiki to Confluence.

The Twiki URLs are of the form:

https://twiki.company.com

I have tried this which does not work.

*twiki*

There are instances of twiki in the pages being searched.

Any ideas?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Try, for example, https*twiki*company*com or any combination of words that make up your url. The more words you use, the more accurate results you will get.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Just to extend/expand upon the previous response, take the full URL you want to search and substitute any special characters with an asterisk (*).

https://mysite/content/fruit.html?search=apples

would become:

https*mysite*content*fruit*html*search*apples

Pro Tip: Go directly to the advanced search page (at <confluence site>/dosearchsite.action) and put some restrictions on the search before running.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Rowell, Chris [Explore] likes this

Thanks for this @Richard Cross

I tried doing this but it's not working. I have pages I know are 100% using my url but are not being returned in the search. It is returning only 14 results and I know there are at least 20, so I cannot rely on the results at all. I wonder if it excludes Space home pages in the results?

I am looking for any instance where a page includes part of the old microsoft stream link (https://web.microsoftstream.com/video/....)

I understand that * (asterisk) is required at the end as a wildcard because each URL will have a different path after 'video/'

I have tried regex search and your method with no real luck. Any ideas?

https*web*microsoftstream*com*video* - (14 results)

https*web*microsoftstream*com*video - (1 result)

"https*web*microsoftstream*com*video*" - (no results)

"https://web.microsoftstream.com/video*" - (14 results)

"https://web.microsoftstream.com/video/" - (14 results)

"https*web*microsoftstream*com*video/*" - (no results)

https://web.microsoftstream.com/video/ - (any page with 'video')

https*web*microsoftstream*com*video/* - (any page with 'web', 'video' and 'stream')

/.*https://web.microsoftstream.com/video.*/ - (any page with word 'video' and 'stream' in)

/.*https://web\.microsoftstream\.com/video.*/ - (any page with the word 'video' in)

/.*https://web\.microsoftstream\.com/video*.*/ - (any page with the word 'video' in)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

1 vote

Its just a show stopper : to have a wiki where You can not find all occurrences of links You want to update. How are You supposed to keep links updated in Your system?

Again this is just a result of the childish "ephemeral chat" perspective and not a serious, professional system to produce and maintain knowledge.

The response "We had an existing feature request to extend the search capabilities, but this has since been closed." is just unacceptable and rude. I know You don't care, don't need to say it in my face...

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Export to Word is in my opinion the easiest way to find the links. Important: check both of the url checkboxes on the content tab (s. attached screenshot).

Afterwards, you find all the links by a simple text search in the resulting docx file.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

If you have access to the Reporting add-on, you can create a report to find all these links. I wrote about doing that here: https://community.atlassian.com/t5/Confluence-articles/Finding-and-fixing-broken-links-with-Reporting-for-mere-mortals/ba-p/1334589

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Ironically, the link to a recipe to fix broken links was broken by some trailing characters ;-)

Searching for it found it https://community.atlassian.com/t5/Confluence-articles/Finding-and-fixing-broken-links-with-Reporting-for-mere-mortals/ba-p/1334589

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Could someone PLEASE expand on how you created a report, what Service Rocket is and how to DO all these things?

This article assumes so many bits of knowledge I have no idea where to start.

What Macros are required? And / or what addons are required?

I am pretty new to Confluence, and landed here because I need to try to find URLs in a site we are migrating.

Links to information is totally fine.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Service Rocket is the producer of the Confluence plugin Reporting for Confluence.

It is very powerful but not easy to use for a newbie or non-developer - it is basically "programming by macro design" - and quite expensive.

The article shows you all the macros required and how to nest them for this specific use case, but this requires the plugin. The plugin documentation contains plenty of other use cases with "recipes".

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Thank you.

Since Atlassian search should work to be able to find things like URLs in a trivial manner, I will keep bugging them about actually getting Search to work.

A plugin as described above is far too much overkill for something that should work anyway.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • robert_seeger likes this

As a follow up to this, I found a section in the help for Confluence search that says basically, you can't use an * or ? at the start of a search term. The search will just fail silently.

Hence my attempts at *twiki*company*com were just failing silently.

@GJP suggested https*twiki*company*com which does indeed work.

But the basic takeaway is that an * or ? at the start will silently fail with Confluences search implementation.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • robert_seeger likes this

Suggest an answer

Was this helpful?

Thanks!

Confluence

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

How do I search Confluence for an external URL?

7 answers

1 accepted

System Error

Cause

Stack Trace:[hide]

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events