It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

How do I search Confluence for an external URL?

I need to find all the instances of an external URL in our wiki. Is there any way to do this through an advanced search?

4 answers

1 accepted

1 vote
Answer accepted

Hi Mandy,

There are some limitations with the Confluence search.  We had an existing feature request to extend the search capabilities, but this has since been closed.  Your best bet will be to use the workaround described here: https://confluence.atlassian.com/confkb/how-to-perform-a-confluence-site-search-for-keywords-and-links-through-the-database-830284252.html.  This involves using the Confluence database instead.

Kind regards,

Miranda Rawson

 

The link above is not working.

AnnWorley Atlassian Team Jul 12, 2018

Here is an updated link: How to perform a Confluence site search for keywords and links through the database

In case there is an issue accessing it, here is an excerpt:

For an example, an external URL was inserted to numerous pages but as time goes by, those URL might point to a dead link as there might be some changes in the subdomain/URL path.

Solution
To search for these contents, run the SQL query below on your Confluence database. Replace the <INSERT_KEYWORD_HERE> with your keyword. The % symbol represents a wildcard search.
The SQL results will return the content type, title along with the space details including the spacestatus (either CURRENT or ARCHIVED). If a space is Archived, it won't be searchable in Confluence's User Interface.

select c.CONTENTTYPE,c.TITLE, s.SPACENAME, s.SPACEKEY, s.SPACETYPE, s.SPACESTATUS 
from content c join spaces s on c.SPACEID=s.SPACEID
where CONTENTID in
(select CONTENTID from bodycontent where BODY like '%<INSERT_KEYWORD_HERE>%')
Like Michael_Kortrey likes this
6 votes
Davin_Studer Community Leader Mar 13, 2018

There is a little known feature of Confluence search in that is can also do regular expression searches. Try doing this for your search ...

/.*{your url here}.*/

Now, you will have to format the url to escape out any regular expression reserved characters. The biggest one would be periods. but if you have any of these characters in the url you would need to change them. See below for replacements

\    ->    \\
. -> \.
( -> \(
) -> \)
[ -> \[
^    ->    \^
$    ->    \$
|    ->    \|
*    ->    \*
+    ->    \+
?    ->    \?
{    ->    \{

So if you url was https://www.google.com/stuff+things you would search using this syntax ...

/.*https://www\.google\.com/stuff\+things.*/

@Davin_Studer

This is really interesting (and I didn't know about regex in Confluence at all). But would this show me the instances of URL if the full URL appeared on the page OR would this show me if the URL occurred in the source?

I tried that and I got a system error and a HUGE stack trace; am self-hosted running Confluence 6.9.3 on CentOS6.

 

logo System Error

Cause

java.lang.IllegalArgumentException: integer expected at position 3
    at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:896)

Stack Trace:[hide]

java.lang.IllegalArgumentException: integer expected at position 3
 at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:896)
 at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:880)

... etc ....

    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)


 

For example if external site url to search is https://demo.site.com/..... . This worked in confluence for me : 

http*demo*site*com*

 

Regards,

Ankit

This solution works and is simple in application.

You guys understand that not being able to find a basic text string in your pages is a pretty major failing of a data repository in this day and age, right?  Sure, there are workarounds.  Sure, you can use a regex.  But critically, when I type a string I know is in there and can't find it, I start asking some very fundamental questions about this product.  Strongly recommend you get the search tools up to a point that they meet basic expectations of a search tool.  This has been one of my biggest issues with Confluence, and the kind of thing that would lead us to consider alternatives.

Even Sharepoint has a way to dredge the entire contents and Find and Replace.  This is 1980's technology, and it is shocking that it is not possible in this tool.

Suggest an answer

Log in or Sign up to answer
TAGS
Community showcase
Posted in Confluence

What project did you transition or start on Confluence with the shift to remote work?

It’s been great to hear from fellow users over the last few weeks about the best tips and fun moments you’ve had working on Confluence since the transition to working remote. I’d love to keep the c...

353 views 11 11
Join discussion

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you