Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Sql query to search the contents of all attachments, including file history revisions, for a string

ilakkuvaselvi manoharan June 16, 2018

I would need a sql query to search the contents of all attachments, including file history revisions, for a string.

I am using sql server database. I need the query to return the filename that contains a specific string. I want to search in attachments to all pages, blogs, all versions.

 

 

2 answers

0 votes
Vijay Sv November 8, 2019

Use something like this if you're looking for content attachments.

SELECT 'All Content Attachments', count(*) FROM [dbo].[CONTENT] WHERE CONTENTTYPE = 'ATTACHMENT';

0 votes
Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
June 16, 2018

You can't and there's two reasons

First, some attachments are not directly readable.  Imagine zip or pdf files - you need software to open and read them, SQL would not be able to find plain text in them.

Second, attachments are not held in the database (unless you're on an old unsupported version of Confluence where you've chosen to enable that).  So SQL can't find them, as they won't be where it can look.

ilakkuvaselvi manoharan June 16, 2018

Thank you Nic! I appreciate your help. If I want to search for contents of files/attachments in confluence, Can you please suggest what is the best way to do?

ilakkuvaselvi manoharan June 16, 2018

Do you think getting the attachment file through REST API using attachment id and searching for a specific string is a good idea?

 

1. I get the attachment:

2.Then I need to use a script to search for a specific string in the attachment.

 

But I need to do this for all the attachments in all spaces and blogs. Do you think this sounds like a good idea or do you have another solution for this?

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
June 16, 2018

That is a good approach, but you will still need to think about how you "open" (i.e.read) attachments after you have downloaded them.

I should have said before though, Confluence can index the contents of attachments, as long as they are in a format it understands.  So you might want to consider using the built-in searches.  That can also be done over REST, with CQL

ilakkuvaselvi manoharan June 19, 2018

Thank you very much Nic for your support. 

 

I am wondering if I could get the "extracted_text" of the attachment files through sql query?

Please refer to the following link for my reference to extracted_text:

https://confluence.atlassian.com/doc/hierarchical-file-system-attachment-storage-704578486.html

Extracted text files

When a text based file is uploaded in Confluence (for example Word, PowerPoint, etc), its text is extracted and indexed so that people can search for the content of a file, not just the filename. We store the extracted text so that when that file needs to be reindexed, we don't need to re-extract the content of the file.

The extracted text file will be named with the version number, for example 2.extracted_text, and stored alongside the file versions themselves (within level 8 in the explanation above).  We only keep the extracted text for the latest version, not earlier versions of a file. 

Regards,

Ilakk

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
June 19, 2018

That will only work for files that are converted.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events