I would need a sql query to search the contents of all attachments, including file history revisions, for a string.
I am using sql server database. I need the query to return the filename that contains a specific string. I want to search in attachments to all pages, blogs, all versions.
You can't and there's two reasons
First, some attachments are not directly readable. Imagine zip or pdf files - you need software to open and read them, SQL would not be able to find plain text in them.
Second, attachments are not held in the database (unless you're on an old unsupported version of Confluence where you've chosen to enable that). So SQL can't find them, as they won't be where it can look.
Do you think getting the attachment file through REST API using attachment id and searching for a specific string is a good idea?
1. I get the attachment:
2.Then I need to use a script to search for a specific string in the attachment.
But I need to do this for all the attachments in all spaces and blogs. Do you think this sounds like a good idea or do you have another solution for this?
That is a good approach, but you will still need to think about how you "open" (i.e.read) attachments after you have downloaded them.
I should have said before though, Confluence can index the contents of attachments, as long as they are in a format it understands. So you might want to consider using the built-in searches. That can also be done over REST, with CQL
Thank you very much Nic for your support.
I am wondering if I could get the "extracted_text" of the attachment files through sql query?
Please refer to the following link for my reference to extracted_text:
When a text based file is uploaded in Confluence (for example Word, PowerPoint, etc), its text is extracted and indexed so that people can search for the content of a file, not just the filename. We store the extracted text so that when that file needs to be reindexed, we don't need to re-extract the content of the file.
The extracted text file will be named with the version number, for example
2.extracted_text, and stored alongside the file versions themselves (within level 8 in the explanation above). We only keep the extracted text for the latest version, not earlier versions of a file.
Connect with like-minded Atlassian users at free events near you!Find an event
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.Host an event