Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

How to get Apache Tika / POI to work in Confluence plugin

Philipp Steinwender
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 9, 2013

Hello!

I'm trying to use Apache Tika in a confluence plugin I'm developing.

The problem seems to be depenency issues, as tika has a lot of them.

I also do not have a lot knowledge about OSGI and how Confluence handles this.

When using the dependncy tika-bundle I get the exception:

ClassNotFoundException: org.apache.xmlbeans.XmlException

When using the dependency tika-parsers the following error occurs:

ClassNotFoundException: org.w3c.dom.Node

Has anybody here already included tika in a plugin?
How can I locate the problem and fix it?

Please help, I'm stuck.

Confluence versions 4.3 and 5.1

2 answers

1 accepted

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

0 votes
Answer accepted
Philipp Steinwender
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 11, 2013

My issues were related to OSGI package imports. The plugin loaded classes with factories at runtime, but confluence could not automatically detect these package imports at build time, therefore the errors at runtime.

As the intention was to extract textual content of MS Office files the way like the OfficeConnector does it, here is the way I got it to work:

One further issue is, that a confluence plugin has to use a certain xerces version, as mentioned in this talk: http://www.atlassian.com/company/about/events/summit/2010/presentations/under-the-hood/plugins2-and-osgi-gotchas.jsp -- As POI (or Tika) use xerces, it was necessary to find a poi version that matches the xerces version of confluence.

By looking at the atlassian-plugin.xml and pom.xml of the OfficeConnector plugin, I found out which dependencies work:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.5-FINAL</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>3.5-FINAL</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>3.5-FINAL</version>
    <exclusions>
        <exclusion>
            <groupId>stax</groupId>
            <artifactId>stax-api</artifactId>
        </exclusion>
        <exclusion>
            <groupId>xml-apis</groupId>
            <artifactId>xml-apis</artifactId>
        </exclusion>
    </exclusions>
</dependency>

With these dependencies, I can extract powerpoint, excel and word files of current formats.

Next thing was to manually write the OSGI Import-Package section in the atlassian-plugin.xml.

In fact, I merged the generated entries that you can find in confluence's admin section in the OSGI browser with the entries of the OfficeConnector, found in the atlassian-plugin.xml.

And that worked.

Sameer V November 8, 2017

Thanks for sharing the solution Philipp!

So, you got Tika working for Parsing Office files using the above dependencies? I know its been long, but if you remember, can you please share some more details.

I'm trying to integrate with Tika as well, any info you provide would be really helpful!

https://community.developer.atlassian.com/t/confluence-plugin-apache-tika-parser-issue-emptyparser/10847

0 votes
Mizan
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 9, 2013

You will need to add the dependency in POM.xml of your plugin , refer http://www.sonatype.com/books/mvnex-book/reference/customizing-sect-add-depend.html

you might first need to install Tika in your local maven repository

Philipp Steinwender
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 9, 2013

Do I understand you correctly, to add the tika dependency in the pom.xml?
I've done that.

The issue might be, that tika has a lot of transitive dependencies that are already shipped with confluence but in different versions.

I can compile the plugin, also atlas-run starts.

But when tika gets called, it cannot find these classes.

Mizan
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 9, 2013

The exceptions is because of org.apache.xmlbeans.XmlException & org.w3c.dom.Node

May be you will need to add dependency for these classes

TAGS
AUG Leaders

Atlassian Community Events