Community
Answers Developer Questions
Questions
How to get Apache Tika / POI to work in Confluence plugin

How to get Apache Tika / POI to work in Confluence plugin

Hello!

I'm trying to use Apache Tika in a confluence plugin I'm developing.

The problem seems to be depenency issues, as tika has a lot of them.

I also do not have a lot knowledge about OSGI and how Confluence handles this.

When using the dependncy tika-bundle I get the exception:

ClassNotFoundException: org.apache.xmlbeans.XmlException

When using the dependency tika-parsers the following error occurs:

ClassNotFoundException: org.w3c.dom.Node

Has anybody here already included tika in a plugin?
How can I locate the problem and fix it?

Please help, I'm stuck.

Confluence versions 4.3 and 5.1

2 answers

1 accepted

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

0 votes

Answer accepted

My issues were related to OSGI package imports. The plugin loaded classes with factories at runtime, but confluence could not automatically detect these package imports at build time, therefore the errors at runtime.

As the intention was to extract textual content of MS Office files the way like the OfficeConnector does it, here is the way I got it to work:

One further issue is, that a confluence plugin has to use a certain xerces version, as mentioned in this talk: http://www.atlassian.com/company/about/events/summit/2010/presentations/under-the-hood/plugins2-and-osgi-gotchas.jsp -- As POI (or Tika) use xerces, it was necessary to find a poi version that matches the xerces version of confluence.

By looking at the atlassian-plugin.xml and pom.xml of the OfficeConnector plugin, I found out which dependencies work:

&lt;dependency&gt;
    &lt;groupId&gt;org.apache.poi&lt;/groupId&gt;
    &lt;artifactId&gt;poi&lt;/artifactId&gt;
    &lt;version&gt;3.5-FINAL&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
    &lt;groupId&gt;org.apache.poi&lt;/groupId&gt;
    &lt;artifactId&gt;poi-scratchpad&lt;/artifactId&gt;
    &lt;version&gt;3.5-FINAL&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
    &lt;groupId&gt;org.apache.poi&lt;/groupId&gt;
    &lt;artifactId&gt;poi-ooxml&lt;/artifactId&gt;
    &lt;version&gt;3.5-FINAL&lt;/version&gt;
    &lt;exclusions&gt;
        &lt;exclusion&gt;
            &lt;groupId&gt;stax&lt;/groupId&gt;
            &lt;artifactId&gt;stax-api&lt;/artifactId&gt;
        &lt;/exclusion&gt;
        &lt;exclusion&gt;
            &lt;groupId&gt;xml-apis&lt;/groupId&gt;
            &lt;artifactId&gt;xml-apis&lt;/artifactId&gt;
        &lt;/exclusion&gt;
    &lt;/exclusions&gt;
&lt;/dependency&gt;

With these dependencies, I can extract powerpoint, excel and word files of current formats.

Next thing was to manually write the OSGI Import-Package section in the atlassian-plugin.xml.

In fact, I merged the generated entries that you can find in confluence's admin section in the OSGI browser with the entries of the OfficeConnector, found in the atlassian-plugin.xml.

And that worked.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Thanks for sharing the solution Philipp!

So, you got Tika working for Parsing Office files using the above dependencies? I know its been long, but if you remember, can you please share some more details.

I'm trying to integrate with Tika as well, any info you provide would be really helpful!

https://community.developer.atlassian.com/t/confluence-plugin-apache-tika-parser-issue-emptyparser/10847

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

You will need to add the dependency in POM.xml of your plugin , refer http://www.sonatype.com/books/mvnex-book/reference/customizing-sect-add-depend.html

you might first need to install Tika in your local maven repository

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Do I understand you correctly, to add the tika dependency in the pom.xml?
I've done that.

The issue might be, that tika has a lot of transitive dependencies that are already shipped with confluence but in different versions.

I can compile the plugin, also atlas-run starts.

But when tika gets called, it cannot find these classes.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

The exceptions is because of org.apache.xmlbeans.XmlException & org.w3c.dom.Node

May be you will need to add dependency for these classes

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Was this helpful?

Thanks!

Answers Developer Questions

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

How to get Apache Tika / POI to work in Confluence plugin

2 answers

1 accepted

Comments for this post are closed

Was this helpful?

Thanks!

TAGS

Atlassian Community Events

Ask a question

Start a discussion

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

How to get Apache Tika / POI to work in Confluence plugin

2 answers

1 accepted

Comments for this post are closed

Was this helpful?

Thanks!

TAGS

Atlassian Community Events