Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

Junk HTML develops over time in Confluence pages

Laura Schneider
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 18, 2020

Is anyone else seeing this problem? Over time, I see a lot of junk HTML inserted into my Confluence pages.

I will work on a page, then come back to it and work on it further (applying the styles from the menu or just typing directly in Confluence). Over time, the HTML source (as viewed in the Confluence editor) collects "junk". For example, a randomly inserted non-breaking space. Part of a paragraph will change to a different color, the font will change to a different style (bold, italic), letter spacing will change, etc. Rather than simple HTML, complex formatting gets inserted seemingly at random. The only way I could see this "junk" entering Confluence is if someone were copying text from a pre-formatted source into the page, which I've already ruled out: as the sole editor of a page I can close and re-open it moments  or days later and see new junk in the HTML that I know I didn't enter (and there were no other editors according to the page history).

Are you seeing this problem?

Examples of simple HTML tags that got corrupted over time: (corrupted parts underlined)

  • <p><spanstyle="color: rgb(0,0,0);">blah blah blah...
  • <h2><span style="letter-spacing: -0.008em;">Features</span></h2>
  • <p>Refer to the chapter&nbsp;<em>blah blah blah...
  • <p class="BodyText1">Blah blah blah&nbsp;blah blah blah...
  • <h6><strong><span style="color: rgb(94,108,132);">Blah blah blah...</span></strong></h6>
  • <p><span style="letter-spacing: 0.0px;">Blah blah blah...
  • <h2><span style="font-size: 20.0px;letter-spacing: -0.008em;">Blah blah blah...
  • Blah blah blah...<strong style="letter-spacing: 0.0px;">blah blah blah</strong>
  • Blah blah blah <em>blah </em>blah blah blah...
  • <li><span style="letter-spacing: 0.0px;">Blah blah blah

This junk, when exported to Word, creates an even bigger mess of styles that I have to manually correct and map to a single (example) Heading 1 style or body style.

Atlassian's position is that the only solution is to buy a third-party extension (not in my budget) that will fix the junk output (according to the sales pitches of the extension manufacturers...).

1 answer

2 votes
Bill Bailey
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 18, 2020

I have been a Confluence user for probably 8 years now. And the only time I have seen this type of thing is from people copying and pasting from Word or from another HTML view (even of a Confluence page).

Have you seen this with a different browser? Just trying to think of what could be affecting the entry of formatted text. Browser extension? TinyMCE extension?

My fix for this is the free version of the source editor and Regex to go through and cleanse a page.

Laura Schneider
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 18, 2020

Hm. That's a thought... I can't swear nobody's been copying from one Confluence page to another... We know better than to copy from Word or the Internet or anything, but I suppose I haven't specifically prohibited copying from one page to another within Confluence. I'll look into that, thanks!

It's definitely not browser (or platform)-specific, I'm seeing this on hundreds of pages. Regex fixing each page <shudder> I suppose that'll be my best solution. Ouch.

Thanks, Bill! I'll check into this.

Bill Bailey
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 18, 2020

Well maybe after you educate users and then clean up pages, it will stop. I have a long page of various RegEx patterns I use to clean out crap, for example, span tags:

</?span.*?>

Have fun!

Like # people like this
Roman Katzer
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
January 20, 2021

Hi Bill and Laura,

could you share your regexes and additional wisdom?

We're in a similar spot (hundreds of requirements copy/pasted from Excel, Word, PDF and HTML, including tables - plus text first colored red, then black ("Black must be the default!?")). You can't imagine the mess...

Thanks!

Laura Schneider
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 20, 2021

Hi Roman,

I do feel your pain. Unfortunately, I have no additional wisdom to add beyond what Bill Bailey said: no copying-pasting or you'll get junk. It's an epic fail on the part of Atlassian, because training developers to not copy-paste is not a solution to bad software design on the part of Atlassian. Developers are not technical writers and have no idea what a style or stylesheet is or how it should be used. And they shouldn't need to know this.

Unfortunately, our respective organizations are using a tool not mean for requirements or tech docs development/storage/output. It is for collaboration on ideas, taking meeting notes, etc. Here is their page describing how it could be used (https://www.atlassian.com/software/confluence) , but it is overstating the usefulness when it comes to collaboration: what good is collaborating on creating information if it cannot be output to common tools like Word? And cannot take input from any outside sources without creating stylistic chaos? <shrug>

This is a failure to use the tool as it was intended: taking meeting notes and basic blogging with no intention of output or formatting consistency. Sorry I don't have better news for you.

Best of luck,

Laura

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events