Is anyone else seeing this problem? Over time, I see a lot of junk HTML inserted into my Confluence pages.
I will work on a page, then come back to it and work on it further (applying the styles from the menu or just typing directly in Confluence). Over time, the HTML source (as viewed in the Confluence editor) collects "junk". For example, a randomly inserted non-breaking space. Part of a paragraph will change to a different color, the font will change to a different style (bold, italic), letter spacing will change, etc. Rather than simple HTML, complex formatting gets inserted seemingly at random. The only way I could see this "junk" entering Confluence is if someone were copying text from a pre-formatted source into the page, which I've already ruled out: as the sole editor of a page I can close and re-open it moments or days later and see new junk in the HTML that I know I didn't enter (and there were no other editors according to the page history).
Are you seeing this problem?
Examples of simple HTML tags that got corrupted over time: (corrupted parts underlined)
This junk, when exported to Word, creates an even bigger mess of styles that I have to manually correct and map to a single (example) Heading 1 style or body style.
Atlassian's position is that the only solution is to buy a third-party extension (not in my budget) that will fix the junk output (according to the sales pitches of the extension manufacturers...).
I have been a Confluence user for probably 8 years now. And the only time I have seen this type of thing is from people copying and pasting from Word or from another HTML view (even of a Confluence page).
Have you seen this with a different browser? Just trying to think of what could be affecting the entry of formatted text. Browser extension? TinyMCE extension?
My fix for this is the free version of the source editor and Regex to go through and cleanse a page.
Hm. That's a thought... I can't swear nobody's been copying from one Confluence page to another... We know better than to copy from Word or the Internet or anything, but I suppose I haven't specifically prohibited copying from one page to another within Confluence. I'll look into that, thanks!
It's definitely not browser (or platform)-specific, I'm seeing this on hundreds of pages. Regex fixing each page <shudder> I suppose that'll be my best solution. Ouch.
Thanks, Bill! I'll check into this.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Well maybe after you educate users and then clean up pages, it will stop. I have a long page of various RegEx patterns I use to clean out crap, for example, span tags:
</?span.*?>
Have fun!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Bill and Laura,
could you share your regexes and additional wisdom?
We're in a similar spot (hundreds of requirements copy/pasted from Excel, Word, PDF and HTML, including tables - plus text first colored red, then black ("Black must be the default!?")). You can't imagine the mess...
Thanks!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Roman,
I do feel your pain. Unfortunately, I have no additional wisdom to add beyond what Bill Bailey said: no copying-pasting or you'll get junk. It's an epic fail on the part of Atlassian, because training developers to not copy-paste is not a solution to bad software design on the part of Atlassian. Developers are not technical writers and have no idea what a style or stylesheet is or how it should be used. And they shouldn't need to know this.
Unfortunately, our respective organizations are using a tool not mean for requirements or tech docs development/storage/output. It is for collaboration on ideas, taking meeting notes, etc. Here is their page describing how it could be used (https://www.atlassian.com/software/confluence) , but it is overstating the usefulness when it comes to collaboration: what good is collaborating on creating information if it cannot be output to common tools like Word? And cannot take input from any outside sources without creating stylistic chaos? <shrug>
This is a failure to use the tool as it was intended: taking meeting notes and basic blogging with no intention of output or formatting consistency. Sorry I don't have better news for you.
Best of luck,
Laura
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.