It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

How to decrease disk usage or find interesting things during investigation on filesystem?

Hi, awesome community! 

I hope you are doing well. 

 

In this article, I'd like to share my usage a small util fdupes

Home page of that project located in that GitHub

Let's determine to exist use case: 

1. We have a huge directory {jira_home}/data/attachments/ or {confluence_home}/attachments. (for bamboo and bitbucket it will not work properly)

In my use case is ~750GB and ~180GB

All those instances are on-premises.

We need to analyze exist disk usage for duplicates and if it is possible to replace by symlinks.

 

So, without any more stalling, here we go.

1. Install fdupes if it is not in your system.

on RHEL/CentOS-based and Fedora based system

yum install fdupes
dnf install fdupes [On Fedora 22 onwards]

Debian based:

sudo apt-get install fdupes

or 

sudo aptitude install fdupes

macOS based:

brew install fdupes

 

2. Next step is to change to an attachments directory

fdupes --recurse --size --summarize ./attachments/

3. You can see progress like this:

image.png

 

4. And finally I got that result. 

image.png

 

5. Only after analyze I suggest to replace by --hardlinks, I totally recommend to check on test files for to do production. Sometimes better to do that per Jira project key directory.  

 

Conclusion: 

Doing that analyze, I found causes like exocet, clone plus, custom post-function to clone issue, and mail handlers problems, which generates most of the duplicates. And of course, it can be workflow bottleneck, some duplicate, incorrectly parsing emails. 

And please, don't do as a fanatic, because disk usage is the cheapest thing in nowadays. (I hope) ;)

Also, I suggest to read: Hierarchical File System Attachment Storage and Jira attachments structure

 

I will be happy if community members post here: own statistics.

Maybe will implement disk usage deduplication ;)

 

P.S.: For windows, please have a look that util jdupes

P.S.S. If you want to do for all files, please, have a look that kvdo , because on lower level.   I hope that article will interesting for you ( VDO new linux compression layer ).

P.S.S.S. If you are using NetApp  ;), it is out-of-the-box functionality 

 

Cheers,

Gonchik Tsymzhitov

4 comments

Interesting tool, thanks for the article

What are the advantages of using the tool? As you say, storage is cheap.

How much space did you recover? 1%, 5%?

I don't advise duplicating attachments in Jira with plugins, workflows or anything. It just confuses people and goes against the DRY principle

In first time I decreased 17% from ~340GB on old Jira instance before migrate into main.

Then I started to deep investigate why a lot of duplicate files generates. Because end of users, can't upload in short time a lot of duplicates. I hope only automatisation ;) 

 

About cheap, yes, nowadays -+ 100GB it's ok. Just check dockers, m2 , gradle, npm dependency directories . 

Gonchik,

I would rather go for deduplication using a SAN system that supports this function transparently. Assume the same file is attached to different pages in Confluence & the users aren't aware the file exists more than once because they don't see the page or space. Then this file or page containing the file is deleted by someone: What happens? Is the file still available on the other pages with your solution?

It makes much more sense with Jira on closed, older issues. If we have issues that need to be documented after closed, we move them to a Confluence page & also move the attachments of the issue. Simple princinple: Jira is Process, Confluence is Documentation...

For us we introduced an automatic attachment purger on older versions of attachments in Confluence. The most up to date file is never purged.

Jan-Peter, 

Functionality of a SAN system is good. But I was wondering a lot of company don't have that function.

Therefore I highlighted kvdo (https://www.redhat.com/en/blog/look-vdo-new-linux-compression-layer). e,g, who used proxmox (can read that stats https://forum.proxmox.com/threads/virtual-data-optimizer-vdo.42838/)

 

Also, sometimes checking that stats will help to find some interesting automatisation.  I hope that was to me very interesting :)

 

Could you upload your stats of that info into this thread, please?  

About clean the older version of attachments. Thanks for that notice, I will share my solution around that :)

Comment

Log in or Sign up to comment
TAGS

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you