Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,368,560
Community Members
 
Community Events
168
Community Groups

How to decrease disk usage or find interesting things during investigation on filesystem?

Hi, awesome community! 

I hope you are doing well. 

 

In this article, I'd like to share my usage a small util fdupes

Home page of that project located in that GitHub

Let's determine to exist use case: 

1. We have a huge directory {jira_home}/data/attachments/ or {confluence_home}/attachments. (for bamboo and bitbucket it will not work properly)

In my use case is ~750GB and ~180GB

All those instances are on-premises.

We need to analyze exist disk usage for duplicates and if it is possible to replace by symlinks.

 

So, without any more stalling, here we go.

1. Install fdupes if it is not in your system.

on RHEL/CentOS-based and Fedora based system

yum install fdupes
dnf install fdupes [On Fedora 22 onwards]

Debian based:

sudo apt-get install fdupes

or 

sudo aptitude install fdupes

macOS based:

brew install fdupes

 

2. Next step is to change to an attachments directory

fdupes --recurse --size --summarize ./attachments/

3. You can see progress like this:

image.png

 

4. And finally I got that result. 

image.png

 

5. Only after analyze I suggest to replace by --hardlinks, I totally recommend to check on test files for to do production. Sometimes better to do that per Jira project key directory.  

 

Conclusion: 

Doing that analyze, I found causes like exocet, clone plus, custom post-function to clone issue, and mail handlers problems, which generates most of the duplicates. And of course, it can be workflow bottleneck, some duplicate, incorrectly parsing emails. 

And please, don't do as a fanatic, because disk usage is the cheapest thing in nowadays. (I hope) ;)

Also, I suggest to read: Hierarchical File System Attachment Storage and Jira attachments structure

 

I will be happy if community members post here: own statistics.

Maybe will implement disk usage deduplication ;)

 

P.S.: For windows, please have a look that util jdupes

P.S.S. If you want to do for all files, please, have a look that kvdo , because on lower level.   I hope that article will interesting for you ( VDO new linux compression layer ).

P.S.S.S. If you are using NetApp  ;), it is out-of-the-box functionality 

 

Cheers,

Gonchik Tsymzhitov

4 comments

Interesting tool, thanks for the article

What are the advantages of using the tool? As you say, storage is cheap.

How much space did you recover? 1%, 5%?

I don't advise duplicating attachments in Jira with plugins, workflows or anything. It just confuses people and goes against the DRY principle

In first time I decreased 17% from ~340GB on old Jira instance before migrate into main.

Then I started to deep investigate why a lot of duplicate files generates. Because end of users, can't upload in short time a lot of duplicates. I hope only automatisation ;) 

 

About cheap, yes, nowadays -+ 100GB it's ok. Just check dockers, m2 , gradle, npm dependency directories . 

Gonchik,

I would rather go for deduplication using a SAN system that supports this function transparently. Assume the same file is attached to different pages in Confluence & the users aren't aware the file exists more than once because they don't see the page or space. Then this file or page containing the file is deleted by someone: What happens? Is the file still available on the other pages with your solution?

It makes much more sense with Jira on closed, older issues. If we have issues that need to be documented after closed, we move them to a Confluence page & also move the attachments of the issue. Simple princinple: Jira is Process, Confluence is Documentation...

For us we introduced an automatic attachment purger on older versions of attachments in Confluence. The most up to date file is never purged.

Jan-Peter, 

Functionality of a SAN system is good. But I was wondering a lot of company don't have that function.

Therefore I highlighted kvdo (https://www.redhat.com/en/blog/look-vdo-new-linux-compression-layer). e,g, who used proxmox (can read that stats https://forum.proxmox.com/threads/virtual-data-optimizer-vdo.42838/)

 

Also, sometimes checking that stats will help to find some interesting automatisation.  I hope that was to me very interesting :)

 

Could you upload your stats of that info into this thread, please?  

About clean the older version of attachments. Thanks for that notice, I will share my solution around that :)

Comment

Log in or Sign up to comment