Data Cleansing with Data Manager: Seeking Insights and Experiences

Mariana Silveira Sales
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 17, 2025

Hello Atlassian Community,

Regarding Data Manager Cleansing Rules:

How did you learn to effectively apply Cleansing Rules within Data Manager? Was it through trial and error, or do you have a set of best practices you follow?

I've found managing this aspect a bit challenging. The impression I have is that we need prior knowledge about the data's specifics because I could not find an easy way to analyze data within the Data Manager’s user interface. So, I would need to know earlier what I need to clean in my data before bringing it to Data Manager.

Does this make sense to you? I would love to hear your experiences with data cleansing in Data Manager.

What worked, what didn’t, and how did you refine your approach?

Looking forward to an enriching discussion and learning from all of you! Thank you for sharing your knowledge!

3 comments

Comment

Log in or Sign up to comment
Shawn Stevens
Contributor
January 17, 2025

Thank you for starting this conversation. We are just starting our journey into Data Manager so I will be intently watching this to get any tips, tricks, or pitfalls. Looking forward to the conversation. 

Like Rodney Estrada likes this
Hun Chan
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 17, 2025

Hi Shawn,

Suggest that you refer to my response to Mariana. Thanks.

Hun Chan
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 17, 2025

Hi Mariana,

There is a best practice rule that we follow in dealing with Cleansing Rules, There are two default cleansing rules we ALWAYS used for any data source that is imported into Data Manager (ADM). These two default cleansing rules are 1: Exclude Null or Empty Primary Key (PK) and 2: Remove Primary Key (PK) Duplicates as depicted in the screen grab below. 

The premise in ADM is to analyse data that is active and relevant. Data that is irrelevant should be removed. What is inactive or irrelevant data?

For example when dealing with, or analysing compute based objects such as desktops or servers, inactive or irrelevant data may consist of:

  • Devices deemed to be stale (these are devices that have not been inventoried or logged into or scanned for > 90 days)
  • Devices that are decommissioned
  • Devices that are retired
  • Devices that are ignored
  • Devices that are non-operational
  • Virtual machines that are powered off
  • Devices that are disabled
  • Devices that are not classed as computers (servers or desktops) such as printers, monitor, tables, keyboards, etc.
  • Devices that have NULL values in important attributes such as inventory date attributes 

The above examples need to be either excluded or filtered from the data source. This is where we use cleansing rules to remove the irrelevant records.

Another best practice: any cleansing rule that we introduce will always be executed between the default rules i.e. 1: Exclude Null or Empty Primary Key (PK) and 2: Remove Primary Key (PK) Duplicates. The Exclude Null or Empty Primary Key (PK) and the Remove Primary Key (PK) Duplicates will always be the first and the last cleansing rule to be executed. 

Screenshot 2025-01-18 at 1.53.07 PM.png

You also mentioned that you need prior knowledge about the data's specifics because you could not find an easy way to analyse data within the Data Manager’s user interface. So, you will need to know earlier what you need to clean in my data before bringing it to Data Manager. Here is my response to your comment above. 

Once the datasource has been established in ADM, you can view the staged data of the data source as depicted in the screen grab below. In the example below, we have the ability to look into the staged data of the SCCM data source by clicking on the "..." button of the data source and selecting "View Staged Data". 

Screenshot 2025-01-18 at 2.25.53 PM.png

This brings up the data from the SCCM source that has already been transferred as staged data into ADM, from the corresponding job that you setup in Adapters - refer to the screen grab below.

Here, you'll be able to analyse all the attributes from the SCCM source and you may want to introduce a cleansing rule that filters out value hun from the Manufacturer attribute.

So you will:

  1. Go the cleanse and import screen 
  2. "..." button for the SCCM data source
  3. Cleansing rules
  4. Select appropriate cleansing rule (e.g. filter record equal specific value)
  5. Add cleansing rule 
  6. Move and ensure that this new cleansing rule is between Exclude Null or Empty Primary Key (PK) and Remove Primary Key (PK) Duplicates
  7. Configure the cleansing rule: select an appropriate reason to describe the cleansing rule, select the column name e.g. Manufacturer, select the column value e.g. hun
  8. Save the cleansing rules
  9. Cleanse the SCCM data source.

Screenshot 2025-01-18 at 2.38.15 PM.png

I hope that is helps clarifies any ambiguities around the usage of cleansing rule.

Like # people like this
Shawn Stevens
Contributor
January 21, 2025

Hun Chan, 

Thanks for the explanation and details. We are super excited to start using Assets Data Manager, we just haven't done it yet.

Thank you!

Shawn Stevens

Simon Carter January 23, 2025

We get this error, unsure of where we go to fix it ?

 

Failed
00:00:00.264Some of the settings in the cleansing rules are incomplete. Please review the cleansing rules.
Hun Chan
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 26, 2025

Hi Simon,

Can you please review and ensure that all the cleansing rules have had the column name and the corresponding column (string) value, where appropriate has been defined and filled like the example below.

Regards,

Hun 

Screenshot 2025-01-27 at 11.12.03 AM.png

TAGS
AUG Leaders

Atlassian Community Events