Data Cleansing with Data Manager: Seeking Insights and Experiences

Hello Atlassian Community,

Regarding Data Manager Cleansing Rules:

How did you learn to effectively apply Cleansing Rules within Data Manager? Was it through trial and error, or do you have a set of best practices you follow?

I've found managing this aspect a bit challenging. The impression I have is that we need prior knowledge about the data's specifics because I could not find an easy way to analyze data within the Data Manager’s user interface. So, I would need to know earlier what I need to clean in my data before bringing it to Data Manager.

Does this make sense to you? I would love to hear your experiences with data cleansing in Data Manager.

What worked, what didn’t, and how did you refine your approach?

Looking forward to an enriching discussion and learning from all of you! Thank you for sharing your knowledge!

4 comments

Comment

Thank you for starting this conversation. We are just starting our journey into Data Manager so I will be intently watching this to get any tips, tricks, or pitfalls. Looking forward to the conversation.

Like • Rodney Estrada likes this

Hi Shawn,

Suggest that you refer to my response to Mariana. Thanks.

Hi Mariana,

There is a best practice rule that we follow in dealing with Cleansing Rules, There are two default cleansing rules we ALWAYS used for any data source that is imported into Data Manager (ADM). These two default cleansing rules are 1: Exclude Null or Empty Primary Key (PK) and 2: Remove Primary Key (PK) Duplicates as depicted in the screen grab below.

The premise in ADM is to analyse data that is active and relevant. Data that is irrelevant should be removed. What is inactive or irrelevant data?

For example when dealing with, or analysing compute based objects such as desktops or servers, inactive or irrelevant data may consist of:

Devices deemed to be stale (these are devices that have not been inventoried or logged into or scanned for > 90 days)
Devices that are decommissioned
Devices that are retired
Devices that are ignored
Devices that are non-operational
Virtual machines that are powered off
Devices that are disabled
Devices that are not classed as computers (servers or desktops) such as printers, monitor, tables, keyboards, etc.
Devices that have NULL values in important attributes such as inventory date attributes

The above examples need to be either excluded or filtered from the data source. This is where we use cleansing rules to remove the irrelevant records.

Another best practice: any cleansing rule that we introduce will always be executed between the default rules i.e. 1: Exclude Null or Empty Primary Key (PK) and 2: Remove Primary Key (PK) Duplicates. The Exclude Null or Empty Primary Key (PK) and the Remove Primary Key (PK) Duplicates will always be the first and the last cleansing rule to be executed.

Screenshot 2025-01-18 at 1.53.07 PM.png

You also mentioned that you need prior knowledge about the data's specifics because you could not find an easy way to analyse data within the Data Manager’s user interface. So, you will need to know earlier what you need to clean in my data before bringing it to Data Manager. Here is my response to your comment above.

Once the datasource has been established in ADM, you can view the staged data of the data source as depicted in the screen grab below. In the example below, we have the ability to look into the staged data of the SCCM data source by clicking on the "..." button of the data source and selecting "View Staged Data".

Screenshot 2025-01-18 at 2.25.53 PM.png

This brings up the data from the SCCM source that has already been transferred as staged data into ADM, from the corresponding job that you setup in Adapters - refer to the screen grab below.

Here, you'll be able to analyse all the attributes from the SCCM source and you may want to introduce a cleansing rule that filters out value hun from the Manufacturer attribute.

So you will:

Go the cleanse and import screen
"..." button for the SCCM data source
Cleansing rules
Select appropriate cleansing rule (e.g. filter record equal specific value)
Add cleansing rule
Move and ensure that this new cleansing rule is between Exclude Null or Empty Primary Key (PK) and Remove Primary Key (PK) Duplicates
Configure the cleansing rule: select an appropriate reason to describe the cleansing rule, select the column name e.g. Manufacturer, select the column value e.g. hun
Save the cleansing rules
Cleanse the SCCM data source.

I hope that is helps clarifies any ambiguities around the usage of cleansing rule.

Like • like this

Hun Chan,

Thanks for the explanation and details. We are super excited to start using Assets Data Manager, we just haven't done it yet.

Thank you!

Shawn Stevens

Hi @Hun Chan , thanks for detailed reply. I am experiencing this error "Duplicate values found in the secondary key column" when I run my cleansing rule. I have done research but cannot seem to locate the issue. Do you have any experience with this issue?

Thank you!

We get this error, unsure of where we go to fix it ?

Failed

00:00:00.264

Some of the settings in the cleansing rules are incomplete. Please review the cleansing rules.

Hi Simon,

Can you please review and ensure that all the cleansing rules have had the column name and the corresponding column (string) value, where appropriate has been defined and filled like the example below.

Regards,

Hun

Screenshot 2025-01-27 at 11.12.03 AM.png

Hi @Paul Daly , you will need to add/implement the cleansing rule called remove duplicates by column into the data source that is giving you the error when you run cleansing. I've attached a screen grab of the rule below.

In this remove duplicates by column rule, select a reason (e.g. Force Clean by Column). If the reason does not exist, go to Data Manager->Settings->Reasons and Add the reason Force Clean by Column.

Then select a column as depicted below. This is your secondary key. What have you defined/assigned as your secondary key? The example screen shot below shows SerialNumber however yours may be different. You should be able to validate your secondary key in your mappings.

Please ensure that this remove duplicates by column rule is placed just before the Remove PK Duplicates - Forced Cleaned cleansing rule. Save the rules and run the cleansing. This should resolve the error.

Screenshot 2025-02-23 at 10.59.51 AM.png

Like • Paul Daly likes this

@Hun Chan

Thank you for this, I will try it first thing Monday and let you know how it goes. I appreciate it very much.

👍

Was this helpful?

Thanks!

Assets Data Manager (f.k.a. AirTrack)

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Data Cleansing with Data Manager: Seeking Insights and Experiences

4 comments

Comment

Was this helpful?

Thanks!

TAGS

Atlassian Community Events