Hello Atlassian Community,
Regarding Data Manager Cleansing Rules:
How did you learn to effectively apply Cleansing Rules within Data Manager? Was it through trial and error, or do you have a set of best practices you follow?
I've found managing this aspect a bit challenging. The impression I have is that we need prior knowledge about the data's specifics because I could not find an easy way to analyze data within the Data Manager’s user interface. So, I would need to know earlier what I need to clean in my data before bringing it to Data Manager.
Does this make sense to you? I would love to hear your experiences with data cleansing in Data Manager.
What worked, what didn’t, and how did you refine your approach?
Looking forward to an enriching discussion and learning from all of you! Thank you for sharing your knowledge!
Hi Mariana,
There is a best practice rule that we follow in dealing with Cleansing Rules, There are two default cleansing rules we ALWAYS used for any data source that is imported into Data Manager (ADM). These two default cleansing rules are 1: Exclude Null or Empty Primary Key (PK) and 2: Remove Primary Key (PK) Duplicates as depicted in the screen grab below.
The premise in ADM is to analyse data that is active and relevant. Data that is irrelevant should be removed. What is inactive or irrelevant data?
For example when dealing with, or analysing compute based objects such as desktops or servers, inactive or irrelevant data may consist of:
The above examples need to be either excluded or filtered from the data source. This is where we use cleansing rules to remove the irrelevant records.
Another best practice: any cleansing rule that we introduce will always be executed between the default rules i.e. 1: Exclude Null or Empty Primary Key (PK) and 2: Remove Primary Key (PK) Duplicates. The Exclude Null or Empty Primary Key (PK) and the Remove Primary Key (PK) Duplicates will always be the first and the last cleansing rule to be executed.
You also mentioned that you need prior knowledge about the data's specifics because you could not find an easy way to analyse data within the Data Manager’s user interface. So, you will need to know earlier what you need to clean in my data before bringing it to Data Manager. Here is my response to your comment above.
Once the datasource has been established in ADM, you can view the staged data of the data source as depicted in the screen grab below. In the example below, we have the ability to look into the staged data of the SCCM data source by clicking on the "..." button of the data source and selecting "View Staged Data".
This brings up the data from the SCCM source that has already been transferred as staged data into ADM, from the corresponding job that you setup in Adapters - refer to the screen grab below.
Here, you'll be able to analyse all the attributes from the SCCM source and you may want to introduce a cleansing rule that filters out value hun from the Manufacturer attribute.
So you will:
I hope that is helps clarifies any ambiguities around the usage of cleansing rule.
Hun Chan,
Thanks for the explanation and details. We are super excited to start using Assets Data Manager, we just haven't done it yet.
Thank you!
Shawn Stevens
Hi @Hun Chan , thanks for detailed reply. I am experiencing this error "Duplicate values found in the secondary key column" when I run my cleansing rule. I have done research but cannot seem to locate the issue. Do you have any experience with this issue?
Thank you!
We get this error, unsure of where we go to fix it ?
Failed | 00:00:00.264 | Some of the settings in the cleansing rules are incomplete. Please review the cleansing rules. |
Hi Simon,
Can you please review and ensure that all the cleansing rules have had the column name and the corresponding column (string) value, where appropriate has been defined and filled like the example below.
Regards,
Hun
Hi @Paul Daly , you will need to add/implement the cleansing rule called remove duplicates by column into the data source that is giving you the error when you run cleansing. I've attached a screen grab of the rule below.
In this remove duplicates by column rule, select a reason (e.g. Force Clean by Column). If the reason does not exist, go to Data Manager->Settings->Reasons and Add the reason Force Clean by Column.
Then select a column as depicted below. This is your secondary key. What have you defined/assigned as your secondary key? The example screen shot below shows SerialNumber however yours may be different. You should be able to validate your secondary key in your mappings.
Please ensure that this remove duplicates by column rule is placed just before the Remove PK Duplicates - Forced Cleaned cleansing rule. Save the rules and run the cleansing. This should resolve the error.