Problem with "Remove Duplicates" in "Get & Transform" (Excel 2016)
The problem is with "Remove Duplicates" in "Get & Transform" in Excel 2016. It also persists with with "Power Query" in Excel 2013.
The data source is a csv file data.csv
The processed Excel file is Duplicates_not_removed.xlsx
Using Excel 2016, I've used "Get & Transform" to carry out the following steps:
. Loaded data from file data.csv. There are four text columns - Col_A to Col_D
. Changed all column types to "Text"
. Added a custom column containing the length of data in Col_D using Text.Transform()
. Filtered the new column to keep only text with length 2
. Removed the newly added column
. Merged columns Col_A to Col_D to get a single column Merged_for_primary
. Selected the new single column and clicked on "Remove Duplicates"
. "Close and Load" data to Sheet1
. Using "Conditional Formatting" in Excel, I've highlighted duplicate items in the single column in Sheet1
. As can be seen from the attached Excel file Duplicates_not_removed.xlsx, the column contains duplicate items despite running "Remove Duplicates" from "Get & Transform"
I've checked with "Power Query" from Excel 2013 and the problem persists there also.
"Remove Duplicates" from "Get & Transform" or "Power Query" did not do its job but Excel can easily detect duplicates. What seems to be the problem?
I'd be grateful for any help.
The problem is with "Remove Duplicates" in "Get & Transform" in Excel 2016. It also persists with with "Power Query" in Excel 2013.
The data source is a csv file data.csv
The processed Excel file is Duplicates_not_removed.xlsx
Using Excel 2016, I've used "Get & Transform" to carry out the following steps:
. Loaded data from file data.csv. There are four text columns - Col_A to Col_D
. Changed all column types to "Text"
. Added a custom column containing the length of data in Col_D using Text.Transform()
. Filtered the new column to keep only text with length 2
. Removed the newly added column
. Merged columns Col_A to Col_D to get a single column Merged_for_primary
. Selected the new single column and clicked on "Remove Duplicates"
. "Close and Load" data to Sheet1
. Using "Conditional Formatting" in Excel, I've highlighted duplicate items in the single column in Sheet1
. As can be seen from the attached Excel file Duplicates_not_removed.xlsx, the column contains duplicate items despite running "Remove Duplicates" from "Get & Transform"
I've checked with "Power Query" from Excel 2013 and the problem persists there also.
"Remove Duplicates" from "Get & Transform" or "Power Query" did not do its job but Excel can easily detect duplicates. What seems to be the problem?
I'd be grateful for any help.