Remove Duplicate Cell on Second and Third Row

kariaheart

New member
Joined
Oct 22, 2011
Messages
10
Reaction score
0
Points
0
Hi,

I have a file where an order may have one, two or three deliveries against an order and a solution that I am after in Power Query is to have an output file whereby the repeated order amounts are removed.

Have attached an example and would be grateful if one of you Ninja's would solve this for me.

Am able to do this in Excel using sumproduct function (lot of time and freeze with large data) but desire a Power Query solution.

Many thanks.
 

Attachments

  • File to remove duplicate ordered qty.xlsx
    47.5 KB · Views: 16
This does it step by step

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    index.Add = Table.AddIndexColumn(Source, "Index", 0, 1),
    previousitem.get = Table.ReplaceErrorValues(Table.AddColumn(index.Add, "Previous Item", each index.Add{[Index]-1}[Item]), {{"Previous Item", 0}}),
    ordered.Sparse = Table.AddColumn(previousitem.get, "Custom", each if [Item] <> [Previous Item] then [Ordered] else ""),
    redundantcols.Remove = Table.RemoveColumns(ordered.Sparse,{"Previous Item", "Ordered", "Index"}),   
    orderedname.Reinsert = Table.RenameColumns(redundantcols.Remove,{{"Custom", "Ordered"}}),
    columns.Reorder = Table.ReorderColumns(orderedname.Reinsert,{"Item", "Ordered", "SaleQty", "Customer", "DocRef", "DocDate", "Order#", "OrderDate"})
in
    columns.Reorder
 
Last edited:
I tend to dive straight into M, but if you want to do it via the GUI, these are the steps to follow.

Load the table into PQ in the normal way
Then in Power Query, follow these steps
  • add an index column, it doesn't matter if you start with 0 or 1 - Add Column>General>Index column
  • add a custom column, I rename column to Previous Item - Add Column>General>Custom Column, and add the formula #"Added Index"{[Index]-1}[Item] (assuming that the previous step was names #"Added Index")
  • replace the error in row 1 - select column [Previous Item] and Transform>Any Column>Replace Values>Replace Errors and set to 0 (this is so that the [Item] and [Previous Item] do not match for the next step)
  • add a custom column to get your desired [Ordered] values, in this case I do not change the default name as it gets changed later - Add Column>General>Custom Column, dd the formula if [Item]<>[Previous Item] then [Ordered] else ""
  • delete the redundant helper columns and the original [Ordered] column - select [Ordered], [Index] and [Previous Item] columns, and Home>Manage Columns?Remove Columns (or just select them and hit the Delete key)
  • rename the new ordered values column - select the [Custom] column, Transform>Any Column>Rename and change to Ordered
  • move the new ordered values column back to the original ordered values column position - select the [Ordered] columns, and drag it over to between the [Item] and [SaleQty] columns

This does mean of course that you get the horrible PQ naming convention applied to your steps.
 
This is my version... :smile:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
#"Inserted Merged Column" = Table.AddColumn(#"Added Index", "Merged", each Text.Combine({Text.From([Item]), "-", Text.From([Ordered])}), type text),
#"Added Custom" = Table.AddColumn(#"Inserted Merged Column", "New Ordered", each try if #"Inserted Merged Column"[Merged]{[Index]-1} = [Merged] then null else [Ordered] otherwise [Ordered]),
#"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Item", "Ordered", "New Ordered", "SaleQty", "Customer", "DocRef", "DocDate", "Order#", "OrderDate", "Index", "Merged"}),
#"Removed Columns" = Table.RemoveColumns(#"Reordered Columns",{"Ordered", "Index", "Merged"})
in
#"Removed Columns"
 
Hey there,

There are two ways to deal with this particular issue. The first is what I call the "Index-1" method, which is what Bob & Rudi have illustrated for you. The other is what I call the "Merge" method, which involves adding Index columns, then merging the tables together to perform conditional logic.

The difference between the two is all around performance. When you run the "Index-1" method against small data sets, it's doesn't really matter which you choose. But when I expand your data set to 7200 rows (which isn't large), the time differences are massive.

Here's a table that shows the differences based some tests I just did:
Row CountIndex-1 MethodMerge Method
9~0.23 seconds~0.10 seconds
500~2.02 seconds~0.20 seconds
7200~193 seconds~0.50 seconds

I don't know your tolerance for latency, but 2 seconds is about the limit of mine.

Here's how you set up the Merge method:
  • Connect to your data as normal
  • Sort your rows into the correct descending order structure
  • Add an Index column from 0
  • Add an Index column from 1
  • Go to Home --> Merge Queries and choose to merge to the query you are in. Pick to merge Index (top) against Index.1 (bottom) using the default join (Left Outer)
  • Rename the new column "Prior"
  • Expand new column to show Item, Ordered and SaleQty (and leave the option to preface the query name checked)
  • Add a Conditional Column for your new OrderQty using the following logic (where square braces indicated columns, not text):
    • If [Item] equals null then [Ordered]
    • Else If [Item] does not equal [Prior.Item] then [Ordered]
    • Else If [Ordered] equals [Prior.Ordered] then null
    • Otherwise [Ordered]
  • Add another conditional column using the following logic
    • If [Item] equals null then [SaleQty]
    • Else If [Item] does not equal [Prior.Item] then [SaleQty]
    • Else If [SaleQty] equals [Prior.SaleQty] then null
    • Otherwise [SaleQty]
  • Remove [Ordered],[SaleQty],[Prior.Item],[Prior.SaleQty]
  • Reorder the columns as needed
  • Set your data types

M code for this:
Code:
let    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"Order#", Order.Ascending}, {"Item", Order.Ascending}, {"DocDate", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
    #"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 1, 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Index1",{"Index"},#"Added Index1",{"Index.1"},"Added Index1",JoinKind.LeftOuter),
    #"Renamed Columns" = Table.RenameColumns(#"Merged Queries",{{"Added Index1", "Prior"}}),
    #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Index", "Index.1"}),
    #"Expanded Prior" = Table.ExpandTableColumn(#"Removed Columns", "Prior", {"Item", "Ordered", "SaleQty"}, {"Prior.Item", "Prior.Ordered", "Prior.SaleQty"}),
    #"Added Conditional Column" = Table.AddColumn(#"Expanded Prior", "OrderQty", each if [Item] = null then [Ordered] else if [Item] <> [Prior.Item] then [Ordered] else if [Ordered] = [Prior.Ordered] then null else [Ordered] ),
    #"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "SalesQty", each if [Item] = null then [SaleQty] else if [Item] <> [Prior.Item] then [SaleQty] else if [SaleQty] = [Prior.SaleQty] then null else [SaleQty] ),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Conditional Column1",{"Ordered", "SaleQty", "Prior.Item", "Prior.Ordered", "Prior.SaleQty"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"Item", "OrderQty", "SalesQty", "Customer", "DocRef", "DocDate", "Order#", "OrderDate"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"Item", Int64.Type}, {"OrderQty", type number}, {"SalesQty", type number}, {"Customer", type text}, {"DocRef", Int64.Type}, {"DocDate", type date}, {"Order#", Int64.Type}, {"OrderDate", type date}})
in
    #"Changed Type"

Sample workbook attached
 

Attachments

  • File to remove duplicate ordered qty.xlsx
    25.5 KB · Views: 17
Dear Bob, Rudi and Ken

I thank you so much for the solutions that you have given.

This is my lucky day and wishing you all a nice day/night.

Thanks again.
 
Back
Top