Sorting duplicates into groupings

Sabytiger

New member
Joined
Jan 23, 2017
Messages
5
Reaction score
0
Points
0
I am somewhat new the amazingness that is excel formulas, I know some of the basics but when it comes to more complicated formulas I’m at a loss. I was wondering if someone could point me the right direction for what function I would need or what formula would work for this particular problem. I’m not even sure if this is possible.

I have 4 columns of data, I want something that can look at a row, see what numbers are in that rows columns, then find any other rows that match that columns numbers and group them together. Then put a value to each separate grouping. Then here where it gets difficult, while searching if a column has one of the original number the formula was searching for, but then also finds a different value in another column of the row that has one of the numbers the formula was searching for, it would then also go find that new number and assign it to the same grouping as the first and second row.
I am horrible at explaining things so I have included an example of what I want done.

if row A has 2 in column 1, 403 in column 2, and 270 in column 3 I want any row with any of those numbers in the same columns to be the same number as row A is assigned.
so, if row B only had 403 in column 2, i would want it to have the same number as row A
if row C had 403 in column 2 and 275 in column 3, i would want it to have the same number as row A
if row D had 275 in column 3, i would want it to have the same number as row A
But if Row E had a 403 in column 1 I don’t want it the same number as row A
In the end row A, B, C, and D be in group one with value 1 assigned to it and row E would be in group two and have value of 2 assigned to it.

I have attached an example of the kind of data im working with
 

Attachments

  • example.xlsx
    52 KB · Views: 31
This kind of automation exercise is a sort of nightmare trying to get a full definition of the rules, and its even worse when the client isn't good at explaining :) :)

So... A few questions to help me understand the logic:

1. Looking at your example, you seem to be applying a different rule when a cell is empty, eg groups 17 and 18 have the same configuration, so why aren't all 4 entries coded as group 17 ?
2. From the top, the data does seem to be in a logical sequence apart from a few exceptions, which obviously helps to pick new groups. Can this be relied upon, or might there be (eg) more group 1s lower down the list?
3. What does the use of background colouring signify? (rows 432 and others below)
4. Why do you change to an alphanumeric group code after group 228, and why does this have gaps in it (eg nothing between A24 and A31)
 
Last edited:
This kind of automation exercise is a sort of nightmare trying to get a full definition of the rules, and its even worse when the client isn't good at explaining :) :)

So... A few questions to help me understand the logic:

1. Looking at your example, you seem to be applying a different rule when a cell is empty, eg groups 17 and 18 have the same configuration, so why aren't all 4 entries coded as group 17 ?
2. From the top, the data does seem to be in a logical sequence apart from a few exceptions, which obviously helps to pick new groups. Can this be relied upon, or might there be (eg) more group 1s lower down the list?
3. What does the use of background colouring signify? (rows 432 and others below)
4. Why do you change to an alphanumeric group code after group 228, and why does this have gaps in it (eg nothing between A24 and A31)

1. This is a very stripped down version of a much bigger spreadsheet which I cant share. So, what I did was I used a formula to look through my data and found duplicate information, then place a number to mark them as pairs and then the number was to be placed in the column of where the duplicate information was found at, so that’s why there some blank cells, as this point the only grouping ive done is the pairing of duplicates.
Example
John Doe 000-000-0002 box 2 fort something
John Doe 000-000-0002 Box 4 fort something
Jane Doez 000-000-0003 Box 2 fort something
Jane Does 000-000-0003 Box 4 fort something
Jack Does 000-000-0005 Box 6 fort something
This data would result in
exampleNamePhoneAddress
John Doe111
John Doe112
John Doez321
Jane Does422
Jack Does4
when it comes to columns the 1 in name column and the 1 in phone or address are not related, so each pair of ones in different columns are their own separate thing.
2. not sure what you mean with this one.
3. the backgrounds mean nothing at this point they were used for sorting, I honestly forgot to take them out.
4. the reason why the values switch from numbers to letters is when the duplicate wasn’t 100% able to be confirmed I used a letter to show that it was a possible duplicate and not confirmed. The reason why they’re gaps in the A values is because this is a list of confirmed duplicates with their possible counter parts included (so if a row had 2 in column 1 and a34 in column 2, I went and found the other a34 in column 2 and included it in this list as well)
i
i hope i answered all your questions, my apologies if i didnt.

Oh also i just sorted via largest and smallest that might also be why you might think its grouped.
 
Im sorry, but based on the limited information your able to provide, I don't think that I understand sufficiently to be able to help you.

Good luck in finding the right solution.
 
Im sorry, but based on the limited information your able to provide, I don't think that I understand sufficiently to be able to help you.

Good luck in finding the right solution.

thank you for trying
 
Example
John Doe 000-000-0002 box 2 fort something
John Doe 000-000-0002 Box 4 fort something
Jane Doez 000-000-0003 Box 2 fort something
Jane Does 000-000-0003 Box 4 fort something
Jack Does 000-000-0005 Box 6 fort something
This data would result in
exampleNamePhoneAddress
John Doe111
John Doe112
John Doez321
Jane Does422
Jack Does4
See if can help to you attached example file with helper columns
 

Attachments

  • sabytiger-navic7284.xlsx
    180.5 KB · Views: 14
See if can help to you attached example file with helper columns

that really doesn’t help as it doesn’t find all the values that would be needed to be found. I Have updated the example to show what I want the formula to do.
if you sort the duplicate section and look at V1 it is a very good example of what i want from this formula.
Here is the process I used to make/mark v1. I started with the name column. i found any 1 in the name column and marked them with v1. I then checked the rows that had 1 in the name column to see if there were any other values in the other columns and found 584 in the address column. I then filtered the address column to find any 584’s in that column, which I then also marked with v1. Then I repeated my checking process, I looked at the rows that all had 584 in the address column and checked if these rows had any other values in the other columns, I found that some of the rows contained the values 276, 279 and 295 in the phone column, which I then filtered the phone column to show any rows with those values and marked them with v1 as well. I then repeated my check once again, I saw that the rows that had the values 276,279 and 295 in the phone column did not contain other values in the other columns that were not already marked with the v1, which means I could move onto the next grouping (v2).
this is basically the same process i went through to make the rest of the groupings as well. I want a formula that would do this whole process for me because I know that this would take a lot of hours doing it by hand and could still have a lot of human error.
 

Attachments

  • examples.xlsx
    52 KB · Views: 15
Look at column E of sheet1 of the attached.
It might be close to what you want.
Filter it for groups by using Text filters| Contains… and then look for the likes of v171, or v2, including the comma each time, to find the groups.

There's a half finished macro in there too which if you want to try it will need column E data clearing first, then click the Button 2 on the sheet1.
It uses Advanced Filter iteratively.

If it's going in the right direction, I'll tweak and tidy it up.

To try and do this with formulae… well, don't ask me to!

Bed for me now.
 

Attachments

  • ExcelGuru7284examples.xlsm
    72.8 KB · Views: 24
Look at column E of sheet1 of the attached.
It might be close to what you want.
Filter it for groups by using Text filters| Contains… and then look for the likes of v171, or v2, including the comma each time, to find the groups.

There's a half finished macro in there too which if you want to try it will need column E data clearing first, then click the Button 2 on the sheet1.
It uses Advanced Filter iteratively.

If it's going in the right direction, I'll tweak and tidy it up.

To try and do this with formulae… well, don't ask me to!

Bed for me now.
This is getting pretty close, and I appreciate all the effort everyone is putting in, thank you!
There only two things with this one that makes it miss the mark just slightly. Some of the rows have multiple groups when I need it to be just one, and I would like every row to have a grouping, other than that its awesome!
 
Last edited by a moderator:
Look at column G of sheet1 of the attached.
It might be closer to what you want.
Filter it for groups in the usual way (no need for Text Filters| Contains…).
You didn't have any rows at all that were unique, but if they exist they will all be marked Unique rather than being given a group number.

There's a three-quarter finished macro in there too which if you want to try, click the Button 2 on the sheet1. No need to clear the Duplicate column first.
All it needs is a table called Table2 with the headers name, phone, Email, Address, Duplicate anywhere on Sheet1, and a Sheet2 with the headers name, phone, Email, Address in cells B1:E1

Getting every row to have a grouping will take a little longer… Currently all rows with a non-blank in the name column will be grouped, together with some rows with blanks in the name column that matched in other column(s). All rows unassigned to a group will have blanks in the name column.
 

Attachments

  • ExcelGuru7284examples02.xlsm
    73.7 KB · Views: 22
Last edited:
This is getting pretty close, I would like every row to have a grouping

Excuse me for re-entering the discussion, but Ive been following this interesting thread. In the sample data, there are a lot of completely empty rows in the Table, and these (probably) dont need a group allocated. How about
those where the name cell is blank, and one or more of the other three is not blank? I think Im right in saying that the routine is currently "driven" by the populated "name" cells.
 
Back
Top