Problem of duplicate records in databases arises when a table doesn’t have a primary or unique key. Nevertheless, its easy to find duplicate records in the table by using “Group By” and “Having” clauses of SQL query.
Group by clause is used to group data based upon one or more columns. In our simple example here, we’ll find the duplicate rows on names and contact numbers. First let’s build a small table which has three columns. Not to mention that the data is fake and is meant just for the sake of example:
Id | Name | ContactNumber |
1 | John | 12345678 |
2 | Allan | 112233445 |
3 | Harry | 87654321 |
4 | Allan | 112233445 |
5 | Tom | 12324543 |
6 | Bob | 989876761 |
7 | John | 12345678 |
SELECT name, COUNT(name) FROM contacts GROUP BY name, contactnumber;
The result will be like
name | Count(name) |
John | 2 |
Allan | 2 |
Harry | 1 |
Tom | 1 |
Bob | 1 |
This is good to see but not exactly what we want. We need to filter out the duplicate rows on name and contact number columns, so we add Having clause to the query and the final sql statement will become
SELECT name, COUNT(name) as occurrence FROM contacts GROUP BY name, contactnumber having occurrence > 1;
Now the result will be like
name | occurence |
John | 2 |
Allan | 2 |
So here John and Allan are the duplicate rows found in the table.