If you are dealing with duplicate rows in a SQL server database, and you’re not sure how to find them, don’t panic. I had to deal with a similar situation recently.
Two weeks ago, I was working on a project where I had to import some data from a legacy system into a whole new table. After the data had been loaded and a report was generated with that data, we’ve noticed there were some duplicate entries. Therefore, I had to quickly figure out how to put together a quick query to find duplicate rows and removed them from the table.
In this post I will show simple steps to take in case you are facing the same issue. In order to get this post, I am going to create a new table and populate it with some seed data.
Create Database
DROP TABLE IF EXISTS Students; CREATE TABLE Students ( Id INT IDENTITY(1, 1), First_Name nvarchar(100), Last_Name nvarchar(100), Phone nvarchar(12), PRIMARY KEY(id) );
Populate the table with data
Now that we have our table added, I will proceed to populate it with some data for our demonstration.
INSERT INTO dbo.Students (first_name,last_name,phone) VALUES ('Mary','Allan','407-455-1414'), ('John','Doe','407-445-5414'), ('Chirac','Paul','407-455-1814'), ('John','Doe','407-445-5414'), ('Mela','Jil','407-787-6767'), ('John','Doe','407-445-5414'), ('Mary','Allan','407-455-1414'), ('John','Doe','407-445-5414'), ('Mary','Allan','407-455-1414'), ('Chris','Tai','407-555-7777'), ('Albert','Allan','321-455-9977'), ('Jean','Doe','407-445-5414');
Ensure the data was inserted properly
Okay, we have the data we need in the table, let’s run a quick query to see if it match our criteria, in order words have some duplicates. Let’s run a simple select statement to retrieve the full data-set in the the table.
SELECT Id, First_Name, Last_Name,Phone FROM dbo.students ORDER BY First_Name, Last_Name desc;
Find duplicate rows in the table
As a matter of fact, there are many ways to find and return the duplicates. I ended up using group by to get my result, If you end up using some other method, please share it in the comment below.
SELECT First_Name, Last_Name, Phone, COUNT(*) Occurrences FROM dbo.students GROUP BY First_Name, Last_Name, Phone HAVING COUNT(*) > 1;
Last, below you can see the result of the duplicated rows found and the number of time.