Delete Duplicate Entries In A Sql Server Table

Couple days ago I wrote a quick post demonstrating how to find duplicate records in a SQL server table. A colleague of mine read the post and asked me to show how to delete the duplicates  from the table. Therefore, I thought it would make sense I put this post together on how to delete duplicate entries in a table.

What does it take to remove duplicates from a table?

  • First, we need to validate the data to see the duplicates we are deleting.
    WITH student_cte AS (
        SELECT 
            First_Name, 
    		Last_Name,
    		Phone, 
            ROW_NUMBER() OVER(
                PARTITION BY 
                    First_Name, 
                    Last_Name, 
                    Phone
                ORDER BY 
                    First_Name, 
                    Last_Name, 
                    Phone
            ) rn
         FROM dbo.Students
    )
    select * from student_cte;
    
  • The picture below shows the records that are duplicates with the row count.
  • Now, in order to delete these record, I am going to show the best way to handle using that same CTE query above with a slight modification. I have deleted any instance with rn > 1.
    WITH student_cte AS (
        SELECT 
            First_Name, 
    		Last_Name,
    		Phone, 
            ROW_NUMBER() OVER(
                PARTITION BY 
                    First_Name, 
                    Last_Name, 
                    Phone
                ORDER BY 
                    First_Name, 
                    Last_Name, 
                    Phone
            ) rn
         FROM dbo.Students
    )
    DELETE FROM student_cte WHERE rn >1;
    
  • After running the query above, the duplicated records are now gone.

In this post, I showed how you can get rid of duplicate rows from a table in SQL Server. I hope after reading this article you will be able to use these tips. If you have questions or feedback, please share it with me. I’d like to know what you’re thinking. Please post your feedback, question, or comments about this article.

Find Duplicate Entries In A Sql Server Table Using CTE

In my last post, I showed how to find duplicate entries in a sql server table using group by which was pretty simple to pull off. I thought it would be a good idea to find the list of all the duplicate rows using CTE for those that are fan of it. We will be using the same table from our previous post on finding duplicate rows which you can read and follow.

What is CTE

A CTE which is short for Common Table Expression, is a temporary result set that you can reference within another query statement. They are used to simplify large queries.

Find duplicate rows CTE

   WITH student_cte AS(
        SELECT  First_Name, 
                Last_Name,
                Phone,  
                COUNT(*) Occurrences
        FROM dbo.students
        GROUP BY
                First_Name, 
                Last_Name,
	        Phone
       HAVING COUNT(*) > 1
  )
  SELECT	st.Id,
		st.First_Name, 
		st.Last_Name,
		st.Phone  		
 FROM dbo.students st
 INNER JOIN student_cte 
  ON student_cte.First_Name = st.First_Name
  AND student_cte.Last_Name = st.Last_Name;

Find duplicate entries result

Find duplicate entries using cte as query

Quick Way To Find Duplicate Rows In A Sql Server Table

If you are dealing with duplicate rows in a SQL server database, and you’re not sure how to find them, don’t panic. I had to deal with a similar situation recently.
Two weeks ago, I was working on a project where I had to import some data from a legacy system into a whole new table. After the data had been loaded and a report was generated with that data, we’ve noticed there were some duplicate entries. Therefore, I had to quickly figure out how to put together a quick query to find duplicate rows and removed them from the table.

In this post I will show simple steps to take in case you are facing the same issue. In order to get this post, I am going to create a new table and populate it with some seed data.

Create Database

DROP TABLE IF EXISTS Students;
CREATE TABLE Students (
    Id INT IDENTITY(1, 1), 
    First_Name  nvarchar(100), 
    Last_Name  nvarchar(100), 
    Phone nvarchar(12),
    PRIMARY KEY(id)
);

Populate the table with data

Now that we have our table added, I will proceed to populate it with some data for our demonstration.

   INSERT INTO dbo.Students
        (first_name,last_name,phone) 
   VALUES
        ('Mary','Allan','407-455-1414'),
	('John','Doe','407-445-5414'),
        ('Chirac','Paul','407-455-1814'),
	('John','Doe','407-445-5414'),
        ('Mela','Jil','407-787-6767'),
	('John','Doe','407-445-5414'),
        ('Mary','Allan','407-455-1414'),
	('John','Doe','407-445-5414'),
        ('Mary','Allan','407-455-1414'),
	('Chris','Tai','407-555-7777'),
	('Albert','Allan','321-455-9977'),
	('Jean','Doe','407-445-5414');

Ensure the data was inserted properly

Okay, we have the data we need in the table, let’s run a quick query to see if it match our criteria, in order words have some duplicates. Let’s run a simple select statement to retrieve the full data-set in the the table.

   SELECT Id, First_Name, Last_Name,Phone 
   FROM dbo.students
   ORDER BY First_Name, Last_Name desc;

Query result to find duplicate rows in table

Find duplicate rows in the table

As a matter of fact, there are many ways to find and return the duplicates. I ended up using group by to get my result, If you end up using some other method, please share it in the comment below.

 SELECT  First_Name, 
          Last_Name,
          Phone,  
          COUNT(*) Occurrences
  FROM dbo.students
  GROUP BY
    First_Name, 
    Last_Name,
    Phone
  HAVING 
    COUNT(*) > 1;


Last, below you can see the result of the duplicated rows found and the number of time.
Find duplicate rows in sql server table