(learn how to delete duplicate rows in MySQL by using the DELETE JOIN statement or an immediate table)
How to Delete Duplicate Rows in MySQL
Deleting duplicate rows from a table is a common task to clean up data and ensure data integrity. Here, we will explore three different methods to delete duplicate rows in MySQL, with clear explanations for each step.
Sample Data Setup
First, let's create a sample table and populate it with data for demonstration.
sql
Copy code
DROP TABLE IF EXISTS contacts;
CREATE TABLE contacts (
id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) NOT NULL
);
INSERT INTO contacts (first_name, last_name, email)
VALUES
('Carine', 'Schmitt', 'carine.schmitt@verizon.net'),
('Jean', 'King', 'jean.king@me.com'),
('Peter', 'Ferguson', 'peter.ferguson@google.com'),
('Janine', 'Labrune', 'janine.labrune@aol.com'),
('Jonas', 'Bergulfsen', 'jonas.bergulfsen@mac.com'),
('Janine', 'Labrune', 'janine.labrune@aol.com'),
('Susan', 'Nelson', 'susan.nelson@comcast.net'),
('Zbyszek', 'Piestrzeniewicz', 'zbyszek.piestrzeniewicz@att.net'),
('Roland', 'Keitel', 'roland.keitel@yahoo.com'),
('Jean', 'King', 'jean.king@me.com'),
('Susan', 'Nelson', 'susan.nelson@comcast.net'),
('Roland', 'Keitel', 'roland.keitel@yahoo.com');
View Sample Data
Check the data using this query:
sql
Copy code
SELECT * FROM contacts ORDER BY email;
Identify Duplicate Rows
To find duplicate rows, group by the column that has duplicates (e.g., email
) and count how many times each value appears:
sql
Copy code
SELECT email, COUNT(email)
FROM contacts
GROUP BY email
HAVING COUNT(email) > 1;
This query shows which emails are duplicated and how many times they occur.
Methods to Delete Duplicate Rows
1. Using DELETE with JOIN
This approach compares the table with itself. It deletes duplicate rows but keeps the row with the highest id
(latest entry):
sql
Copy code
DELETE t1
FROM contacts t1
INNER JOIN contacts t2
WHERE
t1.id < t2.id AND
t1.email = t2.email;
Explanation:
t1
andt2
are aliases for the samecontacts
table.- The
INNER JOIN
pairs rows with the sameemail
. - The
WHERE
clause keeps the row with the higherid
by deleting rows wheret1.id < t2.id
.
To keep the row with the lowest id
, reverse the comparison:
sql
Copy code
DELETE t1
FROM contacts t1
INNER JOIN contacts t2
WHERE
t1.id > t2.id AND
t1.email = t2.email;
2. Using an Intermediate Table
This method involves creating a temporary table to store distinct rows, then replacing the original table.
Steps:
- Create a temporary table with the same structure as the original:
- Insert distinct rows into the temporary table:
GROUP BY email
ensures that only one row per email is retained.- Drop the original table and rename the temporary table:
sql
Copy code
CREATE TABLE contacts_temp LIKE contacts;
sql
Copy code
INSERT INTO contacts_temp
SELECT *
FROM contacts
GROUP BY email;
Explanation:
sql
Copy code
DROP TABLE contacts;
ALTER TABLE contacts_temp RENAME TO contacts;
3. Using the ROW_NUMBER() Function
This approach assigns a unique rank to each row with duplicate values using the ROW_NUMBER()
function (available in MySQL 8.0+).
Step 1: View Duplicate Rows Use ROW_NUMBER()
to assign a rank to each duplicate group, ordered by the column with duplicates (e.g., email
):
sql
Copy code
SELECT
id,
email,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id
) AS row_num
FROM contacts;
Explanation:
PARTITION BY email
: Groups rows by theemail
column.ORDER BY id
: Assigns a rank based on theid
.
Step 2: Find IDs of Duplicate Rows Filter rows where row_num > 1
(duplicates):
sql
Copy code
SELECT id
FROM (
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id
) AS row_num
FROM contacts
) t
WHERE row_num > 1;
Step 3: Delete Duplicate Rows Use the IDs identified above in a DELETE
statement:
sql
Copy code
DELETE FROM contacts
WHERE id IN (
SELECT id
FROM (
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id
) AS row_num
FROM contacts
) t
WHERE row_num > 1
);
Summary
Here’s a quick comparison of the methods:
Method | Advantages | Best For |
DELETE with JOIN | Fast and simple for small tables | Keeping the highest/lowest id |
Intermediate Table | Works in all MySQL versions | Large datasets or simpler logic |
ROW_NUMBER() Function | Advanced and flexible | MySQL 8.0+ with custom ranking |
Use these methods based on your needs, ensuring that your data is clean and duplicates are removed effectively.
4o