I have a huge database that contains writer names.
There are multiple records in my database but I don't know which rows are duplicate.
How can I delete duplicate rows without knowing the value?
delete from tbl where writer_id in (select writer_id from (select * from tbl) t where exists (select 1 from (select * from tbl) x where x.writer_name = t.writer_name and t.writer_id < x.writer_id));
See demo: http://sqlfiddle.com/#!2/845ca3/1/0
This keeps the first row for each
writer_name, in order of
The EXISTS subquery will run for every row, however. You could also try:
delete t from tbl t left join ( select writer_name, min(writer_id) as writer_id from tbl group by writer_name ) x on t .writer_name = x.writer_name and x.writer_id = t .writer_id where x.writer_name is null;
If there are no foreign key constraints on the table you could also use
create table as select to create a new table without the duplicate entries, drop the old table, and rename the new table to that of the old table's name, getting what you want in the end. (This would not be the way to go if this table has foreign keys though)
That would look like this:
create table tbl2 as (select distinct writer_name from tbl); drop table tbl; alter table tbl2 add column writer_id int not null auto_increment first, add primary key (writer_id); rename table tbl2 to tbl;