How do I delete all rows that are not the most recent N for each user in MySQL?

advertisements

Assuming I have a table foo where I have something like this:

id, user_id, timestamp, some_value

What I want to do is remove all rows that aren't the newest N per user.

The deletion itself could be handled by a:

DELETE FROM foo WHERE id NOT IN (...)

so you could rephrase the problem into: How do I get the newest N(there might be less) rows for each user. This means if I have U users I may end up with N*U rows so LIMIT wont really work.


MySQL does not support reading from a table with SELECT and performing an UPDATE/INSERT/DELETE on the same table in the same query. So doing what you want in one statement is going to be tricky.

I would do it in two stages: first, query the newest $N records per user, and store them in a temporary table:

CREATE TEMPORARY TABLE foo_top_n
  SELECT f1.id
  FROM foo f1 LEFT OUTER JOIN foo f2
    ON (f1.user_id = f2.user_id AND f1.id < f2.id)
  GROUP BY f1.id
  HAVING COUNT(*) < $N;

Next, use the multi-table DELETE syntax and join foo to the temporary table, deleting where no match is found:

DELETE f1 FROM foo f1 LEFT OUTER JOIN foo_top_n f2 USING (id)
WHERE f2.id IS NULL;