How do I write a MySQL query that will limit the results of an attachment or tables and also counts the number of elements in the attached table (s)?

advertisements

How would you write a MySQL query that will limit the results of a joined table (or sub select if that works better) and also counts the number of items in the joined table or tables?

For instance, let's say you had three tables: projects, tasks and comments, where a project has 0 or more tasks and a task has 0 or more comments. How would you limit the number of tasks returned per project to 3 and also return the total number of tasks per project and comments per task?

Here's what I imagine the result set look like:

project_id, project_title, task_id, task_title, num_tasks, num_comments
------------------------------------------------------------------------
1, Project1, 1, Task1, 4, 3
1, Project1, 2, Task2, 4, 0
1, Project1, 3, Task3, 4, 9
2, Project2, 10, Task10, 20, 0
2, Project2, 11, Task11, 20, 0
2, Project2, 12, Task12, 20, 2
3, Project3, 20, Task20, 17, 5
3, Project3, 21, Task21, 17, 1
3, Project3, 22, Task22, 17, 2

Where 'Project1', 'Project2', etc just represent a project's title and 'Task1', 'Task2', etc represent a task's title.

Ultimately, (after parsing through the results of the query) I'd like to be able to display something like this:

 Project1 (4 tasks)
     Task1 (3 comments)
     Task2 (0 comments)
     Task3 (9 comments)
 Project2 (20 tasks)
     Task10 (0 comments)
     Task11 (0 comments)
     Task12 (2 comments)
 Project3 (17 tasks)
     Task20 (5 comments)
     Task21 (1 comments)
     Task22 (2 comments)

I'm guessing this has to be done with sub selects (which is fine), but I can't seem to figure out how to accomplish this with just using joins and I don't quite have a good enough handle on sub selects to do something like this.


Honestly, I'd do this in multiple queries, to avoid the correlated subqueries.

But here you go:

SELECT p.project_id, p.project_title,
    t1.task_id, t1.task_title,
    (SELECT COUNT(*) FROM tasks t
       WHERE t.project_id = p.project_id) AS num_tasks,
    COALESCE((SELECT COUNT(*) FROM comments c
       WHERE c.task_id = t1.task_id), 0) AS num_comments
FROM projects p
JOIN tasks t1 ON (p.project_id = t1.project_id)
LEFT OUTER JOIN tasks t2
  ON (p.project_id = t2.project_id AND t1.task_id > t2.task_id)
GROUP BY t1.task_id
HAVING COUNT(*) < 3;


Consider that correlated subqueries like those above (num_tasks and num_comments) must execute many times -- once for each row of t1.

You can get the results by running these queries separately and combining the results in your application code:

SELECT p.project_id, p.project_title,
    t1.task_id, t1.task_title
FROM projects p
JOIN tasks t1 ON (p.project_id = t1.project_id)
LEFT OUTER JOIN tasks t2
  ON (p.project_id = t2.project_id AND t1.task_id > t2.task_id)
GROUP BY t1.task_id
HAVING COUNT(*) < 3;

SELECT task_id, COUNT(*) AS num_comments
FROM comments
WHERE task_id IN (...list of task_id values from first query...)
GROUP BY task_id;

SELECT project_id, COUNT(*) AS num_tasks
FROM tasks
GROUP BY project_id;

Even running three separate queries like this might be faster overall than running the more complex query that gets all the results together. I say might because it depends on how much data we're talking about. To be sure, you'd have to test both solutions using your own database.


Re your followup question, I'd do this in a subquery:

SELECT p.project_id, p.project_title,
    t1.task_id, t1.task_title
FROM (SELECT * FROM projects ORDER BY last_updated DESC LIMIT 5) p
. . .

Note this is not a correlated subquery; the RDBMS only has to do the subquery once.

I used DESC because I assume you want the most recent projects.