How would you write a MySQL query that will limit the results of a joined table (or sub select if that works better) and also counts the number of items in the joined table or tables?
For instance, let's say you had three tables: projects, tasks and comments, where a project has 0 or more tasks and a task has 0 or more comments. How would you limit the number of tasks returned per project to 3 and also return the total number of tasks per project and comments per task?
Here's what I imagine the result set look like:
project_id, project_title, task_id, task_title, num_tasks, num_comments ------------------------------------------------------------------------ 1, Project1, 1, Task1, 4, 3 1, Project1, 2, Task2, 4, 0 1, Project1, 3, Task3, 4, 9 2, Project2, 10, Task10, 20, 0 2, Project2, 11, Task11, 20, 0 2, Project2, 12, Task12, 20, 2 3, Project3, 20, Task20, 17, 5 3, Project3, 21, Task21, 17, 1 3, Project3, 22, Task22, 17, 2
Where 'Project1', 'Project2', etc just represent a project's title and 'Task1', 'Task2', etc represent a task's title.
Ultimately, (after parsing through the results of the query) I'd like to be able to display something like this:
Project1 (4 tasks) Task1 (3 comments) Task2 (0 comments) Task3 (9 comments) Project2 (20 tasks) Task10 (0 comments) Task11 (0 comments) Task12 (2 comments) Project3 (17 tasks) Task20 (5 comments) Task21 (1 comments) Task22 (2 comments)
I'm guessing this has to be done with sub selects (which is fine), but I can't seem to figure out how to accomplish this with just using joins and I don't quite have a good enough handle on sub selects to do something like this.
Honestly, I'd do this in multiple queries, to avoid the correlated subqueries.
But here you go:
SELECT p.project_id, p.project_title, t1.task_id, t1.task_title, (SELECT COUNT(*) FROM tasks t WHERE t.project_id = p.project_id) AS num_tasks, COALESCE((SELECT COUNT(*) FROM comments c WHERE c.task_id = t1.task_id), 0) AS num_comments FROM projects p JOIN tasks t1 ON (p.project_id = t1.project_id) LEFT OUTER JOIN tasks t2 ON (p.project_id = t2.project_id AND t1.task_id > t2.task_id) GROUP BY t1.task_id HAVING COUNT(*) < 3;
Consider that correlated subqueries like those above (
num_comments) must execute many times -- once for each row of
You can get the results by running these queries separately and combining the results in your application code:
SELECT p.project_id, p.project_title, t1.task_id, t1.task_title FROM projects p JOIN tasks t1 ON (p.project_id = t1.project_id) LEFT OUTER JOIN tasks t2 ON (p.project_id = t2.project_id AND t1.task_id > t2.task_id) GROUP BY t1.task_id HAVING COUNT(*) < 3; SELECT task_id, COUNT(*) AS num_comments FROM comments WHERE task_id IN (...list of task_id values from first query...) GROUP BY task_id; SELECT project_id, COUNT(*) AS num_tasks FROM tasks GROUP BY project_id;
Even running three separate queries like this might be faster overall than running the more complex query that gets all the results together. I say might because it depends on how much data we're talking about. To be sure, you'd have to test both solutions using your own database.
Re your followup question, I'd do this in a subquery:
SELECT p.project_id, p.project_title, t1.task_id, t1.task_title FROM (SELECT * FROM projects ORDER BY last_updated DESC LIMIT 5) p . . .
Note this is not a correlated subquery; the RDBMS only has to do the subquery once.
DESC because I assume you want the most recent projects.