Increased performance unexpectedly after adding JOIN and ORDER BY on request


I have the following People table:

| Id | FirstName | Children |
|  1 |      mark |        4 |
|  2 |      paul |        0 |
|  3 |      mike |        3 |

Note I have an non-unique index in FirstName and another one in Children.

I need to get the top 10000 first names and the children amount of every person that has any children. So I decided to go for this solution:

SELECT firstName, children FROM people
WHERE children > 0
ORDER BY children DESC
LIMIT 0, 10000

The thing is that it takes 4 seconds to return the results from a table with 2.6 million records. This is the explain:

| ID | SELECT_TYPE | TABLE  | TYPE  | POSSIBLE_KEYS | KEY      | KEY_LEN | REF    |       ROWS | EXTRA       |
|  1 |      SIMPLE | people | range | children      | children |       4 | (null) |    2677610 | Using where |

As I see it, the range is telling me the index is being scanned and compared to a value (in this case this is the children > 0). I'd say that should be fast enough. Then, my guess is that after getting all those matching index elements, the DBMS fetches the firstName from the table by internally joining the values in the index with the ones in the table.

If I translate the previous paragraph into SQL I would get something like this:

SELECT firstName, children FROM people
    SELECT id FROM people
    WHERE children > 0
    ORDER BY children DESC
    LIMIT 0, 10000
) s
ON =
ORDER BY children DESC

The explain for the previous SQL statement is:

| ID | SELECT_TYPE | TABLE      | TYPE   | POSSIBLE_KEYS | KEY      | KEY_LEN | REF    |    ROWS | EXTRA                           |
|  1 |     PRIMARY | <derived2> | ALL    | (null)        | (null)   |  (null) | (null) |   10000 | Using temporary; Using filesort |
|  1 |     PRIMARY | p          | eq_ref | PRIMARY       | PRIMARY  |       4 |   |       1 |                                 |
|  2 |     DERIVED | people     | range  | children      | children |       4 | (null) | 2687462 | Using where; Using index        |

To my surprise, this query performs a few times faster than the first one. However, the more I increment the LIMIT X the bigger this difference becomes (EG: For LIMIT 1000000, 10000 the second query is still under 1 second and the first one exceeds 20 seconds). This leads me to the following questions:

  1. In what way is MySQL handling the first query different from the second?
  2. Is there any way to hint MySQL to execute the first query the way it executes the second?
  3. Is it fair to say that the lesson learned from this is that whenever I want to fetch a value that is not part of the index being used, the double order by and join is the right way to go?

Additional notes:

  • SQLFiddle (if it makes any difference)
  • Note I'm running the queries with SQL_NO_CACHE
  • MySQL version: 5.5.37

I'm pretty sure that you can fix the performance of the first query by having an index on children, firstname. This is a covering index for the query, so it should eliminate accesses to the data pages.

The first execution plan says that the index is being used for the where. The limit is applied last, so it seems that it is getting the firstname for all the rows before the limit is applied. That seems weird, but it is consistent with the performance you are seeing.

In the second version, 10000 ids are being read. Assuming they are primary keys, then the data page look ups should be quite fast -- and controlled explicitly by the limit. This may suggest why this version is faster, although it seems like a bit of a mystery. Mostly, though, I would expect the index on children, firstname to improve the first version of the query.