Why does SQL Server use an index scan instead of a Seek + RID lookup?

advertisements

I have a table with approx. 135M rows:

CREATE TABLE [LargeTable]
(
    [ID] UNIQUEIDENTIFIER NOT NULL,
    [ChildID] UNIQUEIDENTIFIER NOT NULL,
    [ChildType] INT NOT NULL
)

It has a non-clustered index with no included columns:

CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX]
  ON [LargeTable]
(
    [ChildID] ASC
)

(It is clustered on ID).

I wish to join this against a temporary table which contains a few thousand rows:

CREATE TABLE #temp
(
    ChildID         UNIQUEIDENTIFIER PRIMARY KEY,
    ChildType       INT
)

...add #temp data...

SELECT lt.ChildID, lt.ChildType
    FROM #temp t
    INNER  JOIN [LargeTable] lt
        ON lt.[ChildID] = t.[ChildID]

However the query plan includes an index scan on the large table:

If I change the index to include extra columns:

CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX] ON [LargeTable]
(
    [ChildID] ASC
)
INCLUDE [ChildType]

Then the query plan changes to something more sensible:

So my question is: Why can't SQL Server still use an index seek in the first scenario, but with a RID lookup to get from the non-clustered index to the table data? Surely that would be more efficient than an index scan on such a large table?


The first query plan actually makes a lot of sense. Remember that SQL Server never reads records, it reads pages. In your table, a page contains many records, since those records are so small.

With the original index, if the second query plan would be used, after finding all the RID's in the index, and reading index pages to do so, pages in the clustered index need to be read to read the ChildType column. In a worst case scenario, that is an entire page for each record it needs to read. As there are many records per page, that might boil down to reading a large percentage of the pages in the clustered index.

SQL server guessed, based on statistics, that simply scanning the pages in the clustered index would require less page reads in total, because it then avoids reading the pages in the non-clustered index.

What matters here is the number of rows in the temp table compared to the number of pages in the large table. Assuming a random distribution of ChildID in the large table, as soon as the number of rows in the temp table approaches or supersedes the number of pages in the large table, SQL server will have to read virtually every page in the large table anyway.