Why does SQL Server perform a cluster analysis when the IN clause has a subquery?


If I search for users like so:

FROM userprofile
WHERE userid IN (1, 2, 3)

The execution plan shows that UserProfile is using a Clustered Index Seek

If I change the IN clause to use a subquery:

FROM userprofile
WHERE userid IN (
    SELECT DISTINCT senders.UserId
    FROM messages m
    JOIN UserMessages recipients ON recipients.MessageId = m.MessageId
    JOIN UserMessages senders ON senders.MessageId = m.MessageId
    WHERE recipients.TypeId = 2
        AND recipients.UserId = 1
        AND senders.UserId <> 1
        AND senders.TypeId = 1

The execution plan shows the subquery is using a Clustered Index Seek but the UserProfile outer query is using a Clustered Index Scan.

How can I write this so both inner and outer queries are using Seeks?

A set of seeks is only cheaper than a full scan if the rowcount is low. SQL Server is pretty conservative so if there is a chance many records will be found, it prefers to scan. In your example, it's pretty clear that userId in (1,2,3) will not return many rows. But with the subquery, SQL Server probably can't tell.

You can force a seek with:

from userprofile with (forceseek)