More efficient alternative to the subquery

advertisements

I've got many "actual" and "history companion tables" (so to speak) with structure of last like this:

values|  date_deal | type_deal | num (autoinc)
value1| 01.01.2012 |         i |   1
value1| 02.01.2012 |         u |   2
value2| 02.01.2012 |         i |   3
value2| 03.01.2012 |         u |   4
value1| 04.01.2012 |         d |   5
value2| 05.01.2012 |         u |   6
value2| 08.01.2012 |         u |   7

If I insert (or update or delete) record in "actual" table, trigger puts affected record into "history table" with date_deal = Geddate(), type_deal = i|u|d (for insert, update and delete triggers respectivly) and num as autoinc unique value

So the question is how to get last record for each distinct value valid on certain date and excluding from final result records which type_deal = 'd' (since that record was deleted from actual table by that time and we don't want to have anything assosiated with it)

The way I do it most of the time:

SELECT  *
FROM    t_table1 t1
WHERE   t1.num = (  SELECT  MAX(num)
                    FROM    t_table1 t2
                    WHERE   t2.[values] = t1.[values]
                        AND t2.[date_deal] < @dt)
    AND t1.[type_deal] <> 'D'

But that works very slow sometimes. I'm looking for more efficient alternative. Please, help


So, an update. Thanks for replies, friends.

I've made some testing on both actual and testing servers. In order to put these different approaches into same league I've decided that we should take all fields from source table.

Testing server has bellow 200K records and I also had a luxury of using DBCC FreeProcCache and DBCC DropCleanbuffers directives. Actual working server has over 2.3M records and also no option for droping buffs or cache since.. well.. it is in use by real users. So it was droped only once and i've got results right after that.

Here is actual queries and time it took on both servers:

Original:

DECLARE @dt datetime = CONVERT(datetime, '01.08.2013', 104)

SELECT  *
FROM    [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
WHERE   c.num = (   SELECT  MAX(num)
                    FROM    [CLIENTS_HISTORY].[dbo].[Clients_all_h] c2
                    WHERE   c2.[AccountSys] = c.[AccountSys]
                        AND date_deal <= @dt)
    AND c.type_deal <> 'D'

61sec @ 2'316'890rec on real one, 4sec @ 191'533 on test

Rahul's:

SELECT  *
FROM    [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
GROUP BY [all_fields]
HAVING  c.num = (   SELECT  MAX(num)
                    FROM    [CLIENTS_HISTORY].[dbo].[Clients_all_h] c2
                    WHERE   c2.[AccountSys] = c.[AccountSys]
                        AND date_deal <= @dt)
    AND c.type_deal <> 'D'

62sec @ 2'316'890rec on real one, 4sec @ 191'533 on test Almost equal

George's (with some major changes):

SELECT * FROM
(
SELECT  *,
        ROW_NUMBER() OVER(PARTITION BY accountsys ORDER BY num desc) AS aa
FROM    [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
WHERE   c.date_deal < @dt) as a
WHERE   aa=1
    AND type_deal <> 'D'

76sec @ 2'316'890rec on real one, 5sec @ 191'533 on test

So far original and Rahul's are fastest and George's is not so fast.


Try using GROUP BY..HAVING CLAUSE

SELECT  *
    FROM    t_table1 t1
    GROUP BY [column_names]
    HAVING   t1.num = (  SELECT  MAX(num)
                        FROM    t_table1 t2
                        WHERE   t2.[values] = t1.[values]
                            AND t2.[date_deal] < @dt)
        AND t1.[type_deal] <> 'D'