Is there a way to simplify the union query by using multiple tables, with reference to the main table (Performance Optimization)

advertisements

I wonder is there any way to improve query, speed up / simplify while using multiple tables UNION clause with reference to only one main table.

following is the scenario, here table reference is my main table and geo_loc1,geo_loc2,geo_loc2 are other tables, which will be joined based on key1 and key2, these two keys exists in all table.

SELECT
    reference.*,
    geo_loc1.bathy,
    geo_loc1.gravity,
    geo_loc1.magnet,
    'data1' as type
FROM
    reference,geo_loc1
WHERE
    reference.latitude between -30 and -10 AND
    reference.longitude between 10 and 50 AND
    DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10'  AND
    reference.key1 = geo_loc1.key1 AND
    reference.key2 = geo_loc1.key2
UNION
SELECT
    reference.*,
    geo_loc2.bathy,
    Null as gravity,
    geo_loc2.magnet,
    'data2' as type
FROM
    reference,geo_loc2
WHERE
    reference.latitude between -30 and -10 AND
    reference.longitude between 10 and 50 AND
    DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10'  AND
    reference.key1 = geo_loc2.key1 AND
    reference.key2 = geo_loc2.key2
UNION
SELECT
    reference.*,
    Null as bathy,
    Null as gravity,
    geo_loc3.magnet,
    'data3' as type
FROM
    reference,geo_loc3
WHERE
    reference.latitude between -30 and -10 AND
    reference.longitude between 10 and 50 AND
    DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10'  AND
    reference.key1 = geo_loc3.key1 AND
    reference.key2 = geo_loc3.key2

Whether this query can be simplified or improved ?


For performance I can see two things you might change.

1) Change UNION to UNION ALL. UNION will check the results for duplicates which adds overhead you don't need.

2) Consider changing DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10' to not use the date function.

If reference.st_date were a proper date column and you didn't need to do the type conversion then the DB could use an index based on date to do a quick lookup. By doing the type conversion the DB will now have to do a scan to do the conversion for each row.

You say the index is on key1,key2,st_date,latitude,longitude. By using the date function you're probably only using the first two columns of that index. I would either change st_date to a date or change the values you're comparing it to so the conversion isn't required.

Also consider using a proper INNER JOIN rather than adding join conditions as WHERE clauses. It's much more readable using the proper syntax and less error prone.

Example using UNION ALL and INNER JOIN should look something like this. Note this is untested obviously but you should get the idea.
Edit Removed use of date() function which should speed it up further

SELECT
    reference.*,
    geo_loc1.bathy,
    geo_loc1.gravity,
    geo_loc1.magnet,
    'data1' as type
FROM
    reference
INNER JOIN geo_loc1 ON
    reference.key1 = geo_loc1.key1 AND
    reference.key2 = geo_loc1.key2
WHERE
    reference.latitude between -30 and -10 AND
    reference.longitude between 10 and 50 AND
    reference.st_date BETWEEN '2000-07-05 00:00:00' AND '2011-11-10 23:59:59'
UNION ALL
SELECT
    reference.*,
    geo_loc2.bathy,
    Null as gravity,
    geo_loc2.magnet,
    'data2' as type
FROM
    reference
INNER JOIN geo_loc2 ON
    reference.key1 = geo_loc2.key1 AND
    reference.key2 = geo_loc2.key2
WHERE
    reference.latitude between -30 and -10 AND
    reference.longitude between 10 and 50 AND
    reference.st_date BETWEEN '2000-07-05 00:00:00' AND '2011-11-10 23:59:59'
UNION ALL
SELECT
    reference.*,
    Null as bathy,
    Null as gravity,
    geo_loc3.magnet,
    'data3' as type
FROM
    reference
INNER JOIN geo_loc3 ON
    reference.key1 = geo_loc3.key1 AND
    reference.key2 = geo_loc3.key2
WHERE
    reference.latitude between -30 and -10 AND
    reference.longitude between 10 and 50 AND
    reference.st_date BETWEEN '2000-07-05 00:00:00' AND '2011-11-10 23:59:59'