The database records historical price changes, how do you calculate the price at any given time?

advertisements

I am working with historical price data of many different products.

I have a table products that lists all products with their name, description etc., identifying them with an uuid. Then there is another table history that stores every price change ever made for every product. The prices may (usually do) change quite a few times per day.

Now I want to calculate what each product's price was at a specific point in time, say 14th March 2015 at 12 noon. How can I do this in SQL?

I am able to do this for one product:

SELECT product_id, price, date
FROM history
WHERE product_id = 'aa6d9976-e9ae-4478-486e-097e86c1e5fe'
AND (date-'2015-03-14 12:00:00+02') < interval '1 second'
ORDER BY diff DESC LIMIT 1

  ->   aa6d9976-e9ae-4478-486e-097e86c1e5fe    109     2015-03-14 11:55:00+01

But I would like ALL products in one query in one set. My idea was to get all the products and LEFT JOIN that table with the history, selecting the appropriate price for each, but I fail at the latter:

SELECT products.product_id, name, price, date
FROM products
  LEFT JOIN history ON products.product_id = history.product_id
WHERE date "is the greatest value that is still somewhat smaller than" '2015-03-14 12:00:00+01'

How do you write properly what I tried to express in the quotes?

I use PostgreSQL (although I have mainly worked with MySQL before). The tables are about 15 thousand (products) and 50 million (history) rows respectively.


Some sample data if you like some:

PRODUCTS

product_id                              name

aa6d9976-e9ae-4478-486e-097e86c1e5fe    One
8da97d50-540e-4fdb-d032-7f443a9869a0    Two
b51654ea-6190-4ed2-5e23-7075ffd3b472    Three

HISTORY

id  product_id                              price   date

1   aa6d9976-e9ae-4478-486e-097e86c1e5fe    100     2015-03-14 09:30:00+01
2   aa6d9976-e9ae-4478-486e-097e86c1e5fe    110     2015-03-14 10:48:00+01
3   b51654ea-6190-4ed2-5e23-7075ffd3b472    9       2015-03-14 11:01:00+01
4   8da97d50-540e-4fdb-d032-7f443a9869a0    49      2015-03-14 11:27:00+01
5   aa6d9976-e9ae-4478-486e-097e86c1e5fe    109     2015-03-14 11:55:00+01
6   b51654ea-6190-4ed2-5e23-7075ffd3b472    8       2015-03-14 13:59:00+01
7   aa6d9976-e9ae-4478-486e-097e86c1e5fe    110     2015-03-14 16:10:00+01
8   8da97d50-540e-4fdb-d032-7f443a9869a0    48      2015-03-14 19:34:00+01
9   8da97d50-540e-4fdb-d032-7f443a9869a0    49      2015-03-14 23:30:00+01
10  aa6d9976-e9ae-4478-486e-097e86c1e5fe    103     2015-03-14 23:33:00+01

DESIRED OUTPUT

id                                      name    price   date

aa6d9976-e9ae-4478-486e-097e86c1e5fe    One     109     2015-03-14 11:55:00+01
8da97d50-540e-4fdb-d032-7f443a9869a0    Two     49      2015-03-14 11:27:00+01
b51654ea-6190-4ed2-5e23-7075ffd3b472    Three   9       2015-03-14 11:01:00+01


First you write a query to find the maximum date that is less than the date you are querying for each product. That would look something like this:

    select product_id, MAX(date) date
    from history
    where date < '3/14/2015 12:00:00'
    group by product_id

Then you can join that sub-query with your products and history tables to get the results you want:

select products.*, history.price, history.date
from products
left join
    (
    select product_id, MAX(date) date
    from history
    where date < '3/14/2015 12:00:00'
    group by product_id
    ) PriceDates
on products.product_id = PriceDates.product_id
join history
on PriceDates.product_id = history.product_id
    and PriceDates.date = history.date