When I add two columns
w together the results of the rows give
NULL if one or both summand are
NULL. This makes sense. But if I sum over the column with
SUM(v) there will be a numeric result even there are
NULLs. Is this due to practical convience? Or is there a theoretical justification?
Here is an example to clarify (MySQL and sqlite):
CREATE TABLE x0 ( id INTEGER , v DOUBLE , w DOUBLE ); INSERT INTO x0 VALUES (1, 1, 1) , (2, 1, 1) , (3, NULL, 1) , (4, 1, NULL) , (5, NULL, NULL) ; -- NULL if summand is NULL SELECT v+w FROM x0 ; -- v+w -- 2 -- 2 -- \N -- \N -- \N -- NULL if summand is NULL SELECT id, v+w, SUM(v+w) FROM x0 GROUP BY id ; -- id v+w SUM(v+w) -- 1 2 2 -- 2 2 2 -- 3 \N \N -- 4 \N \N -- 5 \N \N -- There is numeric results even if some summands are NULL SELECT SUM(v), SUM(w) FROM x0 ; -- SUM(v) SUM(w) -- 3 3
ISO/IEC 9075:1992 Database Language SQL says in section 6.5 (<set function specification>), General Rules, 1) b):
let TX be the single-column table that is the result of applying the <value expression> to each row of T and eliminating null values.
And that's all it says. So aggregate functions must ignore NULLs, but, officially, there is no justification.
This behaviour goes back to the first SQL implementations (IBM System R, Oracle V2), where the designers probably thought this would be a good idea. And later, nobody wanted to break backwards compatibility.
In this aspect, as in others, SQL's handling of NULL values is inconsistent, and sometimes not portable.