This question already has an answer here:

- How dangerous is it to compare floating point values? 9 answers
- What is the most effective way for float and double comparison? 24 answers

I have two integer vectors of nearly 1000 size, and what I am going to do is to check whether the sum of the square integer for these two vectors is the same or not. So I write the following codes:

```
std::vector<int> array1;
std::vector<int> array2;
... // initialize array1 and array2, and in the experiment all elements
// in the two vectors are the same but the sequence of elements may be different.
// For example: array1={1001, 2002, 3003, ....}
// array2={2002, 3003, 1001, ....}
assert(array1.size() == array2.size());
float sum_array1 = 0;
float sum_array2 = 0;
for(int i=0; i<array1.size(); i++)
sum_array1 +=array1[i]*array1[i];
for(int i=0; i<array2.size(); i++)
sum_array2 +=array2[i]*array2[i];
```

I expect that `sum_array1`

should be equal to `sum_array2`

, but in fact in my application I found they were different `sum_array1 = 1.2868639e+009`

while `sum_array2 = 1.2868655e+009`

. What I have done next is to change the type of `sum_array1`

and `sum_array2`

to double type as the following codes show:

```
double sum_array1 = 0;
double sum_array2 = 0;
for(int i=0; i<array1.size(); i++)
sum_array1 +=array1[i]*array1[i];
for(int i=0; i<array2.size(); i++)
sum_array2 +=array2[i]*array2[i];
```

This time `sum_array1`

is equal to `sum_array2`

`sum_array1=sum_array2=1286862225.0000000`

. My question is why it could happen. Thanks.

Floating point values have a finite size, and can therefore only represent real values with a finite precision. This leads to rounding errors when you need more precision than they can store.

In particular, when adding a small number (such as those you're summing) to a much larger number (such as your accumulator), the loss of precision can be quite large compared with the small number, giving a significant error; and the errors will be different depending on the order.

Typically, `float`

has 24 bits of precision, corresponding to about 7 decimal places. Your accumulator requires 10 decimal places (around 30 bits), so you will experience this loss of precision. Typically, `double`

has 53 bits (about 16 decimal places), so your result can be represented exactly.

A 64-bit integer may be the best option here, since all the inputs are integers. Using an integer avoids loss of precision, but introduces a danger of overflow if the inputs are too many or too large.

To minimise the error if you can't use a wide enough accumulator, you could sort the input so that the smallest values are accumulated first; or you could use more complicated methods such as Kahan summation.