Computing Variance, Standard Deviation, and Other Statistical FunctionsProblemYou want to compute one or more of the common statistics such as variance, standard deviation, skew, and kurtosis of a sequence of numbers. SolutionYou can use the accumulate function from the <numeric> header to compute many meaningful statistical functions beyond simply the sum by passing custom function objects. Example 11-9 shows how to compute several important statistical functions, using accumulate. Example 11-9. Statistical functions
#include <numeric>
#include <cmath>
#include <algorithm>
#include <functional>
#include <vector>
#include <iostream>
using namespace std;
template<int N, class T>
T nthPower(T x) {
T ret = x;
for (int i=1; i < N; ++i) {
ret *= x;
}
return ret;
}
template<class T, int N>
struct SumDiffNthPower {
SumDiffNthPower(T x) : mean_(x) { };
T operator( )(T sum, T current) {
return sum + nthPower<N>(current - mean_);
}
T mean_;
};
template<class T, int N, class Iter_T>
T nthMoment(Iter_T first, Iter_T last, T mean) {
size_t cnt = distance(first, last);
return accumulate(first, last, T( ), SumDiffNthPower<T, N>(mean)) / cnt;
}
template<class T, class Iter_T>
T computeVariance(Iter_T first, Iter_T last, T mean) {
return nthMoment<T, 2>(first, last, mean);
}
template<class T, class Iter_T>
T computeStdDev(Iter_T first, Iter_T last, T mean) {
return sqrt(computeVariance(first, last, mean));
}
template<class T, class Iter_T>
T computeSkew(Iter_T begin, Iter_T end, T mean) {
T m3 = nthMoment<T, 3>(begin, end, mean);
T m2 = nthMoment<T, 2>(begin, end, mean);
return m3 / (m2 * sqrt(m2));
}
template<class T, class Iter_T>
T computeKurtosisExcess(Iter_T begin, Iter_T end, T mean) {
T m4 = nthMoment<T, 4>(begin, end, mean);
T m2 = nthMoment<T, 2>(begin, end, mean);
return m4 / (m2 * m2) - 3;
}
template<class T, class Iter_T>
void computeStats(Iter_T first, Iter_T last, T& sum, T& mean,
T& var, T& std_dev, T& skew, T& kurt)
{
size_t cnt = distance(first, last);
sum = accumulate(first, last, T( ));
mean = sum / cnt;
var = computeVariance(first, last, mean);
std_dev = sqrt(var);
skew = computeSkew(first, last, mean);
kurt = computeKurtosisExcess(first, last, mean);
}
int main( )
{
vector<int> v;
v.push_back(2);
v.push_back(4);
v.push_back(8);
v.push_back(10);
v.push_back(99);
v.push_back(1);
double sum, mean, var, dev, skew, kurt;
computeStats(v.begin( ), v.end( ), sum, mean, var, dev, skew, kurt);
cout << "count = " << v.size( ) << "\n";
cout << "sum = " << sum << "\n";
cout << "mean = " << mean << "\n";
cout << "variance = " << var << "\n";
cout << "standard deviation = " << dev << "\n";
cout << "skew = " << skew << "\n";
cout << "kurtosis excess = " << kurt << "\n";
cout << endl;
}
The program in Example 11-9 produces the following output: count = 6 sum = 124 mean = 20.6667 variance = 1237.22 standard deviation = 35.1742 skew = 1.75664 kurtosis excess = 1.14171 DiscussionSome of the most important statistical functions (e.g., variance, standard deviation, skew, and kurtosis) are defined in terms of standardized sample moments about the mean. The precise definitions of statistical functions vary somewhat from text to text. This text uses the unbiased definitions of the statistical functions shown in Table 11-1. Table 11-1. Definitions of statistical functions
The simplest way to code the statistical functions is to define them all in terms of moments. Since there are several different moments used, each one accepting a constant integer value, I pass the constant as a template parameter. This allows the compiler to potentially generate more efficient code because the integer is known at compile time. The moment function is defined using the mathematical summation operator . Whenever you think of the summation operation you should think of the accumulate function from the <numeric> header. The accumulate function has two forms: one accumulates using operator+ and the other uses an accumulator functor that you need to provide. Your accumulator functor will accept two values: the accumulated value so far, and the value at a specific position in the sequence. Example 11-10 illustrates how accumulate works by showing how the user supplied functor is called repeatedly for each element in a series. Example 11-10. Sample implementation of accumulate
template<class Iter_T, class Value_T, class BinOp_T>
Iter_T accumulate(Iter_T begin, Iter_T end, Value_T value, BinOp_T op) {
while (begin != end) {
value = op(value, *begin++)
}
return value;
}
|
