# Norms

The $$l_1$$, $$l_2$$ and $$l_{oo}$$ norms of a vector $$x$$ are given by

 $x _1 = \sum_{i=1}^n x_i$ $x _2 = \sqrt{\sum_{i=1}^n x_{i}^2}$ $x _{oo} = max_i x_i$

In imperative programming languages it is straightforward to calculate these with traditional for-loops. For instance, in python:

However, they all essentially apply well-known operations - maps and folds - on a range of values. In Haskell, they can be expressed very succinctly:

In C++, algorithms such as std::accumulate and std::transform are meant to apply such operations. However, there is no simple way of composing those. This can be achieved with Boost.Range, which offers all STL algorithms and adaptors to chain them:

where Vector is a hypothetical class modelling a Single Pass Range of double.

## Performance

The question arising naturally is whether this abstraction incurs extra overhead; or whether the Boost library and the STL implementation getting called under the hood invoke some magic that will win the day. In the following snippet, three routines are applied to a std::vector<long long>; a naive loop accessing the elements by index, a C++11 range-based for and the algorithm composition from Boost range:

#include <random>
#include <algorithm>
#include <boost/timer/timer.hpp>
#include <cmath>
#include <vector>
#include <boost/range/numeric.hpp>

using Int = long long;

Int simple_one_norm(const std::vector<Int>& v){
Int sum = 0;
std::size_t n = v.size();
for (std::size_t i=0; i<n; ++i)
sum += std::abs(v[i]);
return sum;
}

Int elegant_one_norm(const std::vector<Int>& v){
Int sum = 0;
for (auto i : v)
sum += std::abs(i);
return sum;
}

Int range_one_norm(const std::vector<Int>& v){
using boost::accumulate;
auto abs = (Int(*)(Int))&std::abs;

return accumulate(v | transformed(abs), static_cast<Int>(0));
}

int main(){
Int n = 1000000;

//fill the vector with values generated at runtime
//so that the compiler cannot precompute any results
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<Int> dis(-n, n);
std::vector<Int> v(n);
std::generate(v.begin(), v.end(), std::bind(dis, gen));

Int result_1, result_2, result_3;

{
boost::timer::auto_cpu_timer timer;
result_1 = simple_one_norm(v);
}
{
boost::timer::auto_cpu_timer timer;
result_2 = elegant_one_norm(v);
}
{
boost::timer::auto_cpu_timer timer;
result_3 = range_one_norm(v);
}
std::cout << result_1 << '\n' << result_2
<< '\n' << result_3 << '\n';
}


Compiling the snippet with GCC 4.8 on 64-bit machine and the O2 flag, the following python script is then used to calculate an average for the wall time of each:

import subprocess

simple_one_norm = 0
elegant_one_norm = 0
range_one_norm = 0

n = 1000

for i in range(n):
raw_output = subprocess.check_output('./a.out')
processed_output = raw_output.decode().split('\n')
simple_one_norm += float(processed_output[0].split()[0][:-1])
elegant_one_norm += float(processed_output[1].split()[0][:-1])
range_one_norm += float(processed_output[2].split()[0][:-1])
print(simple_one_norm/n, elegant_one_norm/n, range_one_norm/n, sep='\n')


resulting in:

0.0020477590000000045
0.0016447669999999982
0.0016487220000000018


Accessing the elements by index performs the worst, whilst the other two routines come out roughly equal.