The new C++ standard, called C++11, is finally here.
It enriches both the language and its standard library, bringing some features that many users awaited, like lamdas, the “auto” type, and so on. But I’m not going to talk about these, there are a lot of good references on the net.
Some days ago, after viewing Bartosz Milewski’s excellent tutorial on C++11 concurrency, I started playing with the language additions, trying to mimic the behavior of some OpenMP directive. (Again, if you want to learn more about OpenMP, surf the internet. You can start from the official about page).
I’ll show you some of these experiments. Of course we aren’t going to fully implement even a single directive, but maybe we can learn something about the new standard. Readers’ comments and suggestion are welcome!
testing the code
C++11 is a young standard, and the compilers still don’t support it fully. I’ve used GCC 4.7 to compile the code below, but you can try with your favorite C++ compiler. Here’s a nice table summarizing support for the new features in various popular compilers: C++0xCompilerSupport.
To enable support for the new features in g++, add the switch -std=c++0x
, to compile OpenMP code add -fopenmp
too.
parallel
With OpenMP, a programmer can introduce parallelism adding compiler directives and using its library functions. It uses a fork-join execution model. The simplest way to enable parallel execution of a region of code is via the parallel
directive. Here’s a very simple example:
#include <iostream> #include <omp.h> using namespace std; int main() { #pragma omp parallel num_threads(4) { cout << "I'm thread number " << omp_get_thread_num() << endl; } cout << "This code is executed by one thread\n"; return 0; }
The example is self-explanatory: the code block after the OpenMP pragma "parallel" is executed by 4 threads.
At the end of the region there's an implicit barrier, so the last cout is executed only when all the threads have left the parallel region.
Copy this code in a file, compile it, run it and look at the output (eg. g++ -std=c++0x -fopenmp para1.c -o para1; ./para1
)
Let's see how to emulate this behavior using C++'s std::thread
.
threads in C++11
To start a new thread, in C++11 we just need to create a std::thread
object. The simpest (and useless!) example I can imagine is this:
#include <iostream> #include <thread> using namespace std; void hello() { cout << "Hello from a thread\n"; } int main() { thread aThread(&hello); aThread.join(); return 0; }
We can avoid passing a function pointer in line 11 and make the thing nicer using a lambda:
(From now on I'm going to omit some of the includes and other repeated code for brevity. It should be easy to add the missing parts.)
int main() { thread aThread([]() { cout << "Hello from a thread\n"; }); aThread.join(); return 0; }
let's use the threads
Ok, we can use threads to execute some work in parallel. Let's write a trivial thread pool class for the purpose.
using namespace std; typedef void (*task) (); class thread_pool { private: vectorthe_pool; public: thread_pool(unsigned int num_threads, task tbd) { for(int i = 0; i < num_threads; ++i) { the_pool.push_back(thread(tbd)); } } void join() { for_each(the_pool.begin(), the_pool.end(), [] (thread& t) {t.join();}); } void nowait() { for_each(the_pool.begin(), the_pool.end(), [] (thread& t) {t.detach();}); } };
It's just a wrapper over a vector
of threads, with some method we'll find useful later.
We can use it this way:
thread_pool pool(4, []() { cout << "Here I am: " << this_thread::get_id() << endl; }); cout << "I can do other things before waiting for them to finish!" << endl; pool.join();
Put these lines in a main and run this example. As you can see:
- Four threads are stared, each of them executes the code in the lambda (says "Here I am")
- The main thread can get some other work done before joining them
syntactic sugar
With a bit of syntactic sugar, we can make the code to resemble the OpenMP version more closely. A couple of macros will help us:
// class thread_pool omitted... #define parallel_do_(N) thread_pool (N, []() #define parallel_end ).join(); int main() { parallel_do_(4) { cout << "Here I am: " << this_thread::get_id() << endl; } parallel_end return 0; }
a bit more
As I said you at the beginning, we are not trying to emulate the parallel
construct fully, it does a lot more and has a lot of clauses that control its behavior. However, we can easily add support for a couple of nice things:
- Let the system choose an appropriate number of threads
- Avoid the implicit barrier at the end of the parallel region
Omit the number of threads
When you don't specify the num_threads
clause, OpenMP figures out itself the number of threads to start, based on the hardware resources available (and a lot of other things!). We can achieve a similar result using thread::hardware_concurrency()
as a default value for num_threads.
Don't wait at the barrier
The nowait
clause instructs OpenMP to not generate a barrier at the end of the parallel region. We can do this by detaching from the threads in the pool instead of joining them.
The following listing shows the new code and a sample of use.
// class thread_pool omitted... #define parallel_do_(N) thread_pool (N, []() #define parallel_do parallel_do_(thread::hardware_concurrency()) #define parallel_end ).join(); #define parallel_end_nowait ).nowait(); int main() { parallel_do_(4) { cout << "Here I am: " << this_thread::get_id() << endl; } parallel_end_nowait cout << "[MASTER] I can do other things while they complete...\n"; //With default number of threads parallel_do { cout << "Let's count ourselves. I'm " << this_thread::get_id() << endl; } parallel_end cout << "[MASTER] Goodbye.\n"; return 0; }
Today we will stop here. I hope you enjoyed the reading.
Share your thoughts in the comments.
Tags: c++11, OpenMP, parallel programming, threads
[...] Last time, we coded a small OpenMP-style parallel construct using some macro directives and a class wrapping a vector of threads. [...]