Example 1

Consider the following example of a parallel program that sums the elements of an array: // File: example1.cpp // #include <stdio.h> #include <omp.h> int main() { int reps = 10000 ; long int A[reps] ; long int sum = 0 ; for (long int i=0; i < reps ; i++) { A[i] = i ; } #pragma omp parallel shared(A,sum,reps) { #pragma omp single { // only one thread has to do this omp_set_num_threads(4); printf("Number of threads = %d\n", omp_get_num_threads() ) ; } #pragma omp for schedule(static,5) for (long int i=0; i < reps; i++) { sum += A[i] ; } } // end of parallel region printf("sum = %ld\n", sum) ; return 0 ; }

Download: example1.cpp

Download this program. Then, compile and run it a few times. On GL, you need to use the -fopenmp flag to compile:

g++ -fopenmp example1.cpp

What happens when you run this program a few times? Does it give the same answers? (Hint: no.) The reason for this is that the shared variable sum is being updated by different threads and sum of the updates might get lost.

This is a common situation. OpenMP uses the reduction clause to solve this problem.