Simon Moll
Introduction
LLVM-VE-RV 1.5.1 is a new development release of the LLVM-VE compiler for SX-Aurora. This release offers the following key features:
- Vector code generation and the ability to leverage LLVM’s vectorizers for automatic vectorization.
- Early support for VH-to-VE OpenMP target offloading for C programs, based on sotoc by RWTH Aachen.
- Outer-loop vectorization with the Region Vectorizer.
- LLVM-VE-RV also supports the VEL intrinsics of LLVM-VE (compile with
-mvelintrin
).
Loop Vectorization with LLVM
This is the first time that we have automatic loop vectorization in LLVM working for SX-Aurora. Since this is automatic, all you have to do is compile your code using:
clang(++) --target=ve-linux -O3 [..]
Region Vectorizer for LLVM
This release is also bundled with the Region Vectorizer (RV) for outer-loop vectorization.
The Region Vectorizer is able to vectorize some loops that LLVM cannot handle.
In order to apply RV to a loop, annotate the loop with #pragma omp simd
and
use rvclang(++)
instead of clang(++)
to load the Region Vectorizer plugin into the compiler.
rvclang -fopenmp-simd --target=ve-linux -O3 <prog.c> -o <prog.o>
Preview feature: OpenMP target offloading (sotoc)
This release comes with a preview feature for OpenMP target offloadig from the VH to VE. The underlying method (sotoc) has been developer by RTWH Aachen and leverages VEOffload to perform the actual offloading.
To offload a parallel loop to the VE, use OpenMP pragmas as shown here for a saxpy kernel:
#pragma omp target \
map(to : x[0:m]) \
map(tofrom : y[0:m])
#pragma omp parallel for
for (int i = 0; i < m; ++i) {
y[i] = a * x[i] + y[i];
}
Compile the code with the following flags:
clang -fopenmp -fopenmp-targets=aurora-nec-veort -O3 <your_code.c> -o
This offloading prototype uses the ncc
compiler to compile the VE kernels for
offloading. To use clang
with the VE backend instead, also pass the option
-fopenmp-nec-compiler=clang|rvclang
to compile the VE code with clang or
rvclang.
VELIntrinsics
LLVM-VE-RV also supports VEL Intrinsics for low-level vector programming.
Add the compiler option -mvelintrin
to enable VEL Intrinsics. If you don’t, any VEL intrinsics that involve vector mask registers will lead to a compiler crash.
Object files compiled with and without the -mvelintrin
flag are compatible.
Installation
Pre-build RPMs are available from github.
Building from source
To build the llvm-ve-rv compiler from source, i recommend the automatic build scripts in llvm-dev. Clone llvm-dev with:
git clone -b hpce/release_1.5.1 gh:sx-aurora-dev/llvm-dev.git
Clone all source repositories
. llvm-dev/clone.sh https://github.com/sx-aurora-dev hpce/release_1.5.1
and build llvm-ve-rv 1.5.1 using
bash llvm-dev/build-and-install.sh