CUDA RNAfold

~~The version of RNAfold to run on nVidia graphics cards and Tesla high performance computing parallel hardware will be available shortly.~~ At present CUDA support for RNAfold is only included in ViennaRNA-2.3.0cuda

Contact me or Ronny for advanced version.

Technical report

Notes on compiling ViennaRNA Package v2.3.0cuda with CUDA 12.1 and GCC 10.2.1

nvcc options seem to have changed. Try setting only nVidia compute level 5.0 only.

After downloading and unpacking ViennaRNA-2.3.0cuda.tar.gz need something like:
./configure NVCC_PATH=/path/to/cuda/bin NVCC_SAMPLES=/path/to/cuda/samples CUDA_SMS=50 --enable-sse

CUDA_SMS= is needed. E.g. CUDA_SMS=50 for CUDA Capability 5.0 such as NVIDIA GeForce GTX 745

--enable-sse needed if you want to include support for Intel AVX512 vector instructions.

make fails ld: globals.o:globals.h:92: multiple definition of `GAV'; baum.o:globals.h:92: first defined here

Someone has already fixed src/Kinfold/globals.*

Fix might be to replace src/Kinfold/globals.c and src/Kinfold/globals.h with versions from 2.5.1 but then have to avoid picking up rect_flag which did not exist in 2.3.0 (e.g. by commenting out the 1 line which sets it in the 2.5.1 code.)

Alternatively add extern before each of: GlobVars GSV; GlobVars GAV; GlobVars GTV;

Delete *.o before running make again. (Unfortunately here it appears that make assumes it is only run once and so may not recompile all the modified files it would normally be expected to.)

For debugging make V=1 may help.

For long molecule GeForce GTX 745 2.9 times faster than 3.6 GHz CPU

On one molecule, 11446 nucleotides long, RNAfold 2.3.0cuda using a single nVidia GeForce GTX 745 graphics card, was about 2.92 times faster than RNAfold 2.5.1 running on the same Intel i7-4790 3.60GHz desktop.

Older versions of CUDA

ViennaRNA-2.3.0cuda configure error message: `Could not find nvcc`

Work around

Make sure nvcc is in your PATH
E.g. export PATH=$PATH:/usr/local/cuda/bin

ViennaRNA-2.3.0cuda make error message:
`NVCC modular_decomposition.o nvcc fatal : Unsupported gpu architecture 'compute_61'`

Work around

Make sure you are using a version of the CUDA nvcc compiler (e.g. CUDA 9.1) which supports nVidia GPU compute level 6.1

Is your GPU at compute level 6.1?

If you cannot upgrade your version of CUDA, it may be possible to compile for an older compute level and run the CUDA version of RNAfold with only marginal loss of efficiency.

Use `make V=1`

To confirm the nvvc command line used by make.

gcc `mfe_cuda.c warning: passing argument discards const qualifier from pointer target type`

Work around

These warnings are fine. The new code uses const and passes such data to older code which does not. The older code does not modify the data. However the compiler does not know this and hence issue many warnings when compiling mfe_cuda.c

It appears on newer versions of gcc, such warnings can be suppressed with -Wno-discarded-qualifiers The gcc command line option -Wdiscarded-qualifiers is relatively new but in older (e.g. version 4.8.5) the -w option will suppress all warnings.

`NVCC modular_decomposition.o /usr/include/stdc-predef.h fatal error: cuda_runtime.h: No such file or directory`

Problem can arise when old version of CUDA nvcc is used with newer command line. (E.g. CUDA 7.0 versus 9.1)

Work around

Ensure CUDA 9.1 is used through out compilation.

`NVLD RNAfold /usr/bin/crt/link.stub fatal error: host_defines.h: No such file or directory`

It appears this can arise when a different version of CUDA nvcc is used to link the RNAfold executable than was used to compile it. (E.g. CUDA 7.0 versus 9.1)

Work around

Ensure CUDA 9.1 is used both to compile and link

`modular_decomposition.cu init_gpu(1, 60) cudaMalloc d_energy_min 244 returned error CUDA driver version is insufficient for CUDA runtime version (code 35) line(236)`

It appears this is a known compatibility problem between nVidia driver 384.98 and CUDA 9.1 (and, I guess, every version older than 384.98. For example, it appears also with nvidia driver version 384.59).

Use nvidia-smi to find which driver you have installed and nvcc -V to find the CUDA compiler's version.

Work around

Recompile on the host (e.g. with CUDA 8.0), rather than simply copying the executable. Alternatively you might upgrade the nVidia driver.

RNAfold 2.3.5 and SSE

ViennaRNA Package version 2.3.5 (April 2017) is the first version to support SSE. However SSE is not enabled by default.

To get the speed advantages of SSE instructions you will need

An x86 processor that supports SSE.
Any CPU less than five years old is probably going to include SSE. E.g., with unix look in the special system file /proc/cpuinfo, or look at Wikipedia.
A C compiler that supports SSE.
E.g. almost any recent version of GCC (GCC 4.3 onwards) which supports the -msse4.1 command line flag.
When you install ViennaRNA-2.3.5.tar.gz use ./configure --enable-sse before you run make.

Error message:
`CCLD RNAfold /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/smmintrin.h: In function 'vrna_E_ml_stems_fast': error: '__builtin_ia32_pminsd128' needs isa option -m32 -msse4.1 return (m128i) builtin_ia32_pminsd128 ((v4si)X, (v4si)Y); lto-wrapper: gcc returned 1 exit status`

Work around?

Make sure you are using a version of the C compiler which supports SSE. This problem seems to be removed by using GCC release 6.2.1

Error message:
`libtool: link: gcc -fno-strict-aliasing -flto -ffat-lto-objects -fopenmp -g -O2 -fno-strict-aliasing -flto -o RNALfold RNALfold_cmdl.o RNALfold.o ../../src/ViennaRNA/.libs/libRNA_conv.a -lstdc++ -lm -fopenmp lto1: fatal error: bytecode stream generated with LTO version 2.2 instead of the expected 5.1`

This may be because libRNA_conv.a is not compatible with the GCC link time optimiser. E.g. due to it containing object files produced by an old version of the GCC compiler or different version of the GCC compiler.
tewinget suggestion leads to may be trying something like:

strings libRNA_conv.a | grep 'GCC:'| sort -u

Work around

Ensure you are using the same version of GCC for both gcc and g++.

It is possible to disable gcc's link time optimiser with the gcc link time switch -fno-lto (MartinJames)

Error message:
`cannot find -lstdc++ collect2: error: ld returned 1 exit status`

Work around

Ensure have both C and C++ compilers installed.

Ps

The SSE code in RNAfold 2.3.5 is derived from: Improving SSE Parallel Code with Grow and Graft Genetic Programming, William B. Langdon and Ronny Lorenz. In GI 2017, pp1537-1538, 15-19 July, Berlin.

W.B.Langdon 9 July 2017 (last update 4 May 2023)

CUDA RNAfold

Notes on compiling ViennaRNA Package v2.3.0cuda with CUDA 12.1 and GCC 10.2.1

make fails ld: globals.o:globals.h:92: multiple definition of `GAV'; baum.o:globals.h:92: first defined here

For long molecule GeForce GTX 745 2.9 times faster than 3.6 GHz CPU

Older versions of CUDA

ViennaRNA-2.3.0cuda configure error message: Could not find nvcc

Work around

ViennaRNA-2.3.0cuda make error message: NVCC modular_decomposition.o nvcc fatal : Unsupported gpu architecture 'compute_61'

Work around

Use make V=1

gcc mfe_cuda.c warning: passing argument discards const qualifier from pointer target type

Work around

NVCC modular_decomposition.o /usr/include/stdc-predef.h fatal error: cuda_runtime.h: No such file or directory

Work around

NVLD RNAfold /usr/bin/crt/link.stub fatal error: host_defines.h: No such file or directory

Work around

modular_decomposition.cu init_gpu(1, 60) cudaMalloc d_energy_min 244 returned error CUDA driver version is insufficient for CUDA runtime version (code 35) line(236)

Work around

RNAfold 2.3.5 and SSE

Error message: CCLD RNAfold /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/smmintrin.h: In function 'vrna_E_ml_stems_fast': error: '__builtin_ia32_pminsd128' needs isa option -m32 -msse4.1 return (__m128i) __builtin_ia32_pminsd128 ((__v4si)__X, (__v4si)__Y); lto-wrapper: gcc returned 1 exit status

Work around?

Work around

Error message: cannot find -lstdc++ collect2: error: ld returned 1 exit status

Work around

Ps

ViennaRNA-2.3.0cuda configure error message: `Could not find nvcc`

ViennaRNA-2.3.0cuda make error message:
`NVCC modular_decomposition.o nvcc fatal : Unsupported gpu architecture 'compute_61'`

Use `make V=1`

gcc `mfe_cuda.c warning: passing argument discards const qualifier from pointer target type`

`NVCC modular_decomposition.o /usr/include/stdc-predef.h fatal error: cuda_runtime.h: No such file or directory`

`NVLD RNAfold /usr/bin/crt/link.stub fatal error: host_defines.h: No such file or directory`

`modular_decomposition.cu init_gpu(1, 60) cudaMalloc d_energy_min 244 returned error CUDA driver version is insufficient for CUDA runtime version (code 35) line(236)`

Error message:
`CCLD RNAfold /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/smmintrin.h: In function 'vrna_E_ml_stems_fast': error: '__builtin_ia32_pminsd128' needs isa option -m32 -msse4.1 return (m128i) builtin_ia32_pminsd128 ((v4si)X, (v4si)Y); lto-wrapper: gcc returned 1 exit status`

Error message:
`cannot find -lstdc++ collect2: error: ld returned 1 exit status`