Trixi-GPU

  1. About Trixi-GPU
  2. TrixiCUDA.jl
    1. Recent Update
  3. Acceleration Overview
    1. CPU and GPU Data Flow
    2. ODE and PDE Acceleration
    3. Smediscretization on GPU
  4. News
  5. Acknowledgments

About Trixi-GPU

Trixi-GPU offers GPU acceleration for solving hyperbolic PDEs. The core solvers are based on the Trixi-Framework, with its acceleration powered by JuliaGPU. Currently, NVIDIA CUDA serves as our primary and experimental GPU acceleration support, and other types of GPU support will be implemented in the future.

TrixiCUDA.jl

TrixiCUDA.jl offers CUDA acceleration for solving hyperbolic PDEs on GPUs. It is our top-priority solution for achieving high acceleration in solving PDEs on GPUs.

Recent Update

Update on Mar 19, 2025:

Update on Jan 28, 2025:

Update on Dec 31, 2024:

Archived Update

Acceleration Overview

CPU and GPU Data Flow

Minimizing large and frequent data transfers between the CPU and GPU is crucial for accelerating large programs. To reduce transfer overhead, initial values and large parameters are initialized and kept on the GPU throughout the solving process. Additionally, GPU-specific interfaces and custom kernels are implemented to speed up data initialization and processing.

ODE and PDE Acceleration

The overall GPU acceleration relies on two parts: (1) ODE acceleration, based on GPU array operations natively accelerated through CUDA.jl, and (2) PDE-specific acceleration, focusing on semidiscretization, implemented with custom kernels. With custom kernels, specialized optimizations focused on things like memory access and algorithms can be applied to achieve further speedup.

Smediscretization on GPU

Semidiscretization is a key part of acceleration due to its potential for full parallelization and the weak data dependencies between some functionalities. Thus, running it on the GPU with pipelined streams is an effective approach to achieving high speedup. But some data dependencies force certain functionalities to remain sequential, making it hard to achieve more intensive pipelining.

News

There is a new project focused on implementing AMR with CUDA dynamic parallelism for the upcoming Google Summer of Code 2025. Please reach out if you are interested.

Acknowledgments

Thanks to our developers, maintainers, and outside contributors for their contributions to our community. Also, special thanks to Prof. Hendrik Ranocha, Prof. Jesse Chan, and Prof. Michael Schlottke-Lakemper for advising this project.

© Trixi-GPU developers. Powered by Franklin.jl and the Julia programming language.