RenderFormer CUDA Engine

Standalone CUDA Inference for Neural Rendering (SIGGRAPH 2025)

RenderFormer CUDA Engine is a high-performance C++20/CUDA inference engine for RenderFormer (SIGGRAPH 2025), a transformer-based neural renderer that converts triangle meshes with global illumination into photorealistic images.

RenderFormer Gallery

Performance

Benchmarked on RTX 3080 Ti, model renderformer-v1.1-swin-large (483M params):

	PyTorch fp16	CUDA Engine (512×512)	CUDA Engine (320×320)
Per-view render	~1.5s	78ms (8.3× faster)	34ms (30 FPS)

Key Features

Zero Python dependencies — loads safetensors + HDF5 directly in C++
cuDNN fused SDPA — tensor-core flash attention for cross-attention
CUDA Graph capture — 732-node graph replay with zero CPU launch overhead
Full fp16 inference — mixed-precision encoder + full fp16 decoder
KV cache — cross-attention K,V computed once, reused across all views
Interactive viewer — GLFW/OpenGL with trackball camera, async double-buffered rendering

RenderFormer CUDA Engine

Standalone CUDA Inference for Neural Rendering (SIGGRAPH 2025)

Performance

Key Features

Links