Projects
RenderFormer CUDA Engine
Standalone CUDA Inference for Neural Rendering (SIGGRAPH 2025)
RenderFormer CUDA Engine is a high-performance C++20/CUDA inference engine for RenderFormer (SIGGRAPH 2025), a transformer-based neural renderer that converts triangle meshes with global illumination into photorealistic images.

Benchmarked on RTX 3080 Ti, model renderformer-v1.1-swin-large (483M params):
| |
PyTorch fp16 |
CUDA Engine (512×512) |
CUDA Engine (320×320) |
| Per-view render |
~1.5s |
78ms (8.3× faster) |
34ms (30 FPS) |
Key Features
- Zero Python dependencies — loads safetensors + HDF5 directly in C++
- cuDNN fused SDPA — tensor-core flash attention for cross-attention
- CUDA Graph capture — 732-node graph replay with zero CPU launch overhead
- Full fp16 inference — mixed-precision encoder + full fp16 decoder
- KV cache — cross-attention K,V computed once, reused across all views
- Interactive viewer — GLFW/OpenGL with trackball camera, async double-buffered rendering
Links