Show HN: Chisel – Profile GPU Kernels Without a GPU (Nvidia and AMD)

github.com

3 points by technoabsurdist 16 hours ago

We built Chisel to make GPU kernel profiling hardware-free. It lets you run chisel profile kernel.cu and get full Nsight/Ncompute or rocprofv3 reports without a GPU needed.

It spins up remote H100, L40S, or MI300X machines (via DigitalOcean for now, but gonna expand backends soon), runs your code, and gives you back detailed traces (kernel timings, memory transfers, API calls, etc). Everything is CLI-based and designed for iterative dev—profiling takes \~1–2 minutes per run.

For example:

# Profile a PyTorch training script on H100 with Nsight Systems chisel profile --nsys train.py

# Profile a HIP kernel on MI300X with system trace chisel profile --rocprofv3="--sys-trace" matrix_add.cpp

Repo: https://github.com/Herdora/chisel PyPI: pip install chisel-cli

Would love feedback! especially from anyone building custom kernels, ML layers, or low-level GPU ops.