# Learn GPU Programming in Your Browser
Sarah Pan, Austin Huang
2024-09-13

- [TL;DR](#tldr)
- [Introducing WebGPU Puzzles](#introducing-webgpu-puzzles)
- [About WebGPU](#about-webgpu)
- [Why We Built WebGPU Puzzles](#why-we-built-webgpu-puzzles)
- [WebGPU Programming Basics](#webgpu-programming-basics)
- [Building WebGPU Puzzles](#building-webgpu-puzzles)

## TL;DR

WebGPU has [arrived](https://developer.chrome.com/blog/webgpu-release),
opening a direct pipeline from the web browser to your local GPU. We’ve
built WebGPU Puzzles to help you try it and explore the possibilities.
It’s a simple, interactive way to learn GPU programming using nothing
but your browser:

[gpupuzzles.answer.ai](https://gpupuzzles.answer.ai)

We challenge you to do the puzzles, then share your ideas about the
possibilities with us!

## Introducing WebGPU Puzzles

WebGPU Puzzles is a web incarnation of [Sasha
Rush](https://rush-nlp.com/)’s [GPU
Puzzles](https://github.com/srush/GPU-Puzzles) - a series of small, fun,
self-contained coding challenges for learning GPU programming. The
original GPU Puzzles was written for Numba/CUDA to be run on a remote
server with a dedicated GPU device.

Using WebGPU, [gpupuzzles.answer.ai](https://gpupuzzles.answer.ai) lets
you write code directly in the browser and have the app execute and
check results automatically. GPU computation runs entirely locally in
your browser[1], whether you have a humble integrated GPU that comes
with your laptop, or a high-end device GPU installed on an expensive
workstation.

<img src="images/gpupuzzles.gif" style="width:88.0%"
data-fig-align="center" />

## About WebGPU

WebGPU is a low-level, high-performance API for web browsers that works
with most modern GPUs (and can be repurposed for native applications
[outside the
browser](https://www.answer.ai/posts/2024-07-11--gpu-cpp.html) too). By
connecting the browser to local GPU compute, WebGPU effectively turns
the web into one massive distributed GPU cluster with every connected
personal device as a compute node.

This means web applications now have the ability to frictionlessly bring
GPU compute online by simply serving a web page. If we think of neural
networks as a new building block for running learned computation using
GPUs, WebGPU provides a complementary building block allowing
frictionless contribution and coordination of local GPU compute over the
web.

WebGPU is still new. It was [rolled out to Chrome in fall 2023 for macOS
and Windows](https://developer.chrome.com/blog/webgpu-release), while
other browsers (Firefox, Safari) and platforms (Linux) are in the
process of [enabling their
implementations](https://github.com/gpuweb/gpuweb/wiki/Implementation-Status).

## Why We Built WebGPU Puzzles

Why did we build this app?

Our first goal is to help beginners write GPU code without worrying
about low-level, vendor-specific tech stacks. As generative AI is
becoming a core building block of computation, the ability to write and
reason about code running on GPUs is increasingly important. We want to
make GPU programming as simple and accessible as regular programming. In
fact, accessibility was one of the reasons we chose to make a web app,
allowing those with little prior GPU programming experience (including
one of the contributors to this project!) to learn the ropes.

Second, for those of you who already know GPU programming through CUDA,
we also wanted to show how the techniques you are already familiar with
can be operationalized in web applications through WebGPU.

Finally, we wanted to demonstrate new possibilities now that the web
browser can directly access local GPU compute. You can think of this
WebGPU Puzzles app as an initial experiment in this new category of
web-based GPU compute applications.

## WebGPU Programming Basics

Although GPU programming is a deep and extensive topic, here we’ll
provide a brief introduction to help newcomers understand the basic
concepts and get started with WebGPU Puzzles. If you are already
familiar with CUDA, you can probably skim this section.

At its core, dispatching a GPU computation differs from everyday CPU
code because the GPU performs its computations *independently* on a
separate device from the CPU. You should think of the GPU invocation
less like a function call to the GPU and more as an asynchronous remote
procedure call to an independent high-throughput computing device.

The tradoff for having high-throughput is that the GPU is not as
flexible as the CPU. For this reason, you write code that runs on the
GPU in a language called WebGPU Shading Language (WGSL). WGSL is
designed to expose the capabilities and limitations of these simpler
computation units as a small domain specific language. It mirrors the
limited computations that are expressible on most GPUs. Thus WGSL
programs have a shallow mapping to GPU hardware, so the WGSL you write
provides fine-grained control of GPU execution.

Furthermore, GPUs they have a limited ability to coordinate locally.
GPUs have restricted capabilities for resource sharing and
synchronization within local units. Threads are the individual
processing units and each runs your WGSL code independently with a
unique thread ID. Groups of threads that can share resources through
shared memory and synchronize are called workgroups in WebGPU (analogous
to blocks in CUDA).

A compute kernel operation specifies the number of threads in a
workgroup and the number of total workgroups in a dispatch. Both of
these are specified as 3 dimensional vectors with the 3 dimensions named
x, y, and z. This 3D spatial organization is a holdover from the
graphics origins of the GPU. For compute kernel operations we can map x,
y, z dimensions to whatever is useful for our computation. If we don’t
need all three dimensions, we can set the unused dimensions to a size of
1.

For those that are familiar with CUDA, here’s a mapping of CUDA
terminology to WebGPU (if you are not familiar with CUDA, you can skip
this table):

<table>
<colgroup>
<col style="width: 33%" />
<col style="width: 33%" />
<col style="width: 33%" />
</colgroup>
<thead>
<tr>
<th>CUDA</th>
<th>WebGPU</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>Device Code</td>
<td>WGSL Code</td>
<td>The code executed on the GPU for parallel computation.</td>
</tr>
<tr>
<td>Thread</td>
<td>Invocation</td>
<td>The smallest unit of execution within a block/workgroup.</td>
</tr>
<tr>
<td>Block</td>
<td>Workgroup</td>
<td>A collection of threads/invocations that execute concurrently and
share memory. These are both organized as and into a 1D, 2D, or 3D
grid.</td>
</tr>
<tr>
<td>Grid</td>
<td>Dispatch</td>
<td>A collection of blocks/workgroups that execute the kernel/WGSL
code.These are also organized as a 1D, 2D, or 3D grid.</td>
</tr>
</tbody>
</table>

In WebGPU puzzles, each puzzle has test cases which fix the number of
threads in a workgroup and the number of workgroups in a dispatch. Your
task is to fill-in the WGSL code that will be executed by each thread in
the workgroup in order to carry out the computation specified by the
test case and pass the tests validating your solution.

## Building WebGPU Puzzles

WebGPU Puzzles has been a fun experiment in combining
[FastHTML](https://fastht.ml/) and [gpu.cpp](https://gpucpp.answer.ai/)
to target the browser.

Give [gpupuzzles.answer.ai](https://gpupuzzles.answer.ai) a try! You can
join us on [Discord](https://discord.gg/zmJVhXsC7f) and let us know how
it goes!

[1] Browser WebGPU support required. Mac/Windows Chrome should work
automatically, Safari and Linux users will need to enable WebGPU in
their browser settings.
CUDA	WebGPU	Meaning
Device Code	WGSL Code	The code executed on the GPU for parallel computation.
Thread	Invocation	The smallest unit of execution within a block/workgroup.
Block	Workgroup	A collection of threads/invocations that execute concurrently and share memory. These are both organized as and into a 1D, 2D, or 3D grid.
Grid	Dispatch	A collection of blocks/workgroups that execute the kernel/WGSL code.These are also organized as a 1D, 2D, or 3D grid.