### Fast Perlin Noise

#### by Ben Weston

I’m a POV-Ray fanatic, using it for everything from art to drawing graphs. Those years of using POV have given me a lot of skills with procedurally generated textures & shapes that I seldom get a chance to use in my day job. So, I figured I’d leverage those abilities into my hobby programming projects.

The first step, of course, is implementing Perlin noise.

I started by reading this splendid paper on “Simplex Noise”, Perlin’s faster version of his classic noise. But while reading it it occured to me that it might be possible to do Perlin noise using a single texture lookup, rather than the 8 samples it normally uses (for 3D noise), or the 4 samples for simplex.

*Note that I haven’t done much research into efficient implementations of Perlin noise, so it’s quite possible I’ve rediscovered someone else’s technique. If someone’s seen this before please let me know in the comments, and I’ll give credit.*

## Maths!:

The heart of the perlin noise algorithm is based on a grid, with a random gradient assigned to each gridpoint. These gradients are extrapolated to the position of the sample point, and blended together to create a smooth random noise value. For simplicity’s sake I’m only going to write the maths for one of the dimensions:

output = dot(grad0,uvw)*(1-f(u)) + dot(grad1,uvw-<1,0,0>)*f(u)

Where uvw is the position within the grid square. If we have unit grid spacing, and an input position xyz, then uvw = frac(xyz). grad0 and grad1 are the random gradient vectors for points at u=0 and u=1. We extrapolate the gradient so that they have a value of 0 at the grid point, by dot producting the gradient with the position relative to that grid point… I’ve explained that badly, but it’s important to the trickery later on….

Now, look at the f(u) terms. To people who write shaders, this is obviously a lerp(). In fact, in 3D, it’s a trilinear interpolation, using <f(u),f(v),f(w)> to blend between the gradients of the surrounding 8 grid points.

So, the question I asked myself was: * “Can I make the texture hardware do that trilinear blend for me?” *The problem is the uvw term inside the dots, inside the lerp. We have to pull it out if we’re going to get a value into a texture.

First, we need to know a useful property of dots and lerps – they’re distributive. i.e.:

dot(a+b,c) = dot(a,c) + dot(b,c) lerp(a+b,c+d,e) = lerp(a,c,e) + lerp(b,d,e)

And, because a lerp is just a multiply and add, we can do this:

lerp(dot(a,b),dot(a,c),d) = dot(a,lerp(b,c,d))

I shalln’t waste space here with the proof of these, I’d guess if you’re following all of this you can probably prove these things for yourself.

So we can rearrange our formula like so, substituting uvw for xyz-p0, where p0 is the position of the next lowest grid point, i.e. p0=floor(xyz):

output = lerp( dot(grad0,xyz-p0), dot(grad1,xyz-p1), f(u) ) output = lerp( dot(grad0,xyz)-dot(grad0,p0), dot(grad1,xyz)-dot(grad1,p1), f(u) ) output = lerp( dot(grad0,xyz), dot(grad1,xyz), f(u) ) - lerp( dot(grad0,p0), dot(grad1,p1), f(u) ) output = dot( xyz, lerp( grad0, grad1, f(u) ) ) - lerp( dot(grad0,p0), dot(grad1,p1), f(u) )

And, of course, p0 & p1 are just the position of the grid points within the array, which we obviously know when we create the array. Meaning we’re only trying to lerp constants, which means we can put those constants in a texture!

## Implementation:

### Create the texture

Gradient value – Perlin suggests picking from 12 vectors pointing at the edges of a cube. i.e. <-1,1,0>, <1,0,1>, etc. The integer values mean our dot(grad,p) values won’t need a floating point storage format. (Perlin originally suggested these values to save on maths operations, since the dot product just becomes adds and subtracts. On modern vector processors that doesn’t really matter.)

Gradient dot position – I’m using a RGBA8 signed normalized texture, with this value stored in the alpha channel, so my texture can’t be bigger than 64 wide. Because dot(<1,1,0>,<63,63,63>) = 126, so I can just fit a [-126,126] range.

### The Fast Perlin Noise Shader

Here it is, the heart of the algorithm. It’s quite simple:

uvw = frac(xyz); p0 = floor(xyz); f = 6*(uvw^5) - 15*(uvw^4) + 10*(uvw^3); p = p0 + f; sample = readTexture( p ); return dot(xyz,sample.xyz) - sample.w;

### Tiling the pattern

There’s still one big problem with this optimised trickery: when the texture wraps it will jump from dot(grad63,p63) to dot(grad0,p0), and the gradients will break.

I scratched my head for a while, but couldn’t come up with any simpler solution than just manually tiling, by making the last value the same as the first, on every row, column, etc. Then make sure you wrap the input values at 63.

## Results:

I also wrote a conventional Perlin implementation, so I could do a comparison. Other than minor precision issues (probably caused by the texture interpolators) there was no difference.

So… yeah. Perlin noise is now fast enough that I can start building a world with it… *Wooh!*

This may be just the thing for tiling:

http://www.gamedev.net/blog/33/entry-2138456-seamless-noise/

Did you check the glsl ashima noise?

Basically 2D, 3D etc simplex noise function.

Here the code: github.com/ashima/webgl-noise

I haven’t looked at the code, but it says it’s just an implementation of the paper I mentioned at the start. I reckon my technique is simpler.

Have you compared it to Ken’s own GPU version http://http.developer.nvidia.com/GPUGems/gpugems_ch05.html? it uses 2 texture lookups but fewer arithmetic operations.

If I’m understanding correctly he’s just using a higher resolution texture to look-up the result of the noise, rather than actually doing the noise computation. This is certainly good enough for most applications, though I think I can get a higher quality result from a smaller source texture, which may be more suited to my purposes.

I’m using his method using a 32x32x32xRG texture. It’s the fastest noise generator I have yet to find although I haven’t compared it to yours. But it doesn’t have any repetition/warping problems like your solution does as you describe.

Could be useful on mobile GPUs which have struggled with full perlin noise earlier. Do you think the small source texture size is a little limiting, e.g. when using the noise at higher frequency it may obviously repeat without extra work?

Well it’s a 3D texture, so you won’t see repetition unless your surface is aligned to an axis of the texture. Also I’m probably going to rotate the texture for each octave, so multi-fractals should never repeat.

I’m hoping I can get good results with even smaller textures, so I can keep the texture in the cache.

Sorry, missed that it was a 3d texture – not so useful for mobile as-is then since OpenGL ES2 doesn’t support 3D textures out-of-the-box. Still, nice one for desktop GPUs.

This is a very great and original idea. Unfortunately, I don’t think it can work very well. The precalculated dot(p0,grad0) value grows pretty large even with a 63×63×63 texture, and the precision loss at the edge becomes quite huge and unavoidable.

By setting the origin of the grid at the centre of the texture rather than at the (0,0,0) corner you can gain 1 bit of information, but that’s probably all. I haven’t seen your code but I suppose you came across the same issue; I wonder what was your workaround?

You are correct. It works well for the scale of things shown in those screenshots, but I’ve been creating a landscape and this technique isn’t accurate enough for the larger octaves of noise. So I’ve been mixing conventional perlin noise (using the first 3 components from the array only) with my fast noise for the smaller details.

I found a few improvements to the precision: Like you said, offsetting the range gains you 1-bit, also using higher-precision texture formats can improve the quality of the interpolation a lot, but there seems (on my hardware, at least) to be a hard-limit to the precision of the uvw values used in the interpolation, so no matter what you do there will always be a finite number of discontinuous steps between values.

But perhaps future hardware will be more precise…

I have found the built-in texture interpolation to have a disappointingly low precision in all current generation GPUs. For many of my experiments into procedural shading where a texture lokup is used as part of the function, I have had to re-implement the interpolation in explicit shader code. This increases the precision considerably, but slows the computations down, because you circumvent the optimized low precision trilinear interpolation and replace it with true floating point arithmetic operations.