So here’s a mistake I made with the Metal API.
Suppose you have a loop where you’re constructing multiple encoders, one encoder per loop.
And you need to pass a parameter–say, an integer–into each encoder.
So you write the following:
id<MTLBuffer> buffer = [self.device newBufferWithLength:sizeof(uint16_t) options:MTLResourceOptionCPUCacheModeDefault]; for (uint16_t i = 0; i < 5; ++i) { id<MTLComputeCommandEncoder> compute = [buffer computeCommandEncoder]; ... blah blah blah ... memmove(buffer.contents, &i, sizeof(i)); [compute setBuffer:buffer offset:0 atIndex:MyKernelIndex]; ... blah blah blah ... [compute dispatchThread...]; [compute endEncoding]; }
If you run this, I discovered that all five invocations of the kernel will result in the two-byte value at MyKernelIndex to be set to 4–the last value seen in i as we loop.
Why?
Because the same buffer is reused across all five invocations, and because the Metal code isn’t executed until after the entire buffer is committed–the last value passed in is the value that will be used across all invocations.
But if this is replaced with:
for (uint16_t i = 0; i < 5; ++i) { id<MTLComputeCommandEncoder> compute = [buffer computeCommandEncoder]; ... blah blah blah ... [compute setBytes:&i length:sizeof(i) atIndex:MyKernelIndex]; ... blah blah blah ... [compute dispatchThread...]; [compute endEncoding]; }
Each invocation gets a unique value for i.
Just something to watch out for in a deferred execution model.