Cycles casts a pointer from ShaderDataTinyStorage to ShaderData, these structs by default had different alignments however (the former was 1-byte aligned, the latter 16-byte). This caused undefined behavior on at least the CUDA platform. Forcing both structs to use the same alignment fixes this.
CUDA toolkits newer than 10.1 run into this because of a compiler optimization.
See also kernel_path.h:
ShaderDataTinyStorage emission_sd_storage; ShaderData *emission_sd = AS_SHADER_DATA(&emission_sd_storage);