-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strategy for unit testing compute kernels generated from LLVM backend #540
Comments
Copying @georgemitenkov's comments from #533 (comment) : ---- start ---- @pramodk Regarding testing, I had one idea:
This is not actual IR check but suits integration test purposes. For example something like this:
I am currently using this to verify the vectorised code. ---- end ---- |
I was thinking in similar direction! Before writing details about what I was thinking, let me clarify few questions regarding your proposal:
Considering above questions, I was thinking of following:
As shown above, one has just to take care of alignment/padding aspect i.e. we have to pin pointers or non-pointer variables at particular offset.
Does this make sense? The reason I am thinking with above approach is that 1) we don't know the type of Implementing above wouldn't be complicated : allocating some memory block and setup pointers at particular offset considering alignment. But with LLVM API, if you think it would be even more easier, then feel free to propose! cc: @iomaganaris Edit : may be I can provide a pseudo code later today and that might help to explain my text. |
Here is very abstract code for above logic: SCENARIO("compute kernel test", "[llvm][runner]") {
GIVEN("mod file ") {
std::string nmodl_text = R"(
NEURON {
SUFFIX hh
USEION na READ ena WRITE ina
USEION k READ ek WRITE ik
NONSPECIFIC_CURRENT il
RANGE gnabar, gkbar, gl, el, gna, gk
RANGE minf, hinf, ninf, mtau, htau, ntau
THREADSAFE : assigned GLOBALs will be per thread
}
...
DERIVATIVE states {
m' = (minf-m)/mtau
h' = (hinf-h)/htau
n' = (ninf-n)/ntau
}
)";
NmodlDriver driver;
const auto& ast = driver.parse_string(nmodl_text);
...
codegen::CodegenLLVMHelperVisitor v(.....);
v.visit_program(*ast);
...
// we now retrieve information about how many double*, int*, double and int in the structure
auto& some_instance_struct_info = v.get_some_useful_instance_struct_info();
..
// here we allocate instruct struct object object with same seed and hence data1 and data2 are same
// `allocate_and_initialize_instruct_struct` will allocate base struct and will setup pointers to actual data
// note the data is just `void*` which can be type cast to actual type inside JIT runner
void* data1 = allocate_and_initialize_instruct_struct(some_instance_struct_info, SEED1, NUM_NODE_COUNT);
void* data2 = allocate_and_initialize_instruct_struct(some_instance_struct_info, SEED1, NUM_NODE_COUNT);
...
// based on the backends, we now can run kernels with different backends / vector width
Runner your_runner1(m, data1, vector_width=1);
Runner your_runner2(m, data2, vector_width=4);
Runner your_runner3(m, data3, gpu=true);
...
// compare the results or print them if required
compare_data_with_some_condition(data1, data2);
compare_data_with_some_condition(data1, data3);
...
// cleanup
deallocate_instruct_struct(data1);
deallocate_instruct_struct(data2); |
Right, I see! We indeed do not know what the InstanceStruct would be until we parse the AST. This seems logical approach to me and I think it is good because of the uniformity over all backends. I will look more into this as well. Also:
|
After some more thinking, I had another idea that helps in my opinion (not really specific to anything:) ) I think that I also better understand this approach, so we can have a sync later on the weekend/Monday to discuss more. We define a c++ class for all inputs: struct InstanceInfo {
basePtr;
offsetsPtr;
sizesPtr;
num_elems;
}; Then // We will call these functions in our LLVM wrapper file
extern "C" void _interface_init_struct(InstanceInfo *info) {
// C code to set up the fields;
}
extern "C" void _interface_print_struct(InstanceInfo *info) {
// C code to print struct;
} The only thing that is left, is to define a conversion to our struct. We can define: // here info contains the data, type is taken from AST or LLVM generated kernel code.
llvm::Value* infoToStruct(InstanceInfo *info, llvm::Type * instanceType) {
// code that produces instructions to transform Info to the struct we need: basically
// iterate over `num_elems` and get the members with the size/offset calculation.
} Overall, we generate LLVM module following steps:
|
Yes, I was thinking LLVM backends with AVX2, AVX512 or ARM NEON. (but same data structure could be used for testing non-LLVM based backends but that would need additional work for runners)
That's correct. Just a note : one needs to bit careful about the size of struct due to padding / alignment. We have simple struct with double*, int*, double and int. So it's not that complicated but something to keep in mind.
char *instance = allocate(whatever size required for pointers + extra data + padding/alignment );
// 1st member data
*(instance + 0) = allocate (sizeof(double) * node_count);
// 2nd member data (8 considering pointer size)
*(instance + 8) = allocate (sizeof(double) * node_count);
... similar offset calculation for rest of the data members and their data allocation
// double and int variables are directly stored as values
*(instance + X) = 0.025 // dt |
Yeah, I think discussion would be helpful. I was thinking about padding/alignment aspects to avoid this transformation. i.e. if you create a memory block with right pointers then you can directly typecast the pointer to instanceType. May be discussion would clarify things! |
I see, thank you for the example! |
Assume sample mod file like this:
The struct for holding all data is generated looks like this:
And compute function generated looks like:
This compute kernel generated in-memory and translated to LLVM IR. Our goal is to:
What needs to happen?
INSTANCE_STRUCT
instancenrn_state_hh
with theINSTANCE_STRUCT
parameterAs kernels and
INSTANCE_STRUCT
are generated dynamically, how to do such testing?The text was updated successfully, but these errors were encountered: