Generate compute kernels with correct instance variable #531

pramodk · 2021-02-28T22:50:35Z

Improvements to codegen helper
- instance structure now contains all global variables
- instance structure now contains index variables for ions
- nrn_state kernel now has all variables converted to instance
- InstanceVarHelper added to query variable and it's location
nmodl_to_json helper added in main.cpp
added --vector-width CLI option
Add argument to nrn_state_hh
WIP progress :: add comments as TODOs to support LLVM IR generation
: fix cmake-format
: fix clang-format
: fix tests

Using :

make -j && ./bin/nmodl hh.mod llvm --ir --vector-width 8 passes --json-ast

We get now

INSTANCE_STRUCT {
    DOUBLE *gnabar
    DOUBLE *gkbar
    DOUBLE *gl
    DOUBLE *el
    DOUBLE *gna
    DOUBLE *gk
    DOUBLE *il
    DOUBLE *minf
    DOUBLE *hinf
    DOUBLE *ninf
    DOUBLE *mtau
    DOUBLE *htau
    DOUBLE *ntau
    DOUBLE *m
    DOUBLE *h
    DOUBLE *n
    DOUBLE *Dm
    DOUBLE *Dh
    DOUBLE *Dn
    DOUBLE *ena
    DOUBLE *ek
    DOUBLE *ina
    DOUBLE *ik
    DOUBLE *v_unused
    DOUBLE *g_unused
    DOUBLE *ion_ena
    DOUBLE *ion_ina
    DOUBLE *ion_dinadv
    DOUBLE *ion_ek
    DOUBLE *ion_ik
    DOUBLE *ion_dikdv
    INTEGER *ion_ena_index
    INTEGER *ion_ina_index
    INTEGER *ion_dinadv_index
    INTEGER *ion_ek_index
    INTEGER *ion_ik_index
    INTEGER *ion_dikdv_index
    DOUBLE *voltage
    INTEGER *node_index
    DOUBLE t
    DOUBLE dt
    DOUBLE celsius
    INTEGER secondorder
    INTEGER node_count
}

VOID nrn_state_hh(INSTANCE_STRUCT *mech){
    INTEGER id
    for(id = 0; id<mech->node_count; id = id+8) {
        INTEGER node_id, ena_id, ek_id
        DOUBLE v
        node_id = mech->node_index[id]
        ena_id = mech->ion_ena_index[id]
        ek_id = mech->ion_ek_index[id]
        v = mech->voltage[node_id]
        mech->ena[id] = mech->ion_ena[ena_id]
        mech->ek[id] = mech->ion_ek[ek_id]
        mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
        mech->h[id] = mech->h[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->htau[id])))*(-(((mech->hinf[id]))/mech->htau[id])/((((-1.0)))/mech->htau[id])-mech->h[id])
        mech->n[id] = mech->n[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->ntau[id])))*(-(((mech->ninf[id]))/mech->ntau[id])/((((-1.0)))/mech->ntau[id])-mech->n[id])
    }
}

Note that running this branch result into

libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Error: expecting a type in CodegenVarType node
Abort trap: 6

This is because various fixes in LLVM IR generation are pending.

@georgemitenkov : Could you look at various TODO's mentioned in this 2da2bc4 commit? I will do some improvements to this PR but you can get an idea of what is current structure and if you would like to change other things.

georgemitenkov · 2021-03-01T05:36:03Z

Thanks for doing this, I will address TODOs later today!

The PR looks good. Current error is due to the i32* and f32/64* types that did not exist before. I will fix that.

pramodk · 2021-03-07T09:18:08Z

@georgemitenkov : Quickly checking simple mod file:

NEURON {
    SUFFIX hh
    USEION na READ ena WRITE ina
    USEION k READ ek WRITE ik
    RANGE minf, mtau
}

STATE {
    m
}

ASSIGNED {
    v (mV)
    ena (mV)
    ek (mV)
    ina (mA/cm2)
    ik (mA/cm2)
    il (mA/cm2)
    minf
    mtau (ms)
}

BREAKPOINT {
    SOLVE states METHOD cnexp
}

DERIVATIVE states {
     m' =  (minf-m)/mtau
}

Generating serial IR and compiling with llc works fine. But if I try vector form with:

./bin/nmodl hh.mod llvm --ir --vector-width 8

Storing generated output in hh.ll and then trying to compile with llc gives:

$ llc  hh.ll
/usr/local/opt/llvm/bin/llc: error: /usr/local/opt/llvm/bin/llc: hh.ll:36:9: error: stored value and pointer type do not match
  store <8 x i32> %13, i32* %node_id, align 32
        ^

the type needs to be changed?

 for.body:                                         ; preds = %for.cond
   %node_id = alloca i32, align 4
   %ena_id = alloca i32, align 4
   %ek_id = alloca i32, align 4
   ...
   %12 = getelementptr inbounds <8 x i32>, <8 x i32>* %11, i64 %9
   %13 = load <8 x i32>, <8 x i32>* %12, align 32
   store <8 x i32> %13, i32* %node_id, align 32

georgemitenkov · 2021-03-07T09:35:53Z

@pramodk Yes, this is not fully supported yet. Loading node_id leads to indirect indexing, hence scatters/gathers possibly. So I only generate the code for [id] essentially that is found in compute.

The first allocas need to be vector, but that's a one-liner :) I will add it together with more advanced (indirect) vectorisation support (working on that now).

georgemitenkov · 2021-03-07T09:37:52Z

Basically, we have this:

VOID nrn_state_hh(INSTANCE_STRUCT *mech){
    INTEGER id
    for(id = 0; id<mech->node_count; id = id+8) {

        // This section is needed for indirect indexing, so also NOT supported
        INTEGER node_id, ena_id, ek_id
        DOUBLE v
        node_id = mech->node_index[id]
        ena_id = mech->ion_ena_index[id]
        ek_id = mech->ion_ek_index[id]
        // Indirect indexing, NOT supported
        v = mech->voltage[node_id]
        mech->ena[id] = mech->ion_ena[ena_id]
        mech->ek[id] = mech->ion_ek[ek_id]

        // Direct indexing, supported
        mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
        mech->h[id] = mech->h[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->htau[id])))*(-(((mech->hinf[id]))/mech->htau[id])/((((-1.0)))/mech->htau[id])-mech->h[id])
        mech->n[id] = mech->n[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->ntau[id])))*(-(((mech->ninf[id]))/mech->ntau[id])/((((-1.0)))/mech->ntau[id])-mech->n[id])
    }
}

And I am working on the first 2 unsupported cases 👍

pramodk · 2021-03-07T10:07:36Z

Ah yes! I remember now 👍

Do you plan to fix this in PR or separate PR? If you think we should merge working serial version in this PR then thats fine !

And just for you to know in case it's useful : I was quickly testing with avoiding indirect indexing i.e. with following diff:

→ git diff
diff --git a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
index cb92ade..ffffe8b 100644
--- a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
+++ b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
@@ -240,6 +240,7 @@ void CodegenLLVMHelperVisitor::ion_read_statements(BlockType type,
                                                    std::vector<std::string>& double_variables,
                                                    ast::StatementVector& index_statements,
                                                    ast::StatementVector& body_statements) {
+    return;
     /// create read ion and corresponding index statements
     auto create_read_statements = [&](std::pair<std::string, std::string> variable_names) {
         // variable in current mechanism instance
@@ -305,6 +306,7 @@ void CodegenLLVMHelperVisitor::ion_write_statements(BlockType type,
                                                     std::vector<std::string>& double_variables,
                                                     ast::StatementVector& index_statements,
                                                     ast::StatementVector& body_statements) {
+    return;
     /// create write ion and corresponding index statements
     auto create_write_statements = [&](std::string ion_varname, std::string op, std::string rhs) {
         // index for writing ion variable
@@ -505,9 +507,9 @@ void CodegenLLVMHelperVisitor::visit_nrn_state_block(ast::NrnStateBlock& node) {
         std::vector<std::string> double_variables{"v"};

         /// access node index and corresponding voltage
-        loop_index_statements.push_back(
-            visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
-        loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));
+        //loop_index_statements.push_back(
+        //    visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
+        //loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));

         /// read ion variables
         ion_read_statements(BlockType::State,

and then generated kernel is:

VOID nrn_state_hh(INSTANCE_STRUCT *mech){
    INTEGER id
    for(id = 0; id<mech->node_count; id = id+8) {
        INTEGER node_id
        DOUBLE v
        mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
    }
}

and this gives:

→ llc hh.ll
llc: error: llc: hh.ll:43:14: error: invalid operand type for instruction
  %22 = sdiv <8 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %21
             ^

georgemitenkov · 2021-03-07T10:51:49Z

Do you plan to fix this in PR or separate PR? If you think we should merge working serial version in this PR then thats fine !

I think merging serial should be ok? just keep this branch for vector code.

And just for you to know in case it's useful : I was quickly testing with avoiding indirect indexing i.e. with following diff:

Hm, that is weird , since all types actually match. I will look at this.

pramodk · 2021-03-07T11:00:37Z

Hm, that is weird , since all types actually match. I will look at this.

because sdiv is for integer type and as operands are double, it should be fdiv?

pramodk · 2021-03-07T11:02:44Z

I think merging serial should be ok? just keep this branch for vector code.

Just to be sure : if I merge this branch, everything will be in llvm branch. So what do you mean by "keep this branch for vector code"? (on merge, PR branch is also automatically deleted; but I can re-do push)

georgemitenkov · 2021-03-07T11:31:15Z

Hm, that is weird , since all types actually match. I will look at this.

because sdiv is for integer type and as operands are double, it should be fdiv?

Yes, just fixed that! The problem was that isDoubleTy() was used.

georgemitenkov · 2021-03-07T11:33:07Z

I think merging serial should be ok? just keep this branch for vector code.

Just to be sure : if I merge this branch, everything will be in llvm branch. So what do you mean by "keep this branch for vector code"? (on merge, PR branch is also automatically deleted; but I can re-do push)

I meant to merge this branch but do not delete? But I think it's fine to create a new one. It might be even easier so that the work is split into smaller chunks/PRs

I will push a fix, and then it is ready for merging!

georgemitenkov · 2021-03-07T13:11:26Z

Btw, here is the vectorised code in NMODL, LLVM and assembly obtained with:

./bin/nmodl hh.mod llvm --ir --vector-width 4
llc <llvm_file> -o -

NMODL kernel:

VOID nrn_state_hh(INSTANCE_STRUCT *mech){
    INTEGER id
    for(id = 0; id<mech->node_count; id = id+4) {
        INTEGER node_id
        DOUBLE v
        mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
        mech->h[id] = mech->h[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->htau[id])))*(-(((mech->hinf[id]))/mech->htau[id])/((((-1.0)))/mech->htau[id])-mech->h[id])
        mech->n[id] = mech->n[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->ntau[id])))*(-(((mech->ninf[id]))/mech->ntau[id])/((((-1.0)))/mech->ntau[id])-mech->n[id])
    }
}

Generated LLVM (pretty long)

; ModuleID = 'hh'
source_filename = "hh"

%hh__instance_var__type = type { double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, i32*, i32*, i32*, i32*, i32*, i32*, double*, i32*, double, double, double, i32, i32 }

define void @nrn_state_hh(%hh__instance_var__type* %mech1) {
  %mech = alloca %hh__instance_var__type*, align 8
  store %hh__instance_var__type* %mech1, %hh__instance_var__type** %mech, align 8
  %id = alloca i32, align 4
  store i32 0, i32* %id, align 4
  %__vec_id = alloca <4 x i32>, align 16
  store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %__vec_id, align 16
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %0
  %1 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %2 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %1, i32 0, i32 43
  %3 = load i32, i32* %2, align 4
  %4 = load i32, i32* %id, align 4
  %5 = icmp slt i32 %4, %3
  br i1 %5, label %for.body, label %for.exit

for.body:                                         ; preds = %for.cond
  %node_id = alloca i32, align 4
  %v = alloca double, align 8
  %6 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %7 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %6, i32 0, i32 13
  %8 = load i32, i32* %id, align 4
  %9 = sext i32 %8 to i64
  %10 = load double*, double** %7, align 8
  %11 = bitcast double* %10 to <4 x double>*
  %12 = getelementptr inbounds <4 x double>, <4 x double>* %11, i64 %9
  %13 = load <4 x double>, <4 x double>* %12, align 32
  %14 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %15 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %14, i32 0, i32 10
  %16 = load i32, i32* %id, align 4
  %17 = sext i32 %16 to i64
  %18 = load double*, double** %15, align 8
  %19 = bitcast double* %18 to <4 x double>*
  %20 = getelementptr inbounds <4 x double>, <4 x double>* %19, i64 %17
  %21 = load <4 x double>, <4 x double>* %20, align 32
  %22 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %21
  %23 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %24 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %23, i32 0, i32 10
  %25 = load i32, i32* %id, align 4
  %26 = sext i32 %25 to i64
  %27 = load double*, double** %24, align 8
  %28 = bitcast double* %27 to <4 x double>*
  %29 = getelementptr inbounds <4 x double>, <4 x double>* %28, i64 %26
  %30 = load <4 x double>, <4 x double>* %29, align 32
  %31 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %32 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %31, i32 0, i32 7
  %33 = load i32, i32* %id, align 4
  %34 = sext i32 %33 to i64
  %35 = load double*, double** %32, align 8
  %36 = bitcast double* %35 to <4 x double>*
  %37 = getelementptr inbounds <4 x double>, <4 x double>* %36, i64 %34
  %38 = load <4 x double>, <4 x double>* %37, align 32
  %39 = fdiv <4 x double> %38, %30
  %40 = fneg <4 x double> %39
  %41 = fdiv <4 x double> %40, %22
  %42 = fsub <4 x double> %41, %13
  %43 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %44 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %43, i32 0, i32 10
  %45 = load i32, i32* %id, align 4
  %46 = sext i32 %45 to i64
  %47 = load double*, double** %44, align 8
  %48 = bitcast double* %47 to <4 x double>*
  %49 = getelementptr inbounds <4 x double>, <4 x double>* %48, i64 %46
  %50 = load <4 x double>, <4 x double>* %49, align 32
  %51 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %50
  %52 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %53 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %52, i32 0, i32 40
  %54 = load double, double* %53, align 8
  %.splatinsert = insertelement <4 x double> undef, double %54, i32 0
  %.splat = shufflevector <4 x double> %.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer
  %55 = fmul <4 x double> %.splat, %51
  %56 = call <4 x double> @llvm.exp.v4f64(<4 x double> %55)
  %57 = fsub <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %56
  %58 = fmul <4 x double> %57, %42
  %59 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %60 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %59, i32 0, i32 13
  %61 = load i32, i32* %id, align 4
  %62 = sext i32 %61 to i64
  %63 = load double*, double** %60, align 8
  %64 = bitcast double* %63 to <4 x double>*
  %65 = getelementptr inbounds <4 x double>, <4 x double>* %64, i64 %62
  %66 = load <4 x double>, <4 x double>* %65, align 32
  %67 = fadd <4 x double> %66, %58
  %68 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %69 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %68, i32 0, i32 13
  %70 = load i32, i32* %id, align 4
  %71 = sext i32 %70 to i64
  %72 = load double*, double** %69, align 8
  %73 = bitcast double* %72 to <4 x double>*
  %74 = getelementptr inbounds <4 x double>, <4 x double>* %73, i64 %71
  store <4 x double> %67, <4 x double>* %74, align 32
  %75 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %76 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %75, i32 0, i32 14
  %77 = load i32, i32* %id, align 4
  %78 = sext i32 %77 to i64
  %79 = load double*, double** %76, align 8
  %80 = bitcast double* %79 to <4 x double>*
  %81 = getelementptr inbounds <4 x double>, <4 x double>* %80, i64 %78
  %82 = load <4 x double>, <4 x double>* %81, align 32
  %83 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %84 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %83, i32 0, i32 11
  %85 = load i32, i32* %id, align 4
  %86 = sext i32 %85 to i64
  %87 = load double*, double** %84, align 8
  %88 = bitcast double* %87 to <4 x double>*
  %89 = getelementptr inbounds <4 x double>, <4 x double>* %88, i64 %86
  %90 = load <4 x double>, <4 x double>* %89, align 32
  %91 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %90
  %92 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %93 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %92, i32 0, i32 11
  %94 = load i32, i32* %id, align 4
  %95 = sext i32 %94 to i64
  %96 = load double*, double** %93, align 8
  %97 = bitcast double* %96 to <4 x double>*
  %98 = getelementptr inbounds <4 x double>, <4 x double>* %97, i64 %95
  %99 = load <4 x double>, <4 x double>* %98, align 32
  %100 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %101 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %100, i32 0, i32 8
  %102 = load i32, i32* %id, align 4
  %103 = sext i32 %102 to i64
  %104 = load double*, double** %101, align 8
  %105 = bitcast double* %104 to <4 x double>*
  %106 = getelementptr inbounds <4 x double>, <4 x double>* %105, i64 %103
  %107 = load <4 x double>, <4 x double>* %106, align 32
  %108 = fdiv <4 x double> %107, %99
  %109 = fneg <4 x double> %108
  %110 = fdiv <4 x double> %109, %91
  %111 = fsub <4 x double> %110, %82
  %112 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %113 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %112, i32 0, i32 11
  %114 = load i32, i32* %id, align 4
  %115 = sext i32 %114 to i64
  %116 = load double*, double** %113, align 8
  %117 = bitcast double* %116 to <4 x double>*
  %118 = getelementptr inbounds <4 x double>, <4 x double>* %117, i64 %115
  %119 = load <4 x double>, <4 x double>* %118, align 32
  %120 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %119
  %121 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %122 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %121, i32 0, i32 40
  %123 = load double, double* %122, align 8
  %.splatinsert2 = insertelement <4 x double> undef, double %123, i32 0
  %.splat3 = shufflevector <4 x double> %.splatinsert2, <4 x double> undef, <4 x i32> zeroinitializer
  %124 = fmul <4 x double> %.splat3, %120
  %125 = call <4 x double> @llvm.exp.v4f64(<4 x double> %124)
  %126 = fsub <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %125
  %127 = fmul <4 x double> %126, %111
  %128 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %129 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %128, i32 0, i32 14
  %130 = load i32, i32* %id, align 4
  %131 = sext i32 %130 to i64
  %132 = load double*, double** %129, align 8
  %133 = bitcast double* %132 to <4 x double>*
  %134 = getelementptr inbounds <4 x double>, <4 x double>* %133, i64 %131
  %135 = load <4 x double>, <4 x double>* %134, align 32
  %136 = fadd <4 x double> %135, %127
  %137 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %138 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %137, i32 0, i32 14
  %139 = load i32, i32* %id, align 4
  %140 = sext i32 %139 to i64
  %141 = load double*, double** %138, align 8
  %142 = bitcast double* %141 to <4 x double>*
  %143 = getelementptr inbounds <4 x double>, <4 x double>* %142, i64 %140
  store <4 x double> %136, <4 x double>* %143, align 32
  %144 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %145 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %144, i32 0, i32 15
  %146 = load i32, i32* %id, align 4
  %147 = sext i32 %146 to i64
  %148 = load double*, double** %145, align 8
  %149 = bitcast double* %148 to <4 x double>*
  %150 = getelementptr inbounds <4 x double>, <4 x double>* %149, i64 %147
  %151 = load <4 x double>, <4 x double>* %150, align 32
  %152 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %153 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %152, i32 0, i32 12
  %154 = load i32, i32* %id, align 4
  %155 = sext i32 %154 to i64
  %156 = load double*, double** %153, align 8
  %157 = bitcast double* %156 to <4 x double>*
  %158 = getelementptr inbounds <4 x double>, <4 x double>* %157, i64 %155
  %159 = load <4 x double>, <4 x double>* %158, align 32
  %160 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %159
  %161 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %162 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %161, i32 0, i32 12
  %163 = load i32, i32* %id, align 4
  %164 = sext i32 %163 to i64
  %165 = load double*, double** %162, align 8
  %166 = bitcast double* %165 to <4 x double>*
  %167 = getelementptr inbounds <4 x double>, <4 x double>* %166, i64 %164
  %168 = load <4 x double>, <4 x double>* %167, align 32
  %169 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %170 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %169, i32 0, i32 9
  %171 = load i32, i32* %id, align 4
  %172 = sext i32 %171 to i64
  %173 = load double*, double** %170, align 8
  %174 = bitcast double* %173 to <4 x double>*
  %175 = getelementptr inbounds <4 x double>, <4 x double>* %174, i64 %172
  %176 = load <4 x double>, <4 x double>* %175, align 32
  %177 = fdiv <4 x double> %176, %168
  %178 = fneg <4 x double> %177
  %179 = fdiv <4 x double> %178, %160
  %180 = fsub <4 x double> %179, %151
  %181 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %182 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %181, i32 0, i32 12
  %183 = load i32, i32* %id, align 4
  %184 = sext i32 %183 to i64
  %185 = load double*, double** %182, align 8
  %186 = bitcast double* %185 to <4 x double>*
  %187 = getelementptr inbounds <4 x double>, <4 x double>* %186, i64 %184
  %188 = load <4 x double>, <4 x double>* %187, align 32
  %189 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %188
  %190 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %191 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %190, i32 0, i32 40
  %192 = load double, double* %191, align 8
  %.splatinsert4 = insertelement <4 x double> undef, double %192, i32 0
  %.splat5 = shufflevector <4 x double> %.splatinsert4, <4 x double> undef, <4 x i32> zeroinitializer
  %193 = fmul <4 x double> %.splat5, %189
  %194 = call <4 x double> @llvm.exp.v4f64(<4 x double> %193)
  %195 = fsub <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %194
  %196 = fmul <4 x double> %195, %180
  %197 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %198 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %197, i32 0, i32 15
  %199 = load i32, i32* %id, align 4
  %200 = sext i32 %199 to i64
  %201 = load double*, double** %198, align 8
  %202 = bitcast double* %201 to <4 x double>*
  %203 = getelementptr inbounds <4 x double>, <4 x double>* %202, i64 %200
  %204 = load <4 x double>, <4 x double>* %203, align 32
  %205 = fadd <4 x double> %204, %196
  %206 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %207 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %206, i32 0, i32 15
  %208 = load i32, i32* %id, align 4
  %209 = sext i32 %208 to i64
  %210 = load double*, double** %207, align 8
  %211 = bitcast double* %210 to <4 x double>*
  %212 = getelementptr inbounds <4 x double>, <4 x double>* %211, i64 %209
  store <4 x double> %205, <4 x double>* %212, align 32
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %213 = load i32, i32* %id, align 4
  %214 = add i32 %213, 4
  store i32 %214, i32* %id, align 4
  %215 = load <4 x i32>, <4 x i32>* %__vec_id, align 16
  %216 = add <4 x i32> %215, <i32 4, i32 4, i32 4, i32 4>
  store <4 x i32> %216, <4 x i32>* %__vec_id, align 16
  br label %for.cond

for.exit:                                         ; preds = %for.cond
  ret void
}

; Function Attrs: nounwind readnone speculatable willreturn
declare <4 x double> @llvm.exp.v4f64(<4 x double>) #0

attributes #0 = { nounwind readnone speculatable willreturn }

And assembly:

	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 14
	.section	__TEXT,__literal16,16byte_literals
	.p2align	4                               ## -- Begin function nrn_state_hh
LCPI0_0:
	.long	0                               ## 0x0
	.long	1                               ## 0x1
	.long	2                               ## 0x2
	.long	3                               ## 0x3
LCPI0_1:
	.quad	0xbff0000000000000              ## double -1
	.quad	0xbff0000000000000              ## double -1
LCPI0_2:
	.quad	0x8000000000000000              ## double -0
	.quad	0x8000000000000000              ## double -0
LCPI0_3:
	.quad	0x3ff0000000000000              ## double 1
	.quad	0x3ff0000000000000              ## double 1
LCPI0_4:
	.long	4                               ## 0x4
	.long	4                               ## 0x4
	.long	4                               ## 0x4
	.long	4                               ## 0x4
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_nrn_state_hh
	.p2align	4, 0x90
_nrn_state_hh:                          ## @nrn_state_hh
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	pushq	%r14
	pushq	%rbx
	subq	$160, %rsp
	.cfi_offset %rbx, -32
	.cfi_offset %r14, -24
	movq	%rdi, -72(%rbp)
	movl	$0, -20(%rbp)
	movdqa	LCPI0_0(%rip), %xmm0            ## xmm0 = [0,1,2,3]
	.p2align	4, 0x90
LBB0_1:                                 ## %for.cond
                                        ## =>This Inner Loop Header: Depth=1
	movdqa	%xmm0, -176(%rbp)
	movq	-72(%rbp), %rax
	movl	-20(%rbp), %ecx
	cmpl	340(%rax), %ecx
	jge	LBB0_3
## %bb.2:                               ## %for.body
                                        ##   in Loop: Header=BB0_1 Depth=1
	movq	%rsp, %rax
	addq	$-16, %rax
	movq	%rax, %rsp
	movq	%rsp, %rax
	addq	$-16, %rax
	movq	%rax, %rsp
	movq	-72(%rbp), %rax
	movslq	-20(%rbp), %rbx
	movq	104(%rax), %r14
	shlq	$5, %rbx
	movapd	(%r14,%rbx), %xmm4
	movapd	%xmm4, -144(%rbp)               ## 16-byte Spill
	movapd	16(%r14,%rbx), %xmm3
	movapd	%xmm3, -160(%rbp)               ## 16-byte Spill
	movq	56(%rax), %rcx
	movq	80(%rax), %rdx
	movapd	(%rdx,%rbx), %xmm0
	movapd	16(%rdx,%rbx), %xmm1
	movapd	LCPI0_1(%rip), %xmm2            ## xmm2 = [-1.0E+0,-1.0E+0]
	movapd	%xmm2, %xmm5
	divpd	%xmm0, %xmm5
	movapd	%xmm5, %xmm7
	divpd	%xmm1, %xmm2
	movapd	(%rcx,%rbx), %xmm6
	movapd	16(%rcx,%rbx), %xmm5
	divpd	%xmm1, %xmm5
	divpd	%xmm0, %xmm6
	movapd	LCPI0_2(%rip), %xmm0            ## xmm0 = [-0.0E+0,-0.0E+0]
	xorpd	%xmm0, %xmm6
	xorpd	%xmm0, %xmm5
	divpd	%xmm2, %xmm5
	divpd	%xmm7, %xmm6
	subpd	%xmm4, %xmm6
	movapd	%xmm6, -112(%rbp)               ## 16-byte Spill
	subpd	%xmm3, %xmm5
	movapd	%xmm5, -128(%rbp)               ## 16-byte Spill
	movsd	320(%rax), %xmm0                ## xmm0 = mem[0],zero
	unpcklpd	%xmm0, %xmm0                    ## xmm0 = xmm0[0,0]
	mulpd	%xmm0, %xmm7
	movapd	%xmm7, -96(%rbp)                ## 16-byte Spill
	mulpd	%xmm2, %xmm0
	movapd	%xmm0, -48(%rbp)                ## 16-byte Spill
	callq	_exp
	movapd	%xmm0, -64(%rbp)                ## 16-byte Spill
	movaps	-48(%rbp), %xmm0                ## 16-byte Reload
	movhlps	%xmm0, %xmm0                    ## xmm0 = xmm0[1,1]
	callq	_exp
	movaps	-64(%rbp), %xmm1                ## 16-byte Reload
	movlhps	%xmm0, %xmm1                    ## xmm1 = xmm1[0],xmm0[0]
	movaps	%xmm1, -64(%rbp)                ## 16-byte Spill
	movaps	-96(%rbp), %xmm0                ## 16-byte Reload
	callq	_exp
	movaps	%xmm0, -48(%rbp)                ## 16-byte Spill
	movapd	-96(%rbp), %xmm0                ## 16-byte Reload
	unpckhpd	%xmm0, %xmm0                    ## xmm0 = xmm0[1,1]
	callq	_exp
	movapd	-48(%rbp), %xmm2                ## 16-byte Reload
	unpcklpd	%xmm0, %xmm2                    ## xmm2 = xmm2[0],xmm0[0]
	movapd	LCPI0_3(%rip), %xmm1            ## xmm1 = [1.0E+0,1.0E+0]
	movapd	%xmm1, %xmm0
	subpd	%xmm2, %xmm0
	mulpd	-112(%rbp), %xmm0               ## 16-byte Folded Reload
	subpd	-64(%rbp), %xmm1                ## 16-byte Folded Reload
	mulpd	-128(%rbp), %xmm1               ## 16-byte Folded Reload
	addpd	-144(%rbp), %xmm0               ## 16-byte Folded Reload
	addpd	-160(%rbp), %xmm1               ## 16-byte Folded Reload
	movapd	%xmm1, 16(%r14,%rbx)
	movapd	%xmm0, (%r14,%rbx)
	movq	-72(%rbp), %rax
	movslq	-20(%rbp), %rbx
	movq	112(%rax), %r14
	shlq	$5, %rbx
	movapd	(%r14,%rbx), %xmm4
	movapd	%xmm4, -160(%rbp)               ## 16-byte Spill
	movapd	16(%r14,%rbx), %xmm5
	movapd	%xmm5, -144(%rbp)               ## 16-byte Spill
	movq	64(%rax), %rcx
	movq	88(%rax), %rdx
	movapd	(%rdx,%rbx), %xmm0
	movapd	16(%rdx,%rbx), %xmm1
	movapd	LCPI0_1(%rip), %xmm2            ## xmm2 = [-1.0E+0,-1.0E+0]
	movapd	%xmm2, %xmm3
	divpd	%xmm1, %xmm3
	movapd	%xmm3, %xmm7
	divpd	%xmm0, %xmm2
	movapd	16(%rcx,%rbx), %xmm6
	movapd	(%rcx,%rbx), %xmm3
	divpd	%xmm0, %xmm3
	divpd	%xmm1, %xmm6
	movapd	LCPI0_2(%rip), %xmm0            ## xmm0 = [-0.0E+0,-0.0E+0]
	xorpd	%xmm0, %xmm6
	xorpd	%xmm0, %xmm3
	divpd	%xmm2, %xmm3
	divpd	%xmm7, %xmm6
	subpd	%xmm5, %xmm6
	movapd	%xmm6, -112(%rbp)               ## 16-byte Spill
	subpd	%xmm4, %xmm3
	movapd	%xmm3, -128(%rbp)               ## 16-byte Spill
	movsd	320(%rax), %xmm0                ## xmm0 = mem[0],zero
	unpcklpd	%xmm0, %xmm0                    ## xmm0 = xmm0[0,0]
	mulpd	%xmm0, %xmm7
	movapd	%xmm7, -96(%rbp)                ## 16-byte Spill
	mulpd	%xmm2, %xmm0
	movapd	%xmm0, -48(%rbp)                ## 16-byte Spill
	callq	_exp
	movapd	%xmm0, -64(%rbp)                ## 16-byte Spill
	movaps	-48(%rbp), %xmm0                ## 16-byte Reload
	movhlps	%xmm0, %xmm0                    ## xmm0 = xmm0[1,1]
	callq	_exp
	movaps	-64(%rbp), %xmm1                ## 16-byte Reload
	movlhps	%xmm0, %xmm1                    ## xmm1 = xmm1[0],xmm0[0]
	movaps	%xmm1, -64(%rbp)                ## 16-byte Spill
	movaps	-96(%rbp), %xmm0                ## 16-byte Reload
	callq	_exp
	movaps	%xmm0, -48(%rbp)                ## 16-byte Spill
	movapd	-96(%rbp), %xmm0                ## 16-byte Reload
	unpckhpd	%xmm0, %xmm0                    ## xmm0 = xmm0[1,1]
	callq	_exp
	movapd	-48(%rbp), %xmm3                ## 16-byte Reload
	unpcklpd	%xmm0, %xmm3                    ## xmm3 = xmm3[0],xmm0[0]
	movapd	LCPI0_3(%rip), %xmm1            ## xmm1 = [1.0E+0,1.0E+0]
	movapd	%xmm1, %xmm0
	subpd	%xmm3, %xmm0
	mulpd	-112(%rbp), %xmm0               ## 16-byte Folded Reload
	subpd	-64(%rbp), %xmm1                ## 16-byte Folded Reload
	mulpd	-128(%rbp), %xmm1               ## 16-byte Folded Reload
	addpd	-144(%rbp), %xmm0               ## 16-byte Folded Reload
	addpd	-160(%rbp), %xmm1               ## 16-byte Folded Reload
	movapd	%xmm1, (%r14,%rbx)
	movapd	%xmm0, 16(%r14,%rbx)
	movq	-72(%rbp), %rax
	movslq	-20(%rbp), %rbx
	movq	120(%rax), %r14
	shlq	$5, %rbx
	movapd	16(%r14,%rbx), %xmm3
	movapd	%xmm3, -160(%rbp)               ## 16-byte Spill
	movapd	(%r14,%rbx), %xmm4
	movapd	%xmm4, -144(%rbp)               ## 16-byte Spill
	movq	72(%rax), %rcx
	movq	96(%rax), %rdx
	movapd	16(%rdx,%rbx), %xmm0
	movapd	(%rdx,%rbx), %xmm1
	movapd	LCPI0_1(%rip), %xmm2            ## xmm2 = [-1.0E+0,-1.0E+0]
	movapd	%xmm2, %xmm5
	divpd	%xmm1, %xmm5
	movapd	%xmm5, %xmm7
	divpd	%xmm0, %xmm2
	movapd	(%rcx,%rbx), %xmm6
	movapd	16(%rcx,%rbx), %xmm5
	divpd	%xmm0, %xmm5
	divpd	%xmm1, %xmm6
	movapd	LCPI0_2(%rip), %xmm0            ## xmm0 = [-0.0E+0,-0.0E+0]
	xorpd	%xmm0, %xmm6
	xorpd	%xmm0, %xmm5
	divpd	%xmm2, %xmm5
	divpd	%xmm7, %xmm6
	subpd	%xmm4, %xmm6
	movapd	%xmm6, -112(%rbp)               ## 16-byte Spill
	subpd	%xmm3, %xmm5
	movapd	%xmm5, -128(%rbp)               ## 16-byte Spill
	movsd	320(%rax), %xmm0                ## xmm0 = mem[0],zero
	unpcklpd	%xmm0, %xmm0                    ## xmm0 = xmm0[0,0]
	mulpd	%xmm0, %xmm7
	movapd	%xmm7, -96(%rbp)                ## 16-byte Spill
	mulpd	%xmm2, %xmm0
	movapd	%xmm0, -48(%rbp)                ## 16-byte Spill
	callq	_exp
	movapd	%xmm0, -64(%rbp)                ## 16-byte Spill
	movaps	-48(%rbp), %xmm0                ## 16-byte Reload
	movhlps	%xmm0, %xmm0                    ## xmm0 = xmm0[1,1]
	callq	_exp
	movaps	-64(%rbp), %xmm1                ## 16-byte Reload
	movlhps	%xmm0, %xmm1                    ## xmm1 = xmm1[0],xmm0[0]
	movaps	%xmm1, -64(%rbp)                ## 16-byte Spill
	movaps	-96(%rbp), %xmm0                ## 16-byte Reload
	callq	_exp
	movaps	%xmm0, -48(%rbp)                ## 16-byte Spill
	movapd	-96(%rbp), %xmm0                ## 16-byte Reload
	unpckhpd	%xmm0, %xmm0                    ## xmm0 = xmm0[1,1]
	callq	_exp
	movapd	-48(%rbp), %xmm2                ## 16-byte Reload
	unpcklpd	%xmm0, %xmm2                    ## xmm2 = xmm2[0],xmm0[0]
	movapd	LCPI0_3(%rip), %xmm1            ## xmm1 = [1.0E+0,1.0E+0]
	movapd	%xmm1, %xmm0
	subpd	%xmm2, %xmm0
	mulpd	-112(%rbp), %xmm0               ## 16-byte Folded Reload
	subpd	-64(%rbp), %xmm1                ## 16-byte Folded Reload
	mulpd	-128(%rbp), %xmm1               ## 16-byte Folded Reload
	addpd	-144(%rbp), %xmm0               ## 16-byte Folded Reload
	addpd	-160(%rbp), %xmm1               ## 16-byte Folded Reload
	movapd	%xmm1, 16(%r14,%rbx)
	movapd	%xmm0, (%r14,%rbx)
	addl	$4, -20(%rbp)
	movdqa	-176(%rbp), %xmm0
	paddd	LCPI0_4(%rip), %xmm0
	jmp	LBB0_1
LBB0_3:                                 ## %for.exit
	leaq	-16(%rbp), %rsp
	popq	%rbx
	popq	%r14
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
.subsections_via_symbols

pramodk · 2021-03-07T14:08:55Z

without looking into details, this is looking reasonable - if I use -mcpu=broadwell or -mcpu=skylake-avx512 then llc generates either AVX2 registers or AVX-512 registers. I used:

./bin/nmodl hh.mod llvm --ir --vector-width 8

# and then

llc -o - -O3  -march=x86-64 -mcpu=core2 hh.ll 
llc -o - -O3  -march=x86-64 -mcpu=broadwell hh.ll
llc -o - -O3  -march=x86-64 -mcpu=skylake-avx512 hh.ll

src/codegen/llvm/codegen_llvm_helper_visitor.cpp

- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.

- remove undefined visit_codegen_instance_var - Improved member creation for instance struct - Instance struct type generation for kernel arguments - Proper integration of instance struct - Added scalar code generation for the kernel - Removed instance test since it is not created explicitly anymore - Fixed ordering for precision and width in LLVM Visitor - Added vector induction variable - Vectorised code for compute with direct loads fully functional - Instance naming fixed - (LLVM IR) Fixed compute vector code generation types - refactoring : improve coversion of double to int for the loop expressions

pramodk · 2021-03-07T20:07:23Z

I have squashed this PR into two separate commits that shows 1) changes in LLVM helper visitor 2) changes in LLVM IR generation. I will rebase & merge both commits to keep the history.

How to test this PR / commit?

Take a sample mod file hh.mod:

NEURON {
    SUFFIX hh
    NONSPECIFIC_CURRENT il
    RANGE minf, mtau, gl, el
}

STATE {
    m
}

ASSIGNED {
    v (mV)
    minf
    mtau (ms)
}

BREAKPOINT {
    SOLVE states METHOD cnexp
    il = gl*(v - el)
}

DERIVATIVE states {
     m =  (minf-m)/mtau
     : m' =  (minf-m)/mtau       : you can uncomment this like if need ode solution involving exp
}

Run serial code generation:

./bin/nmodl hh.mod llvm --ir  > tmp.ll
sed -n '/ModuleID/,$p' tmp.ll > hh.ll       # only extract LLVM IR module from stdout

And now convert to assembly using LLC:

→ llc -o - -O3  -march=x86-64 -mcpu=skylake-avx512 hh.ll
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15
	.globl	_nrn_state_hh                   ## -- Begin function nrn_state_hh
	.p2align	4, 0x90
_nrn_state_hh:                          ## @nrn_state_hh
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$16, %rsp
	movq	%rdi, -16(%rbp)
	movl	$0, -4(%rbp)
	movq	-16(%rbp), %rax
	movl	-4(%rbp), %ecx
	cmpl	92(%rax), %ecx
	jge	LBB0_3
	.p2align	4, 0x90
LBB0_2:                                 ## %for.body
                                        ## =>This Inner Loop Header: Depth=1
	movq	%rsp, %rax
	leaq	-16(%rax), %rsp
	movq	%rsp, %rcx
	leaq	-16(%rcx), %rsp
	movq	-16(%rbp), %rdx
	movslq	-4(%rbp), %rsi
	movq	56(%rdx), %rdx
	movslq	(%rdx,%rsi,4), %rdx
	movl	%edx, -16(%rax)
	movq	-16(%rbp), %rax
	movq	48(%rax), %rax
	vmovsd	(%rax,%rdx,8), %xmm0            ## xmm0 = mem[0],zero
	vmovsd	%xmm0, -16(%rcx)
	movq	-16(%rbp), %rax
	movslq	-4(%rbp), %rcx
	movq	(%rax), %rdx
	movq	8(%rax), %rsi
	movq	16(%rax), %rax
	vmovsd	(%rdx,%rcx,8), %xmm0            ## xmm0 = mem[0],zero
	vsubsd	(%rax,%rcx,8), %xmm0, %xmm0
	vdivsd	(%rsi,%rcx,8), %xmm0, %xmm0
	vmovsd	%xmm0, (%rax,%rcx,8)
	incl	-4(%rbp)
	movq	-16(%rbp), %rax
	movl	-4(%rbp), %ecx
	cmpl	92(%rax), %ecx
	jl	LBB0_2
LBB0_3:                                 ## %for.exit
	movq	%rbp, %rsp
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
.subsections_via_symbols

Testing Vector / SIMD IR

Current PR supports vector IR generation but doesn't have support for scatter/gather instructions (yet). So we have to apply following patch for now:

→ git diff
diff --git a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
index c34ae2c..d264832 100644
--- a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
+++ b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
@@ -522,9 +522,9 @@ void CodegenLLVMHelperVisitor::visit_nrn_state_block(ast::NrnStateBlock& node) {
         std::vector<std::string> double_variables{"v"};

         /// access node index and corresponding voltage
-        loop_index_statements.push_back(
-            visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
-        loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));
+        //loop_index_statements.push_back(
+        //    visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
+        //loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));

         /// read ion variables
         ion_read_statements(BlockType::State,

Now we can generate vector IR as:

make -j && ./bin/nmodl hh.mod llvm --ir --vector-width 8 > tmp.ll
sed -n '/ModuleID/,$p' tmp.ll > hh.ll

Now hh.ll will have vector IR:

→ cat hh.ll
; ModuleID = 'hh'
source_filename = "hh"

%hh__instance_var__type = type { double*, double*, double*, double*, double*, double*, double*, i32*, double, double, double, i32, i32 }

.....

for.body:                                         ; preds = %for.cond
  %node_id = alloca i32, align 4
  %v = alloca double, align 8
  %6 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %7 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %6, i32 0, i32 1
  %8 = load i32, i32* %id, align 4
  %9 = sext i32 %8 to i64
  %10 = load double*, double** %7, align 8
  %11 = bitcast double* %10 to <8 x double>*
  %12 = getelementptr inbounds <8 x double>, <8 x double>* %11, i64 %9
  %13 = load <8 x double>, <8 x double>* %12, align 64
  %14 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %15 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %14, i32 0, i32 2
  %16 = load i32, i32* %id, align 4
  %17 = sext i32 %16 to i64
  %18 = load double*, double** %15, align 8
  %19 = bitcast double* %18 to <8 x double>*
  %20 = getelementptr inbounds <8 x double>, <8 x double>* %19, i64 %17
  %21 = load <8 x double>, <8 x double>* %20, align 64
  %22 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %23 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %22, i32 0, i32 0
  %24 = load i32, i32* %id, align 4
  %25 = sext i32 %24 to i64
  %26 = load double*, double** %23, align 8
  %27 = bitcast double* %26 to <8 x double>*
  %28 = getelementptr inbounds <8 x double>, <8 x double>* %27, i64 %25
  %29 = load <8 x double>, <8 x double>* %28, align 64
  %30 = fsub <8 x double> %29, %21
  %31 = fdiv <8 x double> %30, %13
  %32 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
  %33 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %32, i32 0, i32 2
  %34 = load i32, i32* %id, align 4
  %35 = sext i32 %34 to i64
  %36 = load double*, double** %33, align 8
  %37 = bitcast double* %36 to <8 x double>*
  %38 = getelementptr inbounds <8 x double>, <8 x double>* %37, i64 %35
  store <8 x double> %31, <8 x double>* %38, align 64
  br label %for.inc

And converting to assembly will give:

→ llc -o - -O3  -march=x86-64 -mcpu=skylake-avx512 hh.ll
....
_nrn_state_hh:                          ## @nrn_state_hh
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	pushq	%rbx
	andq	$-32, %rsp
	subq	$96, %rsp
	movq	%rsp, %rbx
	.cfi_offset %rbx, -24
	movq	%rdi, 24(%rbx)
	movl	$0, 20(%rbx)
	vmovaps	LCPI0_0(%rip), %ymm0            ## ymm0 = [0,1,2,3,4,5,6,7]
	vmovaps	%ymm0, 32(%rbx)
	vpbroadcastd	LCPI0_1(%rip), %ymm0    ## ymm0 = [8,8,8,8,8,8,8,8]
	movq	24(%rbx), %rax
	movl	20(%rbx), %ecx
	cmpl	92(%rax), %ecx
	jge	LBB0_3
	.p2align	4, 0x90
LBB0_2:                                 ## %for.body
                                        ## =>This Inner Loop Header: Depth=1
	movq	%rsp, %rax
	addq	$-16, %rax
	movq	%rax, %rsp
	movq	%rsp, %rax
	addq	$-16, %rax
	movq	%rax, %rsp
	movq	24(%rbx), %rax
	movslq	20(%rbx), %rcx
	movq	(%rax), %rdx
	movq	8(%rax), %rsi
	shlq	$6, %rcx
	movq	16(%rax), %rax
	vmovapd	(%rdx,%rcx), %zmm1
	vsubpd	(%rax,%rcx), %zmm1, %zmm1
	vdivpd	(%rsi,%rcx), %zmm1, %zmm1
	vmovapd	%zmm1, (%rax,%rcx)
	addl	$8, 20(%rbx)
	vpaddd	32(%rbx), %ymm0, %ymm1
	vmovdqa	%ymm1, 32(%rbx)
	movq	24(%rbx), %rax
	movl	20(%rbx), %ecx
	cmpl	92(%rax), %ecx
	jl	LBB0_2
LBB0_3:                                 ## %for.exit
	leaq	-8(%rbp), %rsp
	popq	%rbx
	popq	%rbp
	vzeroupper
	retq
	.cfi_endproc
                                        ## -- End function
.subsections_via_symbols
....

cc: @iomaganaris @castigli

- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.

pramodk · 2021-03-07T20:10:58Z

@georgemitenkov : I have restored pramodk/llvm-instance-type but it's same as llvm branch now.

- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.

pramodk mentioned this pull request Mar 1, 2021

Create epilog/remainder loop for the FOR node in the AST #532

Closed

georgemitenkov mentioned this pull request Mar 2, 2021

Addressing TODOs for Instance struct #533

Merged

5 tasks

pramodk force-pushed the pramodk/llvm-instance-type branch from 8b1d186 to b9b2061 Compare March 5, 2021 18:58

pramodk marked this pull request as ready for review March 6, 2021 22:50

pramodk changed the title ~~(WIP) Generate compute kernels with correct instance variable~~ Generate compute kernels with correct instance variable Mar 6, 2021

pramodk mentioned this pull request Mar 7, 2021

Simple assignment expression with integer gets converted to double type #542

Open

pramodk mentioned this pull request Mar 7, 2021

[LLVM] How to test test LLVM IR for different backends? #543

Open

georgemitenkov reviewed Mar 7, 2021

View reviewed changes

src/codegen/llvm/codegen_llvm_helper_visitor.cpp Outdated Show resolved Hide resolved

pramodk force-pushed the pramodk/llvm-instance-type branch 2 times, most recently from c7c83fd to cc44ae5 Compare March 7, 2021 18:50

pramodk and others added 2 commits March 7, 2021 19:52

pramodk force-pushed the pramodk/llvm-instance-type branch from cc44ae5 to 770ac80 Compare March 7, 2021 18:53

pramodk merged commit 0740b98 into llvm Mar 7, 2021

pramodk deleted the pramodk/llvm-instance-type branch March 7, 2021 20:09

pramodk restored the pramodk/llvm-instance-type branch March 7, 2021 20:10

1uc deleted the pramodk/llvm-instance-type branch July 12, 2024 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate compute kernels with correct instance variable #531

Generate compute kernels with correct instance variable #531

pramodk commented Feb 28, 2021 •

edited

Loading

georgemitenkov commented Mar 1, 2021

pramodk commented Mar 7, 2021 •

edited

Loading

georgemitenkov commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021 •

edited

Loading

pramodk commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021 •

edited

Loading

pramodk commented Mar 7, 2021

pramodk commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021 •

edited

Loading

pramodk commented Mar 7, 2021 •

edited

Loading

pramodk commented Mar 7, 2021 •

edited

Loading

pramodk commented Mar 7, 2021

Generate compute kernels with correct instance variable #531

Generate compute kernels with correct instance variable #531

Conversation

pramodk commented Feb 28, 2021 • edited Loading

georgemitenkov commented Mar 1, 2021

pramodk commented Mar 7, 2021 • edited Loading

georgemitenkov commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021 • edited Loading

pramodk commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021 • edited Loading

pramodk commented Mar 7, 2021

pramodk commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021

georgemitenkov commented Mar 7, 2021 • edited Loading

pramodk commented Mar 7, 2021 • edited Loading

pramodk commented Mar 7, 2021 • edited Loading

How to test this PR / commit?

Testing Vector / SIMD IR

pramodk commented Mar 7, 2021

pramodk commented Feb 28, 2021 •

edited

Loading

pramodk commented Mar 7, 2021 •

edited

Loading

georgemitenkov commented Mar 7, 2021 •

edited

Loading

georgemitenkov commented Mar 7, 2021 •

edited

Loading

georgemitenkov commented Mar 7, 2021 •

edited

Loading

pramodk commented Mar 7, 2021 •

edited

Loading

pramodk commented Mar 7, 2021 •

edited

Loading