-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate compute kernels with correct instance variable #531
Conversation
Thanks for doing this, I will address TODOs later today! The PR looks good. Current error is due to the i32* and f32/64* types that did not exist before. I will fix that. |
8b1d186
to
b9b2061
Compare
@georgemitenkov : Quickly checking simple mod file: NEURON {
SUFFIX hh
USEION na READ ena WRITE ina
USEION k READ ek WRITE ik
RANGE minf, mtau
}
STATE {
m
}
ASSIGNED {
v (mV)
ena (mV)
ek (mV)
ina (mA/cm2)
ik (mA/cm2)
il (mA/cm2)
minf
mtau (ms)
}
BREAKPOINT {
SOLVE states METHOD cnexp
}
DERIVATIVE states {
m' = (minf-m)/mtau
} Generating serial IR and compiling with
Storing generated output in
the type needs to be changed?
|
@pramodk Yes, this is not fully supported yet. Loading The first allocas need to be vector, but that's a one-liner :) I will add it together with more advanced (indirect) vectorisation support (working on that now). |
Basically, we have this: VOID nrn_state_hh(INSTANCE_STRUCT *mech){
INTEGER id
for(id = 0; id<mech->node_count; id = id+8) {
// This section is needed for indirect indexing, so also NOT supported
INTEGER node_id, ena_id, ek_id
DOUBLE v
node_id = mech->node_index[id]
ena_id = mech->ion_ena_index[id]
ek_id = mech->ion_ek_index[id]
// Indirect indexing, NOT supported
v = mech->voltage[node_id]
mech->ena[id] = mech->ion_ena[ena_id]
mech->ek[id] = mech->ion_ek[ek_id]
// Direct indexing, supported
mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
mech->h[id] = mech->h[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->htau[id])))*(-(((mech->hinf[id]))/mech->htau[id])/((((-1.0)))/mech->htau[id])-mech->h[id])
mech->n[id] = mech->n[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->ntau[id])))*(-(((mech->ninf[id]))/mech->ntau[id])/((((-1.0)))/mech->ntau[id])-mech->n[id])
}
} And I am working on the first 2 unsupported cases 👍 |
Ah yes! I remember now 👍 Do you plan to fix this in PR or separate PR? If you think we should merge working serial version in this PR then thats fine ! And just for you to know in case it's useful : I was quickly testing with avoiding indirect indexing i.e. with following diff: → git diff
diff --git a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
index cb92ade..ffffe8b 100644
--- a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
+++ b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
@@ -240,6 +240,7 @@ void CodegenLLVMHelperVisitor::ion_read_statements(BlockType type,
std::vector<std::string>& double_variables,
ast::StatementVector& index_statements,
ast::StatementVector& body_statements) {
+ return;
/// create read ion and corresponding index statements
auto create_read_statements = [&](std::pair<std::string, std::string> variable_names) {
// variable in current mechanism instance
@@ -305,6 +306,7 @@ void CodegenLLVMHelperVisitor::ion_write_statements(BlockType type,
std::vector<std::string>& double_variables,
ast::StatementVector& index_statements,
ast::StatementVector& body_statements) {
+ return;
/// create write ion and corresponding index statements
auto create_write_statements = [&](std::string ion_varname, std::string op, std::string rhs) {
// index for writing ion variable
@@ -505,9 +507,9 @@ void CodegenLLVMHelperVisitor::visit_nrn_state_block(ast::NrnStateBlock& node) {
std::vector<std::string> double_variables{"v"};
/// access node index and corresponding voltage
- loop_index_statements.push_back(
- visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
- loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));
+ //loop_index_statements.push_back(
+ // visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
+ //loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));
/// read ion variables
ion_read_statements(BlockType::State, and then generated kernel is: VOID nrn_state_hh(INSTANCE_STRUCT *mech){
INTEGER id
for(id = 0; id<mech->node_count; id = id+8) {
INTEGER node_id
DOUBLE v
mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
}
} and this gives:
|
I think merging serial should be ok? just keep this branch for vector code.
Hm, that is weird , since all types actually match. I will look at this. |
because |
Just to be sure : if I merge this branch, everything will be in |
Yes, just fixed that! The problem was that |
I meant to merge this branch but do not delete? But I think it's fine to create a new one. It might be even easier so that the work is split into smaller chunks/PRs I will push a fix, and then it is ready for merging! |
Btw, here is the vectorised code in NMODL, LLVM and assembly obtained with: ./bin/nmodl hh.mod llvm --ir --vector-width 4
llc <llvm_file> -o - NMODL kernel: VOID nrn_state_hh(INSTANCE_STRUCT *mech){
INTEGER id
for(id = 0; id<mech->node_count; id = id+4) {
INTEGER node_id
DOUBLE v
mech->m[id] = mech->m[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->mtau[id])))*(-(((mech->minf[id]))/mech->mtau[id])/((((-1.0)))/mech->mtau[id])-mech->m[id])
mech->h[id] = mech->h[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->htau[id])))*(-(((mech->hinf[id]))/mech->htau[id])/((((-1.0)))/mech->htau[id])-mech->h[id])
mech->n[id] = mech->n[id]+(1.0-exp(mech->dt*((((-1.0)))/mech->ntau[id])))*(-(((mech->ninf[id]))/mech->ntau[id])/((((-1.0)))/mech->ntau[id])-mech->n[id])
}
} Generated LLVM (pretty long) ; ModuleID = 'hh'
source_filename = "hh"
%hh__instance_var__type = type { double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, i32*, i32*, i32*, i32*, i32*, i32*, double*, i32*, double, double, double, i32, i32 }
define void @nrn_state_hh(%hh__instance_var__type* %mech1) {
%mech = alloca %hh__instance_var__type*, align 8
store %hh__instance_var__type* %mech1, %hh__instance_var__type** %mech, align 8
%id = alloca i32, align 4
store i32 0, i32* %id, align 4
%__vec_id = alloca <4 x i32>, align 16
store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %__vec_id, align 16
br label %for.cond
for.cond: ; preds = %for.inc, %0
%1 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%2 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %1, i32 0, i32 43
%3 = load i32, i32* %2, align 4
%4 = load i32, i32* %id, align 4
%5 = icmp slt i32 %4, %3
br i1 %5, label %for.body, label %for.exit
for.body: ; preds = %for.cond
%node_id = alloca i32, align 4
%v = alloca double, align 8
%6 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%7 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %6, i32 0, i32 13
%8 = load i32, i32* %id, align 4
%9 = sext i32 %8 to i64
%10 = load double*, double** %7, align 8
%11 = bitcast double* %10 to <4 x double>*
%12 = getelementptr inbounds <4 x double>, <4 x double>* %11, i64 %9
%13 = load <4 x double>, <4 x double>* %12, align 32
%14 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%15 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %14, i32 0, i32 10
%16 = load i32, i32* %id, align 4
%17 = sext i32 %16 to i64
%18 = load double*, double** %15, align 8
%19 = bitcast double* %18 to <4 x double>*
%20 = getelementptr inbounds <4 x double>, <4 x double>* %19, i64 %17
%21 = load <4 x double>, <4 x double>* %20, align 32
%22 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %21
%23 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%24 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %23, i32 0, i32 10
%25 = load i32, i32* %id, align 4
%26 = sext i32 %25 to i64
%27 = load double*, double** %24, align 8
%28 = bitcast double* %27 to <4 x double>*
%29 = getelementptr inbounds <4 x double>, <4 x double>* %28, i64 %26
%30 = load <4 x double>, <4 x double>* %29, align 32
%31 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%32 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %31, i32 0, i32 7
%33 = load i32, i32* %id, align 4
%34 = sext i32 %33 to i64
%35 = load double*, double** %32, align 8
%36 = bitcast double* %35 to <4 x double>*
%37 = getelementptr inbounds <4 x double>, <4 x double>* %36, i64 %34
%38 = load <4 x double>, <4 x double>* %37, align 32
%39 = fdiv <4 x double> %38, %30
%40 = fneg <4 x double> %39
%41 = fdiv <4 x double> %40, %22
%42 = fsub <4 x double> %41, %13
%43 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%44 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %43, i32 0, i32 10
%45 = load i32, i32* %id, align 4
%46 = sext i32 %45 to i64
%47 = load double*, double** %44, align 8
%48 = bitcast double* %47 to <4 x double>*
%49 = getelementptr inbounds <4 x double>, <4 x double>* %48, i64 %46
%50 = load <4 x double>, <4 x double>* %49, align 32
%51 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %50
%52 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%53 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %52, i32 0, i32 40
%54 = load double, double* %53, align 8
%.splatinsert = insertelement <4 x double> undef, double %54, i32 0
%.splat = shufflevector <4 x double> %.splatinsert, <4 x double> undef, <4 x i32> zeroinitializer
%55 = fmul <4 x double> %.splat, %51
%56 = call <4 x double> @llvm.exp.v4f64(<4 x double> %55)
%57 = fsub <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %56
%58 = fmul <4 x double> %57, %42
%59 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%60 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %59, i32 0, i32 13
%61 = load i32, i32* %id, align 4
%62 = sext i32 %61 to i64
%63 = load double*, double** %60, align 8
%64 = bitcast double* %63 to <4 x double>*
%65 = getelementptr inbounds <4 x double>, <4 x double>* %64, i64 %62
%66 = load <4 x double>, <4 x double>* %65, align 32
%67 = fadd <4 x double> %66, %58
%68 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%69 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %68, i32 0, i32 13
%70 = load i32, i32* %id, align 4
%71 = sext i32 %70 to i64
%72 = load double*, double** %69, align 8
%73 = bitcast double* %72 to <4 x double>*
%74 = getelementptr inbounds <4 x double>, <4 x double>* %73, i64 %71
store <4 x double> %67, <4 x double>* %74, align 32
%75 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%76 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %75, i32 0, i32 14
%77 = load i32, i32* %id, align 4
%78 = sext i32 %77 to i64
%79 = load double*, double** %76, align 8
%80 = bitcast double* %79 to <4 x double>*
%81 = getelementptr inbounds <4 x double>, <4 x double>* %80, i64 %78
%82 = load <4 x double>, <4 x double>* %81, align 32
%83 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%84 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %83, i32 0, i32 11
%85 = load i32, i32* %id, align 4
%86 = sext i32 %85 to i64
%87 = load double*, double** %84, align 8
%88 = bitcast double* %87 to <4 x double>*
%89 = getelementptr inbounds <4 x double>, <4 x double>* %88, i64 %86
%90 = load <4 x double>, <4 x double>* %89, align 32
%91 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %90
%92 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%93 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %92, i32 0, i32 11
%94 = load i32, i32* %id, align 4
%95 = sext i32 %94 to i64
%96 = load double*, double** %93, align 8
%97 = bitcast double* %96 to <4 x double>*
%98 = getelementptr inbounds <4 x double>, <4 x double>* %97, i64 %95
%99 = load <4 x double>, <4 x double>* %98, align 32
%100 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%101 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %100, i32 0, i32 8
%102 = load i32, i32* %id, align 4
%103 = sext i32 %102 to i64
%104 = load double*, double** %101, align 8
%105 = bitcast double* %104 to <4 x double>*
%106 = getelementptr inbounds <4 x double>, <4 x double>* %105, i64 %103
%107 = load <4 x double>, <4 x double>* %106, align 32
%108 = fdiv <4 x double> %107, %99
%109 = fneg <4 x double> %108
%110 = fdiv <4 x double> %109, %91
%111 = fsub <4 x double> %110, %82
%112 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%113 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %112, i32 0, i32 11
%114 = load i32, i32* %id, align 4
%115 = sext i32 %114 to i64
%116 = load double*, double** %113, align 8
%117 = bitcast double* %116 to <4 x double>*
%118 = getelementptr inbounds <4 x double>, <4 x double>* %117, i64 %115
%119 = load <4 x double>, <4 x double>* %118, align 32
%120 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %119
%121 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%122 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %121, i32 0, i32 40
%123 = load double, double* %122, align 8
%.splatinsert2 = insertelement <4 x double> undef, double %123, i32 0
%.splat3 = shufflevector <4 x double> %.splatinsert2, <4 x double> undef, <4 x i32> zeroinitializer
%124 = fmul <4 x double> %.splat3, %120
%125 = call <4 x double> @llvm.exp.v4f64(<4 x double> %124)
%126 = fsub <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %125
%127 = fmul <4 x double> %126, %111
%128 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%129 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %128, i32 0, i32 14
%130 = load i32, i32* %id, align 4
%131 = sext i32 %130 to i64
%132 = load double*, double** %129, align 8
%133 = bitcast double* %132 to <4 x double>*
%134 = getelementptr inbounds <4 x double>, <4 x double>* %133, i64 %131
%135 = load <4 x double>, <4 x double>* %134, align 32
%136 = fadd <4 x double> %135, %127
%137 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%138 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %137, i32 0, i32 14
%139 = load i32, i32* %id, align 4
%140 = sext i32 %139 to i64
%141 = load double*, double** %138, align 8
%142 = bitcast double* %141 to <4 x double>*
%143 = getelementptr inbounds <4 x double>, <4 x double>* %142, i64 %140
store <4 x double> %136, <4 x double>* %143, align 32
%144 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%145 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %144, i32 0, i32 15
%146 = load i32, i32* %id, align 4
%147 = sext i32 %146 to i64
%148 = load double*, double** %145, align 8
%149 = bitcast double* %148 to <4 x double>*
%150 = getelementptr inbounds <4 x double>, <4 x double>* %149, i64 %147
%151 = load <4 x double>, <4 x double>* %150, align 32
%152 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%153 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %152, i32 0, i32 12
%154 = load i32, i32* %id, align 4
%155 = sext i32 %154 to i64
%156 = load double*, double** %153, align 8
%157 = bitcast double* %156 to <4 x double>*
%158 = getelementptr inbounds <4 x double>, <4 x double>* %157, i64 %155
%159 = load <4 x double>, <4 x double>* %158, align 32
%160 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %159
%161 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%162 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %161, i32 0, i32 12
%163 = load i32, i32* %id, align 4
%164 = sext i32 %163 to i64
%165 = load double*, double** %162, align 8
%166 = bitcast double* %165 to <4 x double>*
%167 = getelementptr inbounds <4 x double>, <4 x double>* %166, i64 %164
%168 = load <4 x double>, <4 x double>* %167, align 32
%169 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%170 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %169, i32 0, i32 9
%171 = load i32, i32* %id, align 4
%172 = sext i32 %171 to i64
%173 = load double*, double** %170, align 8
%174 = bitcast double* %173 to <4 x double>*
%175 = getelementptr inbounds <4 x double>, <4 x double>* %174, i64 %172
%176 = load <4 x double>, <4 x double>* %175, align 32
%177 = fdiv <4 x double> %176, %168
%178 = fneg <4 x double> %177
%179 = fdiv <4 x double> %178, %160
%180 = fsub <4 x double> %179, %151
%181 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%182 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %181, i32 0, i32 12
%183 = load i32, i32* %id, align 4
%184 = sext i32 %183 to i64
%185 = load double*, double** %182, align 8
%186 = bitcast double* %185 to <4 x double>*
%187 = getelementptr inbounds <4 x double>, <4 x double>* %186, i64 %184
%188 = load <4 x double>, <4 x double>* %187, align 32
%189 = fdiv <4 x double> <double -1.000000e+00, double -1.000000e+00, double -1.000000e+00, double -1.000000e+00>, %188
%190 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%191 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %190, i32 0, i32 40
%192 = load double, double* %191, align 8
%.splatinsert4 = insertelement <4 x double> undef, double %192, i32 0
%.splat5 = shufflevector <4 x double> %.splatinsert4, <4 x double> undef, <4 x i32> zeroinitializer
%193 = fmul <4 x double> %.splat5, %189
%194 = call <4 x double> @llvm.exp.v4f64(<4 x double> %193)
%195 = fsub <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %194
%196 = fmul <4 x double> %195, %180
%197 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%198 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %197, i32 0, i32 15
%199 = load i32, i32* %id, align 4
%200 = sext i32 %199 to i64
%201 = load double*, double** %198, align 8
%202 = bitcast double* %201 to <4 x double>*
%203 = getelementptr inbounds <4 x double>, <4 x double>* %202, i64 %200
%204 = load <4 x double>, <4 x double>* %203, align 32
%205 = fadd <4 x double> %204, %196
%206 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%207 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %206, i32 0, i32 15
%208 = load i32, i32* %id, align 4
%209 = sext i32 %208 to i64
%210 = load double*, double** %207, align 8
%211 = bitcast double* %210 to <4 x double>*
%212 = getelementptr inbounds <4 x double>, <4 x double>* %211, i64 %209
store <4 x double> %205, <4 x double>* %212, align 32
br label %for.inc
for.inc: ; preds = %for.body
%213 = load i32, i32* %id, align 4
%214 = add i32 %213, 4
store i32 %214, i32* %id, align 4
%215 = load <4 x i32>, <4 x i32>* %__vec_id, align 16
%216 = add <4 x i32> %215, <i32 4, i32 4, i32 4, i32 4>
store <4 x i32> %216, <4 x i32>* %__vec_id, align 16
br label %for.cond
for.exit: ; preds = %for.cond
ret void
}
; Function Attrs: nounwind readnone speculatable willreturn
declare <4 x double> @llvm.exp.v4f64(<4 x double>) #0
attributes #0 = { nounwind readnone speculatable willreturn } And assembly: .section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 14
.section __TEXT,__literal16,16byte_literals
.p2align 4 ## -- Begin function nrn_state_hh
LCPI0_0:
.long 0 ## 0x0
.long 1 ## 0x1
.long 2 ## 0x2
.long 3 ## 0x3
LCPI0_1:
.quad 0xbff0000000000000 ## double -1
.quad 0xbff0000000000000 ## double -1
LCPI0_2:
.quad 0x8000000000000000 ## double -0
.quad 0x8000000000000000 ## double -0
LCPI0_3:
.quad 0x3ff0000000000000 ## double 1
.quad 0x3ff0000000000000 ## double 1
LCPI0_4:
.long 4 ## 0x4
.long 4 ## 0x4
.long 4 ## 0x4
.long 4 ## 0x4
.section __TEXT,__text,regular,pure_instructions
.globl _nrn_state_hh
.p2align 4, 0x90
_nrn_state_hh: ## @nrn_state_hh
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
pushq %r14
pushq %rbx
subq $160, %rsp
.cfi_offset %rbx, -32
.cfi_offset %r14, -24
movq %rdi, -72(%rbp)
movl $0, -20(%rbp)
movdqa LCPI0_0(%rip), %xmm0 ## xmm0 = [0,1,2,3]
.p2align 4, 0x90
LBB0_1: ## %for.cond
## =>This Inner Loop Header: Depth=1
movdqa %xmm0, -176(%rbp)
movq -72(%rbp), %rax
movl -20(%rbp), %ecx
cmpl 340(%rax), %ecx
jge LBB0_3
## %bb.2: ## %for.body
## in Loop: Header=BB0_1 Depth=1
movq %rsp, %rax
addq $-16, %rax
movq %rax, %rsp
movq %rsp, %rax
addq $-16, %rax
movq %rax, %rsp
movq -72(%rbp), %rax
movslq -20(%rbp), %rbx
movq 104(%rax), %r14
shlq $5, %rbx
movapd (%r14,%rbx), %xmm4
movapd %xmm4, -144(%rbp) ## 16-byte Spill
movapd 16(%r14,%rbx), %xmm3
movapd %xmm3, -160(%rbp) ## 16-byte Spill
movq 56(%rax), %rcx
movq 80(%rax), %rdx
movapd (%rdx,%rbx), %xmm0
movapd 16(%rdx,%rbx), %xmm1
movapd LCPI0_1(%rip), %xmm2 ## xmm2 = [-1.0E+0,-1.0E+0]
movapd %xmm2, %xmm5
divpd %xmm0, %xmm5
movapd %xmm5, %xmm7
divpd %xmm1, %xmm2
movapd (%rcx,%rbx), %xmm6
movapd 16(%rcx,%rbx), %xmm5
divpd %xmm1, %xmm5
divpd %xmm0, %xmm6
movapd LCPI0_2(%rip), %xmm0 ## xmm0 = [-0.0E+0,-0.0E+0]
xorpd %xmm0, %xmm6
xorpd %xmm0, %xmm5
divpd %xmm2, %xmm5
divpd %xmm7, %xmm6
subpd %xmm4, %xmm6
movapd %xmm6, -112(%rbp) ## 16-byte Spill
subpd %xmm3, %xmm5
movapd %xmm5, -128(%rbp) ## 16-byte Spill
movsd 320(%rax), %xmm0 ## xmm0 = mem[0],zero
unpcklpd %xmm0, %xmm0 ## xmm0 = xmm0[0,0]
mulpd %xmm0, %xmm7
movapd %xmm7, -96(%rbp) ## 16-byte Spill
mulpd %xmm2, %xmm0
movapd %xmm0, -48(%rbp) ## 16-byte Spill
callq _exp
movapd %xmm0, -64(%rbp) ## 16-byte Spill
movaps -48(%rbp), %xmm0 ## 16-byte Reload
movhlps %xmm0, %xmm0 ## xmm0 = xmm0[1,1]
callq _exp
movaps -64(%rbp), %xmm1 ## 16-byte Reload
movlhps %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0]
movaps %xmm1, -64(%rbp) ## 16-byte Spill
movaps -96(%rbp), %xmm0 ## 16-byte Reload
callq _exp
movaps %xmm0, -48(%rbp) ## 16-byte Spill
movapd -96(%rbp), %xmm0 ## 16-byte Reload
unpckhpd %xmm0, %xmm0 ## xmm0 = xmm0[1,1]
callq _exp
movapd -48(%rbp), %xmm2 ## 16-byte Reload
unpcklpd %xmm0, %xmm2 ## xmm2 = xmm2[0],xmm0[0]
movapd LCPI0_3(%rip), %xmm1 ## xmm1 = [1.0E+0,1.0E+0]
movapd %xmm1, %xmm0
subpd %xmm2, %xmm0
mulpd -112(%rbp), %xmm0 ## 16-byte Folded Reload
subpd -64(%rbp), %xmm1 ## 16-byte Folded Reload
mulpd -128(%rbp), %xmm1 ## 16-byte Folded Reload
addpd -144(%rbp), %xmm0 ## 16-byte Folded Reload
addpd -160(%rbp), %xmm1 ## 16-byte Folded Reload
movapd %xmm1, 16(%r14,%rbx)
movapd %xmm0, (%r14,%rbx)
movq -72(%rbp), %rax
movslq -20(%rbp), %rbx
movq 112(%rax), %r14
shlq $5, %rbx
movapd (%r14,%rbx), %xmm4
movapd %xmm4, -160(%rbp) ## 16-byte Spill
movapd 16(%r14,%rbx), %xmm5
movapd %xmm5, -144(%rbp) ## 16-byte Spill
movq 64(%rax), %rcx
movq 88(%rax), %rdx
movapd (%rdx,%rbx), %xmm0
movapd 16(%rdx,%rbx), %xmm1
movapd LCPI0_1(%rip), %xmm2 ## xmm2 = [-1.0E+0,-1.0E+0]
movapd %xmm2, %xmm3
divpd %xmm1, %xmm3
movapd %xmm3, %xmm7
divpd %xmm0, %xmm2
movapd 16(%rcx,%rbx), %xmm6
movapd (%rcx,%rbx), %xmm3
divpd %xmm0, %xmm3
divpd %xmm1, %xmm6
movapd LCPI0_2(%rip), %xmm0 ## xmm0 = [-0.0E+0,-0.0E+0]
xorpd %xmm0, %xmm6
xorpd %xmm0, %xmm3
divpd %xmm2, %xmm3
divpd %xmm7, %xmm6
subpd %xmm5, %xmm6
movapd %xmm6, -112(%rbp) ## 16-byte Spill
subpd %xmm4, %xmm3
movapd %xmm3, -128(%rbp) ## 16-byte Spill
movsd 320(%rax), %xmm0 ## xmm0 = mem[0],zero
unpcklpd %xmm0, %xmm0 ## xmm0 = xmm0[0,0]
mulpd %xmm0, %xmm7
movapd %xmm7, -96(%rbp) ## 16-byte Spill
mulpd %xmm2, %xmm0
movapd %xmm0, -48(%rbp) ## 16-byte Spill
callq _exp
movapd %xmm0, -64(%rbp) ## 16-byte Spill
movaps -48(%rbp), %xmm0 ## 16-byte Reload
movhlps %xmm0, %xmm0 ## xmm0 = xmm0[1,1]
callq _exp
movaps -64(%rbp), %xmm1 ## 16-byte Reload
movlhps %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0]
movaps %xmm1, -64(%rbp) ## 16-byte Spill
movaps -96(%rbp), %xmm0 ## 16-byte Reload
callq _exp
movaps %xmm0, -48(%rbp) ## 16-byte Spill
movapd -96(%rbp), %xmm0 ## 16-byte Reload
unpckhpd %xmm0, %xmm0 ## xmm0 = xmm0[1,1]
callq _exp
movapd -48(%rbp), %xmm3 ## 16-byte Reload
unpcklpd %xmm0, %xmm3 ## xmm3 = xmm3[0],xmm0[0]
movapd LCPI0_3(%rip), %xmm1 ## xmm1 = [1.0E+0,1.0E+0]
movapd %xmm1, %xmm0
subpd %xmm3, %xmm0
mulpd -112(%rbp), %xmm0 ## 16-byte Folded Reload
subpd -64(%rbp), %xmm1 ## 16-byte Folded Reload
mulpd -128(%rbp), %xmm1 ## 16-byte Folded Reload
addpd -144(%rbp), %xmm0 ## 16-byte Folded Reload
addpd -160(%rbp), %xmm1 ## 16-byte Folded Reload
movapd %xmm1, (%r14,%rbx)
movapd %xmm0, 16(%r14,%rbx)
movq -72(%rbp), %rax
movslq -20(%rbp), %rbx
movq 120(%rax), %r14
shlq $5, %rbx
movapd 16(%r14,%rbx), %xmm3
movapd %xmm3, -160(%rbp) ## 16-byte Spill
movapd (%r14,%rbx), %xmm4
movapd %xmm4, -144(%rbp) ## 16-byte Spill
movq 72(%rax), %rcx
movq 96(%rax), %rdx
movapd 16(%rdx,%rbx), %xmm0
movapd (%rdx,%rbx), %xmm1
movapd LCPI0_1(%rip), %xmm2 ## xmm2 = [-1.0E+0,-1.0E+0]
movapd %xmm2, %xmm5
divpd %xmm1, %xmm5
movapd %xmm5, %xmm7
divpd %xmm0, %xmm2
movapd (%rcx,%rbx), %xmm6
movapd 16(%rcx,%rbx), %xmm5
divpd %xmm0, %xmm5
divpd %xmm1, %xmm6
movapd LCPI0_2(%rip), %xmm0 ## xmm0 = [-0.0E+0,-0.0E+0]
xorpd %xmm0, %xmm6
xorpd %xmm0, %xmm5
divpd %xmm2, %xmm5
divpd %xmm7, %xmm6
subpd %xmm4, %xmm6
movapd %xmm6, -112(%rbp) ## 16-byte Spill
subpd %xmm3, %xmm5
movapd %xmm5, -128(%rbp) ## 16-byte Spill
movsd 320(%rax), %xmm0 ## xmm0 = mem[0],zero
unpcklpd %xmm0, %xmm0 ## xmm0 = xmm0[0,0]
mulpd %xmm0, %xmm7
movapd %xmm7, -96(%rbp) ## 16-byte Spill
mulpd %xmm2, %xmm0
movapd %xmm0, -48(%rbp) ## 16-byte Spill
callq _exp
movapd %xmm0, -64(%rbp) ## 16-byte Spill
movaps -48(%rbp), %xmm0 ## 16-byte Reload
movhlps %xmm0, %xmm0 ## xmm0 = xmm0[1,1]
callq _exp
movaps -64(%rbp), %xmm1 ## 16-byte Reload
movlhps %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0]
movaps %xmm1, -64(%rbp) ## 16-byte Spill
movaps -96(%rbp), %xmm0 ## 16-byte Reload
callq _exp
movaps %xmm0, -48(%rbp) ## 16-byte Spill
movapd -96(%rbp), %xmm0 ## 16-byte Reload
unpckhpd %xmm0, %xmm0 ## xmm0 = xmm0[1,1]
callq _exp
movapd -48(%rbp), %xmm2 ## 16-byte Reload
unpcklpd %xmm0, %xmm2 ## xmm2 = xmm2[0],xmm0[0]
movapd LCPI0_3(%rip), %xmm1 ## xmm1 = [1.0E+0,1.0E+0]
movapd %xmm1, %xmm0
subpd %xmm2, %xmm0
mulpd -112(%rbp), %xmm0 ## 16-byte Folded Reload
subpd -64(%rbp), %xmm1 ## 16-byte Folded Reload
mulpd -128(%rbp), %xmm1 ## 16-byte Folded Reload
addpd -144(%rbp), %xmm0 ## 16-byte Folded Reload
addpd -160(%rbp), %xmm1 ## 16-byte Folded Reload
movapd %xmm1, 16(%r14,%rbx)
movapd %xmm0, (%r14,%rbx)
addl $4, -20(%rbp)
movdqa -176(%rbp), %xmm0
paddd LCPI0_4(%rip), %xmm0
jmp LBB0_1
LBB0_3: ## %for.exit
leaq -16(%rbp), %rsp
popq %rbx
popq %r14
popq %rbp
retq
.cfi_endproc
## -- End function
.subsections_via_symbols |
without looking into details, this is looking reasonable - if I use
|
c7c83fd
to
cc44ae5
Compare
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
- remove undefined visit_codegen_instance_var - Improved member creation for instance struct - Instance struct type generation for kernel arguments - Proper integration of instance struct - Added scalar code generation for the kernel - Removed instance test since it is not created explicitly anymore - Fixed ordering for precision and width in LLVM Visitor - Added vector induction variable - Vectorised code for compute with direct loads fully functional - Instance naming fixed - (LLVM IR) Fixed compute vector code generation types - refactoring : improve coversion of double to int for the loop expressions
cc44ae5
to
770ac80
Compare
I have squashed this PR into two separate commits that shows 1) changes in LLVM helper visitor 2) changes in LLVM IR generation. I will rebase & merge both commits to keep the history. How to test this PR / commit?Take a sample mod file NEURON {
SUFFIX hh
NONSPECIFIC_CURRENT il
RANGE minf, mtau, gl, el
}
STATE {
m
}
ASSIGNED {
v (mV)
minf
mtau (ms)
}
BREAKPOINT {
SOLVE states METHOD cnexp
il = gl*(v - el)
}
DERIVATIVE states {
m = (minf-m)/mtau
: m' = (minf-m)/mtau : you can uncomment this like if need ode solution involving exp
} Run serial code generation: ./bin/nmodl hh.mod llvm --ir > tmp.ll
sed -n '/ModuleID/,$p' tmp.ll > hh.ll # only extract LLVM IR module from stdout And now convert to assembly using LLC: → llc -o - -O3 -march=x86-64 -mcpu=skylake-avx512 hh.ll
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 15
.globl _nrn_state_hh ## -- Begin function nrn_state_hh
.p2align 4, 0x90
_nrn_state_hh: ## @nrn_state_hh
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movq %rdi, -16(%rbp)
movl $0, -4(%rbp)
movq -16(%rbp), %rax
movl -4(%rbp), %ecx
cmpl 92(%rax), %ecx
jge LBB0_3
.p2align 4, 0x90
LBB0_2: ## %for.body
## =>This Inner Loop Header: Depth=1
movq %rsp, %rax
leaq -16(%rax), %rsp
movq %rsp, %rcx
leaq -16(%rcx), %rsp
movq -16(%rbp), %rdx
movslq -4(%rbp), %rsi
movq 56(%rdx), %rdx
movslq (%rdx,%rsi,4), %rdx
movl %edx, -16(%rax)
movq -16(%rbp), %rax
movq 48(%rax), %rax
vmovsd (%rax,%rdx,8), %xmm0 ## xmm0 = mem[0],zero
vmovsd %xmm0, -16(%rcx)
movq -16(%rbp), %rax
movslq -4(%rbp), %rcx
movq (%rax), %rdx
movq 8(%rax), %rsi
movq 16(%rax), %rax
vmovsd (%rdx,%rcx,8), %xmm0 ## xmm0 = mem[0],zero
vsubsd (%rax,%rcx,8), %xmm0, %xmm0
vdivsd (%rsi,%rcx,8), %xmm0, %xmm0
vmovsd %xmm0, (%rax,%rcx,8)
incl -4(%rbp)
movq -16(%rbp), %rax
movl -4(%rbp), %ecx
cmpl 92(%rax), %ecx
jl LBB0_2
LBB0_3: ## %for.exit
movq %rbp, %rsp
popq %rbp
retq
.cfi_endproc
## -- End function
.subsections_via_symbols Testing Vector / SIMD IRCurrent PR supports vector IR generation but doesn't have support for scatter/gather instructions (yet). So we have to apply following patch for now: → git diff
diff --git a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
index c34ae2c..d264832 100644
--- a/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
+++ b/src/codegen/llvm/codegen_llvm_helper_visitor.cpp
@@ -522,9 +522,9 @@ void CodegenLLVMHelperVisitor::visit_nrn_state_block(ast::NrnStateBlock& node) {
std::vector<std::string> double_variables{"v"};
/// access node index and corresponding voltage
- loop_index_statements.push_back(
- visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
- loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));
+ //loop_index_statements.push_back(
+ // visitor::create_statement("node_id = node_index[{}]"_format(INDUCTION_VAR)));
+ //loop_body_statements.push_back(visitor::create_statement("v = voltage[node_id]"));
/// read ion variables
ion_read_statements(BlockType::State, Now we can generate vector IR as: make -j && ./bin/nmodl hh.mod llvm --ir --vector-width 8 > tmp.ll
sed -n '/ModuleID/,$p' tmp.ll > hh.ll Now → cat hh.ll
; ModuleID = 'hh'
source_filename = "hh"
%hh__instance_var__type = type { double*, double*, double*, double*, double*, double*, double*, i32*, double, double, double, i32, i32 }
.....
for.body: ; preds = %for.cond
%node_id = alloca i32, align 4
%v = alloca double, align 8
%6 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%7 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %6, i32 0, i32 1
%8 = load i32, i32* %id, align 4
%9 = sext i32 %8 to i64
%10 = load double*, double** %7, align 8
%11 = bitcast double* %10 to <8 x double>*
%12 = getelementptr inbounds <8 x double>, <8 x double>* %11, i64 %9
%13 = load <8 x double>, <8 x double>* %12, align 64
%14 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%15 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %14, i32 0, i32 2
%16 = load i32, i32* %id, align 4
%17 = sext i32 %16 to i64
%18 = load double*, double** %15, align 8
%19 = bitcast double* %18 to <8 x double>*
%20 = getelementptr inbounds <8 x double>, <8 x double>* %19, i64 %17
%21 = load <8 x double>, <8 x double>* %20, align 64
%22 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%23 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %22, i32 0, i32 0
%24 = load i32, i32* %id, align 4
%25 = sext i32 %24 to i64
%26 = load double*, double** %23, align 8
%27 = bitcast double* %26 to <8 x double>*
%28 = getelementptr inbounds <8 x double>, <8 x double>* %27, i64 %25
%29 = load <8 x double>, <8 x double>* %28, align 64
%30 = fsub <8 x double> %29, %21
%31 = fdiv <8 x double> %30, %13
%32 = load %hh__instance_var__type*, %hh__instance_var__type** %mech, align 8
%33 = getelementptr inbounds %hh__instance_var__type, %hh__instance_var__type* %32, i32 0, i32 2
%34 = load i32, i32* %id, align 4
%35 = sext i32 %34 to i64
%36 = load double*, double** %33, align 8
%37 = bitcast double* %36 to <8 x double>*
%38 = getelementptr inbounds <8 x double>, <8 x double>* %37, i64 %35
store <8 x double> %31, <8 x double>* %38, align 64
br label %for.inc And converting to assembly will give: → llc -o - -O3 -march=x86-64 -mcpu=skylake-avx512 hh.ll
....
_nrn_state_hh: ## @nrn_state_hh
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
pushq %rbx
andq $-32, %rsp
subq $96, %rsp
movq %rsp, %rbx
.cfi_offset %rbx, -24
movq %rdi, 24(%rbx)
movl $0, 20(%rbx)
vmovaps LCPI0_0(%rip), %ymm0 ## ymm0 = [0,1,2,3,4,5,6,7]
vmovaps %ymm0, 32(%rbx)
vpbroadcastd LCPI0_1(%rip), %ymm0 ## ymm0 = [8,8,8,8,8,8,8,8]
movq 24(%rbx), %rax
movl 20(%rbx), %ecx
cmpl 92(%rax), %ecx
jge LBB0_3
.p2align 4, 0x90
LBB0_2: ## %for.body
## =>This Inner Loop Header: Depth=1
movq %rsp, %rax
addq $-16, %rax
movq %rax, %rsp
movq %rsp, %rax
addq $-16, %rax
movq %rax, %rsp
movq 24(%rbx), %rax
movslq 20(%rbx), %rcx
movq (%rax), %rdx
movq 8(%rax), %rsi
shlq $6, %rcx
movq 16(%rax), %rax
vmovapd (%rdx,%rcx), %zmm1
vsubpd (%rax,%rcx), %zmm1, %zmm1
vdivpd (%rsi,%rcx), %zmm1, %zmm1
vmovapd %zmm1, (%rax,%rcx)
addl $8, 20(%rbx)
vpaddd 32(%rbx), %ymm0, %ymm1
vmovdqa %ymm1, 32(%rbx)
movq 24(%rbx), %rax
movl 20(%rbx), %ecx
cmpl 92(%rax), %ecx
jl LBB0_2
LBB0_3: ## %for.exit
leaq -8(%rbp), %rsp
popq %rbx
popq %rbp
vzeroupper
retq
.cfi_endproc
## -- End function
.subsections_via_symbols
.... |
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
@georgemitenkov : I have restored |
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
- instance structure now contains all global variables - instance structure now contains index variables for ions - nrn_state kernel now has all variables converted to instance - InstanceVarHelper added to query variable and it's location * Support for codegen variable with type * Add nmodl_to_json helper added in main.cpp * Added --vector-width CLI option * Add instance struct argument to nrn_state_hh * Add comments as TODOs to support LLVM IR generation Note that this commit and next commit (Part II) are required to make LLVM IR code generation working. Vector IR generation is working except indirect indexes. See comment in #531.
Using :
We get now
Note that running this branch result into
This is because various fixes in LLVM IR generation are pending.
@georgemitenkov : Could you look at various TODO's mentioned in this 2da2bc4 commit? I will do some improvements to this PR but you can get an idea of what is current structure and if you would like to change other things.