sean feedback

Dieterbe · Dieterbe · commit 635a4bbad7d4 · 2020-01-27T22:46:47.000+02:00
diff --git a/api/graphite_req.go b/api/graphite_req.go
@@ -8,7 +8,7 @@ import (
 
 // ReqMap is a map of requests of data,
 // it has single requests for which no pre-normalization effort will be performed, and
-// requests are that can be pre-normalized together to the same resolution, bundled by their PNGroup
+// requests that can be pre-normalized together to the same resolution, bundled by their PNGroup
 type ReqMap struct {
 	single   []models.Req
 	pngroups map[models.PNGroup][]models.Req
diff --git a/api/models/request.go b/api/models/request.go
@@ -126,7 +126,7 @@ func (r *Req) AdjustTo(interval, from uint32, rets []conf.Retention) {
 	// we will have to apply normalization
 	// we use the initially found archive as starting point. there could be some cases - if you have exotic settings -
 	// where it may be more efficient to pick a lower res archive as starting point (it would still require an interval
-	// divisible by the output interval) but let's not worry about that edge case.
+	// that is a factor of the output interval) but let's not worry about that edge case.
 	r.PlanNormalization(interval)
 }
 
diff --git a/devdocs/expr.md b/devdocs/expr.md
@@ -12,7 +12,7 @@
 Such functions require special options.
 see https://github.com/grafana/metrictank/issues/926#issuecomment-559596384
 
-## implement our copy-o-write approach when dealing with modifying series
+## implement our copy-on-write approach when dealing with modifying series
 
 See section 'Considerations around Series changes and reuse and why we chose copy-on-write' below.
 
@@ -27,7 +27,7 @@ example: an averageSeries() of 3 series:
 * will create an output series value.
 * it will use a new datapoints slice, retrieved from pool, because the points will be different. also it will allocate a new meta section and tags map because they are different from the input series also.
 * won't put the 3 inputs back in the pool or cache, because whoever allocated the input series was responsible for doing that. we should not add the same arrays to the pool multiple times.
-* It will however store the newly created series into the cache such that that during plan cleanup time, the series' datapoints slice will be moved back to the pool.
+* It will however store the newly created series into the cache such that during plan cleanup time, the series' datapoints slice will be moved back to the pool.
 
 # Considerations around Series changes and reuse and why we chose copy-on-write.
 
@@ -72,7 +72,7 @@ for now we assume that multi-steps in a row is not that common, and COW seems mo
 
 
 This leaves the problem of effectively managing allocations and using a sync.Pool.
-Note that the expr library can be called by different clients. At this point only Metrictank uses it, but we intend this lirbrary to be easily embeddable in other programs. 
+Note that the expr library can be called by different clients. At this point only Metrictank uses it, but we intend this library to be easily embeddable in other programs. 
 It's up to the client to instantiate the pool, and set up the default allocation to return point slices of desired point capacity.
 The client can then of course use this pool to store series, which it then feeds to expr.
 expr library does the rest.  It manages the series/pointslices and gets new ones as a basis for the COW.
diff --git a/docs/render-path.md b/docs/render-path.md
@@ -113,7 +113,7 @@ First, let's look at some definitions.
 Certain functions will return output series in an interval different from the input interval.
 For example summarize() and smartSummarize(). We refer to these as IA-functions below.
 In principle we can predict what the output interval will be during the plan phase, because we can parse the function arguments.
-However, for simplicty, we don't implement this and treat all IA functions as functions that may change the interval of series in unpredicatable ways.
+However, for simplicity, we don't implement this and treat all IA functions as functions that may change the interval of series in unpredicatable ways.
 
 ### Transparent aggregation
 
@@ -133,9 +133,16 @@ Generally, if series have different intervals, they can keep those and we return
 However, when data will be used together (e.g. aggregating multiple series together, or certain functions like divideSeries, asPercent, etc) they will need to have the same interval.
 An aggregation can be opaque or transparent as defined above.
 
-Pre-normalizing is when we can safely - during planning - set up normalization to happen right after fetching (or better: set up the fetch parameters such that normalizing is not needed) and wen we know the normalization won't affect anything else.
-This is the case when series go from fetching to transparent aggregation, possibly with some processing functions - except opaque aggregation(s) or IA-function(s) - in between, and
-with asPercent in a certain mode (where it has to normalize all inputs), but not with divideSeries where it applies the same divisor to multiple dividend inputs, for example.
+Pre-normalizing is when we can safely - during planning - set up normalization to happen right after fetching (or better: set up the fetch parameters such that normalizing is not needed) and when we know the normalization won't affect anything else.
+
+This is the case when series go from fetching to a processing function like:
+* a transparent aggregation
+* asPercent in a certain mode (where it has to normalize all inputs)
+
+possibly with some processing functions in between the fetching and the above function, except opaque aggregation(s) or IA-function(s).
+
+Some functions also have to normalize (some of) their inputs, but yet cannot have their inputs pre-normalized. For example,
+divideSeries because it applies the same divisor to multiple distinct dividend inputs (of possibly different intervals).
 
 For example if we have these schemas:
 ```
@@ -152,13 +159,12 @@ Likewise, if the query is `groupByNode(group(A,B), 2, callback='sum')` we cannot
 
 Benefits of this optimization:
 1) less work spent consolidating at runtime, less data to fetch
-2) it assures data will be fetched in a pre-canonical way. If we don't set up normalization for fetching, data may not be pre-canonical, such that
+2) it assures data will be fetched in a pre-canonical way. If we don't set up normalization for fetching, data may not be pre-canonical, which means we may have to add null points to normalize it to canonical data, lowering the accuracy of the first or last point.
 3) pre-normalized data reduces a request's chance of breaching max-points-per-req-soft and thus makes it less likely that other data that should be high-resolution gets fetched in a coarser way.
 when it eventually needs to be normalized at runtime, points at the beginning or end of the series may be less accurate.
 
 Downsides of this optimization:
-1) if you already have the raw data cached, and the rollup data is not cached yet, it may result in a slower query.  But this is an edge case
-2) uses slightly more of the chunk cache.
+1) if you already have the raw data cached, and the rollup data is not cached yet, it may result in a slower query, and you'd use slightly more chunk cache after the fetch.  But this is an edge case
 
 ## MDP-optimizable
 
diff --git a/expr/plan.go b/expr/plan.go
@@ -16,9 +16,11 @@ type Optimizations struct {
 }
 
 func (o Optimizations) ApplyUserPrefs(s string) Optimizations {
+	// no user override. stick to what we have
 	if s == "" {
 		return o
 	}
+	// user passed an override. it's either 'none' (no optimizations) or a list of the ones that should be enabled
 	o.PreNormalization = false
 	o.MDP = false
 	if s == "none" {
diff --git a/util/combinations.go b/util/combinations.go
@@ -11,31 +11,31 @@ func AllCombinationsUint32(parts [][]uint32) (out [][]uint32) {
 	out = make([][]uint32, 0, num)
 
 	// will contain idx of which one to pick for each part
-	idexes := make([]int, len(parts))
+	indexes := make([]int, len(parts))
 
 mainloop:
 	for {
-		// update idexes:
+		// update indexes:
 		// travel backwards. whenever we encounter an index that "overflowed"
 		// reset it back to 0 and bump the previous one, until they are all maxed out
-		for i := len(idexes) - 1; i >= 0; i-- {
-			if idexes[i] >= len(parts[i]) {
+		for i := len(indexes) - 1; i >= 0; i-- {
+			if indexes[i] >= len(parts[i]) {
 				if i == 0 {
 					break mainloop
 				}
-				idexes[i] = 0
-				idexes[i-1]++
+				indexes[i] = 0
+				indexes[i-1]++
 			}
 		}
 
 		combo := make([]uint32, len(parts))
 		for i, part := range parts {
-			combo[i] = part[idexes[i]]
+			combo[i] = part[indexes[i]]
 		}
 		out = append(out, combo)
 
 		// always bump idx of the last one
-		idexes[len(parts)-1]++
+		indexes[len(parts)-1]++
 	}
 	return out
 }

Original file line number	Diff line number	Diff line change
`@@ -126,7 +126,7 @@ func (r *Req) AdjustTo(interval, from uint32, rets []conf.Retention) {`
`126`	`126`	`// we will have to apply normalization`
`127`	`127`	`// we use the initially found archive as starting point. there could be some cases - if you have exotic settings -`
`128`	`128`	`// where it may be more efficient to pick a lower res archive as starting point (it would still require an interval`
`129`		`- // divisible by the output interval) but let's not worry about that edge case.`
	`129`	`+ // that is a factor of the output interval) but let's not worry about that edge case.`
`130`	`130`	`r.PlanNormalization(interval)`
`131`	`131`	`}`
`132`	`132`
Original file line number	Diff line number	Diff line change
`@@ -16,9 +16,11 @@ type Optimizations struct {`
`16`	`16`	`}`
`17`	`17`
`18`	`18`	`func (o Optimizations) ApplyUserPrefs(s string) Optimizations {`
	`19`	`+ // no user override. stick to what we have`
`19`	`20`	`if s == "" {`
`20`	`21`	`return o`
`21`	`22`	`}`
	`23`	`+ // user passed an override. it's either 'none' (no optimizations) or a list of the ones that should be enabled`
`22`	`24`	`o.PreNormalization = false`
`23`	`25`	`o.MDP = false`
`24`	`26`	`if s == "none" {`