Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.StringValue #7533

Merged
merged 3 commits into from
Oct 28, 2016

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Oct 27, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass

This fixes a panic when snapshotting the cache. If concurrent writes to the same shard occur, it's possible for different types to be added to the cache for the same series. The way the measurementFields map on the shard is updated is racy in this scenario which would normally prevent this from occurring. When this occurs, the snapshot compaction panics because it can't encode different types in the same series.

To prevent this, we have the cache return an error a different type is added to existing values in the cache.

This also fixes a race that was introduced in #7480 where the FileStore.Stats slice is read by the compactor without a lock when the FileStore is updating it.

Fixes #7498

cc @e-dard

@jwilder jwilder added this to the 1.1.0 milestone Oct 27, 2016
Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@@ -219,7 +240,10 @@ func (c *Cache) Write(key string, values []Value) error {
return ErrCacheMemoryExceeded
}

c.write(key, values)
if err := c.write(key, values); err != nil {
c.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update c.stats.WriteErr or maybe add a new stat for this unusual condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Just discovered that the existing OK and Err stats are tracked, but not actually recorded.

@@ -261,7 +261,7 @@ func (f *FileStore) Add(files ...TSMFile) {
for _, file := range files {
atomic.AddInt64(&f.stats.DiskBytes, int64(file.Size()))
}
f.lastFileStats = f.lastFileStats[:0] // Will need to be recalculated on next call to Stats.
f.lastFileStats = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I guess we were trying to be too clever for our own good making that change 😄

The file store stats slice is re-used which causes the race below:

WARNING: DATA RACE
Write at 0x00c42007e140 by goroutine 43:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Stats()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:511 +0x22e
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).findGenerations()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:461 +0x6f
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanLevel()

Previous read at 0x00c42007e140 by goroutine 40:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).findGenerations()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:463 +0x13d
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanOptimize()
…*tsm1.StringValue

If concurrent writes to the same shard occur, it's possible for different types to
be added to the cache for the same series.  The way the measurementFields map on the
shard is updated is racy in this scenario which would normally prevent this from occurring.
When this occurs, the snapshot compaction panics because it can't encode different types
in the same series.

To prevent this, we have the cache return an error a different type is added to existing
values in the cache.

Fixes #7498
Adds a new dropped stat as well as fixes OK and error stats not
actually get collected and stored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.StringValue
2 participants