InfluxDB doesn't start after disk run out of space with error: Caused by: proto: illegal tag 0 #637

wladekb · 2014-06-12T12:10:14Z

The partition where influxdb stores data run out of space. After extending the partition the following error occurs whenever I try to start influxdb.

[2014/06/12 11:27:16 UTC] [INFO] (main.setupLogging:53) Redirectoring logging to /opt/influxdb/shared/log.txt
[2014/06/12 11:27:16 UTC] [INFO] (main.main:121) Starting Influx Server 0.7.2 bound to 0.0.0.0...
[2014/06/12 11:27:16 UTC] [INFO] (server.NewServer:38) Opening database at /opt/influxdb/shared/data/db
[2014/06/12 11:28:53 UTC] [INFO] (main.setupLogging:53) Redirectoring logging to /opt/influxdb/shared/log.txt
[2014/06/12 11:28:53 UTC] [INFO] (main.main:121) Starting Influx Server 0.7.2 bound to 0.0.0.0...
[2014/06/12 11:28:53 UTC] [INFO] (server.NewServer:38) Opening database at /opt/influxdb/shared/data/db
[2014/06/12 11:31:16 UTC] [INFO] (main.setupLogging:53) Redirectoring logging to /opt/influxdb/shared/log.txt
[2014/06/12 11:31:16 UTC] [INFO] (main.main:121) Starting Influx Server 0.7.2 bound to 0.0.0.0...
[2014/06/12 11:31:16 UTC] [INFO] (server.NewServer:38) Opening database at /opt/influxdb/shared/data/db
[2014/06/12 11:31:16 UTC] [INFO] (wal.NewWAL:40) Opening wal in /opt/influxdb/shared/data/wal
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).openLog:370) Opening log file /opt/influxdb/shared/data/wal/log.2980001
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).openLog:384) Opening index file /opt/influxdb/shared/data/wal/index.2980001
[2014/06/12 11:31:16 UTC] [INFO] (api/http.(*HttpServer).EnableSsl:62) Ssl will be disabled since the ssl port or certificate path weren't set
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).Serve:513) Initializing Raft HTTP server
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).Serve:524) Raft Server Listening at 0.0.0.0:8090
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).startRaft:353) Initializing Raft Server: http://graph-s3:8090
[2014/06/12 11:31:16 UTC] [INFO] (cluster.(*ClusterConfiguration).Recovery:574) Recovering the cluster configuration
[2014/06/12 11:31:16 UTC] [INFO] (cluster.(*ClusterConfiguration).Recovery:592) Checking whether e2f7636 is the local server e2f7636
[2014/06/12 11:31:16 UTC] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:120) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00003
[2014/06/12 11:31:16 UTC] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:120) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00002
[2014/06/12 11:31:16 UTC] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:120) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00001
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).startRaft:374) Recovered from log
[2014/06/12 11:31:16 UTC] [INFO] (server.(*Server).ListenAndServe:88) Waiting for local server to be added
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).SetServerId:109) Setting server id to 1 and recovering
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).recover:441) Checking /opt/influxdb/shared/data/wal/log.2980001, last: 104409725, size: 108051359
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*log).replayFromFileLocation:227) replaying from file location 104409725
[2014/06/12 11:31:16 UTC] [EROR] (wal.sendOrStop:291) Error in replay: Replay error. Stacktrace:
goroutine 16 [running]:
common.NewErrorWithStacktrace(0x7f3665760e80, 0xc2104852d0, 0xc2104852e0, 0x1, 0x1, ...)
        /home/vagrant/influxdb/src/common/error_with_stacktrace.go:22 +0x7f
wal.newErrorReplayRequest(0x7f3665760e80, 0xc2104852d0, 0x42f52)
        /home/vagrant/influxdb/src/wal/replay_request.go:19 +0xc0
wal.(*log).replayFromFileLocation(0xc2100d8360, 0xc2100c7170, 0xc2100f2600, 0xc2100eddc0, 0xc2100ef480)
        /home/vagrant/influxdb/src/wal/log.go:276 +0x54d
wal.func·002()
        /home/vagrant/influxdb/src/wal/log.go:167 +0x25b
created by wal.(*log).dupAndReplayFromOffset
        /home/vagrant/influxdb/src/wal/log.go:168 +0x16c

Caused by: proto: illegal tag 0

root@graph-s3:/opt/influxdb/shared# ps aux | grep influxdb
root     20314  0.0  0.0  10604   924 pts/0    S+   11:36   0:00 grep --color=auto influxdb
root@graph-s3:/opt/influxdb/shared# /etc/init.d/influxdb start
Starting the process influxdb [ OK ]
influxdb process was started [ OK ]
root@graph-s3:/opt/influxdb/shared# ps aux | grep influxdb
root     20412  0.0  0.0  10604   920 pts/0    S+   11:36   0:00 grep --color=auto influxdb
root@graph-s3:/opt/influxdb/shared# ../current/influxdb -v
InfluxDB v0.7.2 (git: 63c2be7) (leveldb: 1.15)
root@graph-s3:/opt/influxdb/shared# ls -l data/wal
total 105528
-rw-r--r-- 1 influxdb influxdb       289 Jun 11 18:02 bookmark
-rw-r--r-- 1 influxdb influxdb       234 Jun 11 18:02 index.2980001
-rw-r--r-- 1 influxdb influxdb 108051359 Jun 12 11:36 log.2980001
root@graph-s3:/opt/influxdb/shared# cat /etc/issue
Ubuntu 12.04.2 LTS \n \l
root@graph-s3:/opt/influxdb/shared# uname -a
Linux graph-s3 3.5.0-36-generic #57~precise1-Ubuntu SMP Thu Jun 20 18:21:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
root@graph-s3:/opt/influxdb/shared# mount
[...]
/dev/sda5 on /data type ext4 (rw)

I've been running v0.6.5 when disk run out of space. Tried to start the old version with no luck, then upgraded to v0.7.2 (latest as of now) and still no luck.

I'm gonna zip the WAL directory and put a link here shortly. If you need any other files please let me know here.

The text was updated successfully, but these errors were encountered:

wladekb · 2014-06-12T12:17:06Z

https://www.dropbox.com/s/wa19eycyp9fjkjb/issue-637-wal-dir.zip

pauldix · 2014-06-12T13:40:19Z

If you delete the wal directory and start up you should be good. That may
cause a little bit of data loss, but it depends on how many writes were
buffered in the WAL before everything died. Likely it's small.

On Thu, Jun 12, 2014 at 8:17 AM, Władysław Bodzek notifications@github.com
wrote:

https://www.dropbox.com/s/wa19eycyp9fjkjb/issue-637-wal-dir.zip

—
Reply to this email directly or view it on GitHub
#637 (comment).

wladekb · 2014-06-13T10:16:30Z

The solution sorted the problem out.

It would be great if I didn't have to manually fix the problem in such a way though. Do you plan any improvement for this issue? Please let me know if you need more info about the issue.

pauldix · 2014-06-13T14:31:19Z

I suppose we could try to catch that error. The bigger problem is what to
do when that happens. You saw that because you had a corrupt WAL file
resulting from filling your disk. The only real solution is to toss
whatever is left in that WAL file.

On Fri, Jun 13, 2014 at 6:16 AM, Władysław Bodzek notifications@github.com
wrote:

The solution sorted the problem out.

It would be great if I didn't have to manually fix the problem in such a
way though. Do you plan any improvement for this issue? Please let me know
if you need more info about the issue.

—
Reply to this email directly or view it on GitHub
#637 (comment).

jvshahid closed this as completed in d3ffecc Jun 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfluxDB doesn't start after disk run out of space with error: Caused by: proto: illegal tag 0 #637

InfluxDB doesn't start after disk run out of space with error: Caused by: proto: illegal tag 0 #637

wladekb commented Jun 12, 2014

wladekb commented Jun 12, 2014

pauldix commented Jun 12, 2014

wladekb commented Jun 13, 2014

pauldix commented Jun 13, 2014

InfluxDB doesn't start after disk run out of space with error: Caused by: proto: illegal tag 0 #637

InfluxDB doesn't start after disk run out of space with error: Caused by: proto: illegal tag 0 #637

Comments

wladekb commented Jun 12, 2014

wladekb commented Jun 12, 2014

pauldix commented Jun 12, 2014

wladekb commented Jun 13, 2014

pauldix commented Jun 13, 2014