Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB doesn't start after disk run out of space with error: Caused by: proto: illegal tag 0 #637

Closed
wladekb opened this issue Jun 12, 2014 · 4 comments

Comments

@wladekb
Copy link

wladekb commented Jun 12, 2014

The partition where influxdb stores data run out of space. After extending the partition the following error occurs whenever I try to start influxdb.

[2014/06/12 11:27:16 UTC] [INFO] (main.setupLogging:53) Redirectoring logging to /opt/influxdb/shared/log.txt
[2014/06/12 11:27:16 UTC] [INFO] (main.main:121) Starting Influx Server 0.7.2 bound to 0.0.0.0...
[2014/06/12 11:27:16 UTC] [INFO] (server.NewServer:38) Opening database at /opt/influxdb/shared/data/db
[2014/06/12 11:28:53 UTC] [INFO] (main.setupLogging:53) Redirectoring logging to /opt/influxdb/shared/log.txt
[2014/06/12 11:28:53 UTC] [INFO] (main.main:121) Starting Influx Server 0.7.2 bound to 0.0.0.0...
[2014/06/12 11:28:53 UTC] [INFO] (server.NewServer:38) Opening database at /opt/influxdb/shared/data/db
[2014/06/12 11:31:16 UTC] [INFO] (main.setupLogging:53) Redirectoring logging to /opt/influxdb/shared/log.txt
[2014/06/12 11:31:16 UTC] [INFO] (main.main:121) Starting Influx Server 0.7.2 bound to 0.0.0.0...
[2014/06/12 11:31:16 UTC] [INFO] (server.NewServer:38) Opening database at /opt/influxdb/shared/data/db
[2014/06/12 11:31:16 UTC] [INFO] (wal.NewWAL:40) Opening wal in /opt/influxdb/shared/data/wal
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).openLog:370) Opening log file /opt/influxdb/shared/data/wal/log.2980001
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).openLog:384) Opening index file /opt/influxdb/shared/data/wal/index.2980001
[2014/06/12 11:31:16 UTC] [INFO] (api/http.(*HttpServer).EnableSsl:62) Ssl will be disabled since the ssl port or certificate path weren't set
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).Serve:513) Initializing Raft HTTP server
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).Serve:524) Raft Server Listening at 0.0.0.0:8090
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).startRaft:353) Initializing Raft Server: http://graph-s3:8090
[2014/06/12 11:31:16 UTC] [INFO] (cluster.(*ClusterConfiguration).Recovery:574) Recovering the cluster configuration
[2014/06/12 11:31:16 UTC] [INFO] (cluster.(*ClusterConfiguration).Recovery:592) Checking whether e2f7636 is the local server e2f7636
[2014/06/12 11:31:16 UTC] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:120) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00003
[2014/06/12 11:31:16 UTC] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:120) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00002
[2014/06/12 11:31:16 UTC] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:120) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00001
[2014/06/12 11:31:16 UTC] [INFO] (coordinator.(*RaftServer).startRaft:374) Recovered from log
[2014/06/12 11:31:16 UTC] [INFO] (server.(*Server).ListenAndServe:88) Waiting for local server to be added
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).SetServerId:109) Setting server id to 1 and recovering
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*WAL).recover:441) Checking /opt/influxdb/shared/data/wal/log.2980001, last: 104409725, size: 108051359
[2014/06/12 11:31:16 UTC] [INFO] (wal.(*log).replayFromFileLocation:227) replaying from file location 104409725
[2014/06/12 11:31:16 UTC] [EROR] (wal.sendOrStop:291) Error in replay: Replay error. Stacktrace:
goroutine 16 [running]:
common.NewErrorWithStacktrace(0x7f3665760e80, 0xc2104852d0, 0xc2104852e0, 0x1, 0x1, ...)
        /home/vagrant/influxdb/src/common/error_with_stacktrace.go:22 +0x7f
wal.newErrorReplayRequest(0x7f3665760e80, 0xc2104852d0, 0x42f52)
        /home/vagrant/influxdb/src/wal/replay_request.go:19 +0xc0
wal.(*log).replayFromFileLocation(0xc2100d8360, 0xc2100c7170, 0xc2100f2600, 0xc2100eddc0, 0xc2100ef480)
        /home/vagrant/influxdb/src/wal/log.go:276 +0x54d
wal.func·002()
        /home/vagrant/influxdb/src/wal/log.go:167 +0x25b
created by wal.(*log).dupAndReplayFromOffset
        /home/vagrant/influxdb/src/wal/log.go:168 +0x16c

Caused by: proto: illegal tag 0
root@graph-s3:/opt/influxdb/shared# ps aux | grep influxdb
root     20314  0.0  0.0  10604   924 pts/0    S+   11:36   0:00 grep --color=auto influxdb
root@graph-s3:/opt/influxdb/shared# /etc/init.d/influxdb start
Starting the process influxdb [ OK ]
influxdb process was started [ OK ]
root@graph-s3:/opt/influxdb/shared# ps aux | grep influxdb
root     20412  0.0  0.0  10604   920 pts/0    S+   11:36   0:00 grep --color=auto influxdb
root@graph-s3:/opt/influxdb/shared# ../current/influxdb -v
InfluxDB v0.7.2 (git: 63c2be7) (leveldb: 1.15)
root@graph-s3:/opt/influxdb/shared# ls -l data/wal
total 105528
-rw-r--r-- 1 influxdb influxdb       289 Jun 11 18:02 bookmark
-rw-r--r-- 1 influxdb influxdb       234 Jun 11 18:02 index.2980001
-rw-r--r-- 1 influxdb influxdb 108051359 Jun 12 11:36 log.2980001
root@graph-s3:/opt/influxdb/shared# cat /etc/issue
Ubuntu 12.04.2 LTS \n \l
root@graph-s3:/opt/influxdb/shared# uname -a
Linux graph-s3 3.5.0-36-generic #57~precise1-Ubuntu SMP Thu Jun 20 18:21:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
root@graph-s3:/opt/influxdb/shared# mount
[...]
/dev/sda5 on /data type ext4 (rw)

I've been running v0.6.5 when disk run out of space. Tried to start the old version with no luck, then upgraded to v0.7.2 (latest as of now) and still no luck.

I'm gonna zip the WAL directory and put a link here shortly. If you need any other files please let me know here.

@wladekb
Copy link
Author

wladekb commented Jun 12, 2014

@pauldix
Copy link
Member

pauldix commented Jun 12, 2014

If you delete the wal directory and start up you should be good. That may
cause a little bit of data loss, but it depends on how many writes were
buffered in the WAL before everything died. Likely it's small.

On Thu, Jun 12, 2014 at 8:17 AM, Władysław Bodzek notifications@github.com
wrote:

https://www.dropbox.com/s/wa19eycyp9fjkjb/issue-637-wal-dir.zip


Reply to this email directly or view it on GitHub
#637 (comment).

@wladekb
Copy link
Author

wladekb commented Jun 13, 2014

The solution sorted the problem out.

It would be great if I didn't have to manually fix the problem in such a way though. Do you plan any improvement for this issue? Please let me know if you need more info about the issue.

@pauldix
Copy link
Member

pauldix commented Jun 13, 2014

I suppose we could try to catch that error. The bigger problem is what to
do when that happens. You saw that because you had a corrupt WAL file
resulting from filling your disk. The only real solution is to toss
whatever is left in that WAL file.

On Fri, Jun 13, 2014 at 6:16 AM, Władysław Bodzek notifications@github.com
wrote:

The solution sorted the problem out.

It would be great if I didn't have to manually fix the problem in such a
way though. Do you plan any improvement for this issue? Please let me know
if you need more info about the issue.


Reply to this email directly or view it on GitHub
#637 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants