Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.10 beta - influx_tsm conversion from bz1 #5406

Closed
discoduck2x opened this issue Jan 20, 2016 · 11 comments
Closed

0.10 beta - influx_tsm conversion from bz1 #5406

discoduck2x opened this issue Jan 20, 2016 · 11 comments

Comments

@discoduck2x
Copy link

~200gb of database @ 20shards, been running for 5 hours now, not high cpu usage, any way to see progress? and more important, what happens if i ctrl-c? corruption?

@toddboom
Copy link
Contributor

@discoduck2x has the conversion tool output any information yet? it should give you some updates as it progresses through the shards.

@discoduck2x
Copy link
Author

@toddboom , thanks for reply
i had 2 databases, one which was only ~300mb and that ones shards was converted in 4-5minutes with corresponding info from the tool, however no info since then.
I can see that the folders in /databasename/default/shardnumber.tsm are growing,,,slooowly so i guess it is progressing just very very slow. i used the -parrallell parameter still i dont know what that implies,, since it doesnt seem to do anything more than hogging one cpu core (out of 8) ~10-20% ,, mem is 64% out of 24gb,, disk system is internal sas 15krpm raid 1+0,,, system doesnt seem bogged down,,

I guess i will let it simmer for another 5-10hours and see what happens... should go out with a warning though to anyone tryign to convert +200gb of data....

@toddboom
Copy link
Contributor

Approximately how big are the new folders? Depending on the data, I'd expect the final result to be in the 10-20GB range.

Also, we did most of our testing on SSD, so it's possible that it's I/O bound on the disk array, but I'd expect you to still get decent performance. Keep me posted as it progresses.

@discoduck2x
Copy link
Author

200-300mb per folder , will let u know more tomorrow. Thx

Todd Persen notifications@github.com wrote:

Approximately how big are the new folders? Depending on the data, I'd expect the final result to be in the 10-20GB range.

Also, we did most of our testing on SSD, so it's possible that it's I/O bound on the disk array, but I'd expect you to still get decent performance. Keep me posted as it progresses.

Reply to this email directly or view it on GitHubhttps://github.com//issues/5406#issuecomment-173384264.

@joelegasse
Copy link
Contributor

I did a little digging, and think I might have figured out why your conversion is crawling along. You enabled the parallel flag, which sounds great, but the current influx binaries are built with Go 1.4.x which defaults GOMAXPROCS to 1.

This means that when the conversion tool creates 20 goroutines (to handle your 20 shards), but only one goroutine will be running at any given time. This would explain your "low CPU usage". If you enabled backups, rather if you didn't disable backups, you should be able to kill the process, move the directories back, and start again with GOMAXPROCS set higher. Although you might still have an issue that those 20 goroutines will all be fighting for the GOMAXPROCS active goroutines, all consuming memory in the process.

I'm going to play around with the parallelism in the conversion tool, as well as add some extra logging in the case a TSM file hits the max size (although it doesn't sound like that's the case you're seeing here).

@toddboom How does having influx_tsm limit itself to GOMAXPROCS simultaneous shards is -parallel is enabled sound?

@discoduck2x
Copy link
Author

@joelegasse @toddboom

i canceld this morning, it had done 2 shards by then,, one took 9h.. reverted to backup and
running now without -parallel but performance looks the same, low cpu usage, do i have to set GOMAXPROCS somehow as en env variable or somewhere else ? (sorry dont know enough bout this)
its working on a 12gb bz1 shardfile now, and currently the .tsm folder is at 900mb...

gonna add iostat to the host to see how quelengths etc look but it feels like something else

@discoduck2x
Copy link
Author

before: -rw-r--r-- 1 root root 12G Jan 21 07:37 510
Conversion of /var/lib/influxdb/data/mydb/default/510 successful (2h32m10.702839909s)
after: 1.5G ./default/510
so,, since i have bout 18shards left size ranging between 9-15GB , im estimating this to be done earliest in bout 36 hours, any help before will be appritiated

@discoduck2x
Copy link
Author

@toddboom
@joelegasse

my test env which has been running 0.10beta started to behave strangely today,, so i tried to revert this procution host , removed the 0.10beta, reinstalled the stable 0.9.6.1 , but cannot get it to load my database.bak stuff,, i am copying them to
[data]
dir = "/var/lib/influxdb/data"
which is specified in /etc/influxdb/influxdb.conf ,,,

when started up and going into the CLI "show databases" only shows _internal ,, if i do "use mydb" then i CAN do "show series" but if i do "show measurements" it says "database not found" ...

please help!

@joelegasse
Copy link
Contributor

@discoduck2x Did you remove the .bak extension from the backed-up folders? Did you make sure influxd was stopped when copying the files around?

The .bak folders are just copies of the directories files that were there, and shouldn't have been modified in any way. Removing the partially-converted folders and copying the .bak folders to their original names should be sufficient to restore the previous state.

@toddboom Can you think of any other steps that might need to be taken when downgrading and restoring the backed-up data?

@discoduck2x
Copy link
Author

@joelegasse , yes

i tried again to "upgrade" , this time using the nightly build which seems a bit faster,, all shards finished in aboout 9hours.

however,, after i start influx the log gives me some info that its reading this and that,, but i still cant see the database in CLI nor access it via grafana, heres the log output as i start it:

2016/01/22 23:34:22 InfluxDB starting, version 0.10.0-nightly-72c6a51, branch master, commit 72c6a51, built 2016-01-11T05:00:46+0000
2016/01/22 23:34:22 Go version go1.5.2, GOMAXPROCS set to 8
2016/01/22 23:34:23 Using configuration at: /etc/influxdb/influxdb.conf
[metastore] 2016/01/22 23:34:23 Using data dir: /var/lib/influxdb/meta
[metastore] 2016/01/22 23:34:23 Skipping cluster join: already member of cluster: nodeId=1 raftEnabled=true peers=[localhost:8088]
[metastore] 2016/01/22 23:34:23 Node at localhost:8088 [Follower]
[metastore] 2016/01/22 23:34:24 Node at localhost:8088 [Leader]. peers=[localhost:8088]
[metastore] 2016/01/22 23:34:24 spun up monitoring for 1
[store] 2016/01/22 23:34:24 Using data dir: /var/lib/influxdb/data
[metastore] 2016/01/22 23:34:24 Updated node id=1 hostname=localhost:8088
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/507
[filestore]2016/01/22 23:34:24 /var/lib/influxdb/data/mydb/default/507/000000001-000000001.tsm (#0) opened in 1.417411ms
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/509
[filestore]2016/01/22 23:34:24 /var/lib/influxdb/data/mydb/default/509/000000001-000000001.tsm (#0) opened in 1.457071ms
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/510
[filestore]2016/01/22 23:34:24 /var/lib/influxdb/data/mydb/default/510/000000001-000000001.tsm (#0) opened in 1.600667ms
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/511
[filestore]2016/01/22 23:34:24 /var/lib/influxdb/data/mydb/default/511/000000001-000000001.tsm (#0) opened in 1.495351ms
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/512
[filestore]2016/01/22 23:34:24 /var/lib/influxdb/data/mydb/default/512/000000001-000000001.tsm (#0) opened in 1.55079ms
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:24 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/515
[filestore]2016/01/22 23:34:24 /var/lib/influxdb/data/mydb/default/515/000000001-000000001.tsm (#0) opened in 1.552111ms
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/532
[filestore]2016/01/22 23:34:25 /var/lib/influxdb/data/mydb/default/532/000000001-000000001.tsm (#0) opened in 2.012052ms
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/556
[filestore]2016/01/22 23:34:25 /var/lib/influxdb/data/mydb/default/556/000000001-000000001.tsm (#0) opened in 2.160766ms
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/557
[filestore]2016/01/22 23:34:25 /var/lib/influxdb/data/mydb/default/557/000000001-000000001.tsm (#0) opened in 2.003232ms
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/561
[filestore]2016/01/22 23:34:25 /var/lib/influxdb/data/mydb/default/561/000000001-000000001.tsm (#0) opened in 2.146533ms
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:25 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/563
[filestore]2016/01/22 23:34:25 /var/lib/influxdb/data/mydb/default/563/000000001-000000001.tsm (#0) opened in 1.842797ms
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/564
[filestore]2016/01/22 23:34:26 /var/lib/influxdb/data/mydb/default/564/000000001-000000001.tsm (#0) opened in 1.827628ms
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/567
[filestore]2016/01/22 23:34:26 /var/lib/influxdb/data/mydb/default/567/000000001-000000001.tsm (#0) opened in 2.141969ms
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/573
[filestore]2016/01/22 23:34:26 /var/lib/influxdb/data/mydb/default/573/000000001-000000001.tsm (#0) opened in 1.822479ms
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/574
[filestore]2016/01/22 23:34:26 /var/lib/influxdb/data/mydb/default/574/000000001-000000001.tsm (#0) opened in 2.102393ms
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:26 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/578
[filestore]2016/01/22 23:34:26 /var/lib/influxdb/data/mydb/default/578/000000001-000000001.tsm (#0) opened in 2.195543ms
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/580
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/580/000000001-000000001.tsm (#0) opened in 2.143866ms
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/582
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/582/000000001-000000001.tsm (#0) opened in 2.164687ms
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/585
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/585/000000001-000000003.tsm (#2) opened in 181.52µs
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/585/000000001-000000001.tsm (#0) opened in 1.197134ms
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/585/000000001-000000002.tsm (#1) opened in 1.278982ms
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:27 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/602
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/602/000000001-000000003.tsm (#2) opened in 344.935µs
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/602/000000001-000000001.tsm (#0) opened in 1.228954ms
[filestore]2016/01/22 23:34:27 /var/lib/influxdb/data/mydb/default/602/000000001-000000002.tsm (#1) opened in 1.654521ms
[tsm1wal] 2016/01/22 23:34:28 tsm1 WAL starting with 10485760 segment size
[tsm1wal] 2016/01/22 23:34:28 tsm1 WAL writing to /var/lib/influxdb/wal/mydb/default/604
[filestore]2016/01/22 23:34:28 /var/lib/influxdb/data/mydb/default/604/000000001-000000002.tsm (#1) opened in 346.728µs
[filestore]2016/01/22 23:34:28 /var/lib/influxdb/data/mydb/default/604/000000001-000000001.tsm (#0) opened in 1.923337ms
[handoff] 2016/01/22 23:34:28 Starting hinted handoff service
[monitor] 2016/01/22 23:34:28 'hh' registered for diagnostics monitoring
[handoff] 2016/01/22 23:34:28 Using data dir: /var/lib/influxdb/hh
[subscriber] 2016/01/22 23:34:28 opened service
[monitor] 2016/01/22 23:34:28 Starting monitor system
[monitor] 2016/01/22 23:34:28 'build' registered for diagnostics monitoring
[monitor] 2016/01/22 23:34:28 'runtime' registered for diagnostics monitoring
[monitor] 2016/01/22 23:34:28 'network' registered for diagnostics monitoring
[monitor] 2016/01/22 23:34:28 'system' registered for diagnostics monitoring
[cluster] 2016/01/22 23:34:28 Starting cluster service
[shard-precreation] 2016/01/22 23:34:28 Starting precreation service with check interval of 10m0s, advance period of 30m0s
[snapshot] 2016/01/22 23:34:28 Starting snapshot service
[copier] 2016/01/22 23:34:28 Starting copier service
[admin] 2016/01/22 23:34:28 Starting admin service
[admin] 2016/01/22 23:34:28 Listening on HTTP: [::]:8083
[continuous_querier] 2016/01/22 23:34:28 Starting continuous query service
[httpd] 2016/01/22 23:34:28 Starting HTTP service
[httpd] 2016/01/22 23:34:28 Authentication enabled: false
[httpd] 2016/01/22 23:34:28 Listening on HTTP: [::]:8086
[retention] 2016/01/22 23:34:28 Starting retention policy enforcement service with check interval of 30m0s
[run] 2016/01/22 23:34:28 Listening for signals
[http] 2016/01/22 23:34:34 ::1 - - [22/Jan/2016:23:34:34 +0100] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.10.0-nightly-72c6a51 4fb3d13e-c158-11e5-8001-000000000000 82.672µs
[query] 2016/01/22 23:34:34 SHOW DIAGNOSTICS FOR registration
[http] 2016/01/22 23:34:34 ::1 - - [22/Jan/2016:23:34:34 +0100] GET /query?db=&epoch=ns&q=SHOW+DIAGNOSTICS+for+%27registration%27 HTTP/1.1 200 40 - InfluxDBShell/0.10.0-nightly-72c6a51 4fb3e57a-c158-11e5-8002-000000000000 1.250904ms
[query] 2016/01/22 23:34:38 SHOW DATABASES
[http] 2016/01/22 23:34:38 ::1 - - [22/Jan/2016:23:34:38 +0100] GET /query?db=&epoch=ns&q=show+databases HTTP/1.1 200 98 - InfluxDBShell/0.10.0-nightly-72c6a51 522339ca-c158-11e5-8003-000000000000 3.443317ms

and heres CLI:
[root@LON-CAB6-01 604]# systemctl start influxdb
[root@LON-CAB6-01 604]# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.10.0-nightly-72c6a51
InfluxDB shell 0.10.0-nightly-72c6a51

show databases

name: databases

name
_internal

im gonna await till sunday then just kick it all out and restart empty database(s) from 0.9.6.1 stable.

@discoduck2x
Copy link
Author

gave up... went all way back to 0.9.4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants