train-model not running #409

fatihgencali · 2020-08-24T12:58:55Z

Hi,

train-model was not running .
I did not use docker , i installed direct.

There is no detailed log for this problem.
I use loudml on grafana.

loudml

train-model --from now-30d --to now prometheus_auto_process_cpu_seconds_total__auto
training(prometheus_auto_process_cpu_seconds_total__auto): 0%| | 0/10 [00:00<?, ?it/s]

loudmld[11689]: ERROR:root:job[c984343e-2260-466d-b4e3-504a5eb2a2be] failed:

buckets:

name: prometheus
type: prometheus
addr: 0.0.0.0:9090
retention_policy: autogen
name: loudml
type: influxdb
addr: 0.0.0.0:8086
database: loudml
retention_policy: autogen
measurement: loudml
annotation_db: loudmlannotations
storage:
path: /var/lib/loudml
server:
listen: 0.0.0.0:8077

How to solve this ?

Thanks

fatihgencali · 2020-08-27T11:53:24Z

Hi,

I pass previous error. Tensorflow==1.15 must be.
But now following error exist.How to solve this?

"WARNING:root:iteration failed: insufficient training data"

[2020-08-27 14:39:26] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.022457
[2020-08-27 14:39:27] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.032339
[2020-08-27 14:39:35] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.021654
[2020-08-27 14:39:35] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.021841
[2020-08-27 14:39:39] "GET /models/prometheus_auto_iratenode_disk_reads_completed_totalinstancenode30m__auto HTTP/1.1" 404 209 0.015889
INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: 2020-08-27 14:38:42, next run: 2020-08-27 14:39:42)
[2020-08-27 14:39:43] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.019205
INFO:root:job[17ac0d71-80a8-4f6c-8fba-a887d8bfeb4b] starting, nice=5
[2020-08-27 14:39:43] "POST /models/prometheus_auto_process_cpu_seconds_total__auto/_train?from=Thu%20Aug%2027%202020%2011%3A10%3A30%20GMT%2B0000&to=Thu%20Aug%2027%202020%2011%3A19%3A53%20GMT%2B0000&output_bucket=loudml&save_output_data=true&flag_abnormal_data=true HTTP/1.1" 202 153 0.026708
INFO:root:train(prometheus_auto_process_cpu_seconds_total__auto) range=2020-08-27T11:00:00.000Z-2020-08-27T11:20:00.000Z train_size=0.670000 batch_size=64 epochs=100)
INFO:root:connecting to prometheus on
INFO:root:found 2 time periods
0%| | 0/10 [00:00<?, ?trial/s, best loss=?]INFO:hyperopt.tpe:build_posterior_wrapper took 0.011455 seconds
INFO:hyperopt.tpe:TPE using 0 trials
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010041 seconds
INFO:hyperopt.tpe:TPE using 1/1 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.013637 seconds
INFO:hyperopt.tpe:TPE using 2/2 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010639 seconds
INFO:hyperopt.tpe:TPE using 3/3 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
40%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Š | 4/10 [00:00<00:00, 31.63trial/s, best loss=?]INFO:hyperopt.tpe:build_posterior_wrapper took 0.009456 seconds
INFO:hyperopt.tpe:TPE using 4/4 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.011114 seconds
INFO:hyperopt.tpe:TPE using 5/5 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.012389 seconds
INFO:hyperopt.tpe:TPE using 6/6 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
70%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� | 7/10 [00:00<00:00, 31.09trial/s, best loss=?]INFO:hyperopt.tpe:build_posterior_wrapper took 0.010510 seconds
INFO:hyperopt.tpe:TPE using 7/7 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.011602 seconds
INFO:hyperopt.tpe:TPE using 8/8 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010743 seconds
INFO:hyperopt.tpe:TPE using 9/9 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 10/10 [00:00<00:00, 29.13trial/s, best loss=?]
ERROR:root:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/worker.py", line 53, in run
res = getattr(self, func_name)(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/worker.py", line 101, in train
**kwargs
File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/donut.py", line 1091, in train
abnormal=abnormal,
File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/donut.py", line 843, in _train_on_dataset
rstate=fmin_state,
File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/fmin.py", line 482, in fmin
show_progressbar=show_progressbar,
File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/base.py", line 686, in fmin
show_progressbar=show_progressbar,
File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/fmin.py", line 516, in fmin
return trials.argmin
File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/base.py", line 622, in argmin
best_trial = self.best_trial
File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/base.py", line 613, in best_trial
raise AllTrialsFailed
hyperopt.exceptions.AllTrialsFailed
ERROR:root:job[17ac0d71-80a8-4f6c-8fba-a887d8bfeb4b] failed:
[2020-08-27 14:39:51] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.018318
[2020-08-27 14:39:51] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.021065
[2020-08-27 14:39:52] "GET /models HTTP/1.1" 200 673 0.026266
[2020-08-27 14:39:52] "GET /jobs HTTP/1.1" 200 432 0.009462
[2020-08-27 14:39:52] "GET /scheduled_jobs HTTP/1.1" 200 110 0.008532
[2020-08-27 14:39:54] "GET /models/prometheus_auto_iratenode_disk_reads_completed_totalinstancenode30m__auto HTTP/1.1" 404 209 0.010784
[2020-08-27 14:40:06] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.027231
[2020-08-27 14:40:06] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.014705

Regards,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train-model not running #409

train-model not running #409

fatihgencali commented Aug 24, 2020

fatihgencali commented Aug 27, 2020

train-model not running #409

train-model not running #409

Comments

fatihgencali commented Aug 24, 2020

fatihgencali commented Aug 27, 2020