Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train-model not running #409

Open
fatihgencali opened this issue Aug 24, 2020 · 1 comment
Open

train-model not running #409

fatihgencali opened this issue Aug 24, 2020 · 1 comment

Comments

@fatihgencali
Copy link

Hi,

train-model was not running .
I did not use docker , i installed direct.

There is no detailed log for this problem.
I use loudml on grafana.

loudml

train-model --from now-30d --to now prometheus_auto_process_cpu_seconds_total__auto
training(prometheus_auto_process_cpu_seconds_total__auto): 0%| | 0/10 [00:00<?, ?it/s]

loudmld[11689]: ERROR:root:job[c984343e-2260-466d-b4e3-504a5eb2a2be] failed:

buckets:

  • name: prometheus
    type: prometheus
    addr: 0.0.0.0:9090
    retention_policy: autogen
  • name: loudml
    type: influxdb
    addr: 0.0.0.0:8086
    database: loudml
    retention_policy: autogen
    measurement: loudml
    annotation_db: loudmlannotations
    storage:
    path: /var/lib/loudml
    server:
    listen: 0.0.0.0:8077

How to solve this ?

Thanks

@fatihgencali
Copy link
Author

Hi,

I pass previous error. Tensorflow==1.15 must be.
But now following error exist.How to solve this?

"WARNING:root:iteration failed: insufficient training data"

 

[2020-08-27 14:39:26] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.022457
[2020-08-27 14:39:27] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.032339
[2020-08-27 14:39:35] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.021654
[2020-08-27 14:39:35] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.021841
[2020-08-27 14:39:39] "GET /models/prometheus_auto_iratenode_disk_reads_completed_totalinstancenode30m__auto HTTP/1.1" 404 209 0.015889
INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: 2020-08-27 14:38:42, next run: 2020-08-27 14:39:42)
[2020-08-27 14:39:43] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.019205
INFO:root:job[17ac0d71-80a8-4f6c-8fba-a887d8bfeb4b] starting, nice=5
[2020-08-27 14:39:43] "POST /models/prometheus_auto_process_cpu_seconds_total__auto/_train?from=Thu%20Aug%2027%202020%2011%3A10%3A30%20GMT%2B0000&to=Thu%20Aug%2027%202020%2011%3A19%3A53%20GMT%2B0000&output_bucket=loudml&save_output_data=true&flag_abnormal_data=true HTTP/1.1" 202 153 0.026708
INFO:root:train(prometheus_auto_process_cpu_seconds_total__auto) range=2020-08-27T11:00:00.000Z-2020-08-27T11:20:00.000Z train_size=0.670000 batch_size=64 epochs=100)
INFO:root:connecting to prometheus on
INFO:root:found 2 time periods
  0%|                                                                                                                        | 0/10 [00:00<?, ?trial/s, best loss=?]INFO:hyperopt.tpe:build_posterior_wrapper took 0.011455 seconds
INFO:hyperopt.tpe:TPE using 0 trials
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010041 seconds
INFO:hyperopt.tpe:TPE using 1/1 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.013637 seconds
INFO:hyperopt.tpe:TPE using 2/2 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010639 seconds
INFO:hyperopt.tpe:TPE using 3/3 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
 40%|████████████████████████████████████████████▊                                                                   | 4/10 [00:00<00:00, 31.63trial/s, best loss=?]INFO:hyperopt.tpe:build_posterior_wrapper took 0.009456 seconds
INFO:hyperopt.tpe:TPE using 4/4 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.011114 seconds
INFO:hyperopt.tpe:TPE using 5/5 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.012389 seconds
INFO:hyperopt.tpe:TPE using 6/6 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
 70%|██████████████████████████████████████████████████████████████████████████████â–�                                 | 7/10 [00:00<00:00, 31.09trial/s, best loss=?]INFO:hyperopt.tpe:build_posterior_wrapper took 0.010510 seconds
INFO:hyperopt.tpe:TPE using 7/7 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.011602 seconds
INFO:hyperopt.tpe:TPE using 8/8 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010743 seconds
INFO:hyperopt.tpe:TPE using 9/9 trials with best loss inf
WARNING:root:iteration failed: insufficient training data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████â–100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 29.13trial/s, best loss=?]
ERROR:root:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/worker.py", line 53, in run
    res = getattr(self, func_name)(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/worker.py", line 101, in train
    **kwargs
  File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/donut.py", line 1091, in train
    abnormal=abnormal,
  File "/usr/local/lib/python3.6/site-packages/loudml-1.6.0-py3.6.egg/loudml/donut.py", line 843, in _train_on_dataset
    rstate=fmin_state,
  File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/fmin.py", line 482, in fmin
    show_progressbar=show_progressbar,
  File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/base.py", line 686, in fmin
    show_progressbar=show_progressbar,
  File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/fmin.py", line 516, in fmin
    return trials.argmin
  File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/base.py", line 622, in argmin
    best_trial = self.best_trial
  File "/usr/local/lib/python3.6/site-packages/hyperopt-0.2.4-py3.6.egg/hyperopt/base.py", line 613, in best_trial
    raise AllTrialsFailed
hyperopt.exceptions.AllTrialsFailed
ERROR:root:job[17ac0d71-80a8-4f6c-8fba-a887d8bfeb4b] failed:
[2020-08-27 14:39:51] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.018318
[2020-08-27 14:39:51] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.021065
[2020-08-27 14:39:52] "GET /models HTTP/1.1" 200 673 0.026266
[2020-08-27 14:39:52] "GET /jobs HTTP/1.1" 200 432 0.009462
[2020-08-27 14:39:52] "GET /scheduled_jobs HTTP/1.1" 200 110 0.008532
[2020-08-27 14:39:54] "GET /models/prometheus_auto_iratenode_disk_reads_completed_totalinstancenode30m__auto HTTP/1.1" 404 209 0.010784
[2020-08-27 14:40:06] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.027231
[2020-08-27 14:40:06] "GET /models/prometheus_auto_process_cpu_seconds_total__auto HTTP/1.1" 200 673 0.014705

Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant