Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] concurrent experiment: database is locked #715

Closed
ultranity opened this issue Sep 29, 2024 · 4 comments
Closed

[BUG] concurrent experiment: database is locked #715

ultranity opened this issue Sep 29, 2024 · 4 comments
Assignees
Labels
🐛 bug Something isn't working

Comments

@ultranity
Copy link

🐛 Bug description

有多组并行实验提交时,出现 sqlite并行读写报错

    swanlab.log(metrics, step)
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanlab/data/sdk.py", line 184, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanlab/data/sdk.py", line 208, in log
    ll = run.log(data, step)
         ^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanlab/data/run/main.py", line 325, in log
    metric_info = self.__exp.add(key=k, data=v, step=step)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanlab/data/run/exp.py", line 76, in add
    self.__operator.on_column_create(column_info)
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanlab/data/run/helper.py", line 106, in on_column_create
    return self.__run_all("on_column_create", column_info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanlab/data/run/helper.py", line 54, in __run_all
    return {name: getattr(callback, method)(*args, **kwargs) for name, callback in self.callbacks.items()}
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanboard/callback.py", line 87, in on_column_create
    n = Namespace.create(name=namespace, experiment_id=self.exp.id, sort=column_info.sort)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanboard/db/models/namespaces.py", line 155, in create
    return super().create(
           ^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanboard/db/model.py", line 119, in create
    return super().create(**query)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 6741, in create
    inst.save(force_insert=True)
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/swanboard/db/model.py", line 103, in save
    super().save(*args, **kwargs)
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 6951, in save
    pk = self.insert(**field_dict).execute()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 2036, in inner
    return method(self, database, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 2107, in execute
    return self._execute(database)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 2912, in _execute
    return super(Insert, self)._execute(database)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 2625, in _execute
    cursor = database.execute(self)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 3330, in execute
    return self.execute_sql(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 3320, in execute_sql
    with __exception_wrapper__:
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 3088, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 196, in reraise
    raise value.with_traceback(tb)
  File "/public/opt/conda/envs/py312/lib/python3.12/site-packages/peewee.py", line 3322, in execute_sql
    cursor.execute(sql, params or ())
peewee.OperationalError: database is locked

🧑‍💻 Step to reproduce

快速复现:用 process pool executor 模拟并发状态

import swanlab
import json
import os
import yaml
import tqdm
from tqdm.contrib.concurrent import process_map
def  load_metrics(folder:str):
    cfg = yaml.load(open(f"{folder}/config.yaml"), Loader=yaml.FullLoader)
    swanlab.init(
        project="dinov2",
        logdir='./logs',
        mode="local",
        experiment_name=os.path.basename(folder),
        config=cfg)
    with open(f"{folder}/training_metrics.json") as f:
        #each line contains a dict
        lines = f.readlines()
        #parse each line
        for line in tqdm.tqdm(lines):
            metrics=eval(line)
            step = metrics.pop("iteration")
            swanlab.log(metrics, step)

if __name__ == "__main__":
    process_map(load_metrics, [f"outputs/{x}" for x in os.listdir("outputs")])

👾 Expected result

处理sqlite并行读写问题,wait for lock release

@ultranity ultranity added the 🐛 bug Something isn't working label Sep 29, 2024
@SAKURA-CAT
Copy link
Contributor

此问题我也发现了,正在处理中!

@SAKURA-CAT SAKURA-CAT self-assigned this Oct 4, 2024
@SAKURA-CAT
Copy link
Contributor

SAKURA-CAT commented Oct 4, 2024

解决方案是在数据库连接时开启WAL模式,虽然这会增加数据库体积,但是在可接受范围内。

This was referenced Oct 4, 2024
@SAKURA-CAT
Copy link
Contributor

@ultranity 您好,这个问题发生在swanlab的子包swanboard,您可以升级版本看一下呢:

pip install swanboard==0.1.4b2

安装完后会出现版本不匹配的警告,您忽略就可以,新的swanlab包将在这几天发布😊

@SAKURA-CAT
Copy link
Contributor

fixed by #716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants