Skip to content
This repository was archived by the owner on Oct 29, 2023. It is now read-only.

Catalog operations can clash with multiple dbdeployer runs #72

Closed
datacharmer opened this issue Mar 16, 2019 · 2 comments
Closed

Catalog operations can clash with multiple dbdeployer runs #72

datacharmer opened this issue Mar 16, 2019 · 2 comments
Assignees
Labels

Comments

@datacharmer
Copy link
Owner

datacharmer commented Mar 16, 2019

When running several instances of dbdeployer in parallel, sometimes there is a race condition while updating the catalog file.

dbdeployer deals well with concurrent operations inside the program. Multiple sandbox deployments are handled correctly. The problem occurs when several instances of dbdeployer are trying to update the catalog at the same time. Then, it is possible that one of the operations can be overwritten by a concurrent one.

To Reproduce

#!/bin/bash

dbdeployer deploy multiple 5.0 --sandbox-directory=first --base-port=5100  > /tmp/multi1.txt &
dbdeployer deploy multiple 5.0 --sandbox-directory=second --base-port=5200 > /tmp/multi2.txt &
dbdeployer deploy multiple 5.0 --sandbox-directory=third --base-port=5300 > /tmp/multi3.txt &
dbdeployer deploy single 5.0 > /tmp/single.txt &
dbdeployer deploy replication 5.0 > /tmp/replication.txt &

echo running
wait
echo deployed:

dbdeployer sandboxes --header
echo ""

echo "catalog"
dbdeployer sandboxes --catalog --header


sandboxes=$(dbdeployer sandboxes | wc -l)
catalog_entries=$(dbdeployer sandboxes --catalog | wc -l)

if [ "$sandboxes" == "$catalog_entries" ]
then
    echo "Found $catalog_entries entries in catalog as expected"
    dbdeployer delete all --concurrent --skip-confirm
else
    echo "Found $catalog_entries entries in catalog, but expected $sandboxes"
    exit 1
fi

Expected behavior
The catalog should have the same number of sandboxes as the ones found on disk.
Sometimes, rarely, but noticeably, the catalog has one sandbox less than expected.

Sample successful run:

running
deployed:
            name                  type       version          ports
---------------------------- -------------- --------- ----------------------
 first                    :   multiple       5.0.96    [5101 5102 5103 ]
 msb_5_0_96               :   single         5.0.96    [5096 ]
 rsandbox_5_0_96          :   master-slave   5.0.96    [25697 25698 25699 ]
 second                   :   multiple       5.0.96    [5201 5202 5203 ]
 third                    :   multiple       5.0.96    [5301 5302 5303 ]

catalog
      name         version       type       nodes          ports
----------------- --------- -------------- ------- ----------------------
 third             5.0.96    multiple       3       [5301 5302 5303 ]
 first             5.0.96    multiple       3       [5101 5102 5103 ]
 msb_5_0_96        5.0.96    single         0       [5096 ]
 rsandbox_5_0_96   5.0.96    master-slave   3       [25697 25698 25699 ]
 second            5.0.96    multiple       3       [5201 5202 5203 ]
Found        5 entries in catalog as expected
List of deployed sandboxes:
/Users/gmax/sandboxes/first
/Users/gmax/sandboxes/msb_5_0_96
/Users/gmax/sandboxes/rsandbox_5_0_96
/Users/gmax/sandboxes/second
/Users/gmax/sandboxes/third

Sample failure:

running
deployed:
            name                  type       version          ports
---------------------------- -------------- --------- ----------------------
 first                    :   multiple       5.0.96    [5101 5102 5103 ]
 msb_5_0_96               :   single         5.0.96    [5096 ]
 rsandbox_5_0_96          :   master-slave   5.0.96    [25697 25698 25699 ]
 second                   :   multiple       5.0.96    [5201 5202 5203 ]
 third                    :   multiple       5.0.96    [5301 5302 5303 ]

catalog
  name    version     type     nodes         ports
-------- --------- ---------- ------- -------------------
 second   5.0.96    multiple   3       [5201 5202 5203 ]
 third    5.0.96    multiple   3       [5301 5302 5303 ]
Found        2 entries in catalog, but expected        5

Possible solutions
Introducing a file-base locking mechanism that should prevent concurrent dbdeployer instances from overwriting competing operations.

@datacharmer datacharmer self-assigned this Mar 16, 2019
@datacharmer
Copy link
Owner Author

datacharmer commented May 5, 2019

Here's a better test (requires version 1.30.0 and GNU parallel)

#!/bin/bash

function parallel_catalog {

    versions=$(dbdeployer info version 5.0 all)
    num_versions=$(echo "$versions" | wc -w)
    if [[ $num_versions -lt 2 ]]
    then
        echo "Not enough concurrency"
        echo "Change the definition of \$versions and try again"
        exit 1
    fi
    echo "# $versions"
    parallel --shellquote dbdeployer deploy {1} {2} ::: single multiple replication ::: $versions
    parallel dbdeployer deploy {1} {2} ::: single multiple replication ::: $versions > /tmp/parallel.txt
    exit_code=$?
    if [ "$exit_code" != "0" ]
    then
        cat /tmp/parallel.txt
        return
    fi

    echo deployed:

    dbdeployer sandboxes --header
    echo ""

    echo "catalog"
    dbdeployer sandboxes --catalog --header

    sandboxes=$(dbdeployer sandboxes | wc -l)
    catalog_entries=$(dbdeployer sandboxes --catalog | wc -l)

    if [ "$sandboxes" == "$catalog_entries" ]
    then
        echo "------------------------------------------------------"
        echo "Found $catalog_entries entries in catalog as expected"
        echo "------------------------------------------------------"
        echo ""
        dbdeployer delete all --concurrent --skip-confirm
        exit_code=$?
    else
        echo "###################################################################"
        echo "Found $catalog_entries entries in catalog, but expected $sandboxes"
        echo "###################################################################"
        exit_code=1
    fi
}


iterations=$1
if [ -z "$iterations" ]
then
    iterations=10
fi

for N in $(seq 1 $iterations)
do
    echo ""
    echo "# iteration $N of $iterations"
    echo ""
    parallel_catalog
    if [ "$exit_code" != "0" ]
    then
        exit
    fi
done

datacharmer added a commit that referenced this issue May 5, 2019
Fix Issue #72: Catalog operations can clash with multiple dbdeployer runs
Replace in-memory mutex with file-based lock (using
github.com/nightlyone/lockfile).
@datacharmer
Copy link
Owner Author

Fixed in 1.30.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant