Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign forests over multiple mount points #324

Closed
paul-hoehne opened this issue Mar 16, 2018 · 6 comments
Closed

Assign forests over multiple mount points #324

paul-hoehne opened this issue Mar 16, 2018 · 6 comments
Assignees
Milestone

Comments

@paul-hoehne
Copy link

paul-hoehne commented Mar 16, 2018

Let's assume the following situation. On a cluster of 3 hosts, there are 3 mount-points. For example, Host 1, 2 and 3 (H1, H2, and H3) each have mounts /mldata1, /mldata2, and /mldata3 (M1, M2, and M3). For example, each mount is a separate GP2 filesystem on AWS. The forest assignment policy should multiplex the forests on hosts with the following constraints:

  1. for each host, primary forests should be balanced across the mounts (as much as possible).
  2. replica forests should be balanced across the remaining hosts, such a failover will spread the load across the cluster.
  3. on each host, the replicas should be balanced across the mount points.

For example, H1 has mounts M1, M2, and M3. With two forests per mount point, that should imply M1 has forests F1 and F2, M2 has forests F3 and F4 and M4 should have forests M5 and M6. H2 (which has Forests F7-F12), would also be spread over its three mounts, etc.

The replica for H1, M1, F1 might be on H2, M2. the replicat for H1, M1, F2 might be on H3, M3. The replicat for H1, M2, F3 might be on H1, M1. The replica for H1, M2, F4 might be on H3, M1, etc.

@rjrudin rjrudin added this to the 3.7.0 milestone Mar 17, 2018
@rjrudin
Copy link
Contributor

rjrudin commented Mar 17, 2018

This sounds fun to work on - everything will be done in ml-app-deployer - but adding "help wanted" just in case someone else thinks it'd be interesting too and would like to do it.

@rjrudin
Copy link
Contributor

rjrudin commented Apr 10, 2018

@paul-hoehne How would you want use properties to specify the different data directories to use? This lists all the current properties for forest directories - https://github.com/marklogic-community/ml-gradle/wiki/Property-reference#database-and-forest-properties

I think we'd need a double-delimited string here, e.g.:

mlForestDataDirectory=path1,path2,path3 // This is a single value, becomes a sequence of values
mlDatabaseDataDirectories=database1,path1|path2|path3,database2,path1 // This is a sequence of values, would become a double-delimited "map" of values

@rjrudin
Copy link
Contributor

rjrudin commented Apr 10, 2018

@paul-hoehne Do you envision that the mount points would be the same on each host?

@rjrudin
Copy link
Contributor

rjrudin commented Apr 11, 2018

@paul-hoehne Assuming 3 hosts, with the 3 mount directories you specified; and 2 forests per directory per host (so 18 forests total); and then 2 replicas per primary (so 36 replicas); does this look like the correct list of 54 hosts? I think it fits your requirements.

host1:/mldata1:Documents-1
host2:/mldata2:Documents-1-replica-1
host3:/mldata3:Documents-1-replica-2

host1:/mldata1:Documents-2
host2:/mldata2:Documents-2-replica-1
host3:/mldata3:Documents-2-replica-2

host1:/mldata2:Documents-3
host2:/mldata3:Documents-3-replica-1
host3:/mldata1:Documents-3-replica-2

host1:/mldata2:Documents-4
host2:/mldata3:Documents-4-replica-1
host3:/mldata1:Documents-4-replica-2

host1:/mldata3:Documents-5
host2:/mldata1:Documents-5-replica-1
host3:/mldata2:Documents-5-replica-2

host1:/mldata3:Documents-6
host2:/mldata1:Documents-6-replica-1
host3:/mldata2:Documents-6-replica-2

host2:/mldata1:Documents-7
host3:/mldata2:Documents-7-replica-1
host1:/mldata3:Documents-7-replica-2

host2:/mldata1:Documents-8
host3:/mldata2:Documents-8-replica-1
host1:/mldata3:Documents-8-replica-2

host2:/mldata2:Documents-9
host3:/mldata3:Documents-9-replica-1
host1:/mldata1:Documents-9-replica-2

host2:/mldata2:Documents-10
host3:/mldata3:Documents-10-replica-1
host1:/mldata1:Documents-10-replica-2

host2:/mldata3:Documents-11
host3:/mldata1:Documents-11-replica-1
host1:/mldata2:Documents-11-replica-2

host2:/mldata3:Documents-12
host3:/mldata1:Documents-12-replica-1
host1:/mldata2:Documents-12-replica-2

host3:/mldata1:Documents-13
host1:/mldata2:Documents-13-replica-1
host2:/mldata3:Documents-13-replica-2

host3:/mldata1:Documents-14
host1:/mldata2:Documents-14-replica-1
host2:/mldata3:Documents-14-replica-2

host3:/mldata2:Documents-15
host1:/mldata3:Documents-15-replica-1
host2:/mldata1:Documents-15-replica-2

host3:/mldata2:Documents-16
host1:/mldata3:Documents-16-replica-1
host2:/mldata1:Documents-16-replica-2

host3:/mldata3:Documents-17
host1:/mldata1:Documents-17-replica-1
host2:/mldata2:Documents-17-replica-2

host3:/mldata3:Documents-18
host1:/mldata1:Documents-18-replica-1
host2:/mldata2:Documents-18-replica-2

@paul-hoehne
Copy link
Author

+1

@rjrudin rjrudin self-assigned this May 23, 2018
@rjrudin
Copy link
Contributor

rjrudin commented May 30, 2018

This will be supported by the mlDatabaseDataDirectories property, which now supports multiple data directories per host.

@rjrudin rjrudin closed this as completed May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants