MARL Scheduling Environment

Getting started

To train agents in the domain, run an arbitrary train*.py script in the src folder. You can customize the parameters explained below.

Environment parameters

Parameter	type	Description
numberOfCores	Int	amount of compute cores in the domain
numberOfAgents	Int	amount of RL agents in the domain
collectionLength	Int	amount of associated job slots of each agent
possibleJobPriorities	[Int]	job priority for each job type. list index is the job type
possibleJobLengths	[Int]	job length for each job type. list index is the job type
fixPricesList	[Int]	fix price for each job type. list index is the job type
probabilities	[Float]	spawn probability for each job type. list index is the job type. Its sum must equal 1.
newJobsPerRoundPerAgent	Int	if one agent has capacity for new jobs in his job collection: how many new jobs are generated at the beginning of each round
freePrices	Bool	specifies if free prices are used
commercialFreePriceReward	Bool	specifies in case of free prices if the commercial or non-commercial reward function is used
dividedAgents	Bool	specifies if all the agents are of the distributed architecture. The difference in the name is due to a later name change.
aggregatedAgents	Bool	specifies if all the agents are of the semi-aggregated architecture
fullyAggregatedAgents	Bool	specifies if all the agents are of the fully aggregated architecture
locallySharedParameters	Bool	specifies if all the agents are of the distributed architecture with agent-wise parameter sharing
globallySharedParameters	Bool	specifies if all the agents in the domain are of the distributed architecture with global parameter sharing, i.e. all the agents share one neural net

PPO hyperparameters

Parameter	type	Description
IS_PPO	Bool	specifies if PPO is used
LR_ACTOR	Float	initial learning rate of the actor network
LR_CRITIC	Float	initial learning rate of the critic network
ACCEPTOR_GAMMA	Float	the discount factor of future rewards for the acceptor unit
OFFER_GAMMA	Float	the discount factor of future rewards for the offer unit
RAW_K_EPOCHS	Int	used to determine the number of iterations with which the acceptor and offer units' memories are used to optimize their neural networks
ACCEPTOR_K_EPOCHS OFFER_K_EPOCHS	Int	not specified by the user but derived from RAW_K_EPOCHS
CENTRALISATION_SAMPLE	Int	specifies, for parameter sharing, how many randomly selected subunit memories are included for a training run
EPS_CLIP	Float	specifies the value of the clipping parameter needed for PPO
UPDATE_STEP	Int	specifies after how many time steps the neural networks are trained with the transitions experienced during this period
NUM_NEURONS	Int	specifies the amount of neurons per hidden layer of one neural network

Parameters of the experiments

The section 'The effect of intra-agent trading' uses the same hyperparameters as the section 'Agent architecture and scheduling performance'.

Section: Agent architecture and scheduling performance

Parameter	2 agents	4 agents
freePrices	False	False
num_episodes	6000	6000
episodeLength	100	100
numberOfAgents	2	4
numberOfCores	2	4
newJobsPerRoundPerAgent	1	1
collectionLength	3	3
possibleJobPriorities	[3,10]	[3,10]
possibleJobLengths	[6,3]	[6,3]
fixPricesList	[2,7]	[2,7]
probabilities	[0.8,0.2]	[0.8,0.2]

Parameter	distributed	semi-aggregated	fully aggregated	distributed + local parameter sharing (2 agents)	distributed + local parameter sharing (4 agents)
LR_ACTOR	0.003	0.003	0.003	0.003	0.003
LR_CRITIC	0.01	0.01	0.01	0.01	0.01
ACCEPTOR_GAMMA	0.8733	0.8733	0.8733	0.8733	0.8733
OFFER_GAMMA	0.5	0.5	0.5	0.5	0.5
RAW_K_EPOCHS	3	3	3	3	3
ACCEPTOR_K_EPOCHS	3	3	3	2	1
OFFER_K_EPOCHS	3	3	3	1	1
EPS_CLIP	0.2	0.2	0.2	0.2	0.2
UPDATE_STEP	200	200	200	200	200
NUM_NEURONS	16	32	64	16	16
CENTRALISATION_SAMPLE	/	/	/	2	2

Section: Price level and scarcity

Parameter	2 cores	4 cores
freePrices	True	True
num_episodes	4000	4000
episodeLength	100	100
numberOfAgents	2	2
numberOfCores	2	4
newJobsPerRoundPerAgent	1	1
collectionLength	3	3
possibleJobPriorities	[5]	[5]
possibleJobLengths	[5]	[5]
probabilities	[1]	[1]

Parameter	distributed with price setter network
LR_ACTOR	0.003
LR_CRITIC	0.01
ACCEPTOR_GAMMA	0.95
OFFER_GAMMA	0.5
RAW_K_EPOCHS	2
ACCEPTOR_K_EPOCHS	2
OFFER_K_EPOCHS	2
EPS_CLIP	0.2
UPDATE_STEP	200
NUM_NEURONS	16

Section: Price level and scheduling

Parameter	value
freePrices	True
num_episodes	4000
episodeLength	100
numberOfAgents	2
numberOfCores	3
newJobsPerRoundPerAgent	1
collectionLength	3
possibleJobPriorities	[2,4,8]
possibleJobLengths	[5,5,5]
probabilities	[(1/3),(1/3),(1/3)]

Parameter	distributed with price setter network
LR_ACTOR	0.003
LR_CRITIC	0.01
ACCEPTOR_GAMMA	0.95
OFFER_GAMMA	0.5
RAW_K_EPOCHS	2
ACCEPTOR_K_EPOCHS	2
OFFER_K_EPOCHS	2
EPS_CLIP	0.2
UPDATE_STEP	200
NUM_NEURONS	16

Used code templates

The implementation of the reinforcement learning algorithms was based on freely available code templates, which should not go unmentioned: The implementation of the PPO algorithm was originally taken from PPO and adapted to the requirements of the project. The implementation of the DQN algorithm was also based on a public repository template and adapted to the project.

Loop

The image below gives an overview about the process in the scheduling environment.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
img		img
src		src
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARL Scheduling Environment

Getting started

Environment parameters

PPO hyperparameters

Parameters of the experiments

Section: Agent architecture and scheduling performance

Section: Price level and scarcity

Section: Price level and scheduling

Used code templates

Loop

About

Releases

Packages

Contributors 2

Languages

License

lr40/marl-scheduling

Folders and files

Latest commit

History

Repository files navigation

MARL Scheduling Environment

Getting started

Environment parameters

PPO hyperparameters

Parameters of the experiments

Section: Agent architecture and scheduling performance

Section: Price level and scarcity

Section: Price level and scheduling

Used code templates

Loop

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages