Distributed Coach - Horizontal Scale-Out

Coach supports the horizontal scale-out of rollout workers using –distributed_coach or -dc options. Coach usesthree interfaces for horizontal scale-out, which allows for integration with different technologies and flexibility.These three interfaces are orchestrator, memory backend and data store.

  • Orchestrator - The orchestrator interface provides basic interaction points for orchestration, scheduling andresource management of training and rollout workers in the distributed coach mode. The interactions points definehow Coach should deploy, undeploy and monitor the workers spawned by Coach.

  • Memory Backend - This interface is used as the backing store or stream for the memory abstraction indistributed Coach. The implementation of this module is mainly used for communicating experiences (transitionsand episodes) from the rollout to the training worker.

  • Data Store - This interface is used as a backing store for the policy checkpoints. It is mainly used tosynchronizing policy checkpoints from the training to the rollout worker.../_images/horizontal-scale-out.png

Supported Synchronization Types

Synchronization type refers to the mechanism by which the policy checkpoints are synchronized from the training to therollout worker. For each algorithm, it is specified by using the DistributedCoachSynchronizationType as a part ofagent_params.algorithm.distributed_coach_synchronization_type in the preset. In distributed Coach, two types ofsynchronization modes are supported: SYNC and ASYNC.

  • SYNC - In this type, the trainer waits for all the experiences to be gathered from distributed rollout workersbefore training a new policy and the rollout workers wait for a new policy before gathering experiences. It is suitablefor ON policy algorithms.

  • ASYNC - In this type, the trainer doesn’t wait for any set of experiences to be gathered from distributedrollout workers and the rollout workers continously gather experiences loading new policies, whenever they becomeavailable. It is suitable for OFF policy algorithms.