ddpg
¶
-
class
rl.agents.ddpg.
DDPGAgent
(actor, critic, memory, gamma=0.99, batch_size=32, train_interval=1, memory_interval=1, critic_gradient_clip=100, random_process=None, custom_model_objects=None, warmup_actor_steps=200, warmup_critic_steps=200, invert_gradients=False, gradient_inverter_min=-1.0, gradient_inverter_max=1.0, actor_reset_threshold=0.3, reset_controlers=False, actor_learning_rate=0.001, critic_learning_rate=0.0001, target_critic_update=0.01, target_actor_update=0.01, critic_regularization=0.01, **kwargs)[source]¶ Deep Deterministic Policy Gradient Agent as defined in https://arxiv.org/abs/1509.02971.
Parameters: - actor (keras.model) – The actor network
- critic (keras.model) – The critic network
- env (gym.env) – The gym environment
- memory (
rl.memory.Memory
) – The memory object - gamma (float) – Discount factor
- batch_size (int) – Size of the minibatches
- train_interval (int) – Train only at multiples of this number
- memory_interval (int) – Add experiences to memory only at multiples of this number
- critic_gradient_clip – Delta to which the rewards are clipped (via Huber loss, see https://github.com/devsisters/DQN-tensorflow/issues/16)
- random_process – The noise used to perform exploration
- custom_model_objects –
- target_critic_update (float) – Target critic update factor
- target_actor_update (float) – Target actor update factor
- invert_gradients (bool) – Use gradient inverting as defined in https://arxiv.org/abs/1511.04143
-
backward_offline
(train_actor=True, train_critic=True)[source]¶ Offline Backward method of the DDPG agent
Parameters: