ddpg

class rl.agents.ddpg.DDPGAgent(actor, critic, memory, gamma=0.99, batch_size=32, train_interval=1, memory_interval=1, critic_gradient_clip=100, random_process=None, custom_model_objects=None, warmup_actor_steps=200, warmup_critic_steps=200, invert_gradients=False, gradient_inverter_min=-1.0, gradient_inverter_max=1.0, actor_reset_threshold=0.3, reset_controlers=False, actor_learning_rate=0.001, critic_learning_rate=0.0001, target_critic_update=0.01, target_actor_update=0.01, critic_regularization=0.01, **kwargs)[source]

Deep Deterministic Policy Gradient Agent as defined in https://arxiv.org/abs/1509.02971.

Parameters:
  • actor (keras.model) – The actor network
  • critic (keras.model) – The critic network
  • env (gym.env) – The gym environment
  • memory (rl.memory.Memory) – The memory object
  • gamma (float) – Discount factor
  • batch_size (int) – Size of the minibatches
  • train_interval (int) – Train only at multiples of this number
  • memory_interval (int) – Add experiences to memory only at multiples of this number
  • critic_gradient_clip – Delta to which the rewards are clipped (via Huber loss, see https://github.com/devsisters/DQN-tensorflow/issues/16)
  • random_process – The noise used to perform exploration
  • custom_model_objects
  • target_critic_update (float) – Target critic update factor
  • target_actor_update (float) – Target actor update factor
  • invert_gradients (bool) – Use gradient inverting as defined in https://arxiv.org/abs/1511.04143
backward()[source]

Backward method of the DDPG agent

backward_offline(train_actor=True, train_critic=True)[source]

Offline Backward method of the DDPG agent

Parameters:
  • offline (bool) – Add the new experiences to memory
  • train_actor (bool) – Activate of Deactivate training of the actor
  • train_critic (bool) – Activate of Deactivate training of the critic
checkpoint()[source]

Save the weights

load_memory(memory)[source]

Loads the given memory as the replay buffer

restore_checkpoint(actor=True, critic=True, checkpoint_id=0)[source]

Restore from checkpoint

save(name='DDPG')[source]

Save the model as an HDF5 file

train_actor(batch, sgd_iterations=1, can_reset_actor=False)[source]

Fit the actor network

train_controllers(train_critic=True, train_actor=True, can_reset_actor=False, hard_update_target_critic=False, hard_update_target_actor=False)[source]

Fit the actor and critic networks

Parameters:
  • train_critic (bool) – Whether to fit the critic
  • train_actor (bool) – Whether to fit the actor
  • can_reset_actor (bool) –
train_critic(batch, sgd_iterations=1)[source]

Fit the critic network