`ddpg`¶

class rl.agents.ddpg.DDPGAgent(actor, critic, memory, gamma=0.99, batch_size=32, train_interval=1, memory_interval=1, critic_gradient_clip=100, random_process=None, custom_model_objects=None, warmup_actor_steps=200, warmup_critic_steps=200, invert_gradients=False, gradient_inverter_min=-1.0, gradient_inverter_max=1.0, actor_reset_threshold=0.3, reset_controlers=False, actor_learning_rate=0.001, critic_learning_rate=0.0001, target_critic_update=0.01, target_actor_update=0.01, critic_regularization=0.01, **kwargs)[source]¶

Deep Deterministic Policy Gradient Agent as defined in https://arxiv.org/abs/1509.02971.

Parameters:

actor (keras.model) – The actor network
critic (keras.model) – The critic network
env (gym.env) – The gym environment
memory (rl.memory.Memory) – The memory object
gamma (float) – Discount factor
batch_size (int) – Size of the minibatches
train_interval (int) – Train only at multiples of this number
memory_interval (int) – Add experiences to memory only at multiples of this number
critic_gradient_clip – Delta to which the rewards are clipped (via Huber loss, see https://github.com/devsisters/DQN-tensorflow/issues/16)
random_process – The noise used to perform exploration
custom_model_objects –
target_critic_update (float) – Target critic update factor
target_actor_update (float) – Target actor update factor
invert_gradients (bool) – Use gradient inverting as defined in https://arxiv.org/abs/1511.04143

backward()[source]¶: Backward method of the DDPG agent

backward_offline(train_actor=True, train_critic=True)[source]¶

Offline Backward method of the DDPG agent

Parameters:	offline (bool) – Add the new experiences to memory train_actor (bool) – Activate of Deactivate training of the actor train_critic (bool) – Activate of Deactivate training of the critic

checkpoint()[source]¶: Save the weights

load_memory(memory)[source]¶: Loads the given memory as the replay buffer

restore_checkpoint(actor=True, critic=True, checkpoint_id=0)[source]¶: Restore from checkpoint

save(name='DDPG')[source]¶: Save the model as an HDF5 file

train_actor(batch, sgd_iterations=1, can_reset_actor=False)[source]¶: Fit the actor network

train_controllers(train_critic=True, train_actor=True, can_reset_actor=False, hard_update_target_critic=False, hard_update_target_actor=False)[source]¶

Fit the actor and critic networks

Parameters:	train_critic (bool) – Whether to fit the critic train_actor (bool) – Whether to fit the actor can_reset_actor (bool) –

train_critic(batch, sgd_iterations=1)[source]¶: Fit the critic network

ddpg¶

`ddpg`¶