note: Relevant code for this post is ddpg-ant repo

Intro

So a couple of months ago I had an idea to try and test different types of exploration noise in ddpg environments. The original ddpg paper suggests using an Ornstein–Uhlenbeck process for exploration. I’d read elsewhere that this wasn’t actually needed and Gaussian noise would suffice. I wanted to see if using smoother noise helped the critic to learn in a more stable manner. Spoiler alert: It probably doesn’t. Note that this is a purely exploratory effort though and not an exhaustive study.

I compared four noise schemes:

Gaussian Ornstein–Uhlenbeck
Gaussian Ornstein–Uhlenbeck
Linear segment noise Smooth segment noise
Gaussian Ornstein–Uhlenbeck

Linear segment noise: Selects an ordered set of normally distributed points in the action space and then moves from one to the next in constant increments.

Smooth segment noise: Selects an ordered set of normally distributed points in the action space and uses cubic interpolation to generate a function that moves smoothly between each point.

Setup:

I only considered the AntBulletEnv-v0 environment in pybullet. I ran two experiments each involving 100 training runs in total. Of those training runs 25 of each was allocated to each noise type. Each training run went for 1000 episodes and for each episode the number of steps was 300. The model being trained had two hidden layers of size 400 and 300. The model architectures are detailed in code here.

The rest of the ddpg parameters are as follows:

LAYERS_DIMS   : [400, 300]
TAU           : 0.001
SIGMA         : 3.0
THETA         : 4.0
BUFFER_SIZE   : 100000
BATCH_SIZE    : 64
DISCOUNT      : 0.99

Finally the difference between each experiment was in the learning rates:

Experiment 1 used:

ACTOR_LR      : 5e-05
CRITIC_LR     : 0.0005

Experiment 2 increased the learning rate by a factor of 10:

ACTOR_LR      : 0.0005
CRITIC_LR     : 0.005

Results:

In each case I’m really looking for evidence of one noise process working better than the others. Our sample sizes are tiny because it takes ages and costs to run this stuff remotely. Because our sample sizes are so small I’d really need to see one noise process perform significantly better than the others.

Note that the AntBulletEnv-v0 environment should run for a lot longer that 300 steps, namely to 1000. Because of this we’re not going to get anywhere close to the 2500.0 required to solve the environment.

Experiment 1:

The best results achieved for each noise category where 580.0 for the smooth segment noise, 481.12 for the Ornstein–Uhlenbeck process, 470.83 for linear segment noise and 381.04 for the Gaussian noise.

Reward density plot

The following indicate the density of reward over all 25 runs at each time step and for each noise process.

Reward Graphs Experiment 1

Outcome histogram

The histogram of the 25 end rewards for each noise process:

Outcome reward histograms Experiment 1

Experiment 2:

The best results achieved for each noise category where 574.47 for the smooth segment noise, 511.55 for the Ornstein–Uhlenbeck process, 595.36 for linear segment noise and 548.44 for the Gaussian noise.

Reward density plot

Reward Graphs Experiment 2

Outcome histogram

Outcome reward histograms Experiment 2

Conclusion and issues:

While I think you might be able to make the case that in experiment 1 the smoother noise processes led to slightly better performance I think the small sample size means it’s a pretty weak case. Over all I don’t think there are significant enough differences between the learning to say anything particularly profound.

While I tried to keep all the variables we weren’t interested in constant between runs, it wasn’t clear to me how to normalise the variance of each noise processes relative to each other. We’d consider Gaussian noise with different variances as different and so expect different results. Whereas it’s hard to say in what way the smooth noise process is the same or different to Gaussian for instance because they’re described by different parameters. I wasn’t really sure how to account for this so in the end I just ensured each noise fell within the same range of [-0.02, 0.02] in each dimension.

Best solutions:

The best solutions came from the second experiment, in fact in the first experiment both the Gaussian and Ornstein–Uhlenbeck processes basically failed to even get the ant to walk anything further than a couple of steps. Here are the best of each noise category for the second experiment:

Smooth segment noise Linear segment noise
Gaussian noise Ornstein–Uhlenbeck