Document Type

Conference Proceeding

Publication Date

7-2020

Abstract

This paper presents Noisy Importance Sampling Actor-Critic (NISAC), a set of empirically validated modifications to the advantage actor-critic algorithm (A2C), allowing off-policy reinforcement learning and increased performance. NISAC uses additive action space noise, aggressive truncation of importance sample weights, and large batch sizes. We see that additive noise drastically changes how off-sample experience is weighted for policy updates. The modified algorithm achieves an increase in convergence speed and sample efficiency compared to both the on-policy actor-critic A2C and the importance weighted off-policy actor-critic algorithm. In comparison to state-of-the-art (SOTA) methods, such as actor-critic with experience replay (ACER), NISAC nears the performance on several of the tested environments while training 40% faster and being significantly easier to implement. The effectiveness of NISAC is demonstrated against existing on-policy and off-policy actor-critic algorithms on a subset of the Atari domain.

Download

Included in

Computer Engineering Commons, Electrical and Computer Engineering Commons

COinS

Electrical and Computer Engineering Publications

Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay

Document Type

Publication Date

Abstract

Included in

Links

Browse

Author Corner

Electrical and Computer Engineering Publications

Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay

Authors

Document Type

Publication Date

Abstract

Included in

Share

Links

Browse

Author Corner