Electrical and Computer Engineering Publications
Document Type
Conference Proceeding
Publication Date
7-2021
Abstract
We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize information before acting. DPN learns to efficiently form plans by expanding a single action conditional state transition at a time instead of exhaustively evaluating each action, reducing the number of state-transitions used during planning. We observe emergent planning patterns in our agent, including classical search methods such as breadth-first and depth-first search. DPN shows improved performance over existing baselines across multiple axes.