# A data-driven policy iteration scheme based on linear programming

G. Banjac and J. Lygeros

in IEEE Conference on Decision and Control (CDC), Nice, France, December 2019.
BibTeX  URL  Preprint

@inproceedings{BL:2019a,
author = {G. Banjac and J. Lygeros},
title = {A data-driven policy iteration scheme based on linear programming},
booktitle = {IEEE Conference on Decision and Control (CDC)},
year = {2019},
url = {https://doi.org/10.1109/CDC40024.2019.9029405},
doi = {10.1109/CDC40024.2019.9029405}
}


We consider the problem of learning discounted-cost optimal control policies for unknown deterministic discrete-time systems with continuous state and action spaces. We show that a policy evaluation step of the well-known policy iteration (PI) algorithm can be characterized as a solution to an infinite dimensional linear program (LP). However, when approximating such an LP with a finite dimensional program, the PI algorithm loses its nominal properties. We propose a data-driven PI scheme that ensures a certain monotonic behavior and allows for incorporation of expert knowledge on the system. A numerical example illustrates effectiveness of the proposed algorithm.