Online Partial Conditional Plan Synthesis for POMDPs With Safe-Reachability Objectives: Methods and Experiments

Y. Wang, A. A. R. Newaz, J. D. Hernández, S. Chaudhuri, and L. E. Kavraki, “Online Partial Conditional Plan Synthesis for POMDPs With Safe-Reachability Objectives: Methods and Experiments,” IEEE Transactions on Automation Science and Engineering, vol. 18, pp. 932–945, Jul. 2021.


The framework of partially observable Markov decision processes (POMDPs) offers a standard approach to model uncertainty in many robot tasks. Traditionally, POMDPs are formulated with optimality objectives. In this article, we study a different formulation of POMDPs with Boolean objectives. For robotic domains that require a correctness guarantee of accomplishing tasks, Boolean objectives are natural formulations. We investigate the problem of POMDPs with a common Boolean objective: safe reachability, requiring that the robot eventually reaches a goal state with a probability above a threshold while keeping the probability of visiting unsafe states below a different threshold. Our approach builds upon the previous work that represents POMDPs with Boolean objectives using symbolic constraints. We employ a satisfiability modulo theories (SMTs) solver to efficiently search for solutions, i.e., policies or conditional plans that specify the action to take contingent on every possible event. A full policy or conditional plan is generally expensive to compute. To improve computational efficiency, we introduce the notion of partial conditional plans that cover sampled events to approximate a full conditional plan. Our approach constructs a partial conditional plan parameterized by a replanning probability. We prove that the failure rate of the constructed partial conditional plan is bounded by the replanning probability. Our approach allows users to specify an appropriate bound on the replanning probability to balance efficiency and correctness. Moreover, we update this bound properly to quickly detect whether the current partial conditional plan meets the bound and avoid unnecessary computation. In addition, to further improve the efficiency, we cache partial conditional plans for sampled belief states and reuse these cached plans if possible. We validate our approach in several robotic domains. The results show that our approach outperforms a previous policy synthesis approach for POMDPs with safe-reachability objectives in these domains.


PDF preprint: