文章目录 Markov Decision ProcessDefinitionRelated Concepts Policy for MDP AgentDefinitionJudgement for PolicyValue FunctionsTD formula for value functionsRelation of V and QPolicy CriterionPolicy Improvement TheoremOptimal PolicyReinforcement Learning Fund…