文章目录
- Net_Structure
- Tips
- constraint
Net_Structure
Tips
参考文献
- we can learn a fully centralised stateaction value function Q_tot and then use it to guide the optimisation of decentralised policies in an actor-critic framework
- QMIX consists of agent networks representing each Qa,
and a mixing network that combines them into Q_tot, not
as a simple sum as in VDN, but in a complex non-linear way that ensures consistency between the centralised and decentralised policies - non-linear mixing of agent Q-values in
order to achieve consistent performance across tasks. - cooperative setting
constraint
This function allows each agent to participate in a decentralised execution by choosing greedy actions with respect to its value function.