Anonymous included in Draft

2024-09-15 2024-10-24 About 100 words One minute - views

Contents

Conecpts

Behavior Cloning

\begin{align*} \mathbf{\hat{\theta}} = \argmax_{\theta} \mathbb{E}_{(\mathbf{s},\mathbf{a})\sim \mathbf{\tau_E}}[\log(\pi_{\theta}(\mathbf{a|s}))] \end{align*}

The key idea of behavioural cloning is to maximize such policy $\pi_{\theta}(\mathbf{a}|\mathbf{s})$ where $\mathbf{s}$ and $\mathbf{a}$ is the expert trajectory.

Action Chunking Transformer

Implicit Behavior Cloning

\begin{align*} \mathbf{\hat{a}} = \argmin_{\mathbf{a}} E_{\theta}(\mathbf{s,a}) \end{align*}

Methodology