Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

NeurIPS 2025 (Spotlight)

(* denotes equal contribution)
1Department of Electrical and Computer Engineering (ECE), Seoul National University 2Interdisciplinary Program in Artificial Intelligence (IPAI), Seoul National University
3IPAI / ASRI / INMC, Seoul National University

High-Level Policy is the Bottleneck of Hierarchical Policy.


We evaluate HIQL by varying only the high-level policy while keeping the low-level policy fixed. Using learned high-level policy, performance drops, whereas using the oracle high-level policy achieves high success rates, indicating the high-level policy is the main bottleneck.

Noisy Advantage Signals in Long-Horizon Regimes.


As the distance between $s_t$ and $g$ increases, the value estimates become increasingly erroneous, leading to an imprecise evaluation of the high-level advantage.

Our method:
Option-aware Temporally Abstracted (OTA) Value


By using temporally extended actions in planning, we reduce the effective horizon length, referring to the number of planning steps, to approximately $d^\star(s_t, g)/n$. Specifically, we modify the reward and target value to be option-aware, thereby ensuring that the high-level value $V^h$ is suitable for long-term planning.

Experimental Results

Evaluation on OGBench

  • HIQL struggles with long-horizon tasks, especially the humanoid mazes.
  • OTA achieves superior performance on long-horizon tasks.

Value estimation

  • OTA shows more monotonic and order-consistent value estimates than HIQL.
  • OTA improves high-level value estimation and policy learning in long-horizon tasks.

Videos

HumanoidMaze-large-navigate-v0

HIQL ❌

OTA (Ours) ✅

HumanoidMaze-giant-navigate-v0

HIQL ❌

OTA (Ours) ✅

BibTeX

@inproceedings{ota2025,
  title={Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning},
  author={Ahn, Hongjoon and Choi, Heewoong and Han, Jisu and Moon, Taesup},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2025},
}