Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

  • Zhenyu He
  • , Guhao Feng
  • , Shengjie Luo
  • , Kai Yang
  • , Liwei Wang
  • , Jingjing Xu
  • , Zhi Zhang
  • , Hongxia Yang
  • , Di He

Research output: Journal article publicationConference articleAcademic researchpeer-review

Abstract

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.

Original languageEnglish
Pages (from-to)17858-17876
Number of pages19
JournalProceedings of Machine Learning Research
Volume235
Publication statusPublished - Jul 2024
Externally publishedYes
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: 21 Jul 202427 Jul 2024

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation'. Together they form a unique fingerprint.

Cite this