Pretraining PopT on large amounts of neural data significantly improves downstream decoding performance compared to baseline non-pretrained aggregation methods.
We see this across tasks (x-axis), data modalities (a: iEEG) (b: EEG), and temporal embedding types (hatch):
and across channel ensemble sizes (x-axis):
Pretraining PopT improves sample efficiency, requiring fewer labeled examples (x-axis) for strong performance:
Gains in decoding performance are available to new (held-out) subjects:
Increasing the amount of pretraining data (colors) leads to improvements in downstream decoding performance:
We explore how a pretrained PopT can generate insights into neural data.
We can recover connectivity maps from the pretrained model (right) and compare with coherence analysis (left):
We can discover functional brain regions with [CLS] token attention weight analysis, finding auditory and language regions (arrows) being attended to:
@misc{chau2024populationtransformer,
title={Population Transformer: Learning Population-level Representations of Neural Activity},
author={Geeling Chau and Christopher Wang and Sabera Talukder and Vighnesh Subramaniam and Saraswati Soedarmadji and Yisong Yue and Boris Katz and Andrei Barbu},
year={2024},
eprint={2406.03044},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.03044},
}