One ICML-2024 paper to appear: STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment.