Abstract: This study
explores how Bangladeshi university students develop English vocabulary through
engagement with captioned YouTube videos, conceptualised at the intersection of
incidental learning, multimodality, and connectivism. Grounded in the view that
vocabulary acquisition emerges through meaning-focused, multimodal, and
digitally networked encounters, the study examines how learners notice,
process, and apply new lexical items in their informal viewing practices.
Drawing on semi-structured interviews, reflective journals, and content-use
logs from 15 students across two private and one public university, data were
analysed through reflexive thematic analysis. Four interrelated themes captured
learners’ experiences: captions as adaptive scaffolds for noticing and recall;
learning beyond classroom boundaries through curated digital routines;
balancing entertainment and education to sustain motivation; and coping with
infrastructural and cognitive constraints. Findings reveal that students
transform captioned viewing into strategic, identity-driven vocabulary learning
by managing caption modes, pacing, and genre selection. The study proposes
integrating “caption literacy” and multimodal awareness into tertiary English
curricula, recognising informal digital environments as legitimate learning
spaces. By connecting cognitive, social, and technological dimensions of
vocabulary acquisition, this research extends current understandings of
informal digital learning and offers context-responsive insights for language
pedagogy in the Global South.
Keywords: Captioned
YouTube videos; incidental vocabulary learning; multimodal pedagogy; informal
digital learning; Bangladeshi higher education