In multistability, a constant stimulus induces alternating perceptual interpretations. For many forms of visual multistability, the transition from one interpretation to another ("perceptual switch") is accompanied by a dilation of the pupil. Here we ask whether the same holds for auditory multistability, specifically auditory streaming. Two tones were played in alternation, yielding four distinct interpretations: the tones can be perceived as one integrated percept (single sound source), or as segregated with either tone or both tones in the foreground. We found that the pupil dilates significantly around the time a perceptual switch is reported ("multistable condition"). When participants instead responded to actual stimulus changes that closely mimicked the multistable perceptual experience ("replay condition"), the pupil dilated more around such responses than in multistability. This still held when data were corrected for the pupil response to the stimulus change as such. Hence, active responses to an exogeneous stimulus change trigger a stronger or temporally more confined pupil dilation than responses to an endogenous perceptual switch. In another condition, participants randomly pressed the buttons used for reporting multistability. In Study 1, this "random condition" failed to sufficiently mimic the temporal pattern of multistability. By adapting the instructions, in Study 2 we obtained a response pattern more similar to the multistable condition. In this case, the pupil dilated significantly around the random button presses. Albeit numerically smaller, this pupil response was not significantly different from the multistable condition. While there are several possible explanations-related, e.g., to the decision to respond-this underlines the difficulty to isolate a purely perceptual effect in multistability. Our data extend previous findings from visual to auditory multistability. They highlight methodological challenges in interpreting such data and suggest possible approaches to meet them, including a novel stimulus to simulate the experience of perceptual switches in auditory streaming.