SAMOSA: Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering

SAMOSA System

What if sounds in your XR headset actually matched the room you're in โ€” reverberating off the right walls, absorbing into the right surfaces? SAMOSA makes virtual audio feel real by understanding your physical environment in real time.

๐Ÿ“„ Paper: doi:10.1145/3746059.3747730 ยท Published at UIST 2025

๐ŸŽค Talk: Mon, Sep 29 | 11:24 AM โ€“ 11:36 AM

๐Ÿ“ Paradise Hotel Busan, Sydney Room โ€” Busan, Republic of Korea


๐ŸŽฏ The Problem

In XR, realistic sound is crucial for immersion โ€” but existing spatial audio systems use static, one-size-fits-all acoustics. A cathedral and a closet get the same reverb. This mismatch between what you see and what you hear breaks the illusion.

SAMOSA fixes this by building a real-time understanding of your physical space โ€” its geometry, materials, and acoustic character โ€” and using that to synthesize audio that sounds like it truly belongs there.


๐Ÿ—๏ธ How It Works

SAMOSA fuses three real-time sensing streams into a rich multimodal scene representation:

**1. Room Geometry Estimation** Real-time 3D understanding of room shape and dimensions from device sensors. **2. Surface Material Detection** Visual identification of surfaces (concrete, wood, glass, carpet) that determine how sound reflects and absorbs. **3. Semantic Acoustic Context** LLM-driven interpretation of the scene's acoustic character โ€” understanding that a "library" sounds different from a "gym" even at similar sizes.

These three streams feed into an efficient acoustic calibration engine that synthesizes a realistic Room Impulse Response (RIR) โ€” the acoustic fingerprint of your space โ€” in real time.


๐ŸŽฌ Demo


๐Ÿ“Š Results

We validated SAMOSA through technical evaluation using acoustic metrics for RIR synthesis across various room configurations and sound types, alongside an expert evaluation (N=12).

Aspect Finding
Room configurations Validated across diverse geometries and surface materials
Sound types Tested with speech, music, and environmental audio
Expert evaluation 12 audio professionals confirmed enhanced realism
On-device Runs in real time on XR hardware
(Xu et al., 2025)

References

2025

  1. SAMOSA
    samosa.gif
    Tianyu Xu, Jihan Li, Penghe Zu, Pranav Sahay, Maruchi Kim, Jack Obeng-Marnu, Farley Miller, Xun Qian, Katrina Passarella, Mahitha Rachumalla, Rajeev Nongpiur, and D. Shin.
    In Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST โ€™25)