GenAI for sound design

Event date: 2025-03-20

Event location:

Welcome to this week’s Learning Machines seminar.

This seminar is a collaboration between RISE and Climate AI Nordics – climateainordics.com.

Title: GenAI for sound design

Speaker: Oriol Nieto, Adobe

Abstract: This presentation explores the forefront of generative AI research for sound design at Adobe Research. I will provide an overview of Latent Diffusion Models, which form the foundation of our work, and introduce several recent advancements focused on controllability and multimodality. I will begin with SILA [1], a technique designed to enhance the control of sound effects generated through text prompts. Following this, I will present Sketch2Sound [2], a model that generates sound effects conditioned on both audio recordings and text. Lastly, I will examine MultiFoley [3], a model capable of generating sound effects from both silent videos and text. Throughout the talk, I will showcase a series of examples and demos to illustrate the practical applications and potential of these models, making the case that we are only beginning to unveil a completely new paradigm in how to approach sound design.

[1] Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol Nieto, "SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation", In review for IEEE SPL

[2] Hugo Flores García, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman, "Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations", ICASSP 2025

[3] Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon, "Video-Guided Foley Sound Generation with Multimodal Controls", In review for CVPR 2025

About the speaker: Oriol is a Senior Research Engineer at Adobe Research, where he focuses on human-centered AI for audio creativity, encompassing everything from music to audiobooks, video editing, and sound design. He holds a PhD in Music Technology from MARL, NYU, a Master's in Music, Science, and Technology from Stanford University, and a Master's in Information Technologies from Pompeu Fabra University. Highly involved with the Music Information Retrieval community, he was one of the three General Chairs for ISMIR 2024 in San Francisco this past November. Oriol has helped develop relevant open-source MIR packages such as librosa, mir-eval, and MSAF; contributed to PyTorch; and plays guitar, violin, cajón, and sings (and screams) in his spare time.

Location: This is an online seminar. Connect using Zoom.

Date: 2025-03-20 15:00

Upcoming seminars:

  • 2025-03-27: María J. Molina, University of Maryland
  • 2025-04-03: Abdul Shaamala, Queensland University of Technology
  • 2025-04-24: John Martinsson, RISE and Lund University
  • All seminars are 15:00 CET.

More information and coming seminars: https://ri.se/lm-sem

– The Learning Machines Team