Systems and methods for separating and identifying audio in an audio file using machine learning

Patented

Patent Number: 12,062,375

Date of Patent: August 13, 2024

Disclosed herein are systems and methods for processing an audio file to perform audio Segmentation and Speaker Role Identification (SRID) by training low level classifier and high level clustering components to separate and identify audio from different sources in an audio file by unifying audio separation and automatic speech recognition (ASR) techniques in a single system. Segmentation and SRID can include separating audio in an audio file into one or more segments, based on a determination of the identity of the speaker, category of the speaker, or source of audio in the segment. In one or more examples, the disclosed systems and methods use machine learning and artificial intelligence technology to determine the source of segments of audio using a combination of acoustic and language information. In some examples, the acoustic and language information is used to classify audio in each frame and cluster the audio into segments.

View the patent on Google Patents