在一般的VOIP軟件或視頻會議系統中,假設我們只有A和B兩個人在通話,首先,A的聲音傳給B,B然后用喇叭放出來,而這時B的MIC呢則會采集到喇叭放出來的聲音,然后傳回給A,如果這個傳輸的過程中時延足夠大,A就會聽到一個和自己剛才說過的話一樣的聲音,這就是回聲,聲學回聲消除器的作用就是在B端對B采集到的聲音進行處理,把采集到聲音包含的A的聲音去掉再傳給A,這樣,A就不會聽到自己說過的話了。
聲學回聲消除的原理我就不說了,這在網上有很多文檔,網上缺少的是實現,所以,我在這把一個開源的聲學回聲消除器介紹一下,希望對有些有人用,如果有人知道怎么把這消除器用的基于實時流的VOIP軟件中,希望能一起分享一下。
這個聲學回聲消除器是一個著名的音頻編解碼器speex中的一部分,1.1.9版本后的回聲消除器才起作用,以前版本的都不行,我用的也是這個版本,測試表明,用同一個模擬文件,它有效果比INTEL IPP庫4.1版中的聲學回聲消除器的還要好。
先說編譯。首先,從
www.speex.org上下載speex1.1.9的源代碼,解壓,打開speex/win32/libspeex中的libspeex.dsw,這個工作區里有兩個工程,一個是libspeex,另一個是libspeex_dynamic。然后,將libspeex中的mdf.c文件添加到工程libspeex中,編譯即可。
以下是我根據文檔封裝的一個類,里面有一個測試程序:
//file name: speexEC.h
#ifndef SPEEX_EC_H
#define SPEEX_EC_H
#include <stdio.h>
#include <stdlib.h>
#include "speex/speex_echo.h"
#include "speex/speex_preprocess.h"
class CSpeexEC


{
public:
CSpeexEC();
~CSpeexEC();
void Init(int frame_size=160, int filter_length=1280, int sampling_rate=8000);
void DoAEC(short *mic, short *ref, short *out);

protected:
void Reset();

private:
bool m_bHasInit;
SpeexEchoState* m_pState;
SpeexPreprocessState* m_pPreprocessorState;
int m_nFrameSize;
int m_nFilterLen;
int m_nSampleRate;
float* m_pfNoise;
};

#endif
//fine name:speexEC.cpp
#include "SpeexEC.h"

CSpeexEC::CSpeexEC()


{
m_bHasInit = false;
m_pState = NULL;
m_pPreprocessorState = NULL;
m_nFrameSize = 160;
m_nFilterLen = 160*8;
m_nSampleRate = 8000;
m_pfNoise = NULL;
}

CSpeexEC::~CSpeexEC()


{
Reset();
}

void CSpeexEC::Init(int frame_size, int filter_length, int sampling_rate)


{
Reset();

if (frame_size<=0 || filter_length<=0 || sampling_rate<=0)


{
m_nFrameSize =160;
m_nFilterLen = 160*8;
m_nSampleRate = 8000;
}
else


{
m_nFrameSize =frame_size;
m_nFilterLen = filter_length;
m_nSampleRate = sampling_rate;
}

m_pState = speex_echo_state_init(m_nFrameSize, m_nFilterLen);
m_pPreprocessorState = speex_preprocess_state_init(m_nFrameSize, m_nSampleRate);
m_pfNoise = new float[m_nFrameSize+1];
m_bHasInit = true;
}

void CSpeexEC::Reset()


{
if (m_pState != NULL)


{
speex_echo_state_destroy(m_pState);
m_pState = NULL;
}
if (m_pPreprocessorState != NULL)


{
speex_preprocess_state_destroy(m_pPreprocessorState);
m_pPreprocessorState = NULL;
}
if (m_pfNoise != NULL)


{
delete []m_pfNoise;
m_pfNoise = NULL;
}
m_bHasInit = false;
}

void CSpeexEC:DoAEC(short* mic, short* ref, short* out)


{
if (!m_bHasInit)
return;

speex_echo_cancel(m_pState, mic, ref, out, m_pfNoise);
speex_preprocess(m_pPreprocessorState, (__int16 *)out, m_pfNoise);
}

可以看出,這個回聲消除器類很簡單,只要初始化一下就可以調用了。但是,要注意的是,傳給回聲消除器的兩個聲音信號,必須同步得非常的好,就是說,在B端,接收到A說的話以后,要把這些話音數據傳給回聲消除器做參考,然后再傳給聲卡,聲卡再放出來,這有一段延時,這時,B再采集,然后傳給回聲消除器,與那個參考數據比較,從采集到的數據中把頻域和參考數據相同的部分消除掉。如果傳給消除器的兩個信號同步得不好,即兩個信號找不到頻域相同的部分,就沒有辦法進行消除了。
測試程序:
#define NN 160
void main()


{
FILE* ref_fd, *mic_fd, *out_fd;
short ref[NN], mic[NN], out[NN];
ref_fd = fopen ("ref.pcm", "rb"); //打開參考文件,即要消除的聲音
mic_fd = fopen ("mic.pcm", "rb");//打開mic采集到的聲音文件,包含回聲在里面
out_fd = fopen ("echo.pcm", "wb");//消除了回聲以后的文件

CSpeexEC ec;
ec.Init();

while (fread(mic, 1, NN*2, mic_fd))

{
fread(ref, 1, NN*2, ref_fd);
ec.DoAEC(mic, ref, out);
fwrite(out, 1, NN*2, out_fd);
}
fclose(ref_fd);
fclose(mic_fd);
fclose(out_fd);
}
以上的程序是用文件來模擬回聲和MIC,但在實時流中是大不一樣的,在一般的VOIP軟件中,接收對方的聲音并傳到聲卡中播放是在一個線程中進行的,而采集本地的聲音并傳送到對方又是在另一個線程中進行的,而聲學回聲消除器在對采集到的聲音進行回聲消除的同時,還需要播放線程中的數據作為參考,而要同步這兩個線程中的數據是非常困難的,因為稍稍有些不同步,聲學回聲消除器中的自適應濾波器就會發散,不但消除不了回聲,還會破壞原始采集到的聲音,使被破壞的聲音難以分辨。我做過好多嘗試,始終無法用軟件來實現對這兩個線程中的數據進行同步,導致實現失敗,希望有經驗的網友們一起分享一下這方面的經驗。
示例代碼:
Sample code
This section shows sample code for encoding and decoding speech using the Speex API. The commands can be used to encode and decode a file by calling:
% sampleenc in_file.sw | sampledec out_file.sw
where both files are raw (no header) files encoded at 16 bits per sample (in the machine natural endianness).
sampleenc takes a raw 16 bits/sample file, encodes it and outputs a Speex stream to stdout. Note that the packing used is NOT compatible with that of speexenc/speexdec.
#include <speex/speex.h>
#include <stdio.h>

/**//*The frame size in hardcoded for this sample code but it doesn't have to be*/
#define FRAME_SIZE 160
int main(int argc, char **argv)


{
char *inFile;
FILE *fin;
short in[FRAME_SIZE];
float input[FRAME_SIZE];
char cbits[200];
int nbBytes;

/**//*Holds the state of the encoder*/
void *state;

/**//*Holds bits so they can be read and written to by the Speex routines*/
SpeexBits bits;
int i, tmp;

/**//*Create a new encoder state in narrowband mode*/
state = speex_encoder_init(&speex_nb_mode);

/**//*Set the quality to 8 (15 kbps)*/
tmp=8;
speex_encoder_ctl(state, SPEEX_SET_QUALITY, &tmp);
inFile = argv[1];
fin = fopen(inFile, "r");

/**//*Initialization of the structure that holds the bits*/
speex_bits_init(&bits);
while (1)


{

/**//*Read a 16 bits/sample audio frame*/
fread(in, sizeof(short), FRAME_SIZE, fin);
if (feof(fin))
break;

/**//*Copy the 16 bits values to float so Speex can work on them*/
for (i=0;i<FRAME_SIZE;i++)
input[i]=in[i];

/**//*Flush all the bits in the struct so we can encode a new frame*/
speex_bits_reset(&bits);

/**//*Encode the frame*/
speex_encode(state, input, &bits);

/**//*Copy the bits to an array of char that can be written*/
nbBytes = speex_bits_write(&bits, cbits, 200);

/**//*Write the size of the frame first. This is what sampledec expects but
it's likely to be different in your own application*/
fwrite(&nbBytes, sizeof(int), 1, stdout);

/**//*Write the compressed data*/
fwrite(cbits, 1, nbBytes, stdout);
}

/**//*Destroy the encoder state*/
speex_encoder_destroy(state);

/**//*Destroy the bit-packing struct*/
speex_bits_destroy(&bits);
fclose(fin);
return 0;
}
sampledec reads a Speex stream from stdin, decodes it and outputs it to a raw 16 bits/sample file. Note that the packing used is NOT compatible with that of speexenc/speexdec.
#include <speex/speex.h>
#include <stdio.h>

/**//*The frame size in hardcoded for this sample code but it doesn't have to be*/
#define FRAME_SIZE 160
int main(int argc, char **argv)


{
char *outFile;
FILE *fout;

/**//*Holds the audio that will be written to file (16 bits per sample)*/
short out[FRAME_SIZE];

/**//*Speex handle samples as float, so we need an array of floats*/
float output[FRAME_SIZE];
char cbits[200];
int nbBytes;

/**//*Holds the state of the decoder*/
void *state;

/**//*Holds bits so they can be read and written to by the Speex routines*/
SpeexBits bits;
int i, tmp;

/**//*Create a new decoder state in narrowband mode*/
state = speex_decoder_init(&speex_nb_mode);

/**//*Set the perceptual enhancement on*/
tmp=1;
speex_decoder_ctl(state, SPEEX_SET_ENH, &tmp);
outFile = argv[1];
fout = fopen(outFile, "w");

/**//*Initialization of the structure that holds the bits*/
speex_bits_init(&bits);
while (1)


{

/**//*Read the size encoded by sampleenc, this part will likely be
different in your application*/
fread(&nbBytes, sizeof(int), 1, stdin);
fprintf (stderr, "nbBytes: %d/n", nbBytes);
if (feof(stdin))
break;

/**//*Read the "packet" encoded by sampleenc*/
fread(cbits, 1, nbBytes, stdin);

/**//*Copy the data into the bit-stream struct*/
speex_bits_read_from(&bits, cbits, nbBytes);

/**//*Decode the data*/
speex_decode(state, &bits, output);

/**//*Copy from float to short (16 bits) for output*/
for (i=0;i<FRAME_SIZE;i++)
out[i]=output[i];

/**//*Write the decoded audio to file*/
fwrite(out, sizeof(short), FRAME_SIZE, fout);
}

/**//*Destroy the decoder state*/
speex_decoder_destroy(state);

/**//*Destroy the bit-stream truct*/
speex_bits_destroy(&bits);
fclose(fout);
return 0;
}