牽著老婆滿街逛

Before introducing all the Speex features, here are some concepts in speech coding that help better understand the rest of the manual. Although some are general concepts in speech/audio processing, others are specific to Speex

在介紹Speex特性之前，為了便于閱讀后面的文檔，需要解釋一些概念，盡管一些概念是在語音/音頻處理過程中常見的，但也有Speex特有的一些。

采樣率

The sampling rate expressed in Hertz (Hz) is the number of samples taken from a signal per second. For a sampling rate of Fs kHz, the highest frequency that can be represented is equal to Fs/2 kHz (Fs/2 is known as the Nyquist frequency). This is a fundamental property in signal processing and is described by the sampling theorem. Speex is mainly designed for three different sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are respectively refered to as narrowband, wideband and ultra-wideband.

采樣率是指從連續信號中每秒鐘采集到的采樣數量。用Fs kHz來表示，最高頻率可表示為Fs/2 kHz（見奈奎斯特Nyquist頻率）。采樣定理表明這是信號處理最基本的屬性。Speex主要設計了三種不同的采樣率：8kHz，16kHz和32kHz。分別表示了窄帶、寬帶和超寬帶。

比特率

When encoding a speech signal, the bit-rate is defined as the number of bits per unit of time required to encode the speech. It is measured in bits per second (bps), or generally kilobits per second. It is important to make the distinction between kilobits per second (kbps) and kilobytes per second (kBps).

比特率是指每秒鐘傳送的比特數，在語音信號編碼時，表示語音數據每秒鐘需要多少個比特表示，單位為bps(比特/秒)或kbps(千比特/秒)。注意區分kbps和kBps（千字節/秒）。

質量(可變)

Speex is a lossy codec, which means that it achives compression at the expense of fidelity of the input speech signal. Unlike ome other speech codecs, it is possible to control the tradeoff made between quality and bit-rate. The Speex encoding process is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float.

Speex是一種有損編解碼庫，這意味著它的文檔壓縮方面會導致語音輸入信號的失真，和一些語音編解碼庫不同的是，它盡可能的去控制質量和比特率之間的平衡。大多數時候，是用一個0到10范圍內的質量參數來控制Speex的編碼，比特率為常量的操作，質量參數是整數，如果是變比特率（VBR），則為浮點數（Float）

復雜度（可變）

With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that’s similar to the -1 to -9 options to gzip and bzip2 compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about 5 times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones.

在Speex中，編碼器可調整復雜度。用1到10的整數來控制如何執行搜索，就像用-1到-9來設置壓縮工具gzip或bzip2(博主注：設計壓縮的塊長度,為100k～900k)。正常情況下，復雜度為1時噪聲級會比復雜度為10時高1～2 dB(分貝)，而復雜度為10的CPU需求是復雜度為1的5倍。實踐證明，最好將復雜度設置在2～4，設置較高則對非語音編碼如雙音多頻（DTMF）音質較為有用。

變比特率（VBR）

Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically to adapt to the “difficulty” of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit-rate to achieve good quality, while fricatives (e.g. s,f sounds) can be coded adequately with less bits. For this reason, VBR can achive lower bit-rate for the same quality, or a better quality for a certain bit-rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there’s no guaranty about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.

變比牲率（VBR）允許編解碼器動態調整比特率以適應的音頻解碼的“難度”，拿Speex來說，像元音和瞬間高音則需較高比特率（Bit-rate）來達到最佳效果，而摩擦音則用較少的比特（bits）即可完成編碼。基于這種原因，變比特率（VBR）可以用較低的比特率(bit-rate)達到相同的效果或使用某比特率（bit-rate）質量會更好。盡管它有這些優勢，但VBR也有兩個主要的缺點：首先，它只是針對質量，卻沒辦法保證最終的平均比特率（ABR）; 其次，在一些實時應用如VOIP電話中，盡管擁有高的比特率（bit-rate），為適應通信信道還是需要適當降低。

平均比特率（ABR）

Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bit-rate.

平均比特率（ABR）通過動態調整變比特率(VBR)的質量來獲得一個特定目標的比特率，解決了VBR中存在的問題之一。因為平均比特率（ABR）是實時（開環）調整質量/比特率(bit-rate)的，整體質量會略低于通過變比特率（VBR）設置的接近于目標平均比特率進行編碼獲得的質量。

靜音檢測（VAD）

When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called “comfort noise generation” (CNG).

靜音檢測（VAD）將檢測被編碼的音頻數據是語音還是靜音或背景噪聲。這個特性在用變比特率（VBR）進行編碼是總是開啟的，所以選項設置只對非變比特率（VBR）起作用。在這種情況下，Speex檢測非語音周期并對用足夠的比特數重新生成的背景噪聲進行編碼。這個叫“舒適噪聲生成（CNG）”。

非連續傳輸（DTX）

Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for such frames (corresponding to 250 bps).

非連續性傳輸（DTX）是靜音檢測（VAD）/變比特率（VBR）操作的額外選項，它能夠在背景噪聲固定時，完全的停止傳輸。如果是基于文件的操作，由于我們不能停止對文件的寫入，會有5個比特被用到這種幀內（相對于250bps）。

知覺增強

Perceptual enhancement is a part of the decoder which, when turned on, attempts to reduce the perception of the noise/distortion produced by the encoding/decoding process. In most cases, perceptual enhancement brings the sound further from the original objectively (e.g. considering only SNR), but in the end it still sounds better (subjective improvement).

知覺增強中解碼的一部分，開啟后，用來減少在編碼/解碼過程中產生的噪音/失真。大多數情況下，知覺增強產生的會和最原始的聲音會相差較遠（如只考慮信噪比（SNR）），但最后發音效果卻很好（主觀改善）。

延時算法

Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of “look-ahead” required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don’t account for the CPU time it takes to encode or decode the frames.

每個聲音編解碼在傳輸過程中都會有時延。就Speex來說，它的時延就等于每幀大小加上每幀需要處理的一些"預測"(look-ahead)。在窄帶(8kHz)操作中,大概30ms時延，寬帶操作大概34ms時延。而且沒有將CPU進行編/解碼的時間計算在內。

2.2 編解碼

The main characteristics of Speex can be summarized as follows:
    • Free software/open-source, patent and royalty-free
    • Integration of narrowband and wideband using an embedded bit-stream
    • Wide range of bit-rates available (from 2.15 kbps to 44 kbps)
    • Dynamic bit-rate switching (AMR) and Variable Bit-Rate (VBR) operation
    • Voice Activity Detection (VAD, integrated with VBR) and discontinuous transmission (DTX)
    • Variable complexity
    • Embedded wideband structure (scalable sampling rate)
    • Ultra-wideband sampling rate at 32 kHz
    • Intensity stereo encoding option
    • Fixed-point implementation

Speex的主要特性總結如下：

開源的自由軟件，免專利，免版權
通過嵌入的比特流集成窄帶和寬帶
可大范圍改變比特率（bit-rate）（從2.15kbps到44kbps ）
動態比特率交換（AMR）和變比特率（VBR）操作
靜音檢測（VAD，和變比特率（VBR）集成）和非連續性傳輸（DTX）
可變復雜度
嵌入的寬帶結構（可變的比特率）
32kHz的超寬帶采樣率
強立體聲編碼選項
定點執行

2.3 預處理器

This part refers to the preprocessor module introduced in the 1.1.x branch. The preprocessor is designed to be used on the
audio before running the encoder. The preprocessor provides three main functionalities:
• noise suppression
• automatic gain control (AGC)
• voice activity detection (VAD)

這部分涉及到1.1.x里的預處理模塊介紹，預處理器是在音頻被編碼前使用，它主要提供如下三種主要功能：

抑制噪音
自動增益控制（AGC）
靜音檢測（VAD）

The denoiser can be used to reduce the amount of background noise present in the input signal. This provides higher quality speech whether or not the denoised signal is encoded with Speex (or at all). However, when using the denoised signal with the codec, there is an additional benefit. Speech codecs in general (Speex included) tend to perform poorly on noisy input, which tends to amplify the noise. The denoiser greatly reduces this effect.

降噪是用來減少輸入信號中的背景噪音的數量。不論是Speex（或其他）編碼的去噪信號可提供更高的語音質量。無論如何編解碼器使用降噪信號都是有利的。一般的語音編解碼器（Speex中也包含）在噪音輸入方面都表現不佳，往往會擴大噪音。而降噪則大大降低了這種影響。

Automatic gain control (AGC) is a feature that deals with the fact that the recording volume may vary by a large amount between different setups. The AGC provides a way to adjust a signal to a reference volume. This is useful for voice over IP because it removes the need for manual adjustment of the microphone gain. A secondary advantage is that by setting the microphone gain to a conservative (low) level, it is easier to avoid clipping.

不同的設備，錄音效果會有較大幅度的變動，自動增益控制（AGC）就是用來處理這種現象的。它提供了一種調整信號為參考音量的方法。這對VOIP（voice over IP）是非常有用的，因為它不需要再手動去調整麥克風增益。第二個好處是，將麥克風增益設置為保守(低)級別，可有效避免削波。

The voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the codec.

預處理器提供的靜音檢測（VAD）比編解碼器里直接提供的更為先進。

2.4 自適應抖動緩沖

When transmitting voice (or any content for that matter) over UDP or RTP, packet may be lost, arrive with different delay,or even out of order. The purpose of a jitter buffer is to reorder packets and buffer them long enough (but no longer than necessary) so they can be sent to be decoded.

在用UDP或RTP協議傳輸語音（或其他相關內容）的時候，會出現丟包、不同時延甚至是非時序的到達。抖動緩沖的目的就是將它們緩沖到足夠長（不超過必需的）并對這些包進行重排序，然后才送給解碼器進行解碼。

2.5 回聲消除

圖 2.1 回聲模式

In any hands-free communication system (Fig. 2.1), speech from the remote end is played in the local loudspeaker, propagates in the room and is captured by the microphone. If the audio captured from the microphone is sent directly to the remote end, then the remove user hears an echo of his voice. An acoustic echo canceller is designed to remove the acoustic echo before it is sent to the remote end. It is important to understand that the echo canceller is meant to improve the quality on the remote end.

如圖2.1所示，在免提通信系統中，語音從遠端傳回本地的擴音器，麥克風回捕獲房內的回聲，然后會將其直接發回給遠端，遠端用戶就會聽到它自己的聲音。回聲消除器就是為了在回聲傳回給遠端用戶之前將其消除。重要的是要明白，回聲消除用來提高遠端用戶接收到的語音質量。

2.6 重采樣

In some cases, it may be useful to convert audio from one sampling rate to another. There are many reasons for that. It can be for mixing streams that have different sampling rates, for supporting sampling rates that the soundcard doesn’t support, for transcoding, etc. That’s why there is now a resampler that is part of the Speex project. This resampler can be used to convert between any two arbitrary rates (the ratio must only be a rational number) and there is control over the quality/complexity tradeoff.

在一些情況下，改變音頻的采樣率是非常有用的。有很多原因，如擁有不同采樣率則可進行混合流、支持聲卡不支持的采樣率、代碼轉換等。這是為什么重采樣會成為Speex工程的一部分。重采樣可在任意比率之間轉換（比率必須是有理數），它是基于質量/復雜度進行的折中。

后記：

嗯，總體來說感覺翻譯的蠻粗糙的，有些地方理解的不是很透，放在這里供大家拍磚。

posted on 2012-11-22 00:01 楊粼波閱讀(1745) 評論(0) 編輯收藏引用所屬分類: 文章收藏、C++

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: cocos2dx 內存管理 select 效率問題微軟代碼簽名證書使用指南 Opus 音頻編碼正式標準化音頻比特率 speex 的一個例子, 使用了SPEEX抖動緩存. 深入剖析 iLBC 編碼器原理 speex開源項目的學習 directsound抓取麥克風PCM數據封裝類丟包補償技術調查

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

牽著老婆滿街逛

導航

統計

公告

常用鏈接

留言簿(11)

隨筆分類(466)

隨筆檔案(1513)

文章分類(46)

文章檔案(45)

相冊

收藏夾(39)

工具官網

技術網站

開源網站

其他窩點

收藏網站

銀行官網

友情鏈接

資源共享

搜索

積分與排名

最新評論

閱讀排行榜

Speex手冊----編解碼介紹

2.1 概念

2.2 編解碼

2.3 預處理器

2.4 自適應抖動緩沖

2.5 回聲消除

2.6 重采樣