Big and Little endianness quest

Posted on 01 August 2009 in Articles • 3 min read

I was working with G.729 Annex B 's source code <http://www.itu.int/rec/T-REC-G.729/en> to extract the codec's Voice Activity Detector (VAD).

Voice activity detection is a technique used in speech processing wherein the presence or absence of human speech is detected in regions of audio (which may also contain music, noise, or other sound).

—Wikipedia [1]

The result of my work was a tool that produced 0 and 1 binary output for a speech and non-speech 10ms-long sections of a sound file with sampling frequency of 8000 Hz and 16 bits PCM data.

The Aurora speech recognition experimental framework [2] data was used to test the program. That was the point where the problems began.

The VAD was not working. Almost every part of every file was marked as "speech". But that was not true! And the G729 Annex B VAD could not work so awful. Where was the problem?

The original G.729 reads the input file as follows:

while( fread(new_speech, sizeof(Word16), L_FRAME, f_speech) == L_FRAME)

Where new_speech is a L_FRAME-long array of Word16 (short) elements. Then the new_speech data is pre-processed, passed through and compressed via the codec. I removed the compression part and the code looked like:

Pre_Process(new_speech, L_FRAME);
Vad = Coder_ld8a(prm, frame);

Probably the error was in the Coder_ld8a() function. But how could it be? I just removed the compression stuff, and added

return 1 for speech sections
return 0 for non-speech sections

I could have spent a lot of time hunting the "ghost" bug, if my supervisor Rahim Saedi didn't ask, whether the files were read correctly. OMG! A possible bug error in fread() function?

I checked the values of the new_speech array for a random file from the Aurora database and found something weird: the values sequence was -1 -1 0 0 256 -1 0 0 0 256 ... Something clicked in my mind.

In computing, endianness is the byte (and sometimes bit) ordering used to represent some kind of data. Most modern computer processors agree on bit ordering "inside" individual bytes (this was not always the case). This means that any single-byte value will be read the same on almost any computer one may send it to.

Integers are usually stored as sequences of bytes, so that the encoded value can be obtained by simple concatenation. The two most common of them are:

increasing numeric significance with increasing memory addresses or increasing time, known as little-endian, and

its opposite, most-significant byte first, called big-endian.

Well known processor architectures that use the little-endian format include x86.

—Wikipedia [3]

I was using a x86 processor that uses the little-endian format, trying to read a file that contained 16-bit integers in big-endian format!

It's easier to imagine what happened, with binary representation of the decimal 1 in 16-bit big-endian and little-endian formats:

00000000 00000001 - Big endian
00000001 00000000 - Little endian

And vice-versa the decimal 256 is:

00000001 00000000 - Big endian
00000000 00000001 - Little endian

Now you can see: the program read the big-endian data and interpreted it as little-endian. So, all values were actually multiplied by 256!

Finally, a small piece of code helped to eliminate the problem:

short reverseShort(short s)
{
    unsigned char c1, c2;
    c1 = s & 255;
    c2 = (s >> 8) & 255;

    return (c1 << 8) + c2;
}

References

[1]	http://en.wikipedia.org/wiki/Voice_activity_detection

[2]	http://aurora.hsnr.de/index.html

[3]	http://en.wikipedia.org/wiki/Endianness

Previous Post Next Post