Problem with an infinite loop in bcg729 codec

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with an infinite loop in bcg729 codec

BOITEUX, FREDERIC-2
        Hello,

We're using the Linphone G729 codec (bcg729) in our software (which handles multiple communications at the same time), it runs fine, except from time to time, where it runs into an endless loop, freezing all current communications (until software restart). I'm not always able to access the system when the lock-up has started, but I achieve to get 2 core dumps for this problem.
I've tried to analyze problem origin ; the endless loop occurs in VAD processing :

(gdb) bt
#0  countLeadingZeros (x=0) at utils.h:113
#1  g729Log2_Q0Q16 (x=-4096) at g729FixedPointMath.h:65
#2  bcg729_vad (VADChannelContext=0x7fe20c513040,
    reflectionCoefficient=2065246114,
    LSFCoefficients=LSFCoefficients@entry=0x7fe235e8f940,
    autoCorrelationCoefficients=autoCorrelationCoefficients@entry=0x7fe235e8fa70,
    autoCorrelationCoefficientsScale=<optimized out>,
    signalCurrentFrame=0x7fe20c940680) at vad.c:206
#3  0x00007fe234c276c3 in bcg729Encoder (encoderChannelContext=0x7fe20c940590,
    inputFrame=<optimized out>,
    bitStream=0x7fe20c9fa9d0 "\320j\356e\253\266\061O\214K\n",
    bitStreamLength=0x7fe20c9fa9da "\n") at encoder.c:170
#4  0x00007fe235be90ff in md_g729Encoder_encode (sess=0x7fe231e8fc80,
    encoderData=0x7fe20c9fa980, in=0x7fe2280b78c0)
    at ./threadPool_impl/src/modules/codecG729.c:391
#5  0x00007fe235bdaa8e in sessionSendReceiveProcess (worker=0x55cc9e74e200,
    session=0x7fe231e8fc80, arg=<optimized out>)
    at ./threadPool_impl/src/sessionProcess.c:184
#6  0x00007fe235bd673b in workerRun (arg=0x55cc9e74e200)
    at ./threadPool_impl/src/scheduler.c:609
#7  0x00007fe2357364a4 in start_thread (arg=0x7fe235e90700)
    at pthread_create.c:456
#8  0x00007fe235174d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

The frames 0 to 3 are located in the bcg729 codec. In the frame #1, the source of g729Log2_Q0Q16() function states explicitely that its input must be > 0 and is not checked, and in the 2 cores, the given value was < 0.

I've tried to reproduce the problem, first with some of our automated calls systems (without success), then in a standalone program, trying to call the bcg729 codec with the PCM frame that causes the core dump, trying to set the VADContext to the same state that in the core, but I didn't success (I think perhaps some of these context variables were already modified between start of input frame processing and when the bug occurs). I'm a bit lost in the VAD processing anyway.

Did you experienced such kind of problem ?
Have you any clue about how to solve this endless loop ?

  Any help would be appreciated, thanks !

        Frédéric Boiteux - Odigo IVR product development
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.


_______________________________________________
Linphone-developers mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/linphone-developers
Reply | Threaded
Open this post in threaded view
|

Problem with an infinite loop in bcg729 codec

BOITEUX, FREDERIC-2
        Hello,

We're using the Linphone G729 codec (bcg729) in our software (which handles multiple communications at the same time), it runs fine, except from time to time, where it runs into an endless loop, freezing all current communications (until software restart). I'm not always able to access the system when the lock-up has started, but I achieve to get 2 core dumps for this problem.
I've tried to analyze problem origin ; the endless loop occurs in VAD processing :

(gdb) bt
#0  countLeadingZeros (x=0) at utils.h:113
#1  g729Log2_Q0Q16 (x=-4096) at g729FixedPointMath.h:65
#2  bcg729_vad (VADChannelContext=0x7fe20c513040,
    reflectionCoefficient=2065246114,
    LSFCoefficients=LSFCoefficients@entry=0x7fe235e8f940,
    autoCorrelationCoefficients=autoCorrelationCoefficients@entry=0x7fe235e8fa70,
    autoCorrelationCoefficientsScale=<optimized out>,
    signalCurrentFrame=0x7fe20c940680) at vad.c:206
#3  0x00007fe234c276c3 in bcg729Encoder (encoderChannelContext=0x7fe20c940590,
    inputFrame=<optimized out>,
    bitStream=0x7fe20c9fa9d0 "\320j\356e\253\266\061O\214K\n",
    bitStreamLength=0x7fe20c9fa9da "\n") at encoder.c:170
#4  0x00007fe235be90ff in md_g729Encoder_encode (sess=0x7fe231e8fc80,
    encoderData=0x7fe20c9fa980, in=0x7fe2280b78c0)
    at ./threadPool_impl/src/modules/codecG729.c:391
#5  0x00007fe235bdaa8e in sessionSendReceiveProcess (worker=0x55cc9e74e200,
    session=0x7fe231e8fc80, arg=<optimized out>)
    at ./threadPool_impl/src/sessionProcess.c:184
#6  0x00007fe235bd673b in workerRun (arg=0x55cc9e74e200)
    at ./threadPool_impl/src/scheduler.c:609
#7  0x00007fe2357364a4 in start_thread (arg=0x7fe235e90700)
    at pthread_create.c:456
#8  0x00007fe235174d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

The frames 0 to 3 are located in the bcg729 codec. In the frame #1, the source of g729Log2_Q0Q16() function states explicitely that its input must be > 0 and is not checked, and in the 2 cores, the given value was < 0.

I've tried to reproduce the problem, first with some of our automated calls systems (without success), then in a standalone program, trying to call the bcg729 codec with the PCM frame that causes the core dump, trying to set the VADContext to the same state that in the core, but I didn't success (I think perhaps some of these context variables were already modified between start of input frame processing and when the bug occurs). I'm a bit lost in the VAD processing anyway.

Did you experienced such kind of problem ?
Have you any clue about how to solve this endless loop ?

  Any help would be appreciated, thanks !

        Frédéric Boiteux - Odigo IVR product development
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.


_______________________________________________
Linphone-developers mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/linphone-developers
Reply | Threaded
Open this post in threaded view
|

Re: Problem with an infinite loop in bcg729 codec

BOITEUX, FREDERIC-2
In reply to this post by BOITEUX, FREDERIC-2
        Hello,

  I've progressed on this bcg729 codec problem :

- the problem is occurring when encoding data coming from a G711 decoded stream, using VAD on this stream.


- I actually got a [short] stream which let reproduces the bug (endless loop) 😊


- I've seen that the bcg729 codec have intrinsic tests : I've compiled and run them, and I got some « fails » :

$ ./testCampaignAll                                    
Test LP2LSPConversion bloc                                                        
  speech  ... Pass
  tame  ... Pass                                                                  
  lsp  ... Pass                          

Test postFilter bloc                    
  test  ... Pass                                                                  
  parity  ... Pass
  speech  ... Pass                                                                
  erasure  ... Pass                      
  tame  ... Pass                                                                  
  algthm ... Fail                        
  fixed  ... Pass                        
  lsp ... Fail    
  pitch  ... Pass                                                                  
  overflow  ... Pass

Test decoder bloc
  speechDecode 3750 frames in 0.092994 seconds : 24.798400 us/frame                
  ... Pass          
  lspDecode 2232 frames in 0.055679 seconds : 24.945789 us/frame                  
 ... Fail                                

  algthmDecode 35 frames in 0.000884 seconds : 25.257143 us/frame                  
 ... Fail                                                                          



- When I'm compiling the codec with ASAN checker (included in GCC), I get a lot of runtime errors about integer overflow or undefined behaviour :
src/qLSP2LP.c:87:10: runtime error: left shift of negative value -67560105
src/qLSP2LP.c:93:10: runtime error: left shift of negative value -10805
src/postFilter.c:226:28: runtime error: left shift of negative value -1032
src/computeLP.c:76:30: runtime error: left shift of negative value -118198051
src/computeLP.c:91:9: runtime error: left shift of negative value -75628195
src/computeLP.c:94:24: runtime error: left shift of negative value -6604576
src/LP2LSPConversion.c:65:11: runtime error: left shift of negative value -7304
src/LP2LSPConversion.c:66:11: runtime error: left shift of negative value -411
src/LP2LSPConversion.c:141:8: runtime error: left shift of negative value -3003
src/LP2LSPConversion.c:105:22: runtime error: left shift of negative value -552
src/LP2LSPConversion.c:105:22: runtime error: left shift of negative value -552
src/LP2LSPConversion.c:137:17: runtime error: left shift of negative value -2057
src/g729FixedPointMath.h:259:7: runtime error: left shift of negative value -7173
src/g729FixedPointMath.h:300:25: runtime error: left shift of negative value -2563
src/dtx.c:165:27: runtime error: left shift of negative value -3037
src/g729FixedPointMath.h:259:7: runtime error: left shift of negative value -7069
src/g729FixedPointMath.h:300:25: runtime error: left shift of negative value -2664
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -5
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -5
src/computeWeightedSpeech.c:63:18: runtime error: left shift of negative value -1
src/utils.c:136:18: runtime error: left shift of negative value -1
src/computeWeightedSpeech.c:55:18: runtime error: left shift of negative value -1
src/computeLP.c:195:37: runtime error: left shift of negative value -35
src/computeLP.c:75:23: runtime error: left shift of negative value -293242880
src/cng.c:205:8: runtime error: left shift of negative value -1
src/LPSynthesisFilter.c:41:18: runtime error: left shift of negative value -2
src/postFilter.c:106:18: runtime error: left shift of negative value -2
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -1
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -1
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -4
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -4
src/fixedCodebookSearch.c:214:28: runtime error: left shift of negative value -1



- I've written my failing stream sample in the bcg729 test format, and added an encoderTest2 which is same as encoderTest but is activating VAD, and I can reproduce the problem ! I've attached both to this e-mail, hope it will be accepted by the list system.


Could you confirm that you can also reproduce the problem ?

What about the failed intrinsic tests, do you have the same results ?

About ASAN runtime errors, I've tried to circumvent them checking sign in left-shift macros, but it leads to get another runtime errors about integer overflows…


    With regards,

        Frédéric Boiteux - Odigo IVR product development
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

_______________________________________________
Linphone-developers mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/linphone-developers

encoderTest2.c (7K) Download Attachment
partie_pb_G729_v4.in.gz (90K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem with an infinite loop in bcg729 codec

Johan Pascal

Hi Frederic,

Thanks for the bug report.

I do not reproduce the fails in tests. Do you use the latest version of the code(not that it changed recently but you could use an older version)? The patterns for testing are downloaded automatically and have been updated with bug fixes, this allow the repository to be lighter for people who does not want to run the tests but I never made a version management for patterns so old code won't pass the tests as the patterns evolved.

I do not get the runtime errors from ASAN. (some memory leaks on the tests executables nothing more)

I do reproduce the infinite loop (using the encoderVADTest test program), I just pushed a fix.

https://gitlab.linphone.org/BC/public/bcg729/commit/6c7c7f3576e3c2f2aad864e1da0404b21c33b492


Regards,

Johan

On 14/09/2020 16:48, BOITEUX, FREDERIC wrote:
	Hello,

  I've progressed on this bcg729 codec problem :

- the problem is occurring when encoding data coming from a G711 decoded stream, using VAD on this stream.


- I actually got a [short] stream which let reproduces the bug (endless loop) 😊


- I've seen that the bcg729 codec have intrinsic tests : I've compiled and run them, and I got some « fails » :

$ ./testCampaignAll                                     
Test LP2LSPConversion bloc                                                         
  speech  ... Pass
  tame  ... Pass                                                                   
  lsp  ... Pass                          
…
Test postFilter bloc                     
  test  ... Pass                                                                   
  parity  ... Pass
  speech  ... Pass                                                                 
  erasure  ... Pass                      
  tame  ... Pass                                                                   
  algthm ... Fail                        
  fixed  ... Pass                        
  lsp ... Fail    
  pitch  ... Pass                                                                  
  overflow  ... Pass
…
Test decoder bloc 
  speechDecode 3750 frames in 0.092994 seconds : 24.798400 us/frame                
  ... Pass          
  lspDecode 2232 frames in 0.055679 seconds : 24.945789 us/frame                   
 ... Fail                                
…
  algthmDecode 35 frames in 0.000884 seconds : 25.257143 us/frame                  
 ... Fail                                                                          
…


- When I'm compiling the codec with ASAN checker (included in GCC), I get a lot of runtime errors about integer overflow or undefined behaviour :
src/qLSP2LP.c:87:10: runtime error: left shift of negative value -67560105
src/qLSP2LP.c:93:10: runtime error: left shift of negative value -10805
src/postFilter.c:226:28: runtime error: left shift of negative value -1032
src/computeLP.c:76:30: runtime error: left shift of negative value -118198051
src/computeLP.c:91:9: runtime error: left shift of negative value -75628195
src/computeLP.c:94:24: runtime error: left shift of negative value -6604576
src/LP2LSPConversion.c:65:11: runtime error: left shift of negative value -7304
src/LP2LSPConversion.c:66:11: runtime error: left shift of negative value -411
src/LP2LSPConversion.c:141:8: runtime error: left shift of negative value -3003
src/LP2LSPConversion.c:105:22: runtime error: left shift of negative value -552
src/LP2LSPConversion.c:105:22: runtime error: left shift of negative value -552
src/LP2LSPConversion.c:137:17: runtime error: left shift of negative value -2057
src/g729FixedPointMath.h:259:7: runtime error: left shift of negative value -7173
src/g729FixedPointMath.h:300:25: runtime error: left shift of negative value -2563
src/dtx.c:165:27: runtime error: left shift of negative value -3037
src/g729FixedPointMath.h:259:7: runtime error: left shift of negative value -7069
src/g729FixedPointMath.h:300:25: runtime error: left shift of negative value -2664
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -5
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -5
src/computeWeightedSpeech.c:63:18: runtime error: left shift of negative value -1
src/utils.c:136:18: runtime error: left shift of negative value -1
src/computeWeightedSpeech.c:55:18: runtime error: left shift of negative value -1
src/computeLP.c:195:37: runtime error: left shift of negative value -35
src/computeLP.c:75:23: runtime error: left shift of negative value -293242880
src/cng.c:205:8: runtime error: left shift of negative value -1
src/LPSynthesisFilter.c:41:18: runtime error: left shift of negative value -2
src/postFilter.c:106:18: runtime error: left shift of negative value -2
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -1
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -1
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -4
src/g729FixedPointMath.h:144:6: runtime error: left shift of negative value -4
src/fixedCodebookSearch.c:214:28: runtime error: left shift of negative value -1
…


- I've written my failing stream sample in the bcg729 test format, and added an encoderTest2 which is same as encoderTest but is activating VAD, and I can reproduce the problem ! I've attached both to this e-mail, hope it will be accepted by the list system.


Could you confirm that you can also reproduce the problem ?

What about the failed intrinsic tests, do you have the same results ?

About ASAN runtime errors, I've tried to circumvent them checking sign in left-shift macros, but it leads to get another runtime errors about integer overflows…


    With regards,

	Frédéric Boiteux - Odigo IVR product development
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

_______________________________________________
Linphone-developers mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/linphone-developers

_______________________________________________
Linphone-developers mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/linphone-developers
Reply | Threaded
Open this post in threaded view
|

Re: Problem with an infinite loop in bcg729 codec

BOITEUX, FREDERIC-2
        Hi Johan,

Thanks for your quick reply !

About the test suite, you’re correct, I didn’t had the latest pattern archive (I had an obsolete URL which didn’t work anymore and found somewhere an outdated archive). With the latest sources and patterns, all tests went fine 😊.

About ASAN errors, I still got errors, I share with you the whole options I use : I build the test suite with the following command :

(mkdir build_testASAN && cd build_testASAN && cmake -D ENABLE_TESTS=YES -DCMAKE_SKIP_INSTALL_RPATH=ON  -D CMAKE_C_FLAGS:STRING="${CMAKE_C_FLAGS} -fsanitize=address -fno-omit-frame-pointer  -fno-common -fsanitize=undefined -fsanitize=shift -fsanitize=integer-divide-by-zero -fsanitize=unreachable -fsanitize=vla-bound -fsanitize=signed-integer-overflow -fsanitize=bounds-strict -fsanitize=object-size -fsanitize=nonnull-attribute -fsanitize=returns-nonnull-attribute -fsanitize=bool -fsanitize=enum" .. && make)
[I'm running on Debian 10 Buster]

To be able to fully compile test suite, I had to fix some trivial warnings in some test programs, you'll find them attached to this e-mail, feel free to include them in your source tree.

 I run tests with :
(cd build_testASAN/test && ./testCampaignAll 2>&1|tee testCampaignAll.log)

The ASAN system produces a lot of logs, but I still get errors, like :


Test fixedCodebookSearch bloc
… lot of ASAN log…
…/bcg729.git/src/fixedCodebookSearch.c:211:28: runtime error: left shift of negative value -1
…/bcg729.git/src/fixedCodebookSearch.c:212:28: runtime error: left shift of negative value -1
…/bcg729.git/src/fixedCodebookSearch.c:213:28: runtime error: left shift of negative value -1
…/bcg729.git/src/fixedCodebookSearch.c:210:28: runtime error: left shift of negative value -1
  ... Pass

And when counting these errors  for the whole test suite, I get :
$ grep error testCampaignAll.log| wc -l
574

I'm not sure about the gravity/impact of these messages…


About the endless loop, many thanks for the fix, I've also tested it successfully, it's perfect 😊 !


        With regards,
        Frédéric Boiteux - Odigo IVR

This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

_______________________________________________
Linphone-developers mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/linphone-developers

fix_warnings_in_test_suite.patch (2K) Download Attachment