Possible data racing conditions in OpenSSL libcrypto.1.0.0 CRYPTO_malloc () reported by Helgrind

advertisements

I am having issues testing a multi threaded (pthreads) application with Helgrind reporting hundreds of possible data races inside CRYPTO_malloc() in libcrypto.1.0.0.

I have read and fully understood all the available documentation, including The Definative Guide to Linux Network Programming pages 255-259 and http://www.openssl.org/docs/crypto/threads.html.

My code is initializing the static and dynamic lock structures and callbacks, and registering the thread ID call back:

CRYPTO_THREADID_set_callback(ssl_mitm_thread_id);
CRYPTO_set_dynlock_create_callback(ssl_mitm_dyn_create_lock);
CRYPTO_set_dynlock_lock_callback(ssl_mitm_dyn_locking_callback);
CRYPTO_set_dynlock_destroy_callback(ssl_mitm_dyn_destroy_lock);
ssl_mitm_lock_mutexes = (pthread_mutex_t*)malloc(CRYPTO_num_locks() *
  sizeof(pthread_mutex_t));
for(i = 0; i < CRYPTO_num_locks(); i++) {
  pthread_mutex_init(&ssl_mitm_lock_mutexes[i], NULL);
}
CRYPTO_set_locking_callback(ssl_mitm_locking_callback);

SSL_library_init();
SSL_load_error_strings();
...etc...

I have added debug statements to the locking call back functions, and openSSL is only calling the depreciated CRYPTO_set_locking_callback(), and not using the new dynamic lock interface.

Even so, it is calling the old static lock function, so is making some attempt at thread syncronisation, but Helgrind still thinks there are lots of possible data races. Most if not all of them are within CRYPTO_malloc().

So, is this Helgrind getting over zealous or is it a bug in OpenSSL/libcrypto?

e.g.

==20093== ----------------------------------------------------------------
==20093==
==20093== Possible data race during write of size 4 at 0x63BA368 by thread #3
==20093== Locks held: none
==20093==    at 0x604F0FE: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==20093==    by 0x57F4186: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==20093==    by 0x57D20C4: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==20093==    by 0x57D62E1: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==20093==    by 0x57DF0B6: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
[snip] (SSL_connect() called)
==20093==
==20093== This conflicts with a previous write of size 4 by thread #6
==20093== Locks held: none
==20093==    at 0x604F0FE: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==20093==    by 0x57E1673: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==20093==    by 0x57DF359: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
[snip] (SSL_connect() called)
==20093==
==20093== ----------------------------------------------------------------

There are other examples of data races in CRYPTO_malloc() called from SSL_read, SSL_write, SSL_connect and SSL_accept, and others.


The last line in the snip below is the line of code Helgrind is complaining about:

void *CRYPTO_malloc(int num, const char *file, int line)
        {
        void *ret = NULL;

        if (num <= 0) return NULL;

        allow_customize = 0;

The allow_customize global variable is initialized to 1 in static initialization. This value is used to make sure the malloc customization functions (the CRYPTO_set_mem... and variations) are not called after the allocation routines themselves have been called. This setting to the value 0 is what is done outside of a lock. Technically, there is a race condition if one thread attempts to call a CRYPTO_...alloc function when another thread is trying to set custom functions for CRYPTO, but that can be considered an application bug.