Lifting / Reorganization in C, C ++ and Java: Do mandatory variable declarations always have to be at the top in a context?

advertisements

I read a little about hoisting and reordering, so it seems that Java VM may choose to hoist some expressions. I also read about hoisting of function declarations in Javascript.

First Question: Can someone confirm if hoisting usually exist in C, C++ and Java? or are they all compiler/optimization dependent?

I read a lot of example C codes that always put variable declarations on top, before any assert or boundary condition. I thought it would be a little faster to do all the asserts and boundary cases before variable declarations given that the function could just terminate.

Main Question: Must variable declarations always be on top in a context? (is there hoisting at work here?) Or does the compiler automatically optimize the code by checking these independent asserts and boundary cases first (before irrelevant variable declaration)?

Here's a related example:

void MergeSort(struct node** headRef) {
    struct node* a;
    struct node* b;
    if ((*headRef == NULL) || ((*headRef)->next == NULL)) {
        return;
    }
    FrontBackSplit(*headRef, &a, &b);
    MergeSort(&a);
    MergeSort(&b);
    *headRef = SortedMerge(a, b);
}

As shown above, the boundary case does not depend on variables "a" and "b". Thus, putting the boundary case above variable declarations would make it slightly faster?


Updates:

The above example isn't as good as I hoped because variables "a" and "b" were only declared, not initialized there. Compiler would ignore declaration until we actually need to use them.

I checked GNU GCC assemblies for variable declarations with initializations, the assemblies have different execution sequence. Compiler did not change my ordering of independent asserts and boundary cases. So, reordering these asserts and boundary cases do change the assemblies, thus changing how machine runs them.

I suppose the difference is minuscule that most people never cared about this.


The compiler may reorder/modify your code as it wishes, as long as the modified code is equivalent to the original if executed sequentially. So hoisting is allowed, but not required. This is an optimization and it is completely compiler specific.

Variable declarations in C++ can be wherever you wish. In C they used to have to be on top in a context, but when the c99 standard was introduced, the rules were relaxed and now they can be wherever you want, similarly to c++. Still, many c programmers stick to putting them on top in a context.

In your example, the compiler is free to move the if statements to the top, but I don't think it would. These variables are just pointers that are declared on stack and are un-initialized, the cost of declaring them is minimal, moreover it might be more efficient to create them at the beginning of the function, rather than after the asserts.

If your declarations would involve any side-effects, for example

struct node *a = some_function();

then compiler would be limited in what it can reorder.

Edit:

I checked GCC's loop hoisting in practice with this short program:

#include <stdio.h>
int main(int argc, char **argv) {
    int dummy = 2 * argc;
    int i = 1;
    while (i<=10 && dummy != 4)
        printf("%d\n", i++);
    return 0;
}

I've compiled it with this command:

gcc -std=c99 -pedantic test.c -S -o test.asm

This is the output:

    .file   "test.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .section .rdata,"dr"
LC0:
    .ascii "%d\12\0"
    .text
    .globl  _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
LFB7:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    call    ___main
    movl    8(%ebp), %eax
    addl    %eax, %eax
    movl    %eax, 24(%esp)
    movl    $1, 28(%esp)
    jmp L2
L4:
    movl    28(%esp), %eax
    leal    1(%eax), %edx
    movl    %edx, 28(%esp)
    movl    %eax, 4(%esp)
    movl    $LC0, (%esp)
    call    _printf
L2:
    cmpl    $10, 28(%esp)
    jg  L3
    cmpl    $4, 24(%esp)
    jne L4
L3:
    movl    $0, %eax
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE7:
    .ident  "GCC: (GNU) 4.8.2"
    .def    _printf;    .scl    2;  .type   32; .endef

Then I've compiled it with this command:

gcc -std=c99 -pedantic test.c -O3 -S -o test.asm

This is the output:

    .file   "test.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .section .rdata,"dr"
LC0:
    .ascii "%d\12\0"
    .section    .text.startup,"x"
    .p2align 4,,15
    .globl  _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
LFB7:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    pushl   %ebx
    andl    $-16, %esp
    subl    $16, %esp
    .cfi_offset 3, -12
    call    ___main
    movl    8(%ebp), %eax
    leal    (%eax,%eax), %edx
    movl    $1, %eax
    cmpl    $4, %edx
    jne L8
    jmp L6
    .p2align 4,,7
L12:
    movl    %ebx, %eax
L8:
    leal    1(%eax), %ebx
    movl    %eax, 4(%esp)
    movl    $LC0, (%esp)
    call    _printf
    cmpl    $11, %ebx
    jne L12
L6:
    xorl    %eax, %eax
    movl    -4(%ebp), %ebx
    leave
    .cfi_restore 5
    .cfi_restore 3
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE7:
    .ident  "GCC: (GNU) 4.8.2"
    .def    _printf;    .scl    2;  .type   32; .endef

So basically, with optimization turned on the original code was transformed to something like this:

#include <stdio.h>
int main(int argc, char **argv) {
    int dummy = 2 * argc;
    int i = 1;
    if (dummy != 4)
        while (i<=10)
            printf("%d\n", i++);
    return 0;
}

So, as you can see, there is indeed hoisting in C.