Is the RLE algorithm erroneous?

advertisements

I was looking at a recent Code Golf on the removal of duplicate characters in a string. i mulled it over and thought that the RLE algorithm would solve it, in fact, I did believe that would resolve removing duplicates, I wrote an implementation here in C, to see how far I could go with it

char *rle(const char *src){
    char *p=(char *)src;
    char *q=(char *)src+1;
    char *rle_enc=NULL, *tmp_rle, buf[10];
    int run=1;
    while (*p){
        while(*q){
            if (*p==*q++) run++,p++;
        }
        sprintf(buf,"%d%c",run,*(p-1));
        p++;
        if (!rle_enc){
            if ((rle_enc=malloc(strlen(buf)+1))!=NULL){
                strcpy(rle_enc,buf);
            }
        }else{
            if ((tmp_rle=realloc(rle_enc,(strlen(rle_enc)+strlen(buf)+1)))!=NULL){
                rle_enc=tmp_rle;
                strcat(rle_enc,buf);
            }
        }
        q=(p+1);
        run=1;
    }
    return rle_enc;
}

Sure enough, here's the main for this:

int main(int argc, char **argv){
    char *test1 = "HHHHHHeeeeeelllllloooooooo";
    char *test2 = "nbHHkRvrXbvkn";
    char *p = rle(test1);
    printf("s = %s\n", test1);
    printf("p = %s\n", p);
    if (p) free(p);
    return 0;
}

According to Code Golf on meta, it should be reusable and solve a set of problems, BUT in the shortest set of characters, fair enough I thought I'd just change the variables to 1 letters and compact the code to make it small..but something wasn't quite right with it as this lead me to think about the RLE algorithm itself, here's a page on Wikipedia about what it has to say and the implementation in Java.

The code does appear to be doing what it should, so I thought, now, it's just a matter of going through the encoded string result from rle looking for those that have a 1 followed by the letter..

I did however notice the limitation of the RLE algorithm, it is only suitable for those that have a set of repetitive characters adjacent to each other. But it failed the test case for the Code Golf, which looks deceptively simple, which leads me to this question:

Is the RLE algorithm flawed? Where would it be used nowadays? Gathering dust I presume due to the volume of data and information flowing around that RLE no longer fits a purpose...

Edit: Thanks to Moonshadow, John and Steve for posting their answers.

There is a fundamental lesson that I've still failed to learn - never ever go OTT and think complex when it comes to this kind of thing, that is a fallacy on my part and shows how big thinking can get in the way and I can get sucked into it too deeply and get carried away without looking at the right angle!!!!! Thanks again! :)

Best regards, Tom.


RLE will not solve that code golf problem for you.

The code golf problem requires you to strip out all characters that occur more than once in the input, regardless of where the occurrences are. However, RLE, "run length encoding", encodes "runs" - repeated sequences of the same character; multiple runs of the same character can occur in a string, and RLE will encode these separately, by design.

RLE is intended to encode sequences of repeated data elements more compactly by replacing the sequence with just one element followed by the number of times it is repeated. For this purpose it is perfectly adequate. Any "flaw" is not in the algorithm, but rather in the decision to use it for a purpose for which it is poorly suited.