Efficiency of the search algorithm for kmp

advertisements

Based on Efficiency of the search algorithm of KMP,I really don't understand why the loop can execute at most 2n times.

The following is the pseudocode on wiki

algorithm kmp_search:
    input:
        an array of characters, S (the text to be searched)
        an array of characters, W (the word sought)
    output:
        an integer (the zero-based position in S at which W is found)

    define variables:
        an integer, m ← 0 (the beginning of the current match in S)
        an integer, i ← 0 (the position of the current character in W)
        an array of integers, T (the table, computed elsewhere)

    while m + i < length(S) do
        if W[i] = S[m + i] then
            if i = length(W) - 1 then
                return m
            let i ← i + 1
        else
            let m ← m + i - T[i]
            if T[i] > -1 then
                let i ← T[i]
            else
                let i ← 0

    (if we reach here, we have searched all of S unsuccessfully)
    return the length of S

I think the while loop executes at most n times, not 2n times. There are two branches in the loop. The first branch increase i but do not increase m. The second branch adds i-T[i] to m and i>T[i], so m will be increased. Thus m+i always increase in the while loop. I think the total time in the loop is at most n, why 2n times?


It is easy to visualize the algorithm as follows. You have a stationary string S, a movable string W directly under S, and a small rectangular sliding window that covers one character in both W and S. Initially the beginning of W is under the beginning of S, and the window covers initial characters of both. Each step of the algorithm is as follows:

  1. If two characters covered by the sliding window match, move the window one position to the right. (If we were at the last character of W, the search is over, we have a success).
  2. Otherwise, if the window is covering the first character of W, move both W and the window one position to the right.
  3. Otherwise, move S to the right by the amount calculated from the table (does not really matter how much for this analysis; but it's a positive number).

I recommend doing that physically with two strips of paper and a paper window or a piece of glass!

It is easy to see that for each step 1 you can do step 3 at most once. The worst case is that you do step 3 exactly once for each step 1, and no step 2 is done at all. If you have the first character of W always match S, and the second character never match, that would be exactly the worst case. Thus, W="ab", S="aaaaaaaaaaaaa..".