Preg_match kills the page

advertisements

I am using preg_match to find and remove evaled base64 encoded viruses within files.

the regex bewlow:

/\s*eval\s*\(\s*base64_decode\s*\(\s*('[a-zA-Z0-9\+\/]*={0,2}'|"[a-zA-Z0-9\+\/]*={0,2}")\s*\)\s*\s*\)\s*(;)?\s*/

matches the following code:

eval(base64_decode("BASE64+ENCODED+VIRUS+HERE"));

The above regex works fine.

I wanted to match base64 strings word-wrapped via concatenations. So it should match the following as well "BASE64+EN" . "CODED+VIRUS+HERE".

So I changed the regex into:

/\s*eval\s*\(\s*base64_decode\s*\(\s*\'([a-zA-Z0-9\+\/]*(\'\s*\.\s*\')?[a-zA-Z0-9\+\/]*)*={0,2}\'|"([a-zA-Z0-9\+\/]*("\s*\.\s*")?[a-zA-Z0-9\+\/]*)*={0,2}"\s*\)\s*\s*\)\s*(;)?\s*/

Which finds a partial match for:

"BASE64+ENCODED+VIRUS+HERE"));

But when I try to apply the match on this entire file: http://pastebin.com/ED8sFUP0 the page dies with browser message "The connection to the server was reset while the page was loading.".

I have error reporting activated:

error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('scream.enabled', TRUE);

But nothing shows up not here and not in apache's error logs either.

The very same regex when used on files that do not contain the offending string works as expected; preg_match does not return boolean false it returns 0 meaning that there was no regex error and that it did not find any matches.

My concern is not necessarily why the regex finds only a partial match. That's probably some typo I made that happens to work.

I want to know when and how does the regex compiler fail break the entire process chain

apache > php > regex_compiler

I understand that it may very well be "because" of my regex that just happens compile correctly but not match correctly. And it might cause something bad down the road. But my interest is why the regex compiler fails with no error and how I can get the error message that is should be yielding.

Something similar is discussed but unresolved here: php preg_match_all kills page for unknown reason


I think your regex has to many possibilities to match ==> Catastrophic Backtracking.

/\s*eval\s*\(\s*base64_decode\s*\(\s*\'([a-zA-Z0-9\+\/]*(\'\s*\.\s*\')?[a-zA-Z0-9\+\/]*)*={0,2}\'|"([a-zA-Z0-9\+\/]*("\s*\.\s*")?[a-zA-Z0-9\+\/]*)*={0,2}"\s*\)\s*\s*\)\s*(;)?\s*/
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The regex will need a lot of steps to match the part I marked ==> you have a performance problem, the regex just don't finish in time!

Since (\'\s*\.\s*\')? is optional you need a lot of steps till the regex figured out what to match with the [a-zA-Z0-9\+\/]* before and the same thing after the optional part.

What you can do is to use possessive quantifiers (you make a quantifier possessive by adding a + after it). They prevent from backtracking and the possessive quantifier does not give back a character that it matched. So, try this

/\s*eval\s*\(\s*base64_decode\s*\(\s*\'([a-zA-Z0-9\+\/]*+(\'\s*\.\s*\')?[a-zA-Z0-9\+\/]*+)*={0,2}\'|"([a-zA-Z0-9\+\/]*+("\s*\.\s*")?[a-zA-Z0-9\+\/]*+)*={0,2}"\s*\)\s*\s*\)\s*(;)?\s*/
                                                       ^^                               ^^                           ^^                            ^^