PHP: How to run the sample profiler in production?


Production environment: A load balancer / HTTP reverse proxy in front of cluster of worker machines running Apache 2.2 with mod_php 5.3 on 64 bit Linux. All worker machines are running identical fully custom PHP code and speak to single backend PostgreSQL database. The PHP code is optimized to spend CPU over talking to the database. The database machine has been verified to still have lots of idle.

What I'm looking for: sampling profiler that can attach to PHP process by PID and periodically stop the process (e.g. with SIGSTOP), collect PHP stack via memory inspection and the continue the process (e.g. with SIGCONT). The stopping period should be adjustable but I think stopping interval should be around 1-10 ms.

A single worker machine PHP process is expected to run a single request always in less than 100 ms. I'm mostly interested collecting profile data for those processes that take more than 100 ms. The best case scenario would be a sampling profiler that would be notified at the start of the request and if the PHP process handling the request is still running 100 ms later, start collecting samples at 1 ms intervals. This way any normally running process would be run to the end without interrupts and I would still get profiles for problematic cases.

Does this kind of sampling profiler for PHP exist? The intent is to not use instrumenting profiler because the overhead is too high and the instrumentation messes the statistics (been there, done that).

I'm already aware of XHProf and Xdebug but as far as I know, both are instrumenting profilers and affect the actual opcodes of PHP program. I'd highly prefer something that runs the normal PHP opcodes instead.

The closest I know would work is to run PHP code with HipHop and use sampling profiler for C/C++ code but I'm not yet ready to port the software to HipHop. And in that case, the profiling result would be representative only for HipHop, not for mod_php.

Although XHProf does add overhead to the request if it's enabled (via the function called, not just having the extension enabled), it varies depending on which flags you use. I measured this recently and found that having only XHPROF_FLAGS_MEMORY is the best option for minimal overhead:

I just run XHProf on a small number of requests like so:

function xhprof_enabled() {
  if (mt_rand(1, 300) == 1) {

But unlike xdebug, simply having the extension enabled doesn't seem to add any overhead at all.