I intend to implement a thread pool to manage threads in my project. The basic structure of thread pool come to my head is queue, and some threads generate tasks into this queue, and some thread managed by thread pool are waiting to handle those task. I think this is class producer and consumer problem. But when I google thread pool implementation on the web, I find those implementation seldom use this classic model, so my question is why they don't use this classic model, does this model has any drawbacks? why they don't use full semaphore and empty semaphore to sync?
If you have multiple threads waiting on a single resource (in this case the semaphores and queue) then you are creating a bottle neck. You are forcing all tasks through one queue, even though you have multiple workers. Logically this might make sense if the workers are usually idle, but the whole point of a thread pool is to deal with a heavily loaded scenario where the workers are kept busy (for maximum through-put). Using a single input queue will be particularly bad on a multi-processor system where all workers read and write the head of the queue when they are trying to get the next task. Even though the lock contention might be low, the queue head pointer will still need to be shared/communicated from one CPU cache to another each time it is updated.
Think about the ideal case: all workers are always busy. When a new task is enqueued you want it to be dispatched to the worker that will complete its current/pending task(s) first.
If, as a client, you had a contention-free oracle that could tell you which worker to enqueue a new task to, and each worker had its own queue, then you could implement each worker with its own multi-writer-single-reader queue and always dispatch new tasks to the best queue, thus eliminating worker contention on a single shared input queue. Of course you don't have such an oracle, but this mechanism still works pretty well until a worker runs out of tasks or the queues get imbalanced. "Work stealing" deals with these cases, while still reducing contention compared to the single queue case.
See also: Is Work Stealing always the most appropriate user-level thread scheduling algorithm?