Is it possible to use multithreading without creating threads again and again?


First and once more, thanks to all that already answered my question. I am not a very experienced programmer and it is my first experience with multithreading.

I got an example that is working quite like my problem. I hope it could ease our case here.

public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
    CountDownLatch countDown;
    public Batch(CountDownLatch countDown) {
        this.countDown = countDown;

    public void run() {
        long t0 =System.nanoTime();
        long t = 0;
        while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }

        if(countDown!=null) countDown.countDown();

public static void main(String[] args) {
    ThreadFactory threadFactory = new ThreadFactory() {
        int counter = 1;
        public Thread newThread(Runnable r) {
            Thread t = new Thread(r, "Executor thread " + (counter++));
            return t;

  // the total duty to be divided in tasks is fixed (problem dependent).
  // Increase ntasks will mean decrease the task time proportionally.
  // 4 Is an arbitrary example.
  // This tasks will be executed thousands of times, inside a loop alternating
  // with serial processing that needs their result and prepare the next ones.
    int ntasks = 4;
    int nthreads = 2;
    int ncores = Runtime.getRuntime().availableProcessors();
    if (nthreads<ncores) ncores = nthreads;     

    Batch serial = new Batch(null);
    long serialTime = System.nanoTime();;
    serialTime = System.nanoTime() - serialTime;

    ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
    CountDownLatch countDown = new CountDownLatch(ntasks);

    ArrayList<Batch> batches = new ArrayList<Batch>();
    for (int i = 0; i < ntasks; i++) {
        batches.add(new Batch(countDown));

    long start = System.nanoTime();
    for (Batch r : batches){

    // wait for all threads to finish their task
    try {
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
    long tmeasured = (System.nanoTime() - start);

    System.out.println("Task time= " + TASK_TIME + " ms");
    System.out.println("Number of tasks= " + ntasks);
    System.out.println("Number of threads= " + nthreads);
    System.out.println("Number of cores= " + ncores);
    System.out.println("Measured time= " + tmeasured);
    System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
    System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
    System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);


Instead of doing the calculations, each batch just waits for some given time. The program calculates the speedup, that would allways be 2 in theory but can get less than 1 (actually a speed down) if the 'TASK_TIME' is small.

My calculations take at the top 1 ms and are commonly faster. For 1 ms I find a little speedup of around 30%, but in practice, with my program, I notice a speed down.

The structure of this code is very similar to my program, so if you could help me to optimise the thread handling I would be very grateful.

Kind regards.

Below, the original question:


I would like to use multithreading on my program, since it could increase its efficiency considerably, I believe. Most of its running time is due to independent calculations.

My program has thousands of independent calculations (several linear systems to solve), but they just happen at the same time by minor groups of dozens or so. Each of this groups would take some miliseconds to run. After one of these groups of calculations, the program has to run sequentially for a little while and then I have to solve the linear systems again.

Actually, it can be seen as these independent linear systems to solve are inside a loop that iterates thousands of times, alternating with sequential calculations that depends on the previous results. My idea to speed up the program is to compute these independent calculations in parallel threads, by dividing each group into (the number of processors I have available) batches of independent calculation. So, in principle, there isn't queuing at all.

I tried using the FixedThreadPool and CachedThreadPool and it got even slower than serial processing. It seems to takes too much time creating new Treads each time I need to solve the batches.

Is there a better way to handle this problem? These pools I've used seem to be proper for cases when each thread takes more time instead of thousands of smaller threads...

Thanks! Best Regards!

From what i've read: "thousands of independent calculations... happen at the same time... would take some miliseconds to run" it seems to me that your problem is perfect for GPU programming.

And i think it answers you question. GPU programming is becoming more and more popular. There are Java bindings for CUDA & OpenCL. If it is possible for you to use it, i say go for it.