Embracing the Messiness in Search of Epic Solutions

Groovy + GPars: Handling Concurrency

Posted

in

PROBLEM

Let’s assume given a list of user IDs (ex: 1, 2, 3, 4, and 5), we need to query 2 data sources to get the names and the addresses before returning a list of Employee objects. The Employee class looks like this:-

class App {
    String nameLookup(Integer id) {
        log("Name lookup: ${id}")
        Thread.sleep(2000)
        "User ${id}"
    }

    String addressLookup(Integer id) {
        log("Addr lookup: ${id}")
        Thread.sleep(4000)
        "Address ${id}"
    }

    void log(String message) {
        println "${Thread.currentThread()} - ${new Date().format('HH:mm:ss')} - ${message}"
    }

    List<Employee> lookup(List<Integer> userIds) {
        ???
    }

    void run() {
        def start = System.currentTimeMillis()

        def employees = lookup([1, 2, 3, 4, 5])

        def end = System.currentTimeMillis()

        println '---------------------------------'
        println "Total time in seconds: ${(end - start) / 1000}"
        println '---------------------------------'

        employees.each {
            println it
        }
    }

    static void main(String[] args) {
        new App().run()
    }
}

We are going to explore multiple solutions to implement lookup(..) to run as fast as possible.

ATTEMPT 1

The simplest and the most straightforward approach is to perform the task synchronously.

Code:-

List<Employee> lookup(List<Integer> userIds) {
    userIds.collect { id ->
        new Employee(
                id: id,
                name: nameLookup(id),
                address: addressLookup(id)
        )
    }
}

Output:-

Thread[main,5,main] - 20:54:09 - Name lookup: 1
Thread[main,5,main] - 20:54:11 - Addr lookup: 1
Thread[main,5,main] - 20:54:15 - Name lookup: 2
Thread[main,5,main] - 20:54:17 - Addr lookup: 2
Thread[main,5,main] - 20:54:21 - Name lookup: 3
Thread[main,5,main] - 20:54:23 - Addr lookup: 3
Thread[main,5,main] - 20:54:27 - Name lookup: 4
Thread[main,5,main] - 20:54:29 - Addr lookup: 4
Thread[main,5,main] - 20:54:33 - Name lookup: 5
Thread[main,5,main] - 20:54:35 - Addr lookup: 5
---------------------------------
Total time in seconds: 30.129
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

Since everything runs synchronously in one thread, it takes about 30 seconds to complete.

ATTEMPT 2

In this attempt and the rest of the attempts, we are going to use GPars, which is a concurrency and parallelism library.

<dependency>
    <groupId>org.codehaus.gpars</groupId>
    <artifactId>gpars</artifactId>
    <version>1.2.1</version>
</dependency>

In this attempt, we are going to use GParsPool.executeAsyncAndWait(..).

Code:-

List<Employee> lookup(List<Integer> userIds) {
    (List<Employee>) GParsPool.withPool {
        userIds.collect { id ->
            def results = GParsPool.executeAsyncAndWait(
                    this.&nameLookup.curry(id),
                    this.&addressLookup.curry(id)
            )

            new Employee(
                    id: id,
                    name: results[0],
                    address: results[1]
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:40 - Addr lookup: 1
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:40 - Name lookup: 1
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:44 - Name lookup: 2
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:44 - Addr lookup: 2
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:48 - Name lookup: 3
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:48 - Addr lookup: 3
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:52 - Addr lookup: 4
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:52 - Name lookup: 4
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:56 - Addr lookup: 5
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:56 - Name lookup: 5
---------------------------------
Total time in seconds: 20.184
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

This allows us to run both the name lookup and the address lookup concurrently for each user ID.

Speed gain compared to first attempt: 1.5x

ATTEMPT 3

How about collectParallel(..)?

Code:-

List<Employee> lookup(List<Integer> userIds) {
    (List<Employee>) GParsPool.withPool {
        userIds.collectParallel { Integer id ->
            new Employee(
                    id: id,
                    name: nameLookup(id),
                    address: addressLookup(id)
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-2-worker-1,5,main] - 20:55:00 - Name lookup: 1
Thread[ForkJoinPool-2-worker-4,5,main] - 20:55:00 - Name lookup: 4
Thread[ForkJoinPool-2-worker-3,5,main] - 20:55:00 - Name lookup: 2
Thread[ForkJoinPool-2-worker-2,5,main] - 20:55:00 - Name lookup: 3
Thread[ForkJoinPool-2-worker-5,5,main] - 20:55:00 - Name lookup: 5
Thread[ForkJoinPool-2-worker-4,5,main] - 20:55:02 - Addr lookup: 4
Thread[ForkJoinPool-2-worker-3,5,main] - 20:55:02 - Addr lookup: 2
Thread[ForkJoinPool-2-worker-1,5,main] - 20:55:02 - Addr lookup: 1
Thread[ForkJoinPool-2-worker-2,5,main] - 20:55:02 - Addr lookup: 3
Thread[ForkJoinPool-2-worker-5,5,main] - 20:55:02 - Addr lookup: 5
---------------------------------
Total time in seconds: 6.156
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

This allows us to run all name lookups concurrently, followed by all address lookup concurrently.

Speed gain compared to first attempt: 4.9x

ATTEMPT 4

Perhaps, another approach is to kick start all the lookups asynchronously and hold on to the returned Future objects:-

Code:-

List<Employee> lookup(List<Integer> userIds) {
    (List<Employee>) GParsPool.withPool {
        List<Future> nameFutures = userIds.collect {
            this.&nameLookup.callAsync(it)
        }

        List<Future> addressFutures = userIds.collect {
            this.&addressLookup.callAsync(it)
        }

        userIds.withIndex().collect { Integer id, Integer i ->
            new Employee(
                    id: id,
                    name: nameFutures[i].get(),
                    address: addressFutures[i].get()
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-3-worker-5,5,main] - 20:55:06 - Name lookup: 5
Thread[ForkJoinPool-3-worker-4,5,main] - 20:55:06 - Name lookup: 4
Thread[ForkJoinPool-3-worker-2,5,main] - 20:55:06 - Name lookup: 2
Thread[ForkJoinPool-3-worker-3,5,main] - 20:55:06 - Name lookup: 3
Thread[ForkJoinPool-3-worker-1,5,main] - 20:55:06 - Name lookup: 1
Thread[ForkJoinPool-3-worker-3,5,main] - 20:55:08 - Addr lookup: 4
Thread[ForkJoinPool-3-worker-2,5,main] - 20:55:08 - Addr lookup: 3
Thread[ForkJoinPool-3-worker-4,5,main] - 20:55:08 - Addr lookup: 2
Thread[ForkJoinPool-3-worker-5,5,main] - 20:55:08 - Addr lookup: 1
Thread[ForkJoinPool-3-worker-1,5,main] - 20:55:08 - Addr lookup: 5
---------------------------------
Total time in seconds: 6.032
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

It’s slightly better than the previous attempt, but the performance gain is rather negligible.

Speed gain compared to first attempt: 5.0x

ATTEMPT 5

What if we take the previous attempt and increase the number of threads to 10?

Code:-

List<Employee> lookup(List<Integer> userIds) {
    GParsPool.withPool 10, {
        List<Future> nameFutures = userIds.collect {
            this.&nameLookup.callAsync(it)
        }

        List<Future> addressFutures = userIds.collect {
            this.&addressLookup.callAsync(it)
        }

        userIds.withIndex().collect { Integer id, Integer i ->
            new Employee(
                    id: id,
                    name: nameFutures[i].get(),
                    address: addressFutures[i].get()
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-4-worker-1,5,main] - 20:55:12 - Name lookup: 1
Thread[ForkJoinPool-4-worker-2,5,main] - 20:55:12 - Name lookup: 2
Thread[ForkJoinPool-4-worker-3,5,main] - 20:55:12 - Name lookup: 3
Thread[ForkJoinPool-4-worker-4,5,main] - 20:55:12 - Name lookup: 4
Thread[ForkJoinPool-4-worker-5,5,main] - 20:55:12 - Name lookup: 5
Thread[ForkJoinPool-4-worker-6,5,main] - 20:55:12 - Addr lookup: 1
Thread[ForkJoinPool-4-worker-8,5,main] - 20:55:12 - Addr lookup: 3
Thread[ForkJoinPool-4-worker-7,5,main] - 20:55:12 - Addr lookup: 2
Thread[ForkJoinPool-4-worker-9,5,main] - 20:55:12 - Addr lookup: 4
Thread[ForkJoinPool-4-worker-10,5,main] - 20:55:12 - Addr lookup: 5
---------------------------------
Total time in seconds: 4.016
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

Speed gain compared to first attempt: 7.5x

And now, we have successfully improve our implementation performance from 30 seconds to 4 seconds.

Tags:

Comments

Leave a Reply