Groovy + GPars: Handling Concurrency

PROBLEM

Let’s assume given a list of user IDs (ex: 1, 2, 3, 4, and 5), we need to query 2 data sources to get the names and the addresses before returning a list of Employee objects. The Employee class looks like this:-

@ToString(includePackage = false)
class Employee {
    Long id
    String name
    String address
}

To simplify the example, let’s assume the name lookup takes 2 seconds to run and the address lookup takes 4 seconds to run.

And now, we have the following code:-

class App {
    String nameLookup(Integer id) {
        log("Name lookup: ${id}")
        Thread.sleep(2000)
        "User ${id}"
    }

    String addressLookup(Integer id) {
        log("Addr lookup: ${id}")
        Thread.sleep(4000)
        "Address ${id}"
    }

    void log(String message) {
        println "${Thread.currentThread()} - ${new Date().format('HH:mm:ss')} - ${message}"
    }

    List<Employee> lookup(List<Integer> userIds) {
        ???
    }

    void run() {
        def start = System.currentTimeMillis()

        def employees = lookup([1, 2, 3, 4, 5])

        def end = System.currentTimeMillis()

        println '---------------------------------'
        println "Total time in seconds: ${(end - start) / 1000}"
        println '---------------------------------'

        employees.each {
            println it
        }
    }

    static void main(String[] args) {
        new App().run()
    }
}

We are going to explore multiple solutions to implement lookup(..) to run as fast as possible.

ATTEMPT 1

The simplest and the most straightforward approach is to perform the task synchronously.

Code:-

List<Employee> lookup(List<Integer> userIds) {
    userIds.collect { id ->
        new Employee(
                id: id,
                name: nameLookup(id),
                address: addressLookup(id)
        )
    }
}

Output:-

Thread[main,5,main] - 20:54:09 - Name lookup: 1
Thread[main,5,main] - 20:54:11 - Addr lookup: 1
Thread[main,5,main] - 20:54:15 - Name lookup: 2
Thread[main,5,main] - 20:54:17 - Addr lookup: 2
Thread[main,5,main] - 20:54:21 - Name lookup: 3
Thread[main,5,main] - 20:54:23 - Addr lookup: 3
Thread[main,5,main] - 20:54:27 - Name lookup: 4
Thread[main,5,main] - 20:54:29 - Addr lookup: 4
Thread[main,5,main] - 20:54:33 - Name lookup: 5
Thread[main,5,main] - 20:54:35 - Addr lookup: 5
---------------------------------
Total time in seconds: 30.129
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

Since everything runs synchronously in one thread, it takes about 30 seconds to complete.

ATTEMPT 2

In this attempt and the rest of the attempts, we are going to use GPars, which is a concurrency and parallelism library.

<dependency>
    <groupId>org.codehaus.gpars</groupId>
    <artifactId>gpars</artifactId>
    <version>1.2.1</version>
</dependency>

In this attempt, we are going to use GParsPool.executeAsyncAndWait(..).

Code:-

List<Employee> lookup(List<Integer> userIds) {
    (List<Employee>) GParsPool.withPool {
        userIds.collect { id ->
            def results = GParsPool.executeAsyncAndWait(
                    this.&nameLookup.curry(id),
                    this.&addressLookup.curry(id)
            )

            new Employee(
                    id: id,
                    name: results[0],
                    address: results[1]
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:40 - Addr lookup: 1
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:40 - Name lookup: 1
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:44 - Name lookup: 2
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:44 - Addr lookup: 2
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:48 - Name lookup: 3
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:48 - Addr lookup: 3
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:52 - Addr lookup: 4
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:52 - Name lookup: 4
Thread[ForkJoinPool-1-worker-2,5,main] - 20:54:56 - Addr lookup: 5
Thread[ForkJoinPool-1-worker-1,5,main] - 20:54:56 - Name lookup: 5
---------------------------------
Total time in seconds: 20.184
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

This allows us to run both the name lookup and the address lookup concurrently for each user ID.

Speed gain compared to first attempt: 1.5x

ATTEMPT 3

How about collectParallel(..)?

Code:-

List<Employee> lookup(List<Integer> userIds) {
    (List<Employee>) GParsPool.withPool {
        userIds.collectParallel { Integer id ->
            new Employee(
                    id: id,
                    name: nameLookup(id),
                    address: addressLookup(id)
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-2-worker-1,5,main] - 20:55:00 - Name lookup: 1
Thread[ForkJoinPool-2-worker-4,5,main] - 20:55:00 - Name lookup: 4
Thread[ForkJoinPool-2-worker-3,5,main] - 20:55:00 - Name lookup: 2
Thread[ForkJoinPool-2-worker-2,5,main] - 20:55:00 - Name lookup: 3
Thread[ForkJoinPool-2-worker-5,5,main] - 20:55:00 - Name lookup: 5
Thread[ForkJoinPool-2-worker-4,5,main] - 20:55:02 - Addr lookup: 4
Thread[ForkJoinPool-2-worker-3,5,main] - 20:55:02 - Addr lookup: 2
Thread[ForkJoinPool-2-worker-1,5,main] - 20:55:02 - Addr lookup: 1
Thread[ForkJoinPool-2-worker-2,5,main] - 20:55:02 - Addr lookup: 3
Thread[ForkJoinPool-2-worker-5,5,main] - 20:55:02 - Addr lookup: 5
---------------------------------
Total time in seconds: 6.156
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

This allows us to run all name lookups concurrently, followed by all address lookup concurrently.

Speed gain compared to first attempt: 4.9x

ATTEMPT 4

Perhaps, another approach is to kick start all the lookups asynchronously and hold on to the returned Future objects:-

Code:-

List<Employee> lookup(List<Integer> userIds) {
    (List<Employee>) GParsPool.withPool {
        List<Future> nameFutures = userIds.collect {
            this.&nameLookup.callAsync(it)
        }

        List<Future> addressFutures = userIds.collect {
            this.&addressLookup.callAsync(it)
        }

        userIds.withIndex().collect { Integer id, Integer i ->
            new Employee(
                    id: id,
                    name: nameFutures[i].get(),
                    address: addressFutures[i].get()
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-3-worker-5,5,main] - 20:55:06 - Name lookup: 5
Thread[ForkJoinPool-3-worker-4,5,main] - 20:55:06 - Name lookup: 4
Thread[ForkJoinPool-3-worker-2,5,main] - 20:55:06 - Name lookup: 2
Thread[ForkJoinPool-3-worker-3,5,main] - 20:55:06 - Name lookup: 3
Thread[ForkJoinPool-3-worker-1,5,main] - 20:55:06 - Name lookup: 1
Thread[ForkJoinPool-3-worker-3,5,main] - 20:55:08 - Addr lookup: 4
Thread[ForkJoinPool-3-worker-2,5,main] - 20:55:08 - Addr lookup: 3
Thread[ForkJoinPool-3-worker-4,5,main] - 20:55:08 - Addr lookup: 2
Thread[ForkJoinPool-3-worker-5,5,main] - 20:55:08 - Addr lookup: 1
Thread[ForkJoinPool-3-worker-1,5,main] - 20:55:08 - Addr lookup: 5
---------------------------------
Total time in seconds: 6.032
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

It’s slightly better than the previous attempt, but the performance gain is rather negligible.

Speed gain compared to first attempt: 5.0x

ATTEMPT 5

What if we take the previous attempt and increase the number of threads to 10?

Code:-

List<Employee> lookup(List<Integer> userIds) {
    GParsPool.withPool 10, {
        List<Future> nameFutures = userIds.collect {
            this.&nameLookup.callAsync(it)
        }

        List<Future> addressFutures = userIds.collect {
            this.&addressLookup.callAsync(it)
        }

        userIds.withIndex().collect { Integer id, Integer i ->
            new Employee(
                    id: id,
                    name: nameFutures[i].get(),
                    address: addressFutures[i].get()
            )
        }
    }
}

Output:-

Thread[ForkJoinPool-4-worker-1,5,main] - 20:55:12 - Name lookup: 1
Thread[ForkJoinPool-4-worker-2,5,main] - 20:55:12 - Name lookup: 2
Thread[ForkJoinPool-4-worker-3,5,main] - 20:55:12 - Name lookup: 3
Thread[ForkJoinPool-4-worker-4,5,main] - 20:55:12 - Name lookup: 4
Thread[ForkJoinPool-4-worker-5,5,main] - 20:55:12 - Name lookup: 5
Thread[ForkJoinPool-4-worker-6,5,main] - 20:55:12 - Addr lookup: 1
Thread[ForkJoinPool-4-worker-8,5,main] - 20:55:12 - Addr lookup: 3
Thread[ForkJoinPool-4-worker-7,5,main] - 20:55:12 - Addr lookup: 2
Thread[ForkJoinPool-4-worker-9,5,main] - 20:55:12 - Addr lookup: 4
Thread[ForkJoinPool-4-worker-10,5,main] - 20:55:12 - Addr lookup: 5
---------------------------------
Total time in seconds: 4.016
---------------------------------
Employee(1, User 1, Address 1)
Employee(2, User 2, Address 2)
Employee(3, User 3, Address 3)
Employee(4, User 4, Address 4)
Employee(5, User 5, Address 5)

Speed gain compared to first attempt: 7.5x

And now, we have successfully improve our implementation performance from 30 seconds to 4 seconds.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s