Parallel Execution

etl4s has an elegant shorthand for grouping and parallelizing operations that share the same input type:

import etl4s._

/* Simulate slow IO operations (e.g: DB calls, API requests) */

val e1 = Extract { Thread.sleep(100); 42 }
val e2 = Extract { Thread.sleep(100); "hello" }
val e3 = Extract { Thread.sleep(100); true }

Sequential run of e1, e2, and e3 (~300ms total)

val sequential: Extract[Unit, ((Int, String), Boolean)] =
     e1 & e2 & e3

Parallel run of e1, e2, e3 on their own JVM threads with Scala Futures (~100ms total, same result, 3X faster)

import scala.concurrent.ExecutionContext.Implicits.global

val parallel: Extract[Unit, ((Int, String), Boolean)] =
     e1 &> e2 &> e3

Use the built-in zip method to flatten unwieldly nested tuples:

val clean: Extract[Unit, (Int, String, Boolean)] =
     (e1 & e2 & e3).zip

Mix sequential and parallel execution (First two parallel (~100ms), then third (~100ms)):

val mixed = (e1 &> e2) & e3

Full example of a parallel pipeline:

val consoleLoad: Load[String, Unit] = Load(println(_))
val dbLoad:      Load[String, Unit] = Load(x => println(s"DB Load: ${x}"))

val merge = Transform[(Int, String, Boolean), String] { t => 
    val (i, s, b) = t
    s"$i-$s-$b"
  }

val pipeline =
  (e1 &> e2 &> e3).zip ~> merge ~> (consoleLoad &> dbLoad)