45
Continuous Integration for Spark Apps

Continuous Integration for Spark Apps by Sean McIntyre

Embed Size (px)

Citation preview

Continuous Integrationfor Spark Apps

Hi, I’m Sean!

© 2015 Uncharted Software Inc.

It’s hard to test Spark Apps :(

© 2015 Uncharted Software Inc.

Case Study: Uncharted Spark Pipeline

© 2015 Uncharted Software Inc.

Case Study: Uncharted Spark PipelineSome key issues:

● Ensure reliability● Prevent regressions● Maintain compatibility with multiple versions of Spark● Open-source - need a quick and easy way to evaluate PRs

© 2015 Uncharted Software Inc.

What is Continuous Integration?

© 2015 Uncharted Software Inc.

“Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an

automated build, allowing teams to detect problems early.”

-- ThoughtWorks

© 2015 Uncharted Software Inc.

“Continuous Integration (CI) is a development practice that is pretty damnedimportant for writing quality software.”

-- Me

© 2015 Uncharted Software Inc.

So, What is Continuous Integration?

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

} duh.

© 2015 Uncharted Software Inc.

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

} ...less duh.

© 2015 Uncharted Software Inc.

Why are these difficult with Apache Spark?

5. Build (and test) All The Branches6. Test in a clone of the production

environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

What is a Spark App?

© 2015 Uncharted Software Inc.

What is a Spark app?

Source JARSpark ?

This thing.

JAR

© 2015 Uncharted Software Inc.

And...

Source JARSpark ?

We need to test this

JAR

© 2015 Uncharted Software Inc.

But...

Source JARScalaTestScala RE

By default, we have this

JAR

(boom)

© 2015 Uncharted Software Inc.

v1: Squish Spark inside ScalaTest

Source JAR

ScalaTest with

SparkContext

So, we try this

JAR

it works!(sort of)

© 2015 Uncharted Software Inc.

it works!(sort of)

© 2015 Uncharted Software Inc.

6. Test in a clone of the production environment

© 2015 Uncharted Software Inc.

v2: Squish ScalaTest into Spark

Source

TestJAR

Tests Main.scala

Spark

JAR TestJAR

Test Output

JAR

© 2015 Uncharted Software Inc.

Main.scala

© 2015 Uncharted Software Inc.

6. Test in a clone of the production environment

© 2015 Uncharted Software Inc.

Progress?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

What now?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Docker Container (uncharted/sparklet)

v3: Squish Spark and Test JAR into Docker

Test Output

Source

TestJAR

Tests Main.scala

Spark

JAR

JAR TestJAR

© 2015 Uncharted Software Inc.

test.sh

© 2015 Uncharted Software Inc.

build.gradle (excerpt)

© 2015 Uncharted Software Inc.

Progress?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Travis CI VM

Docker Container

v4: Squish Docker into Travis CI

Test Output

Source

TestJAR

Tests Main.scala

Spark

JAR

JAR TestJAR

© 2015 Uncharted Software Inc.

.travis.yml

© 2015 Uncharted Software Inc.

Voilà!

© 2015 Uncharted Software Inc.

Progress?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

© 2015 Uncharted Software Inc.

© 2015 Uncharted Software Inc.

All done!

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Next Steps?

Alpine Linux

docker-compose

Windows (dev environment) support

python

© 2015 Uncharted Software Inc.

Questions?

https://github.com/unchartedsoftware/sparkpipe-core

https://github.com/Ghnuberath

@Ghnuberath

https://hub.docker.com/r/uncharted/sparklet/

[email protected]