All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center, in the heart of North Carolina’s capital.
I will be giving a double session on October 22nd, titled Big Data Made Easy With a Spark. As you can easily imagine, I am going to speak Big Data, Java, and Spark. If you are interested in those topics, come over and have fun and get a chance to win a copy of Spark in Action, second edition.
The promise of this session is:
In this hands-on session, you will learn how to do a full Big Data scenario from ingestion to publication. You will see how we can use Java and Apache Spark to ingest data, perform some transformations, save the data. You will then perform a second lab where you will run your very first Machine Learning algorithm!
As you can see in the title, it’s a hands-on tutorial. It means, you will have to do things on your own! Don’t think it’s this kind of lessons where you just have a seat and listen…
However, seriously, to make things smoother, read this quick article and try to have the material downloaded and installed. It will simplify our work!
- You have administrator access to your machine.
- You have the right to install stuff on your machine.
Material to download & install
- You will need a JDK (Java Development Kit) v8 on your machine, you can download at http://bit.ly/javadk8 (or https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html). Note that I will only use JDK 8 in this tutorial. Unfortunately, I will not be able to support any other version of Java.
- To ease our development experience, I will use Eclipse as my IDE. I am not opposed to other products, however I will not be able to support any other IDE or Eclipse prior to Oxygen (Eclipse 4.7). You can download Eclipse at https://www.eclipse.org/downloads/packages/. On my side, I will use Eclipse SimRel 2018-09 on stage.
- Nice to have but not required: Maven, SourceTree, or git on the command line.
Lab #1 – file ingestion
The code can be downloaded from the examples of my book Spark with Java / Spark in Action, second edition (the book is being renamed). Go to: https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01.
Lab #2 – a bit of analytics
This example is not yet in my book. However, it will be in, along with others, in a more detailed and explained more in chapters 11 to 13. Of course, you can already access a draft of the code at: https://github.com/jgperrin/net.jgp.labs.spark.
Lab #3 – an even smaller bit of AI (artificial intelligence)
This example is, as well, not yet in the book, but chapter 18 will talk AI and ML (machine learning). Nevertheless, you can download from GitHub at https://github.com/jgperrin/net.jgp.labs.sparkdq4ml.
The slides are available SlideShare.
- 2018-10-24 slides and link to slides added.