ATO 2018 is a wonderful event in Raleigh, NC, where more than 3,000 attendants are expected over two days.
ATO 2018 is a wonderful technology event around Open Source in Raleigh, NC, where more than 3,000 attendants are expected over two days.

All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center, in the heart of North Carolina’s capital.

I will be giving a double session on October 22nd, titled Big Data Made Easy With a Spark. As you can easily imagine, I am going to speak Big Data, Java, and Spark. If you are interested in those topics, come over and have fun and get a chance to win a copy of Spark in Action, second edition.

The promise of this session is:

In this hands-on session, you will learn how to do a full Big Data scenario from ingestion to publication. You will see how we can use Java and Apache Spark to ingest data, perform some transformations, save the data. You will then perform a second lab where you will run your very first Machine Learning algorithm!

As you can see in the title, it’s a hands-on tutorial. It means, you will have to do things on your own! Don’t think it’s this kind of lessons where you just have a seat and listen…

However, seriously, to make things smoother, read this quick article and try to have the material downloaded and installed. It will simplify our work!

Prerequisites

Make sure:

  • You have administrator access to your machine.
  • You have the right to install stuff on your machine.

Material to download & install

Source code

Lab #1 – file ingestion

The code can be downloaded from the examples of my book Spark with Java / Spark in Action, second edition  (the book is being renamed). Go to: https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01.

Lab #2 – a bit of analytics

This example is not yet in my book. However, it will be in, along with others, in a more detailed and explained more in chapters 11 to 13. Of course, you can already access a draft of the code at: https://github.com/jgperrin/net.jgp.labs.spark.

Lab #3 – an even smaller bit of AI (artificial intelligence)

This example is, as well, not yet in the book, but chapter 18 will talk AI and ML (machine learning). Nevertheless, you can download from GitHub at https://github.com/jgperrin/net.jgp.labs.sparkdq4ml.

Slides

The slides are available SlideShare.

 

Please share your feedback in the comments below or via Twitter, my handle is @jgperrin.

 

 

Update

  • 2018-10-24 slides and link to slides added.

Leave a Reply