In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]

Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on […]

Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine Learning on data, using […]

Hortonworks Data Platform (HDP) v2.6 has been released and you can download the platform from their website. The sandbox is not yet available in v2.6. New Versions of Key Components […]