When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering […]

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]

Read about eight very hot predictions for data management in 2019, in usages, shapes, governance, and people.

A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly […]

Yesterday, during Ignite 2018, Microsoft announced that they will integrate Apache Spark more tightly with SQL Server 2019. If you missed previous announcements around SQL Server, it now runs on […]

Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on […]

Apache Spark has been a game changer for distributed data processing, thanks to an easy to understand API, a focus on simplicity, and an adoption of modern infrastructure. However, rumors […]

Spark Summit Europe 2017 just concluded, here, in Dublin. More than 102 speakers, 1200 attendees, and an impressive Databricks team attended the 3-day long celebration. Spark is reaching a new […]