News and Events • Jillur Quddus

Machine Learning with Apache Spark

Introduction to machine learning, deep learning and distributed parallel computing with Apache Spark, written by our founder Jillur Quddus.

Machine Learning with Apache Spark

Machine Learning with Apache Spark

Jillur Quddus • Founder & Chief Data Scientist • 28th December 2018

  Back to Knowledge Base

Introduction

In September 2018 I was fortunate enough to be approached, and soon thereafter commissioned, by Packt Publishing to write a book on Machine Learning with Apache Spark. After 3 frantic months of juggling client commitments with putting together the contents and case studies for this book, I am delighted to announce that, as of 28th December 2018, the book is now published and available to purchase via the following retailers, bookstores and online learning platforms.

Book Summary

Short Answer

Hands-on theoretical and applied introduction to machine learning and deep learning using Apache Spark.

Long Answer

Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits in order to recommend the latest products and services to fighting disease, climate change, and serious organized crime. Ultimately, we manage data in order to derive value from it, whether personal or business value, and many organizations around the world have traditionally invested in tools and technologies to help them process their data faster and more efficiently in order to deliver actionable insights.

But we now live in a highly interconnected world driven by mass data creation and consumption, where data is no longer rows and columns restricted to a spreadsheet but an organic and evolving asset in its own right. With this realization comes major challenges for organizations as we enter the intelligence-driven fourth industrial revolution—how do we manage the sheer amount of data being created every second in all of its various formats (think not only spreadsheets and databases, but also social media posts, images, videos, music, online forums and articles, computer log files, and more)? And once we know how to manage all of this data, how do we know what questions to ask of it in order to derive real personal or business value?

The focus of this book is to help us answer those questions in a hands-on manner starting from first principles. It introduces the latest cutting-edge technologies (the big data ecosystem, including Apache Spark) that can be used to manage and process big data. It then explores advanced classes of algorithms (machine learning, deep learning, natural language processing, and cognitive computing) that can be applied to the big data ecosystem to help us uncover previously hidden relationships in order to understand what the data is telling us so that we may ultimately solve real-world challenges.

Target Audience

Short Answer

Anyone interested in making a hands-on start in the world of machine learning and deep learning, with no prior mathematical or software engineering experience required.

Long Answer

This book is aimed at business analysts, data analysts, data scientists, data engineers, and software engineers for whom a typical day may currently involve analyzing data using spreadsheets or relational databases, perhaps using VBA, Structured Query Language (SQL), or even Python to compute statistical aggregations (such as averages) and to generate graphs, charts, pivot tables and other reporting mediums.

With the explosion of data in all of its various formats and frequencies, perhaps you are now challenged with not only managing all of that data but understanding what it is telling you. You have most likely heard the buzzwords like big data, artificial intelligence and machine learning, but now wish to understand where to start in order to take advantage of these new technologies and frameworks, not just in theory but in practice as well, to solve your business challenges. If this sounds familiar, then this book is for you!

Book Chapters

  • Chapter 1 - The Big Data Ecosystem
  • Chapter 2 - Setting Up a Local Development Environment
  • Chapter 3 - Artificial Intelligence and Machine Learning
  • Chapter 4 - Supervised Learning using Apache Spark
  • Chapter 5 - Unsupervised Learning using Apache Spark
  • Chapter 6 - Natural Language Processing using Apache Spark
  • Chapter 7 - Deep Learning using Apache Spark
  • Chapter 8 - Real-Time Machine Learning using Apache Spark

Core Technologies

Acknowledgements

I would like to thank the Packt Publishing team for providing me with this tremendous opportunity, with special thanks to:

  • Siddharth Mandal - Acquisition Editor
  • Mohammed Yusuf Imaratwale - Content Development Editor
  • Diksha Wakode - Technical Editor
  • Emmanuel Asimadi - Reviewer

Knowledge Base

Latest News and Posts

Python Taster Course

Python Taster Course

A fun and interactive introduction to both the Python programming language and basic computing concepts using programmable robots.

Jillur Quddus
Jillur Quddus
Founder & Chief Data Scientist
Introduction to Python

Introduction to Python

An introductory course to the Python 3 programming language, with a curriculum aligned to the Certified Associate in Python Programming (PCAP) examination syllabus (PCAP-31-02).

Jillur Quddus
Jillur Quddus
Founder & Chief Data Scientist
DDaT Ontology

DDaT Ontology

Automated parsing, and ontological & machine learning-powered semantic similarity modelling, of the Digital, Data and Technology (DDaT) profession capability framework website.

Jillur Quddus
Jillur Quddus
Founder & Chief Data Scientist