Course Provider
What will you learn in this course?
In this course, you will learn about:
- The purpose of Spark and understand why and when you would use Spark
- How to list and describe the components of the Spark unified stack
- The basics of the Resilient Distributed Dataset, Spark's primary data abstraction
- How to download and install Spark standalone
- An overview of Scala and Python.
Spark Fundamentals I
- Domain
Software Tools & Programming Languages
- Course Category
Popular Tech Topics
- Certificate Earned Partner Completion Certificate
-
- Course Price
INR 2,999
- Course Duration
5 Hours
- Course Price
Why should you take this course?
- Learn the fundamentals of Spark, the technology that is revolutionizing the analytics and big data world!
- Spark is an open-source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that require low latency processing that a typical MapReduce program cannot provide, Spark is the way to go.
Who should take this course?
This course is designed for those who want to leverage the revolutionary abilities of Spark in Big Data processing and analytics with low latency processing that a typical MapReduce program cannot provide
Curriculum
- Module 1 - Introduction to Spark - Getting starte
- What is Spark and what is its purpose?
- Components of the Spark unified stack
- Resilient Distributed Dataset (RDD)
- Downloading and installing Spark standalone
- Scala and Python overview
- Launching and using Spark’s Scala and Python shell ©
- Module 2 –Resilient Distributed Dataset and DataFrames
- Understand how to create parallelized collections and external datasets
- Work with Resilient Distributed Dataset (RDD) operations
- Utilize shared variables and key-value pairs
- Describe how data is stored in an HDFS cluster
- Module 3 – Spark application programming
- Understand the purpose and usage of the SparkContext
- Initialize Spark with the various programming languages
- Describe and run some Spark examples
- Pass functions to Spark
- Create and run a Spark standalone application
- Submit applications to the cluster
- Module 4 – Introduction to Spark libraries
- Understand and use the various Spark libraries
- Module 5 -Spark configuration, monitoring and tuning
- Understand components of the Spark cluster
- Configure Spark to modify the Spark properties, environmental variables, or logging properties
- Monitor Spark using the web UIs, metrics, and external instrumentation
- Understand performance tuning considerations
- Final Exam
Tools you will learn in the course
- Scala
- Python shell
FAQs
Apache Spark is a data processing framework that is able to process large datasets quickly and accurately. It can distribute these processing tasks across many computers as well, which makes it an important tool for processing big data and developing machine learning. In addition, it has an API that is easy to use and can reduce the burden on developers.
After completing this course, you will be able to:
• Perform fast iterative algorithms.
• Carry out interactive data mining.
• Perform in-memory cluster computing.
• Support Java, Python, R, and Scala APIs for development.
• Combine SQL, streaming, and complex analytics in the same application.
• Run Spark applications on top of Hadoop, Mesos, standalone, or in the cloud.
• Work with HDFS, Cassandra, HBase, or S3.
As soon as you enroll for this course, you will have access to all the information and materials in your dashboard.
It is recommended that you have a basic understanding of Apache Hadoop and big data. It is also beneficial to have a basic knowledge of Linux, and basic skills in using Scala, Python, and Java.
This course is run by our partner SkillUp Online. It is 100% online, and you do not need to attend any classes in person. You simply require adequate access to the internet and the required technology to be able to use the course materials, which come in the form of articles, videos, and knowledge checks. Plus, you will be able to connect easily with others on the course and your mentors through the discussion space