Course Provider
What will you learn in this course?
After completing this course, you should be able to:
- Describe what Spark is all about know why you would want to use Spark
- Use Resilient Distributed Datasets (RDD) and DataFrame operations
- Use Scala, Java, or Python to create and run a Spark application
- Creating applications using Spark SQL, MLlib, Spark Streaming, and GraphX
- Configure, monitor and tune Spark
Spark Fundamentals II
-
Skill Type
Emerging Tech
- Domain
Software Tools & Programming Languages
- Course Category
Popular Tech Topics
- Certificate Earned Partner Completion Certificate
-
- Course Price
INR 2,999
- Course Duration
5 Hours
- Course Price
Why should you take this course?
In this course one will learn about:
- Apache Spark architecture overview
- How Spark instructions are translated into jobs and what causes multiple stages within a job
- Efficiently using Spark’s memory caching for iterative processing
- Developing, testing, and debugging Spark applications using SBT, Eclipse, and IntelliJ
Who should take this course?
This course is designed for those who want to leverage the capabilities of Spark for development and optimize tasks by improving RDD performance.
Curriculum
- Introduction
- Module 1: Introduction to Notebooks
- Module 2: RDD Architecture
- Module 3: Optimizing Transformations and Actions
- Module 4: Caching and Serialization
- Module 5: Develop and Testing
- Final Exam
Tools you will learn in the course
- SBT
- Eclipse
- IntelliJ
FAQs
Apache Spark is a data processing framework that is able to process large datasets quickly and accurately. It can distribute these processing tasks across many computers as well, which makes it an important tool for processing big data and developing machine learning. In addition, it has an API that is easy to use and can reduce the burden on developers.
After completing this course, you will:
• Have an understanding of Apache Spark architecture.
• Be able to perform input, partitioning, and parallelization.
• Be able to operate on and join multiple datasets.
• Be able to translate Spark operations into jobs.
• Be able to use Spark’s memory caching for iterative processing.
• Be able to develop, test, and debug Spark applications using SBT, Eclipse, and IntelliJ.
As soon as you enroll for this course, you will have access to all the information and materials in your dashboard.
It is recommended that you have a basic understanding of Apache Hadoop and big data. It is also beneficial to have a basic knowledge of Linux, and basic skills in using Scala, Python, and Java.
This course is run by our partner SkillUp Online. It is 100% online, and you do not need to attend any classes in person. You simply require adequate access to the internet and the required technology to be able to use the course materials, which come in the form of articles, videos, and knowledge checks. Plus, you will be able to connect easily with others on the course and your mentors through the discussion space.