Facets Demo New Batches Starting from Saturday... 22-10-2016
Search Course Here

Live Chat
Apache Spark online training

Apache Spark


What is Apache spark ?

Apache Spark is a lightning-fast cluster computing designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.
Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming. Apart from supporting all these workload in a respective system, it reduces the management burden of maintaining separate tools.
Features of Apache Spark :

Speed − Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory, and 10 times faster when running on disk. This is possible by reducing number of read/write operations to disk. It stores the intermediate processing data in memory.
Supports multiple languages − Spark provides built-in APIs in Java, Scala, or Python. Therefore, you can write applications in different languages. Spark comes up with 80 high-level operators for interactive querying.
Advanced Analytics − Spark not only supports ‘Map’ and ‘reduce’. It also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.
What is sparck in Big Data ?

Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications.
  • Some experience coding in Python, Java, or Scala, plus some familiarity with Big Data issues/concepts.
  • It is a 12 days program and extends up to 2hrs each.
  • The format is 20% theory, 80% Hands-on.

  • It is a 3 days program and extends up to 8hrs each.
  • The format is 20% theory, 80% Hands-on.
    Private Classroom arranged on request and minimum attendies for batch is 4.
course content
  • History of Big Data & Apache Spark
  • Introduction to the Spark Shell and the training environment
  • Intro to Spark DataFrames and Spark SQL
  • Introduction to RDDs
    • Lazy Evaluation
    • Transformations and Actions
    • Caching
    • Using the Spark UIs
  • Data Sources: reading from Parquet, S3, Cassandra, HDFS, and your local file system
  • Spark's Architecture
  • Programming with Accumulators and Broadcast variables
  • Debugging and tuning Spark jobs using Spark's admin UIs
  • Memory & Persistence
  • Advanced programming with RDDs (understanding the shuffle phase, partitioning, etc.)
  • Visualization: matplotlib, gg_plot, dashboards, exploration and visualization in notebooks
  • Introduction to Spark Streaming
  • Introduction to MLlib and GraphX
Apache Spark Videos will be updated Soon
To Watch More Videos Click Here

Flash News

AngularJS New Batch Start From 09th OCT & 10th OCT.

Hadoop Dev New Batch Start From 10th OCT & 11th OCT.

IBM COGNOS TM New Batch Start From 11th OCT & 12th OCT.

Informatica Dev New Batch Start From 12th OCT & 13th OCT.

Mean Stack New Batch Start 13th OCT & 14th OCT.

SAP BODS new Batch Starting From 14th OCT & 15th OCT.

SAP S/4 HANA New Batch Start From 15th OCT & 16th OCT

Tableau New Batch Start From 16th OCT & 17th OCT


Facets Demo Training

Demo Schedule : 08:30P.M EST / 07:30P.M CST / 05:30P.M PST on 21st OCT & 06:00A.M IST on 22nd OCT
Email :
Rediff Bol :
Google Talk :
MSN Messenger :
Yahoo Messenger :
Skype Talk :