Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blog on apache spark and its core. #71

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ShivaniBanke
Copy link
Collaborator

This blog provides an overview of Apache Spark, covering its core components and architecture

Copy link

netlify bot commented Nov 25, 2024

Deploy Preview for infraspec ready!

Name Link
🔨 Latest commit ea031c8
🔍 Latest deploy log https://app.netlify.com/sites/infraspec/deploys/6743fc33c2ebce00087d087c
😎 Deploy Preview https://deploy-preview-71--infraspec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -0,0 +1,224 @@
---
title: "Apache Spark: Unleashing Big Data with Rdds, DataFrames and Beyond."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling issue: Rdds
Suggestion: RDDs

weight: 1
---

Have you ever wondered how companies like Netflix recommend your favourite movies or how e-commerce platforms handle vast amounts of data to personalize your shopping experience 🤔?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling issue: favourite
Suggestion: favorite


`Computing Engine`: It focuses on computation rather than storage, allowing it to work with various storage systems like Hadoop, Amazon S3, and Apache Cassandra. This flexibility makes Spark suitable for diverse environments, including cloud and streaming applications.

`Libraries`: It provides a unified API for common data analysis tasks. It supports both standard libraries that ship with the engine as well as external libraries published as third-party packages by the open-source communities. The standard libraries includes libraries for SQL (Spark SQL), machine learning (MLlib), stream processing (Structured Streaming), and graph analytics (GraphX).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: includes
Suggestion: include

# Collect the RDD data
rdd_data = rdd.collect()
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling issue: Sparks’s
Suggestion: Spark’s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant