Task 1/2 – Big Data

Introduction

Welcome to our digital daily digest, today we will explore the universe of data, but not only data but Big Data. During this blog we’ll begin by defining, what exactly is Big Data, we’ll discuss its importance and more important its applications we’ll also present the 3 most used tools for handle Big Data.

What is Big Data?

First things first, to understand we first need to define data. According to (Sedkaoui, 2018)“Data are a collection of facts, such as numbers, words, measurements, observations or even just descriptions of things, which give more information about an individual, an object or an observation.”

(Ohlhorst, 2013) defines Big Data as being a collection of data that traditional tools for handle data can no longer be used. And (Sedkaoui, 2018) defines Big Data as being a collection of data with large volume, that comes in high velocity and in various formats. By this definition we can understand that it is not only about the size, but also the speed and variety in which this data comes in. This is also referred to as the 3V’s Model used to characterized Big Data.

The 3Vs Model

VolumeThis refers to the amount of data in your datasets. Companies like Google, have many websites, images, search terms they have in their data centers. The vast amount of data generated daily poses significant challenges such as storage and processing. But also, provides opportunities for the valuable discoveries.
  VelocityThis refers to the speed in which your organization receives data. You can image a company like Facebook, every day, every minute people are interacting with the application, every interaction is data being sent to them.  
VarietyThis refers to the various types of data types that you have in your dataset. The variety of the data source includes structured, unstructured, and semi-structured data. LinkedIn deals with images, text, clicks, real-time messages, and many more. They must somehow make organize all this data and make sense of it.

Practical Applications of Big Data

Predictive Analyses

Predictive Analyses is a key application of Big Data, it involves the use of advance algorithms to identify patterns and tendence. Companies can have precise predictions and anticipate marketing behavior.

Personalize experience.

With Big Data companies can understand more about their customers and thus provide a more personalized experience by understanding their behavior, interaction history.

 “For example, Amazon’s recommendation engine provides each user with a customized homepage.” (Peng, 2022). By analyzing user’s behavior, preferences, and purchase history they can predict what the customer may be interested in.

Decision Making

With so many data available and the ability to analyze it, companies can make better strategic, management, operational or product development.

In the field of healthcare, IBM Watson an artificial intelligence algorithm was used to assist oncologists in providing a better treatment for cancer (Mishra, 2021). It analyzed large amounts of health information, and records to generate evidence-based treatment.

Technologies and Tools

Apache Hadoop

Hadoop is an open-source distributed database designed to store and process large amounts of data in computer clusters. (Apache, 2023)

Apache Spark

It is an open-source tool to process data in memory, allowing speed and efficiency in comparison to MapReduce.  (Apache, 2023)

Apache Cassandra

Apache Cassandra it is a distributed database management system in NoSQL, designed to process large amounts of data in a distributed environment. (Apache, 2023)

Challenges with Ethics

Naturally storing vast amounts of data rases ethical questions, what can companies do or not do with our data.  With great power, comes great responsibility, there are regulations such as General Data Protection Regulation (GDPR) defines a rigors directive on how personal data should be treated.

Conclusion

There is so much we can do with Big Data, its universe is vast and dynamic, giving us great possibilities and ethical challenges. During this blog, we explore the fundamentals of Big Data, from the 3Vs Models to most use tools such as Apache Cassadra, Hadoop, and Spark. We showed how it can be leveraged to make real life impact in sectors like medicine and retail. From its ability to prevent diseases, provide insights into health treatment and give a more personal experience this illustrates its potential. However, it is very important to also be mindful, about its challenges, like data privacy and security that call for more ethical approaches.


Posted

in

by

Tags:

Comments

Leave a comment