Wednesday, 11 June 2014

What is Big Data ?,Big Data Characteristics, What is MapReduce?

What is Big Data ?

Big data is the term for a collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools or traditional data processing
applications.
uLarge scale data sets.
uComplex data sets

How big is the Big Data?

Any data that can challenge our current technology in some manner can consider as Big Data
-Volume
-Speed of Generating

Big Data Characteristics(3Vs)

uhigh-volume
uhigh-velocity
uhigh-variety

What is Hadoop?

At Google MapReduce operation are run on a special file system called Google File
System (GFS) that is highly optimized for this purpose.
Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop
Distributed File System (HDFS).
The software framework that supports HDFS, MapReduce and other related entities is
called  the project Hadoop or simply Hadoop.
This is open source and distributed by Apache.
Implementation of Big Data
  •       MapReduce
  •       Parallel DBMS technologies

 What is MapReduce?

MapReduce is a programming model Google has used successfully is processing its “big-data” sets
¡A map function
¡A reduce function
¡automatically parallelizes
¡handles machine failures.

MapReduce Advantages

uAutomatic Parallelization
uRun-time:
uData partitioning
uTask scheduling
uHandling machine failures
uManaging inter-machine communication
uCompletely transparent to the programmer/analyst/user

   


No comments:

Post a Comment