What is Big Data ?
Big
data is the term for a collection of data sets so large and complex that it
becomes
difficult to process using on-hand database management tools or
traditional data processing
applications.
uLarge scale data sets.
uComplex data sets
How big is the Big Data?
Any data that can challenge our current technology in some manner can consider
as Big Data
-Volume
-Speed of Generating
Big Data Characteristics(3Vs)
uhigh-volume
uhigh-velocity
uhigh-variety
What is Hadoop?
At Google MapReduce
operation are run on a special file system called Google File
System (GFS) that
is highly optimized for this purpose.
Doug Cutting and
others at Yahoo! reverse engineered the GFS and called it Hadoop
Distributed File
System (HDFS).
The software
framework that supports HDFS, MapReduce and other related entities is
called the project Hadoop or simply Hadoop.
This is open source
and distributed by Apache.
Implementation of Big Data
- MapReduce
- Parallel DBMS technologies
What is MapReduce?
MapReduce is a programming model Google has used
successfully is processing its “big-data” sets
¡A map function
¡A reduce function
¡automatically
parallelizes
¡handles machine
failures.
MapReduce Advantages
uAutomatic
Parallelization
uRun-time:
uData partitioning
uTask scheduling
uHandling machine
failures
uManaging
inter-machine communication
uCompletely
transparent to the programmer/analyst/user
No comments:
Post a Comment