The amount of big data is very huge, infra technology for analyzing big data is need. Then what can be served as this infra technology for big data?
The core technology of big data is Hadoop, the open source. Hadoop forms Hadoop ecosystem when it is developed by open source formation. To use Hadoop easily and make a connection with present system, there are Hadoop Animal Farm such as Pig, HBase, Sqoop, Flume and they are growing with rapid speed. From now on, I'm going to post Hadoop, Hadoop ecosystem and various technology of big data. Hadoop is a project for open source distributed processing technology. It is the most favorable solution
for analyzing present atypical or orthopedic big data. This technology is used by Yahoo, facebook and etc and
It is growing to adopt it.
a. Hadoop
Hadoop creates clusters of machines and coordinates work among
them. Clusters can be built and scaled out with inexpensive computers.
The Hadoop software package includes the robust, reliable Hadoop Distributed File System (HDFS), which splits user data across servers in a cluster. It uses replication to ensure that even multiple node failures will not cause data loss.
In addition, Hadoop includes MapReduce, a parallel distributed processing system that is different from most similar systems on the market. It was designed for clusters of commodity, shared-nothing hardware. No special programming techniques are required to run analyses in parallel using MapReduce; most existing algorithms work without changes. MapReduce takes advantage of the distribution and replication of data in HDFS to spread execution of any job across many nodes in a cluster.



No comments:
Post a Comment