Some little known facts about Hadoop Distributed File System
Jared Diamond once said, “Technology has to be invented or adopted.” At times, people have some qualms in adopting a specific technology.
They tend to do so when they are not familiar with the benefits of adopting that technology. Perhaps this explains why the hadoop distributed file system (HDFS) has failed to cause a stir despite revolutionizing the IT industry. Most of the businesses are yet to come to terms with the benefits of this file system. Nevertheless,
the following facts would certainly clear the air: • As the name suggests, this file system is used by Mapreduce applications. In fact, it is the primary storage system that these applications use. • Businesses often look for a file system that is capable of replicating data blocks. As per them, the ideal file system should be able to create multiple replicas of these blocks. HDFS can perform the aforementioned tasks in an easy manner. • Replicated blocks of data can do more harm than good if they are not distributed on computer nodes in a consistent manner. Therefore, it is essential to look for a storage system that can distribute the blocks throughout a cluster. The Hadoop Distributed File System can certainly do so.• Computations can often turn into nightmares. Therefore, it is advisable to look for a storage system that can facilitate faster and reliable computations. HDFS is known to enable rapid computations. • Not many people are aware of the fact that it is very much possible to integrate the data from HDFS with an Enterprise Data Warehouse (EDW). However, this task can only be accomplished if you use SQL, Fastload, ora similar platform. • It is not known to many people that the Table-Valued UDFs (read: Table-Valued User-defined Functions) play a major role in the integration of data. As a matter of fact, each and every UDF in the AMP accesses the files present in HDFS while integrating the data. • The Table-Valued UDFs can help you in loading new data into the EDW. Furthermore, you can also generate a report by joining the HDFS data to the existing tables. Therefore, these UDFs serve several purposes. • It is being said that the current generation HDFS lacks the sophistication of a typical enterprise data warehouse. Some experts have claimed that the users might find it difficult to place limits on individual queries or perform all vital tasks using this storage system. As a matter of fact, it is also believed that an EDW is far better than the HDFS when it comes to balancing mixed workloads.