How to install Hadoop quickly on Linux

Release time: 2021-09-01 11:25:03 Author: Anonymous I would like to comment
How to install Hadoop quickly on Linux? Today Xiaobian will bring you a detailed installation tutorial, this tutorial can also be used as a reference, the configuration is similar, that is, Linux commands, specific operations are different, need friends to see it together

Everyone is interested in big data, but everyone does not want to go to how to practice to the place, how to implement to learn Hadoop, when we learn any technology, we do not have to think, up is definitely to install, and then to practice, I have to say, now you go to the Internet search, how to install Hadoop, That a lot of out are from the Unbutu system how to install, a lot of also explain is not very clear, a fan also want to learn, so prepared how to install Hadoop Linux tutorial, everyone can learn. Ah powder began to write a tutorial for everyone to install Hadoop.

Preparatory work

1. We can first go to Ali cloud or Huawei cloud to rent a server, after all, a primary version of the server, is not so expensive, a fan or use the rented one before, we choose to install Linux8 version, if it is the machine, you need to download CentOS8 image, Then install it to the VM through the virtual machine. After the installation is complete, we can start to install Hadoop

Let's start by talking about what Hadoop can do and what people often misunderstand about Hadoop.

Hadoop is a Distributed computing and storage framework. Therefore, the working process of Hadoop mainly depends on the Hadoop Distributed File System (HDFS) and Mapreduce distributed computing framework.

But a lot of people will have a misunderstanding of Hadoop, some people who are very good at Hadoop will say that Hadoop can do anything, in fact, it is not, the emergence of each technology, is corresponding to solve different problems, such as we will learn Hadoop next. Hadoop is suitable for Data analysis, but it is definitely not BI. Traditional BI belongs to the Data Presentation layer, while Hadoop is a data carrier focusing on semi-structured and unstructured data, which is a concept of different levels from BI.

Others say that Hadoop is ETL, which is equivalent to data processing, but Hadoop is not an ETL in an absolute sense.

Hadoop Installation tutorial

1. Install SSH

yum install openssh-server 

OpenSSH is an open source implementation of Secure Shell. After the OpenSSH Server is installed, a service named sshd will be added in the /etc/init.d directory. In a moment, we will place the generated key in the specified location and then use it for authentication.

2. Install rsync

yum -y install rsync 

3. After the SSH key is generated, perform subsequent authentication

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 

4. Put the generated key into the license file

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Install Hadoop

Before installing Hadoop, we must first install the JDK and configure the environment variables. The following looks like this, indicating that the JDK has been installed.

1. Decompress Hadoop

We're going to start by putting Hadoop on our servers, like this one,

Then extract tar zxvf hadoop-3.2.1.tar.gz

2. Modify the bashrc file

vim ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-DJava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

Copy to a file and save exit

3. Validate the file

source ~/.bashrc

4. Modify the configuration file /etc/hadoop/core-site.xml

<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <! -- Cache storage path --> <property> <name> hadooptemp</value> </property>

5. Modify etc/hadoop/hdfs-site.xml

<! -- The default value is 3. Because it is a single machine, configure 1 --> <property> <name>dfs.replication</name> <value>1</value> </property> <! Configure the http address --> <property> <name>dfs.http.address</name> <value>0.0.0.0:9870</value> </property>

6. Modify /etc/hadoop/hadoop-env.sh

 

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64 

 

7. Modify the etc/hadoop/yarn-env.sh file

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64 

8. Modify the sbin/stop-dfs.sh file and add it to the top

HDFS_NAMENODE_USER=root

HDFS_DATANODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

YARN_RESOURCEMANAGER_USER=root

YARN_NODEMANAGER_USER=root

9-1. Modify the start-yarn.sh file

YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

9-2. Modify the stop-yarn.sh file

YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

The above commands are mainly used to prompt authentication failure when you start Hadoop.

10. Format the file, go to the bin folder of hadoop, and run the following command

./hdfs namenode -format

11. Go to the sbin folder and start hadoop

./start-dfs.sh 

You can also directly start all. /start-all.sh

Then directly access port 8088

12. Enable ports on the firewall. If you use a cloud server, add port 9870 to the security group

// add port 9870 to firewall firewall-cmd --zone=public --add-port=9870/tcp --permanent // Restart firewall firewall-cmd --reload

13. Enter jps. If yes, if four or five JPS are configured successfully, and then continue

Access hadoop from web at http://IP or 9870

When we see this, we have successfully installed it. Note that Hadoop3.x version of the Hadoop Web port has not changed, but the HDFS side has changed from 50070 to 9870, which needs to be noticed.

  • Tag: Linux Hadoop

Related article

Latest comments