服务器准备
准备四台服务器Node01、Node02、Node03、Node04,其中Node01做Mater,Node02、Node03做Worker,Node04 作为提交任务的Client端。如下图所示。
安装JDK1.8
本次安装的Spark版本是2.3,需要基于JDK1.8,所以先安装JDK1.8。
先前实验中已使用rpm方式安装了JDK1.7,本次使用tar方式安装JDK1.8,且要手动改动默认/usr/bin/java指向的位置。
解压安装
[root@node01 sxt] tar zxvf jdk-8u181-linux-x64.tar.gz -C /opt/sxt
#将解压后的目录分发给Node02、Node03、Node04
[root@node01 sxt]# scp -r jdk1.8.0_181/ node02:`pwd`
[root@node01 sxt]# scp -r jdk1.8.0_181/ node03:`pwd`
[root@node01 sxt]# scp -r jdk1.8.0_181/ node04:`pwd`
为JDK8配置环境变量(Node02、Node03、Node04)
[root@node01 sxt]# vi /etc/profile
#修改内容
export JAVA_HOME=/opt/sxt/jdk1.8.0_181
[root@node01 sxt] source /etc/profile
直接解压的jdk需要修改 软连接路径 /user/bin/java
使用解压的JDK包安装JDK1.8,相对于rpm安装来说 不会覆盖默认/usr/bin/java 指向的位置。需要手动改动指向的位置,不然会默认还是执行的旧的JDK1.7
原来的指向:
[root@node01 bin] cd /usr/bin
[root@node01 bin] ll
lrwxrwxrwx 1 root root 26 Nov 26 18:48 java -> /usr/java/default/bin/java
每台节点上执行:
[root@node01 java] ln -sf /opt/sxt/jdk1.8.0_181/bin/java /usr/bin/java
更改之后的指向:
lrwxrwxrwx 1 root root 30 Dec 23 04:40 java -> /opt/sxt/jdk1.8.0_181/bin/java
[root@node01 bin] java -version
安装Spark-2.3
解压安装(Node01)
[root@node01 software] tar zxvf spark-2.3.1-bin-hadoop2.6.tgz -C /opt/sxt
[root@node01 sxt] mv spark-2.3.1-bin-hadoop2.6/ spark-2.3.1
修改配置(Node01)
修改slaves
[root@node01 sxt] cd spark-2.3.1/conf/
[root@node01 conf] cp slaves.template slaves
[root@node01 conf] vi slaves
#配置worker节点
# A Spark Worker will be started on each of the machines listed below.
node02
node03
修改spark-env.sh
[root@node01 conf] cp spark-env.sh.template spark-env.sh
[root@node01 conf] vi spark-env.sh
#添加配置
export SPARK_MASTER_HOST=node01
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=3g
export SPARK_MASTER_WEBUI_PORT=8888
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
将安装包分发到Node02、Node03
[root@node01 sxt] scp -r spark-2.3.1/ node02:`pwd`
[root@node01 sxt] scp -r spark-2.3.1/ node03:`pwd`
启动Spark Standalone集群
[root@node01 /] cd /opt/sxt/spark-2.3.1/sbin/
[root@node01 sbin] ./start-all.sh
[root@node01 sbin] jps
#node01
1175 Jps
1103 Master
#node02和node03
1168 Jps
1103 Worker
浏览器查看 http://node01:8888(不配置的话默认8080)
向Spark Standalone集群提交任务
提交实例:计算π的值
/opt/sxt/spark-2.3.1/examples/jars 下有example的jar包
#在Node01上提交
[root@node01 ~] cd /opt/sxt/spark-2.3.1/bin
[root@node01 bin]./spark-submit
--master spark://node01:7077
--class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.3.1.jar 100
#结果
2019-12-24 00:10:18 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 73.106669 s
Pi is roughly 3.1412499141249914
注:Node01 Node02 Node03上都可以提交任务
Node04作为客户端提交任务
给Node04发送一份Spark安装目录
[root@node01 sxt]# scp -r spark-2.3.1/ node04:`pwd`
#到Node04上可删除之前集群配置,同样可以提交任务
[root@node04 conf] cd /opt/sxt/spark-2.3.1/conf
[root@node04 conf]# rm -rf spark-env.sh
[root@node04 conf]# rm -rf slaves
停止集群
#Node01上执行
[root@node01 sbin] ./stop-all.sh
node03: stopping org.apache.spark.deploy.worker.Worker
node02: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master