本教程采用 CDH 版,以避免版本依赖冲突导致错误,本教程同样适用于 Linux(推荐 CentOS);
hadoop 版本:hadoop-2.6.0-cdh5.9.3.tar.gz
Spark 版本:spark-1.6.0-cdh5.9.3(自己编译的,详见:大数据之 Spark 编译打包)
解压 spark
tar -zxvf spark-1.6.0-cdh5.9.3
Spark 环境变量配置
配置项如下:
#added by Spark installer
export SPARK_HOME=/Users/zhangzhao/develop/hadoop/spark-1.6.0-cdh5.9.3
export PATH=$PATH:$SPARK_HOME/bin
Spark Local 模式启动
spark-shell --master local[2]
local[2] 表示启动 2 个 work
如下图:
访问上图的地址,http://192.168.1.126:4040,效果如下图:
Spark Standalone模式环境搭建
在 conf/spark-env.sh 添加如下配置:
# master 的 hostname,使用 hostname 命令查看
SPARK_MASTER_HOST=zhangzhaodeMacBook-Pro.local
#Spark worker core 的数量
SPARK_WORKER_CORES=2
#Spark worker 的初始化内存
SPARK_WORKER_MEMORY=2g
zhangzhaodeMacBook-Pro:conf zhangzhao$ cp spark-env.sh.template spark-env.sh
zhangzhaodeMacBook-Pro:conf zhangzhao$ vim spark-env.sh
配置如下图:
启动 Spark
在 spark 根目录下执行 sbin/start-all.sh
zhangzhaodeMacBook-Pro:spark-2.4.0-bin-cdh5.9.3 zhangzhao$ sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /Users/zhangzhao/develop/hadoop/spark-2.4.0-
bin-cdh5.9.3/logs/spark-zhangzhao-org.apache.spark.deploy.master.Master-1-zhangzhaodeMacBook-
Pro.local.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to
/Users/zhangzhao/develop/hadoop/spark-2.4.0-bin-cdh5.9.3/logs/spark-zhangzhao-
org.apache.spark.deploy.worker.Worker-1-zhangzhaodeMacBook-Pro.local.out
zhangzhaodeMacBook-Pro:spark-2.4.0-bin-cdh5.9.3 zhangzhao$
jps 命令验证是否成功,如下图,出现Master和Worker进程,表明启动成功:
停止 Spark
在 spark 根目录下执行 sbin/stop-all.sh
sbin/stop-all.sh