源码构建简化
很多人吐槽StreamingPro构建实在太麻烦了。看源码都难。然后花了一天时间做了比较大重构,这次只依赖于ServiceFramework项目。具体构建方式如下:
git clone https://github.com/allwefantasy/ServiceFramework.git
cd ServiceFramework
mvn install -Pscala-2.11 -Pjetty-9 -Pweb-include-jetty-9
mvn install -Pscala-2.10 -Pjetty-9 -Pweb-include-jetty-9
//如果你需要切换scala版本,在构建之前,记得运行下面的命令
./dev/change-version-to-2.10.sh
接着就可以构建StreamingPro了:
git clone https://github.com/allwefantasy/streamingpro.git
// for spark 1.6.*
mvn -DskipTests clean package -pl streamingpro-spark -am -Ponline -Pscala-2.10 -Pcarbondata -Phive-thrift-server -Pspark-1.6.1 -Pshade
// for spark 2.*
mvn -DskipTests clean package -pl streamingpro-spark-2.0 -am -Ponline -Pscala-2.11 -Phive-thrift-server -Pspark-2.1.0 -Pshade
基于Spark 2.1.1 的StreamingPro 同时支持Spark Streaming 以及Structured Streaming
Structured Streaming 的支持参看文章:
StreamingPro 再次支持 Structured Streaming
Spark Streaming 则和Structure Streaming的形态一模一样:
我们看具体的配置文件:
{
"scalamaptojson": {
"desc": "测试",
"strategy": "spark",
"algorithm": [],
"ref": [
],
"compositor": [
{
"name": "stream.sources",
"params": [
{
"format": "socket",
"outputTable": "test",
"port": "9999",
"host": "localhost",
"path": "-"
},
{
"format": "com.databricks.spark.csv",
"outputTable": "sample",
"header": "true",
"path": "/Users/allwefantasy/streamingpro/sample.csv"
}
]
},
{
"name": "stream.sql",
"params": [
{
"sql": "select city from test left join sample on test.content == sample.name",
"outputTableName": "test3"
}
]
},
{
"name": "stream.outputs",
"params": [
{
"mode": "Overwrite",
"format": "console",
"inputTableName": "test3",
"path": "-"
}
]
}
],
"configParams": {
}
}
}
只是把 ss 前缀换成了 stream。 启动方式如下:
SHome=/Users/allwefantasy/streamingpro
./bin/spark-submit --class streaming.core.StreamingApp \
--master local[2] \
--name test \
$SHome/streamingpro-spark-2.0-0.4.15-SNAPSHOT.jar \
-streaming.name test \
-streaming.platform spark_streaming \
-streaming.job.file.path file://$SHome/spark-streaming.json