本篇介绍Spark SQL如何连接JDBC数据库(我以本地安装的mysql为例)
Maven 中引入
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>6.0.5</version>
</dependency>
代码示例
package com.yzy.spark;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class demo5 {
private static String appName = "spark.sql.demo";
private static String master = "local[*]";
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName(appName)
.master(master)
.getOrCreate();
Dataset<Row> df = spark.read()
.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/smvc")
.option("dbtable", "user")
.option("user", "root")
.option("pass" + "word", "123456")
.load();
df.printSchema();
}
}
运行结果:
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- nicename: string (nullable = true)
|-- age: integer (nullable = true)
|-- pwd: string (nullable = true)
遇到的问题
1..option("pass" + "word", "123456")
这样写是因为编译器的sonar 检查不过,password 是敏感字段。
2..format("jdbc")
,如果忘记加上会报错:ConnectedComponents: error 'Unable to infer schema for Parquet. It must be specified manually.意思是必须手动指定我要连接的是jdbc。
3..option("url", "jdbc:mysql://localhost:3306/smvc")
会报错:java.sql.SQLException:The server time zone value '�й���ʱ��' is unrecognized or represents more than onetime zone. You must configure either the server or JDBC driver (via theserverTimezone configuration property) to use a more specifc time zone value ifyou want to utilize time zone support. 解决办法参考此文
value 修改为jdbc:mysql://localhost:3306/smvc?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai
即可