Rdd和DataFrame的转换
1.直接手动确定
peopleRDD.map{x = >
val para = x.split(",")
(para(0),para(1).trim.toInt)
}.toDF("name","age")
2.通过反射确定
case class People(name:String,age:Int)
peopleRdd.map{x = >
val para = x.split(",")
People(para(0),para(1).trim.toInt)
}.toDF
3.通过编程方式来确定
准备Scheam
val schema = StructType(StructField("name",StringType)::StructField("age",IntegerType)::Nil)
准备Data[需要Row类型]
val data = peopleRdd.map{x =>
val para = x.split(",")
Row(para(0),para(1).trim.toInt)}
生成DataFrame
val dataFrame = spark.createDataFrame(data,schema)
dataFrame.tordd.返回的是RDD[Row]
Rdd和DateSet的转换(需要case class :people(...))
case class People(name:String,age:Int)
peopleRDD.map{x=>
val para = x.split(",")
People(para(0),para(1).trim.toInt)
}.toDS
DateSet.toRdd 返回RDD[People]
DateFrame和DataSet的转换
DataSet - > DataFrame
dataSet.toDF 即可,直接复用case class的名称
DataFrame - >DataSet
case class People(name:String,age:Int)
dataFrame.as[People]即可
dataFrame.createOrReplaceTempView("people")
Session内可访问,一个sparksession结束后,表自动删除,使用表面不需要前缀
dataFrame.createGlobalTempView("people")
应用级别内可访问,一个SparkContext结束后,表自动删除 使用表面需要加上"global_temp."前缀