spark 3.2.3
hudi 0.11.0
spark 写hudi,commit失败。.hoodie目录下,有commit.request和inflight,没有commit文件
-rw-r--r--@ 1 lqq staff 1572 5 23 09:54 20230512145004274.rollback
-rw-r--r--@ 1 lqq staff 0 5 23 09:54 20230512145004274.rollback.inflight
-rw-r--r--@ 1 lqq staff 1384 5 23 09:54 20230512145004274.rollback.requested
-rw-r--r--@ 1 lqq staff 0 5 23 09:54 20230522173618331.commit.requested
-rw-r--r--@ 1 lqq staff 3123 5 23 09:54 20230522173618331.inflight
查看log,发现有错误日志,但是没有打印具体的错误信息
ERROR HoodieSparkSqlWriter$: UPSERT failed with errors
继续查看源码查,发现打印具体错误日志为TRACE级别
private def commitAndPerformPostOperations(spark: SparkSession,
schema: StructType,
writeResult: HoodieWriteResult,
parameters: Map[String, String],
client: SparkRDDWriteClient[HoodieRecordPayload[Nothing]],
tableConfig: HoodieTableConfig,
jsc: JavaSparkContext,
tableInstantInfo: TableInstantInfo
): (Boolean, common.util.Option[java.lang.String], common.util.Option[java.lang.String]) = {
23/05/25 11:57:45 TRACE HoodieSparkSqlWriter$: Printing out the top 100 errors
........
} else {
log.error(s"${tableInstantInfo.operation} failed with errors")
if (log.isTraceEnabled) {
log.trace("Printing out the top 100 errors")
writeResult.getWriteStatuses.rdd.filter(ws => ws.hasErrors)
.take(100)
.foreach(ws => {
log.trace("Global error :", ws.getGlobalError)
if (ws.getErrors.size() > 0) {
ws.getErrors.foreach(kt =>
log.trace(s"Error for key: ${kt._1}", kt._2))
}
})
}
(false, common.util.Option.empty(), common.util.Option.empty())
}
降低日志级别(参考//www.greatytc.com/u/c2bc3695bc47),重跑程序,打印出了具体的错误日志
org.apache.hudi.exception.SchemaCompatibilityException: Unable to validate the rewritten record {"gender": "male", "id": 708075384135690, "count": null} against schema {{"name":"id","type":["null","long"],{"name":"gender","type":["null","string"],"default":null},{"name":"count","type":["null","int"],"default":null}}
原因: schema不兼容。 count字段,之前写入hudi的是int类型,新写一批写入是指定为long类型,导致写入失败
解决方法:改回int类型或者删除hudi表重新写入