1neo4j三种数据导入工作
- create 语句
- load CSV 语句
- neo4j import 批量导入
对于大数据集的导入首选neo4j import,响应快
2 neo4j import
导入数据之前先清空已有的graph.db文件夹内容
2.1 数据集(CSV)准备
node(节点)数据集准备
- header格式 "node_id:ID","name",":LABEL"
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
laurence,"Laurence Harvey",Actor
- ID列唯一取值(不重复)
- LABEL属于标签,代表这类节点的类别
edge(边格式) - header格式":START_ID","name",":END_ID",":TYPE"
"4565904","homepage","0","Predicate"
"4654000","homepage","0","Predicate"
"2254843","homepage","0","Predicate"
"2346995","homepage","0","Predicate"
"3535680","homepage","0","Predicate"
"2090446","homepage","0","Predicate"
2.2 数据导入
- 节点文件applyer.csv address.csv
- 边文件relation.csv
进入neo4j的安装文件bin文件夹(终端进入),输入如下命令
.\bin> neo4j-admin import --database graph.db --id-type string --nodes:applyer C:\Users\DELL\Desktop\neo4j\applyer.csv --nodes:address C:\Users\DELL\Desktop\neo4j\address.csv --relationships C:\Users\DELL\Desktop\neo4j\relation.csv --ignore-duplicate-nodes
3 常见错误
- 节点文件存在重复
- 文件header不规范
- 数据中存在未知变量(CSV格式对长类型整数会进行科学计数,造成错误)
- 数据大小超过内存
- 其他错误(见终端log尾部报错)
#社区版Neo4j免费资源大小
Available resources:
Total machine memory: 15.89 GB
Free machine memory: 6.72 GB
Max heap memory : 3.53 GB
Processors: 4
Configured max memory: 11.12 GB
High-IO: false
4 查看效果
- neo4j.bat console 打开图数据库的前端可视化界面
- 打开本地数据库,可以看到是否成功导入数据集