本来在愉快的部署 ceph 测试环境, 但是启动 osd 是发现如下错误:
Sep 28 09:32:09 ceph-n1 ceph-osd-prestart.sh[17684]: /usr/lib/ceph/ceph-osd-prestart.sh: line 55: [: too many arguments
Sep 28 09:32:09 ceph-n1 ceph-osd[17724]: 2016-09-28 09:32:09.662028 7f92866a7800 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (13) Permission denied
脚本执行异常, 随打开 ceph-osd-prestart.sh 脚本查看第55行:
20 data="/var/lib/ceph/osd/${cluster:-ceph}-$id"
53 # ensure ownership is correct
54 owner="`stat -c %U $data/.`"
55 if [ $owner != 'ceph' -a $owner != 'root' ]; then
56 echo "ceph-osd data dir $data is not owned by 'ceph' or 'root'"
57 echo "you must 'chown -R ceph:ceph ...' or similar to fix ownership"
58 exit 1
59 fi
结合报错信息, 判断 owner 没有正常获取到 $data 目录的所有者. 然后开始 debug 该脚本, 添加了几个 echo
, 查看 stat 命令是否正常, $data 变量是否正常. 期间补充了一点关于 "@", 关于 unit 文件中 "%" 的知识.
[root@ceph-n1 osd]# systemctl reset-failed ceph-osd@0
[root@ceph-n1 osd]# systemctl start ceph-osd@0
[root@ceph-n1 osd]# journalctl -xe
调试期间, 上述命令执行了一百遍~~, 百遍~~, 遍~~, 遍~~....
然后, 半天过去了.
结论是, $data 获取正常, stat 命令正常, 手动执行该 stat 命令正常, 基本可以排除脚本本身的问题. 但是, 发现 ceph-osd-prestart.sh 中凡是涉及到 $data 的命令全部失败, 随后开始排查 $data 目录.
[root@ceph-n1 osd]# ll /var/lib/ceph/osd/
total 0
lrwxrwxrwx 1 root root 15 Sep 28 09:30 ceph-0 -> /home/ceph/osd0
发现 $data 指向的目录是一个软连接. 因为手动执行相关命令是正常的, 开始怀疑 systemd 对 symlink(软连接) 的支持是不是有问题.
google 中搜索 "systemd symlink", 基本都是 systemctl enable 相关的信息, 没有找到与 ExecStartPre 相关的信息.
https://bugzilla.redhat.com/show_bug.cgi?id=955379#c14
Lennart Poettering 2013-05-06 12:16:16 EDT
"systemctl enable" is about enabling vendor supplied unit files. It will only create and remove symlinks in /etc/ and /run/, that's all it does. So right now it's a pretty safe tool: it will create/override/remove the modifiable configuration via symlinks and strictly leave vendor supplied static data untouched, since it is stored in real files. However, if we suddenly allow enabling of symlinks, then this clear separation goes away.This gets particularly nasty for disabling things, because that removes all symlinks to the destination file, and how should it know when to stop precisely?
So, yeah, I am pretty sure we shouldn't allow "enabling of symlinks".
What we should support however is enabling of unit files that are outside of the usualy search paths, via specifiying full absolute paths. i.e. "systemctl enable /var/lib/foo/bar.service" should link it to /etc/systemd/system/bar.service and do everything listed in [Install]. Now, I originally implemented things to work like that, but this might got broken one time...
Andrew, so if you'd call "systemctl enable" directly on the original unit file, instead of via a symlink, then everything should be fine for you, right?
Lennart Poettering 是 systemd 的作者, 上述回复的大概意思是: "systemctl enable" 命令只是用来在 /etc 或者 /run 目录下创建/删除 unit 文件的软连接, 仅此而已. 出于管理及安全方面的考虑, 被链接的 unit 文件必须是真实文件, 而不能是软连接. 此外, 为了使 systemctl enable 更加灵活, 应该支持绝对路径作为 "systemctl enable" 参数, 从而支持默认搜索路径之外的 unit 文件.
然并卵, 虽然是关于软连接的, 但这和我遇到的问题其实没什么关系. 就在我要放弃的时候, 我注意到 ceph-osd@.service 中的两个配置:
ProtectHome=true
ProtectSystem=full
凭借我有限的英文知识, 我的直觉告诉我, 马上就要破案了. 我立刻查了官方文档, 文档如下:
ProtectHome=
Takes a boolean argument or "read-only". If true, the directories
/home, /root and /run/user are made inaccessible and empty for
processes invoked by this unit. If set to "read-only", the three
directories are made read-only instead. It is recommended to
enable this setting for all long-running services (in particular
network-facing ones), to ensure they cannot get access to private
user data, unless the services actually require access to the
user's private data. Note however that processes retaining the
CAP_SYS_ADMIN capability can undo the effect of this setting.
This setting is hence particularly useful for daemons which have
this capability removed, for example with CapabilityBoundingSet=.
Defaults to off.
ProtectHome 可以设置为 true/false/read-only. 设置为 true 的时候, /home, /root, /run/user 对应用不可见. 设置为 read-only, 上述三个目录对应用只读. 设置为 false, 则应用可以正常访问这三个目录. 默认值是 false. 为了保证应用不能访问用户私有数据, 建议所有长时间运行的服务开启该选项.
ProtectSystem=
Takes a boolean argument or "full". If true, mounts the /usr and
/boot directories read-only for processes invoked by this unit.
If set to "full", the /etc directory is mounted read-only, too.
This setting ensures that any modification of the vendor-supplied
operating system (and optionally its configuration) is prohibited
for the service. It is recommended to enable this setting for all
long-running services, unless they are involved with system
updates or need to modify the operating system in other ways.
Note however that processes retaining the CAP_SYS_ADMIN
capability can undo the effect of this setting. This setting is
hence particularly useful for daemons which have this capability
removed, for example with CapabilityBoundingSet=. Defaults to
off.
ProtectSystem 可以设置为 true/false/full. 设置为 true, /usr, /boot 被设置为只读. 设置为 full, /usr, /boot, /etc 被设置为只读. 设置为 false, 则应用可以正常访问上述目录. 这个选项可以保护系统目录不会被应用修改, 建议所有长时间运行的服务开启该选项.
到此, 可以结案了. 由于 ceph-osd@.service 中开启了 ProtectHome 选项, ceph 无法访问 /home/ceph/osd0 目录, /var/lib/ceph/osd/ceph-0 软链失效, 致使 ceph 无法启动.
解决办法有两个:
- 关闭 ProtectHome 选项
- 将 /home/ceph/osd0 移出 /home
为了遵循官方的建议, 这里我选择第二种办法.
教训:
- 英文好才是真的好!! 如果早一点注意到 ProtectHome 这个选项.....
- 英文好才是真的好!! 如果早一点注意到 ProtectHome 这个选项.....
- 英文好才是真的好!! 如果早一点注意到 ProtectHome 这个选项.....
- google 搜索的时候, 如果翻了两页都找不到有用的信息. 肯定是方向错了.