Comment 8 for bug 932898

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Security review:

This is a pretty big code base with 340k+ lines of code and this review should only be considered a very shallow audit.

From my notes in the UDS session where members of the security team, the kernel team, the server team and ceph upstream were present:
From discussions with upstream at UDS:
 * librbd provides kvm remote block
 * librados provides S3 to replace swift
 * radosgw provides RESTful interface for S3 replacement
 * data is unencrypted on wire
 * kernel module is upstream as of 2.6.34
 * fuse module doesn't work on 32 bit
 * kernel client auth: secret in kernel key ring, then it authorizes per session
 * At UDS menioned QA started 6 months ago, meaning that it now has had almost a year
 * has testsuites. It would be nice to have them enabled in the builds (LP: #932895)
 * cluster expected to be run within the data center (private network usage) because there is no encryption
 * ipv6 is supported (should be able to use ippsec if desired, but not tested)
 * in terms of access controls: client authenticates to object/pool, clients only given access to that object/pool. Someone once suggested 'maat', but it was never merged.

In terms of code quality, spot-checking various places, I noticed that defensive coding is not done consistently throughout the codebase. For example, various parts of the codebase have a lot of string operations that don't use their more secure counterparts (sprintf() instead of snprintf(), strcpy instead of strncpy()). There are also a lot of low level memory operations some of which are not checking return codes. Eg, I found things like:
   *buf = (char*)malloc(l);
    strcpy(*buf, str.c_str());
I did not have time to check if these various places could be controlled by an attacker, but considering the amount of code and that defensive coding is not done consistently throughout the codebase, I think we can expect a non-trivial maintenance cost associated with supporting this for 5 years. Building using --without-radosgw should help with this maintenance cost.

Has the following compiler warning:
common/admin_socket_client.cc:166:19: warning: 'socket_fd' may be used uninitialized in this function [-Wuninitialized]
osd/PG.cc:1331:20: warning: variable 'plu' set but not used [-Wunused-but-set-variable]

Ships initscripts: /etc/init.d/ceph, /etc/init.d/radosgw and various daemons (ceph-mon, ceph-mds, ceph-osd) listen on the network. radosgw uses fastcgi is is meant to be used in combination with a webserver and does not run as root.

No dbus services, no setuid programs, no sudo fragments, no cron jobs

As mentioned, has a fuse module. Fuse is hard to get right and it isn't strategic. Let's not support it.

Has the option of using secure authentication between the nodes (cephx)

Upstream wiki states "Filesystems: Ceph performs best when using BTRFS, but other filesystems such as ext3 and ext4 can also be used." The kernel team had some concerns on btrfs at UDS surrounding the fact that the on disk format was not finalized and the recovery tool was non-existent. I'm told by the kernel engineer on my team that the disk format is 'pretty much set' but that the recovery tool is not robust and there could be data loss when using it. I am worried that if ceph recommends btrfs and people use it and get an update that changes the disk format that there will be data loss. I understand that in the ceph world, it doesn't necessary matter because everything is distributed, but a two node cluster that both upgrades at the same time is not outside the realm of possibility.

Conditional ACK based on the following:
- use dh_python2 instead of python-support
- fix lintian warnings
- build with --without-radosgw
- demote ceph-fuse to universe or preferably build using --without-fuse
- update 'man ceph' to note that data is unencrypted on the wire and that the cluster is expected to run in a trusted environment
- update 'man mkcephfs' to include in its --mkbtrfs description something about how btrfs is still considered experimental (but feel free to mention how ceph deals with this especially when using a lot of nodes)
- compile with nss. While it may not be as nice to use as libcrypto++, it is a well supported library with a long history in Ubuntu and a responsive upstream
- fix the compiler warning