Containers Microconference

Registered by Glauber de Oliveira Costa

Containers Topics:
1) Status of cgroups
2) CRIU - Checkpoint/Restore in Userspace
3) Kernel Memory accounting - memcg
4) ploop - container in a file
5) /proc virtualization
6) Building application sandboxes on top of LXC and KVM with libvirt
7) Syslog Virtualization
8) Time Virtualization

cgroups are a crucial part of containers technologies, providing resource isolation and allowing multiple independent tasks to run without harming each other (too much) when it comes to shared resource usage. It is, however, a quite controvertial piece of the Linux Kernel, with a lot of work being done lately to revert that.

We should discuss what kind of changes core cgroup users should expect, how to cope with them.

=== CRIU - Checkpoint/Restore in Userspace ===
Checkpoint/Restore functionality was attempted to be merged into the kernel many times, all without much success. CRIU is an attempt to solve the problem in userspace, augmenting the kernel functionality only when absolutely needed to aid that.

=== Kernel Memory accounting - memcg ===
The cgroup memory controller is already well established in the kernel. Recently, work is being done into adding kernel memory tracking to it. Patches exist for a part of it. We could use this session to explore possibly uncovered areas, and discuss the use cases.

=== ploop - container in a file ===
When dealing with container filesystems, some problems arise: How to limit the amount of data a container uses, since quota solutions that are container aware are not there yet. Also, when migrating the container to another destination, one usually remakes the filesystem by copying over the files (unless shared storage is assumed). This means that inode numbers are not preserved and also creates a bad I/O pattern of randomly copying over a zillion of tiny files, much slower than copying image of block-device (seq I/O).

Another problem with all CTs sharing host fs is non-scalable journal on host fs: as soon as one CT filled the journal, other CTs should wait for its freeing.

=== /proc virtualization ===
When we run a fully featured container, we need a containerized view of the proc filesystem. Current upstream kernel can do that for the process tree and a few other things, but that is just not enough. For running top, for instance, one needs to know not only that, but also how much cpu time each container used in total, how much of that is system time, for how long that container was out of the CPU, etc. That information is available - or will be (some if it is patches pending) in different cgroups.

What are the best ways to achieve this? What problems will we face? Is there any value in keep trying to push this functionality into the kernel, or would a userspace implementation suffice ?

=== Building application sandboxes on top of LXC and KVM with libvirt ===
This session will provide an overview of the recent virt-sandbox project, which aims to provide sandboxing of applications via the use of lightweight guests (both KVM and LXC). Discussion will cover techniques such as sVirt (use of SELinux labeling to prevent the sandboxed virtual machine/container from altering unauthorized resources in the host) and filesystem sharing & isolation (to allow the guest to share a specified portion of the host file system), integration with systemd for containerized application management, and tools to facilitate setup of sandboxed application service environments

Topic Lead: Daniel Berrange
Daniel has been the lead architect & developer of libvirt for more than 5 years, original developer of the libvirt-sandbox, virt-manager & entangle applications, and part-time hacker on QEMU, KVM, OpenStack, GTK-VNC, SPICE and more.

=== Syslog Virtualization ===
There is currently one buffer in the system to hold all messages generated by syslog. It would be ideal to have access to the syslog buffer in a per-container manner for isolation purposes.

=== Time Virtualization ===
Containers should have an independent and stable notion of time that is independent of the host. This is specially important when live migrating them around. It should also be perfectly possible for a process running gettimeofday() in a container to get a different value than the host system.

We will discuss strategies to achieve that.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.