project/procd.git
3 years agojail: seccomp: improve code readability
Daniel Golle [Mon, 30 Nov 2020 00:44:53 +0000 (00:44 +0000)]
jail: seccomp: improve code readability

Break overly long line, add some comments.
No functional changes.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
3 years agojail: always call cgroups_free()
Daniel Golle [Sun, 29 Nov 2020 23:21:04 +0000 (23:21 +0000)]
jail: always call cgroups_free()

In commit 3019f50 ("jail: leak less memory") memory handling in cgroups
related code was refactored. That allows to call cgroups_free()
unconditionally and remove the child-branch of in free_opts().

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
3 years agojail: improve seccomp BPF generator
Daniel Golle [Sun, 29 Nov 2020 19:12:17 +0000 (19:12 +0000)]
jail: improve seccomp BPF generator

Restructure and add code to process rules based on syscall arguments as
defined in OCI run-tine spec. Generated BPF code became more efficient
as now only one BPF instruction for each syscall is required.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: properly initialize timens_fd
Daniel Golle [Thu, 26 Nov 2020 16:34:38 +0000 (16:34 +0000)]
jail: properly initialize timens_fd

So we are safe for the future.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: enter existing cgroups namespace if given
Daniel Golle [Thu, 26 Nov 2020 16:24:47 +0000 (16:24 +0000)]
jail: enter existing cgroups namespace if given

Call to enter an existing cgroups namespace was missing. Add it.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't attempt to mount /sys with noatime
Daniel Golle [Thu, 26 Nov 2020 04:49:35 +0000 (04:49 +0000)]
jail: don't attempt to mount /sys with noatime

Because that won't work. Use relatime instead.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix typo in usage output
Daniel Golle [Thu, 26 Nov 2020 03:29:45 +0000 (03:29 +0000)]
jail: fix typo in usage output

'-j' is wrong, it should be '-i' (for _i_mmediately).

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: seteuid before clone(CLONE_NEWUSER)
Daniel Golle [Thu, 26 Nov 2020 01:44:50 +0000 (01:44 +0000)]
jail: seteuid before clone(CLONE_NEWUSER)

Resolve the userid in parent namespace mapped to the root user of the
new user namespace. Before clone(), seteuid() to that user in the parent
namespace.
Use SECBIT_NO_SETUID_FIXUP so the parent process can later on switch
back using seteuid(0).

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't fail if can't mount-bind /etc/resolv.conf
Daniel Golle [Thu, 26 Nov 2020 01:01:14 +0000 (01:01 +0000)]
jail: don't fail if can't mount-bind /etc/resolv.conf

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't use NULL arguments for mount syscall
Daniel Golle [Thu, 26 Nov 2020 00:55:20 +0000 (00:55 +0000)]
jail: don't use NULL arguments for mount syscall

Make valgrind more happy

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: relax /etc/resolv.conf creation
Daniel Golle [Thu, 26 Nov 2020 00:26:43 +0000 (00:26 +0000)]
jail: relax /etc/resolv.conf creation

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix and simplify userns uid/gid maps from OCI
Daniel Golle [Wed, 25 Nov 2020 23:25:58 +0000 (23:25 +0000)]
jail: fix and simplify userns uid/gid maps from OCI

Pre-calculate allocation length more simple and make sure maps are
properly generated.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix segfault on missing name and refactor
Daniel Golle [Wed, 25 Nov 2020 20:00:10 +0000 (20:00 +0000)]
jail: fix segfault on missing name and refactor

Move check for named jail up to main() function, and also add that
condition in case an OCI container is loaded as that would segfault
in case no name was given.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: leak less memory
Daniel Golle [Tue, 24 Nov 2020 21:03:12 +0000 (21:03 +0000)]
jail: leak less memory

Always free everything before exiting, clean up dynamic structures,
add missing free() calls in various places, ...

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add 'debug' extern variable to preload_seccomp
Daniel Golle [Sun, 22 Nov 2020 22:50:22 +0000 (22:50 +0000)]
jail: add 'debug' extern variable to preload_seccomp

ujail's seccomp ld-preload support broke recently with
Error relocating /lib/libpreload-seccomp.so: debug: symbol not found
Fix that by adding a debug variable to seccomp.c.

Fixes: be6da62 ("seccomp: silence 'unknown syscall' warnings")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: also delete procd runtime state on 'delete'
Daniel Golle [Sun, 22 Nov 2020 04:23:29 +0000 (04:23 +0000)]
uxc: also delete procd runtime state on 'delete'

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: fix incomplete commit
Daniel Golle [Sun, 22 Nov 2020 03:16:31 +0000 (03:16 +0000)]
uxc: fix incomplete commit

Fixes: 04a2edd ("uxc: make force-delete kill container process")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: cgroup hack: rewrite cgroup -> cgroup2
Daniel Golle [Wed, 28 Oct 2020 13:06:07 +0000 (13:06 +0000)]
jail: cgroup hack: rewrite cgroup -> cgroup2

"I'm sure you said cgroup2"

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoseccomp: silence 'unknown syscall' warnings
Daniel Golle [Fri, 20 Nov 2020 23:56:13 +0000 (23:56 +0000)]
seccomp: silence 'unknown syscall' warnings

Output them as debugging messages instead.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: make force-delete kill container process
Daniel Golle [Thu, 19 Nov 2020 17:12:54 +0000 (17:12 +0000)]
uxc: make force-delete kill container process

Don't allow to delete running containers unless '--force' is
specified. If '--force' is specified, send KILL signal to container
process before deleting it.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agotrace: switch to OCI seccomp JSON output
Daniel Golle [Sun, 15 Nov 2020 23:58:44 +0000 (23:58 +0000)]
trace: switch to OCI seccomp JSON output

Generate JSON as specified on OCI runtime spec for seccomp syscall
filter instead of our previous OpenWrt-specific format.

[1]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#seccomp
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoseccomp: switch to new OCI compliant parser
Daniel Golle [Sun, 15 Nov 2020 23:22:13 +0000 (23:22 +0000)]
seccomp: switch to new OCI compliant parser

Drop the old OpenWrt-specific seccomp rule parser in favour of reusing
the OCI compliant variant.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoseccomp: specifying architectures is optional
Daniel Golle [Sun, 15 Nov 2020 23:45:38 +0000 (23:45 +0000)]
seccomp: specifying architectures is optional

Specifying the architecture used for system calls is optional in OCI
spec. Make it optional in the parser.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix capabilities
Daniel Golle [Fri, 6 Nov 2020 18:42:25 +0000 (18:42 +0000)]
jail: fix capabilities

Allocate enough stack space for capget()/capset() which requires
2*sizeof(struct __user_cap_data_struct), each containing 32-bit fields,
where the 2nd struct contains the bits for high (>32) capabilities.
Failing to do that not only leads to those high capabilities being
inaccessible but also overwrote the stack resulting in ujail hanging
infinitely instead of returning from applyOCIcapabilities().
Also adapt debugging output to 64-bit format.
Apart from that, don't set SECBIT_NO_SETUID_FIXUP when not actually
modifying capabilities explicitely, as that would result in ALL
capabilities retained in the subsequent setuid() call instead of
having them all dropped.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: mimic runc cmdline by using getopt_long
Daniel Golle [Tue, 27 Oct 2020 16:34:06 +0000 (16:34 +0000)]
uxc: mimic runc cmdline by using getopt_long

Imitate runc (or crun) cmdline parameters. This allows using uxc as
runtime with podman.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't fail if maskedPath cannot be found
Daniel Golle [Wed, 28 Oct 2020 13:01:52 +0000 (13:01 +0000)]
jail: don't fail if maskedPath cannot be found

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for absolute root path in OCI spec
Daniel Golle [Wed, 28 Oct 2020 11:59:10 +0000 (11:59 +0000)]
jail: add support for absolute root path in OCI spec

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: relax seccomp unknown syscall handling
Daniel Golle [Wed, 28 Oct 2020 01:39:34 +0000 (01:39 +0000)]
jail: relax seccomp unknown syscall handling

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: handle mount propagation flags
Daniel Golle [Wed, 28 Oct 2020 00:30:03 +0000 (00:30 +0000)]
jail: handle mount propagation flags

Add support for propagation mount options (private, slave, shared,
unbindable, rprivate, rslave, rshared, runbindable).

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add option for pidfile
Daniel Golle [Wed, 28 Oct 2020 00:09:51 +0000 (00:09 +0000)]
jail: add option for pidfile

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: guard boolean blobmsg attributes
Daniel Golle [Tue, 27 Oct 2020 22:15:09 +0000 (22:15 +0000)]
jail: guard boolean blobmsg attributes

ujail tried to parse boolean values in config.json even if they were
not present which lead to segfaults.
Check if booleans are actually present before trying to parse them.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoujail: elf: work around GCC bug on MIPS64
Daniel Golle [Thu, 22 Oct 2020 21:59:14 +0000 (22:59 +0100)]
ujail: elf: work around GCC bug on MIPS64

Work-around gcc bug which leads to segfault parsing ELF on MIPS64.
The codepath added in this commit gets triggered when parsing
/lib/ld-musl-mips64-sf.so.1 (a symlink to /lib/libc.so) on MIPS64
(built with gcc-8.4.0 and musl 1.1.24) in qemu-system-mips64 on the
malta/be64 target.
Include work-around outputting an error message, but preventing
segfault when building for MIPS64.

Tested-by: Roman Kuzmitskii <damex.pp@icloud.com>
[tested on edgerouter 4 and edgerouter lite]
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: mount more stuff read-only
Daniel Golle [Thu, 22 Oct 2020 01:44:14 +0000 (02:44 +0100)]
jail: mount more stuff read-only

Mount /etc/resolv.conf, /etc/passwd, /etc/group and /etc/nsswitch.conf
read-only in ujail slim-containers.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: capabilities: apply in two phases
Daniel Golle [Mon, 19 Oct 2020 18:30:13 +0000 (19:30 +0100)]
jail: capabilities: apply in two phases

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: nuke old capabilities code in favour of reusing OCI code
Daniel Golle [Mon, 19 Oct 2020 16:15:11 +0000 (17:15 +0100)]
jail: nuke old capabilities code in favour of reusing OCI code

Previsously capabilities could be defined for slim-containers using
our own JSON format, only allowing to modify capabilities in the
bouding set. As apparently that was never used by even a single
package, drop that old parser and logic in favour of reusing the now
existing OCI capability handling functions.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinstance: actually wire up capabilities filename
Daniel Golle [Mon, 19 Oct 2020 16:50:19 +0000 (17:50 +0100)]
instance: actually wire up capabilities filename

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: adapt to new ubus socket path
Daniel Golle [Mon, 19 Oct 2020 16:00:26 +0000 (17:00 +0100)]
jail: adapt to new ubus socket path

The previous commit
3121467 ("early: run ubusd non-root as user ubus, group ubus")
changed the path of the ubus socket from /var/run/ubus.sock to
/var/run/ubus/ubus.sock. Adapt jail to also mount-bind that new
path for jails which include ubus access (eg. dnsmasq).

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoearly: run ubusd non-root as user ubus, group ubus
Daniel Golle [Mon, 19 Oct 2020 12:43:23 +0000 (13:43 +0100)]
early: run ubusd non-root as user ubus, group ubus

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agocgroups: memory controller fixes
Daniel Golle [Thu, 13 Aug 2020 00:54:21 +0000 (01:54 +0100)]
cgroups: memory controller fixes

OCI 'swap' value encodes memory+swap, make the best out of that.
Ignore 'kernel' and 'kernelTCP' values rather than returning with
error as kernel memory is accounted in the existing limits in cgroup2.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agocgroups: restrict allowed keys in 'unified' section
Daniel Golle [Thu, 13 Aug 2020 00:22:11 +0000 (01:22 +0100)]
cgroups: restrict allowed keys in 'unified' section

Prevent specifying directories by banning the use of '/' characters
and disallow some internal cgroup.* files as suggested in [1].

[1]: https://github.com/opencontainers/runtime-spec/pull/1040

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinitd/init: add minimal SELinux policy loading support
Thomas Petazzoni [Mon, 10 Aug 2020 01:15:20 +0000 (15:15 -1000)]
initd/init: add minimal SELinux policy loading support

In order to support SELinux in OpenWrt, this commit introduces minimal
support for loading the SELinux policy in the init code. The logic is
very much inspired from what Busybox is doing: call
selinux_init_load_policy() from libselinux, and then re-execute init
so that it runs with the SELinux policy in place and enforced.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni at bootlin.com>
[fix spelling of OpenWrt]
Signed-off-by: Paul Spooren <mail@aparcar.org>
4 years agojail: fix freeing cgroups avl
Daniel Golle [Thu, 6 Aug 2020 14:34:27 +0000 (15:34 +0100)]
jail: fix freeing cgroups avl

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: only free cgroups if they were allocated
Daniel Golle [Thu, 6 Aug 2020 14:34:27 +0000 (15:34 +0100)]
jail: only free cgroups if they were allocated

Fixes segfault on shutdown with slim containers.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: parse OCI cgroups resources
Daniel Golle [Wed, 5 Aug 2020 17:37:53 +0000 (18:37 +0100)]
jail: parse OCI cgroups resources

Start pure cgroup2 implementation with emulation of (some) cgroup1
properties.
Initially support converting cpu, memory, blockIO, pids to unified in
addition to directly specifying unified attributes as suggested in
https://github.com/opencontainers/runtime-spec/pull/1040

Support for converting devices and network into BPF programs is
planned.

Now that containers have their representation in the unified cgroup
hierarchy, make sure using cgroup namespaces also produces meaningful
results.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinstance: add instances into unified cgroup hierarchy
Daniel Golle [Wed, 5 Aug 2020 13:36:44 +0000 (14:36 +0100)]
instance: add instances into unified cgroup hierarchy

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: make use of BLOBMSG_CAST_INT64 for OCI rlimits
Daniel Golle [Tue, 4 Aug 2020 00:55:40 +0000 (01:55 +0100)]
jail: make use of BLOBMSG_CAST_INT64 for OCI rlimits

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: use pidns semantics also for timens
Daniel Golle [Sun, 2 Aug 2020 18:25:29 +0000 (19:25 +0100)]
jail: use pidns semantics also for timens

Just like pidns, timens is also only applied to children forked after
the setns() call, so use the same semantics here as well when joining
an existing time namespace.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinitd: attempt to mount cgroup2
Daniel Golle [Wed, 29 Jul 2020 13:26:51 +0000 (14:26 +0100)]
initd: attempt to mount cgroup2

Prepare for using cgroup2 in procd and ujail.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoservice: add method to query available container features
Daniel Golle [Wed, 29 Jul 2020 12:49:38 +0000 (13:49 +0100)]
service: add method to query available container features

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: remove debugging left-over
Daniel Golle [Thu, 30 Jul 2020 11:58:42 +0000 (12:58 +0100)]
uxc: remove debugging left-over

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinstance: make sure values are not inherited from previous runs
Daniel Golle [Wed, 29 Jul 2020 21:17:05 +0000 (22:17 +0100)]
instance: make sure values are not inherited from previous runs

Code to update and move instance attributes has been neglected when
new instance and jail options were added.
Add the ones which were missing.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: use new container.%s kill ubus API
Daniel Golle [Tue, 28 Jul 2020 23:41:32 +0000 (00:41 +0100)]
uxc: use new container.%s kill ubus API

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add 'kill' method to container.%s object
Daniel Golle [Tue, 28 Jul 2020 23:36:19 +0000 (00:36 +0100)]
jail: add 'kill' method to container.%s object

Using the the current container signal method to send a signal to the
jailed process works fine, as signals are being forwarded by the
ujail parent process. However, in case of KILL (==9) signal, both,
parent and jailed process are killed immediately which results in the
'poststop' OCI hook being skipped.
Add new 'kill' method to ujail's container object to allow sending
signals to the jailed process directly instead of having to send
signals to the parent.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: fix create operation
Daniel Golle [Tue, 28 Jul 2020 23:13:27 +0000 (00:13 +0100)]
uxc: fix create operation

The 'create' operation needs uxc to reload it's configuration, so after
adding the container to uxc' persistent state tracking the follow-up
call to create the run-time can find it.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: behave more like a compliant OCI run-time
Daniel Golle [Tue, 28 Jul 2020 08:06:39 +0000 (09:06 +0100)]
uxc: behave more like a compliant OCI run-time

Follow CLI syntax as described in OCI run-time spec[1].
In addition, allow 'create' call also without 'path' parameter to
re-create previously created containers, also after reboot.

Usual workflow:
uxc create debian /mnt/sda3/debian
uxc start debian
uxc kill debian 1
uxc create debian
uxc start debian
...

To create a container and have it automatically launched at boot:
uxc create debian /mnt/sda3/debian true

 [1]: https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#operations

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add some remaining OCI features
Daniel Golle [Tue, 28 Jul 2020 08:05:24 +0000 (09:05 +0100)]
jail: add some remaining OCI features

 * register ubus object for container to query state
 * wait on 'created' state until 'start' command is issued via ubus
 * have a way to bypass waiting on 'created' state
 * support OCI annotations pass-through

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: serialize hook execution
Daniel Golle [Sat, 25 Jul 2020 17:28:25 +0000 (18:28 +0100)]
jail: serialize hook execution

Make sure hook execution is completed before continueing with any
further actions. This involves a major refactoring ujail to use a
single uloop mainloop for each process to avoid congruency issues.
Also fix other remaining problems in code for OCI hooks, such as making
sure memory allocated to store hook information is zerod.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix build on glibc and uclibc
Daniel Golle [Sat, 25 Jul 2020 15:30:29 +0000 (16:30 +0100)]
jail: fix build on glibc and uclibc

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for referencing existing namespaces
Daniel Golle [Mon, 20 Jul 2020 14:00:23 +0000 (15:00 +0100)]
jail: add support for referencing existing namespaces

Allow OCI containers to specify paths to existing namespaces.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix wrong format for 32-bit
Rosen Penev [Mon, 20 Jul 2020 22:35:27 +0000 (15:35 -0700)]
jail: fix wrong format for 32-bit

The proper format for size_t is %zu .

Signed-off-by: Rosen Penev <rosenp@gmail.com>
4 years agorcS: cast format string to int64_t
Rosen Penev [Mon, 20 Jul 2020 22:35:26 +0000 (15:35 -0700)]
rcS: cast format string to int64_t

musl 1.2.0 turns time_t into a 64-bit value, even on 32-bit. This makes it
compatible.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
4 years agojail: re-implement /proc/sys/net read-write in netns hack
Daniel Golle [Mon, 20 Jul 2020 00:37:15 +0000 (01:37 +0100)]
jail: re-implement /proc/sys/net read-write in netns hack

Hack to make /proc/sys/net read-write while the rest of /proc/sys is
read-only which cannot be expressed with OCI spec, but happends to be
very useful. Only apply it if '/proc/sys' is not already listed as
mount, maskedPath or readonlyPath.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: refactor default mounts into new structure
Daniel Golle [Sun, 19 Jul 2020 23:12:44 +0000 (00:12 +0100)]
jail: refactor default mounts into new structure

Add default mounts of /dev, /dev/pts, /dev/shm, /proc and /sys to
the restructured mounts AVL list instead of calling mount directly.
While for slim containers this change shouldn't make any difference,
it allows OCI containers to override options of those default
filesystems.
The previous hack keeping /proc/sys/net mounted read-write if inside
a new network namespace while all the rest of /proc/sys is read-only
cannot easily be translated and is removed for now.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: actually apply filesystem-specific mount options
Daniel Golle [Sun, 19 Jul 2020 23:30:06 +0000 (00:30 +0100)]
jail: actually apply filesystem-specific mount options

OCI supplied filesystems-specific mount options have not been stored
in the add_mount() function. strdup() them there and free the original
string in the OCI function.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for defining devices
Daniel Golle [Sun, 19 Jul 2020 20:45:53 +0000 (21:45 +0100)]
jail: add support for defining devices

OCI run-time spec allows containers to specify devices to be created
in /dev in addition to the default devices.
Parse devices from linux section in config.json; clean-up and refactor
default entries in /dev into the same function using a similar scheme.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: move /tmp/resolv.conf.d to /dev/resolv.conf.d
Daniel Golle [Sun, 19 Jul 2020 19:21:33 +0000 (20:21 +0100)]
jail: move /tmp/resolv.conf.d to /dev/resolv.conf.d

OCI spec implicitely intends /dev to be used as tmpfs mounted by
default while /tmp may not be mounted or may not even exist.
Hence move /tmp/resolv.conf.d to /dev/resolv.conf.d inside
container.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: /proc/$pid/oom_score_adj to OCI defined oomScoreAdj
Daniel Golle [Sun, 19 Jul 2020 18:09:34 +0000 (19:09 +0100)]
jail: /proc/$pid/oom_score_adj to OCI defined oomScoreAdj

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: parse and apply POSIX rlimits
Daniel Golle [Sun, 19 Jul 2020 16:31:42 +0000 (17:31 +0100)]
jail: parse and apply POSIX rlimits

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: read and apply umask from OCI if defined
Daniel Golle [Sun, 19 Jul 2020 00:32:55 +0000 (01:32 +0100)]
jail: read and apply umask from OCI if defined

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: implement OCI user additionalGIDs
Daniel Golle [Sat, 18 Jul 2020 23:35:16 +0000 (00:35 +0100)]
jail: implement OCI user additionalGIDs

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: parse and apply OCI sysctl values
Daniel Golle [Sat, 18 Jul 2020 21:58:22 +0000 (22:58 +0100)]
jail: parse and apply OCI sysctl values

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix hooks
Daniel Golle [Sat, 18 Jul 2020 11:31:09 +0000 (12:31 +0100)]
jail: fix hooks

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for maskedPaths and readonlyPaths
Daniel Golle [Thu, 16 Jul 2020 22:00:35 +0000 (23:00 +0100)]
jail: add support for maskedPaths and readonlyPaths

Parse maskedPaths and readonlyPaths string arrays if defined in OCI
container linux section. readonlyPaths are implemented by adding a
recursive read-only bind-mount on the path, maskedPaths are done by
mounting a zero-sized tmpfs with mode 000 for directories and mount-
binding an empty file having mode 000 for non-directories.
Mounts of both, maskedPaths and readonlyPaths, may fail silently if
the path doesn't exist.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix some more mount options
Daniel Golle [Thu, 16 Jul 2020 12:14:51 +0000 (13:14 +0100)]
jail: fix some more mount options

Make sure 'rbind' works as expected and add support for 'iversion' and
'noiversion' options.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fs: fix build on uClibc-ng
Daniel Golle [Wed, 15 Jul 2020 21:59:59 +0000 (22:59 +0100)]
jail: fs: fix build on uClibc-ng

MS_LAZYTIME is apparently not defined on uClibc-ng. Define that macro
if not defined already. Also fix a copy&paste error which broke
'nolazytime' and 'nostrictatime' mount options.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoprocd: fix compile if procd-ujail is not selected
Daniel Golle [Wed, 15 Jul 2020 00:13:58 +0000 (01:13 +0100)]
procd: fix compile if procd-ujail is not selected

Generating syscall-names.h was added as a dependency for ujail in order
to support seccomp for OCI containers.
This, however, slipped into the wrong place and broke cmake in case
of procd-seccomp being selected but procd-ujail not being selected.
Move dependency to the right place to fix that.

Fixes: bb4a446 ("uxc: add container management CLI tool")
Reported-by: Paul Blazejowski <paulb@blazebox.homeip.net>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix false return in case of nofail mount
Daniel Golle [Mon, 13 Jul 2020 23:40:20 +0000 (00:40 +0100)]
jail: fix false return in case of nofail mount

In some cases mounts could still fail eventhough 'nofail' was set.
Make sure to always return successfull also in those cases.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoprocd: add service instance watchdog
Daniel Bailey [Mon, 13 Jul 2020 22:05:31 +0000 (15:05 -0700)]
procd: add service instance watchdog

Added instance watchdog which will eventually either terminate
or respawn an instance depending on the instance respawn setting.

Added service ubus method 'watchdog' which services the watchdog
timer and allows update of the instance watchdog mode instance.

Two modes: disabled or passive.

Disabled: cancels watchdog timer set for a given instance.

Passive: sets a instance timer which must be serviced or the
instance will be stopped/restarted (dependent upon the instance
respawn value) when the timer expires.

Signed-off-by: Daniel Bailey <danielb@meshplusplus.com>
4 years agouxc: fix build with uClibc-ng
Daniel Golle [Mon, 13 Jul 2020 23:14:07 +0000 (00:14 +0100)]
uxc: fix build with uClibc-ng

Also here _GNU_SOURCE was missing.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: fix 'stop' command
Daniel Golle [Mon, 13 Jul 2020 09:10:05 +0000 (10:10 +0100)]
uxc: fix 'stop' command

The 'stop' command was requesting an invalid ubus method. Fix method
name to make 'stop' operation work.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't make mount source read-only
Daniel Golle [Mon, 13 Jul 2020 08:57:05 +0000 (09:57 +0100)]
jail: don't make mount source read-only

From mount(2):
Specifying mountflags as:
  MS_REMOUNT | MS_BIND | MS_RDONLY
will make access through this mountpoint read-only, without affecting
other mount points.
Hence use MS_BIND when remounting container rootfs read-only.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: refactor mount support to cover OCI spec
Daniel Golle [Sun, 12 Jul 2020 22:47:52 +0000 (23:47 +0100)]
jail: refactor mount support to cover OCI spec

Extend existing support for bind-mounts to allow arbitrary mounts
defined in OCI spec.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: memory allocation fixes
Daniel Golle [Sun, 12 Jul 2020 17:57:56 +0000 (18:57 +0100)]
jail: memory allocation fixes

Make sure envp and argv are allocated and NULL-terminated arrays.
Free opts before parent process quits, free everything but argv,
envp and seccomp filter before execv into user process.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: parse and run OCI hooks
Daniel Golle [Sun, 12 Jul 2020 14:18:58 +0000 (15:18 +0100)]
jail: parse and run OCI hooks

OCI run-time allows defining hooks to be executed at various stages of
the container life cycle. Parse hooks and run them accordingly.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: actually chdir into OCI defined CWD
Daniel Golle [Mon, 13 Jul 2020 11:11:32 +0000 (12:11 +0100)]
jail: actually chdir into OCI defined CWD

Current working directory was parsed by never applied. Apply it just
before executing user process.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: consider PATH for argv in OCI container
Daniel Golle [Mon, 13 Jul 2020 02:00:22 +0000 (03:00 +0100)]
jail: consider PATH for argv in OCI container

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix segfault with len(uidmap/gidmap) > 1
Daniel Golle [Sun, 12 Jul 2020 16:36:05 +0000 (17:36 +0100)]
jail: fix segfault with len(uidmap/gidmap) > 1

Allocate enough memory for all uidmap/gidmap entries.

Fixes: ea7a790 ("jail: add support for running OCI bundle")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoprocd: fix compilation with uClibc-ng
Rosen Penev [Wed, 24 Jun 2020 23:48:54 +0000 (16:48 -0700)]
procd: fix compilation with uClibc-ng

_GNU_SOURCE was missing.

Also defined two macros unavailable with uClibc-ng.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
[resolved conflict in jail.c]
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agojail: use linux/capability.h instead of sys/capability.h
Daniel Golle [Sat, 11 Jul 2020 10:03:56 +0000 (11:03 +0100)]
jail: use linux/capability.h instead of sys/capability.h

Remove bogus build-dependency on libcap by using linux uapi header
and libc-provided syscall wrappers for capget/capset.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoujail: add dependency on syscall-names-h
Daniel Golle [Sat, 11 Jul 2020 09:42:43 +0000 (10:42 +0100)]
ujail: add dependency on syscall-names-h

Makes sure syscall-names.h gets generated before trying to compile
ujail with OCI seccomp support.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix build on platforms without seccomp support
Daniel Golle [Fri, 10 Jul 2020 22:53:59 +0000 (23:53 +0100)]
jail: fix build on platforms without seccomp support

buildbots started failing due to -Werror=missing-declarations
for 'parseOCIlinuxseccomp' and 'applyOCIlinuxseccomp'.
Make sure functions were declared before defining comptibility stubs
for non-seccomp platforms.

Fixes: ea7a790 ("jail: add support for running OCI bundle")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: add container management CLI tool
Daniel Golle [Fri, 10 Jul 2020 09:57:23 +0000 (10:57 +0100)]
uxc: add container management CLI tool

As procd can now provide a fully fetured container runtime using ujail,
add a (for now) simple CLI tool to list, add, delete, start and stop
OCI-complaint container bundles and selecting whether they should be
launched on boot.
In future commits, this will be extended to provide state output, take
care of hooks, send signals and fetch remote container images in
accordance with the Open Container Initiative Runtime Specification.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for running OCI bundle
Daniel Golle [Fri, 10 Jul 2020 09:56:58 +0000 (10:56 +0100)]
jail: add support for running OCI bundle

Prepare ujail for running OCI bundled Linux containers.
This adds handling of most of the JSON schema defined by the
Open Container Initiative Runtime Specification.

What is supported by this commits:
 * basic OCI process definition
 * seccomp filters (no args yet)
 * capabilities (100%)
 * namespaces (100%)
 * uid/gid mappings for userns (100%)
 * mounts (no free form mounts yet)
 * env (100%, limited to a low number entries)
 * hostname (100%)
 * terminal (no consoleSize yet)

What is still missing:
 * complex mounts
 * maskedPaths, readonlyPaths
 * referencing existing namespaces
 * all hooks
 * rlimits
 * oomScoreAdj
 * additionalGids
 * cgroups
 * devices
 * sysctl
 * rootfsPropagation
 * personality and bi-arch (ie. 32-bit container on 64-bit host)

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: handle containers seperately
Daniel Golle [Wed, 20 May 2020 14:26:08 +0000 (15:26 +0100)]
jail: handle containers seperately

To make the API more clean and running containers less of a hidden
feature offer new object ubus 'containers' to handle container
operations similar to how services are handled.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: use sane termios settings for console pts
Daniel Golle [Wed, 20 May 2020 13:57:21 +0000 (14:57 +0100)]
jail: use sane termios settings for console pts

The previously used expression (inpired by LXC) didn't actually make
a lot of sense. Replace it with something inspired by a more recent
version of LXC...

Reported-by: Oldřich Jedlička <oldium.pro@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add option to provide /dev/console to containers
Daniel Golle [Sun, 12 Apr 2020 21:35:25 +0000 (22:35 +0100)]
jail: add option to provide /dev/console to containers

Create UNIX/98 PTY, pass master fd to procd and setup mount-bind of
slave PTS device on /dev/console inside jail.
Allow attaching to an instance's console by using the newly introduced
ujail-console command (no multiplexing for now).

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: unnamed jails can not have netns (fix segfault)
Leonardo Mörlein [Fri, 8 May 2020 00:58:25 +0000 (02:58 +0200)]
jail: unnamed jails can not have netns (fix segfault)

Signed-off-by: Leonardo Mörlein <me@irrelefant.net>
4 years agojail: SIGSEGV must not be forwarded to the child process
Leonardo Mörlein [Fri, 8 May 2020 00:58:24 +0000 (02:58 +0200)]
jail: SIGSEGV must not be forwarded to the child process

A segfault in ujail caused ujail to hang with no chance to abort.
Raising the debug level revealed that SIGSEGV was delivered to
the child process instead of handled directly by ujail. The
corresponding debug message was triggered infinitely again and
again:

forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
[...]

Signed-off-by: Leonardo Mörlein <me@irrelefant.net>
4 years agojail: don't load libpreload-seccomp.so if it doesn't exist
Daniel Golle [Sat, 25 Apr 2020 09:24:35 +0000 (10:24 +0100)]
jail: don't load libpreload-seccomp.so if it doesn't exist

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't fail unless requirejail is set
Daniel Golle [Sat, 25 Apr 2020 08:48:46 +0000 (09:48 +0100)]
jail: don't fail unless requirejail is set

Pass requirejail attribute to ujail and only fail to start a service
which has seccomp policy defined on a system which doesn't have
procd-seccomp installed in case requirejail is set.

Fixes: bcb8655 ("instance: add 'requirejail' attribute")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>