project/procd.git
4 years agouxc: use new container.%s kill ubus API
Daniel Golle [Tue, 28 Jul 2020 23:41:32 +0000 (00:41 +0100)]
uxc: use new container.%s kill ubus API

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add 'kill' method to container.%s object
Daniel Golle [Tue, 28 Jul 2020 23:36:19 +0000 (00:36 +0100)]
jail: add 'kill' method to container.%s object

Using the the current container signal method to send a signal to the
jailed process works fine, as signals are being forwarded by the
ujail parent process. However, in case of KILL (==9) signal, both,
parent and jailed process are killed immediately which results in the
'poststop' OCI hook being skipped.
Add new 'kill' method to ujail's container object to allow sending
signals to the jailed process directly instead of having to send
signals to the parent.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: fix create operation
Daniel Golle [Tue, 28 Jul 2020 23:13:27 +0000 (00:13 +0100)]
uxc: fix create operation

The 'create' operation needs uxc to reload it's configuration, so after
adding the container to uxc' persistent state tracking the follow-up
call to create the run-time can find it.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: behave more like a compliant OCI run-time
Daniel Golle [Tue, 28 Jul 2020 08:06:39 +0000 (09:06 +0100)]
uxc: behave more like a compliant OCI run-time

Follow CLI syntax as described in OCI run-time spec[1].
In addition, allow 'create' call also without 'path' parameter to
re-create previously created containers, also after reboot.

Usual workflow:
uxc create debian /mnt/sda3/debian
uxc start debian
uxc kill debian 1
uxc create debian
uxc start debian
...

To create a container and have it automatically launched at boot:
uxc create debian /mnt/sda3/debian true

 [1]: https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#operations

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add some remaining OCI features
Daniel Golle [Tue, 28 Jul 2020 08:05:24 +0000 (09:05 +0100)]
jail: add some remaining OCI features

 * register ubus object for container to query state
 * wait on 'created' state until 'start' command is issued via ubus
 * have a way to bypass waiting on 'created' state
 * support OCI annotations pass-through

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: serialize hook execution
Daniel Golle [Sat, 25 Jul 2020 17:28:25 +0000 (18:28 +0100)]
jail: serialize hook execution

Make sure hook execution is completed before continueing with any
further actions. This involves a major refactoring ujail to use a
single uloop mainloop for each process to avoid congruency issues.
Also fix other remaining problems in code for OCI hooks, such as making
sure memory allocated to store hook information is zerod.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix build on glibc and uclibc
Daniel Golle [Sat, 25 Jul 2020 15:30:29 +0000 (16:30 +0100)]
jail: fix build on glibc and uclibc

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for referencing existing namespaces
Daniel Golle [Mon, 20 Jul 2020 14:00:23 +0000 (15:00 +0100)]
jail: add support for referencing existing namespaces

Allow OCI containers to specify paths to existing namespaces.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix wrong format for 32-bit
Rosen Penev [Mon, 20 Jul 2020 22:35:27 +0000 (15:35 -0700)]
jail: fix wrong format for 32-bit

The proper format for size_t is %zu .

Signed-off-by: Rosen Penev <rosenp@gmail.com>
4 years agorcS: cast format string to int64_t
Rosen Penev [Mon, 20 Jul 2020 22:35:26 +0000 (15:35 -0700)]
rcS: cast format string to int64_t

musl 1.2.0 turns time_t into a 64-bit value, even on 32-bit. This makes it
compatible.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
4 years agojail: re-implement /proc/sys/net read-write in netns hack
Daniel Golle [Mon, 20 Jul 2020 00:37:15 +0000 (01:37 +0100)]
jail: re-implement /proc/sys/net read-write in netns hack

Hack to make /proc/sys/net read-write while the rest of /proc/sys is
read-only which cannot be expressed with OCI spec, but happends to be
very useful. Only apply it if '/proc/sys' is not already listed as
mount, maskedPath or readonlyPath.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: refactor default mounts into new structure
Daniel Golle [Sun, 19 Jul 2020 23:12:44 +0000 (00:12 +0100)]
jail: refactor default mounts into new structure

Add default mounts of /dev, /dev/pts, /dev/shm, /proc and /sys to
the restructured mounts AVL list instead of calling mount directly.
While for slim containers this change shouldn't make any difference,
it allows OCI containers to override options of those default
filesystems.
The previous hack keeping /proc/sys/net mounted read-write if inside
a new network namespace while all the rest of /proc/sys is read-only
cannot easily be translated and is removed for now.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: actually apply filesystem-specific mount options
Daniel Golle [Sun, 19 Jul 2020 23:30:06 +0000 (00:30 +0100)]
jail: actually apply filesystem-specific mount options

OCI supplied filesystems-specific mount options have not been stored
in the add_mount() function. strdup() them there and free the original
string in the OCI function.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for defining devices
Daniel Golle [Sun, 19 Jul 2020 20:45:53 +0000 (21:45 +0100)]
jail: add support for defining devices

OCI run-time spec allows containers to specify devices to be created
in /dev in addition to the default devices.
Parse devices from linux section in config.json; clean-up and refactor
default entries in /dev into the same function using a similar scheme.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: move /tmp/resolv.conf.d to /dev/resolv.conf.d
Daniel Golle [Sun, 19 Jul 2020 19:21:33 +0000 (20:21 +0100)]
jail: move /tmp/resolv.conf.d to /dev/resolv.conf.d

OCI spec implicitely intends /dev to be used as tmpfs mounted by
default while /tmp may not be mounted or may not even exist.
Hence move /tmp/resolv.conf.d to /dev/resolv.conf.d inside
container.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: /proc/$pid/oom_score_adj to OCI defined oomScoreAdj
Daniel Golle [Sun, 19 Jul 2020 18:09:34 +0000 (19:09 +0100)]
jail: /proc/$pid/oom_score_adj to OCI defined oomScoreAdj

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: parse and apply POSIX rlimits
Daniel Golle [Sun, 19 Jul 2020 16:31:42 +0000 (17:31 +0100)]
jail: parse and apply POSIX rlimits

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: read and apply umask from OCI if defined
Daniel Golle [Sun, 19 Jul 2020 00:32:55 +0000 (01:32 +0100)]
jail: read and apply umask from OCI if defined

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: implement OCI user additionalGIDs
Daniel Golle [Sat, 18 Jul 2020 23:35:16 +0000 (00:35 +0100)]
jail: implement OCI user additionalGIDs

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: parse and apply OCI sysctl values
Daniel Golle [Sat, 18 Jul 2020 21:58:22 +0000 (22:58 +0100)]
jail: parse and apply OCI sysctl values

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix hooks
Daniel Golle [Sat, 18 Jul 2020 11:31:09 +0000 (12:31 +0100)]
jail: fix hooks

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for maskedPaths and readonlyPaths
Daniel Golle [Thu, 16 Jul 2020 22:00:35 +0000 (23:00 +0100)]
jail: add support for maskedPaths and readonlyPaths

Parse maskedPaths and readonlyPaths string arrays if defined in OCI
container linux section. readonlyPaths are implemented by adding a
recursive read-only bind-mount on the path, maskedPaths are done by
mounting a zero-sized tmpfs with mode 000 for directories and mount-
binding an empty file having mode 000 for non-directories.
Mounts of both, maskedPaths and readonlyPaths, may fail silently if
the path doesn't exist.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix some more mount options
Daniel Golle [Thu, 16 Jul 2020 12:14:51 +0000 (13:14 +0100)]
jail: fix some more mount options

Make sure 'rbind' works as expected and add support for 'iversion' and
'noiversion' options.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fs: fix build on uClibc-ng
Daniel Golle [Wed, 15 Jul 2020 21:59:59 +0000 (22:59 +0100)]
jail: fs: fix build on uClibc-ng

MS_LAZYTIME is apparently not defined on uClibc-ng. Define that macro
if not defined already. Also fix a copy&paste error which broke
'nolazytime' and 'nostrictatime' mount options.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoprocd: fix compile if procd-ujail is not selected
Daniel Golle [Wed, 15 Jul 2020 00:13:58 +0000 (01:13 +0100)]
procd: fix compile if procd-ujail is not selected

Generating syscall-names.h was added as a dependency for ujail in order
to support seccomp for OCI containers.
This, however, slipped into the wrong place and broke cmake in case
of procd-seccomp being selected but procd-ujail not being selected.
Move dependency to the right place to fix that.

Fixes: bb4a446 ("uxc: add container management CLI tool")
Reported-by: Paul Blazejowski <paulb@blazebox.homeip.net>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix false return in case of nofail mount
Daniel Golle [Mon, 13 Jul 2020 23:40:20 +0000 (00:40 +0100)]
jail: fix false return in case of nofail mount

In some cases mounts could still fail eventhough 'nofail' was set.
Make sure to always return successfull also in those cases.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoprocd: add service instance watchdog
Daniel Bailey [Mon, 13 Jul 2020 22:05:31 +0000 (15:05 -0700)]
procd: add service instance watchdog

Added instance watchdog which will eventually either terminate
or respawn an instance depending on the instance respawn setting.

Added service ubus method 'watchdog' which services the watchdog
timer and allows update of the instance watchdog mode instance.

Two modes: disabled or passive.

Disabled: cancels watchdog timer set for a given instance.

Passive: sets a instance timer which must be serviced or the
instance will be stopped/restarted (dependent upon the instance
respawn value) when the timer expires.

Signed-off-by: Daniel Bailey <danielb@meshplusplus.com>
4 years agouxc: fix build with uClibc-ng
Daniel Golle [Mon, 13 Jul 2020 23:14:07 +0000 (00:14 +0100)]
uxc: fix build with uClibc-ng

Also here _GNU_SOURCE was missing.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: fix 'stop' command
Daniel Golle [Mon, 13 Jul 2020 09:10:05 +0000 (10:10 +0100)]
uxc: fix 'stop' command

The 'stop' command was requesting an invalid ubus method. Fix method
name to make 'stop' operation work.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't make mount source read-only
Daniel Golle [Mon, 13 Jul 2020 08:57:05 +0000 (09:57 +0100)]
jail: don't make mount source read-only

From mount(2):
Specifying mountflags as:
  MS_REMOUNT | MS_BIND | MS_RDONLY
will make access through this mountpoint read-only, without affecting
other mount points.
Hence use MS_BIND when remounting container rootfs read-only.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: refactor mount support to cover OCI spec
Daniel Golle [Sun, 12 Jul 2020 22:47:52 +0000 (23:47 +0100)]
jail: refactor mount support to cover OCI spec

Extend existing support for bind-mounts to allow arbitrary mounts
defined in OCI spec.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: memory allocation fixes
Daniel Golle [Sun, 12 Jul 2020 17:57:56 +0000 (18:57 +0100)]
jail: memory allocation fixes

Make sure envp and argv are allocated and NULL-terminated arrays.
Free opts before parent process quits, free everything but argv,
envp and seccomp filter before execv into user process.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: parse and run OCI hooks
Daniel Golle [Sun, 12 Jul 2020 14:18:58 +0000 (15:18 +0100)]
jail: parse and run OCI hooks

OCI run-time allows defining hooks to be executed at various stages of
the container life cycle. Parse hooks and run them accordingly.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: actually chdir into OCI defined CWD
Daniel Golle [Mon, 13 Jul 2020 11:11:32 +0000 (12:11 +0100)]
jail: actually chdir into OCI defined CWD

Current working directory was parsed by never applied. Apply it just
before executing user process.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: consider PATH for argv in OCI container
Daniel Golle [Mon, 13 Jul 2020 02:00:22 +0000 (03:00 +0100)]
jail: consider PATH for argv in OCI container

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix segfault with len(uidmap/gidmap) > 1
Daniel Golle [Sun, 12 Jul 2020 16:36:05 +0000 (17:36 +0100)]
jail: fix segfault with len(uidmap/gidmap) > 1

Allocate enough memory for all uidmap/gidmap entries.

Fixes: ea7a790 ("jail: add support for running OCI bundle")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoprocd: fix compilation with uClibc-ng
Rosen Penev [Wed, 24 Jun 2020 23:48:54 +0000 (16:48 -0700)]
procd: fix compilation with uClibc-ng

_GNU_SOURCE was missing.

Also defined two macros unavailable with uClibc-ng.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
[resolved conflict in jail.c]
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agojail: use linux/capability.h instead of sys/capability.h
Daniel Golle [Sat, 11 Jul 2020 10:03:56 +0000 (11:03 +0100)]
jail: use linux/capability.h instead of sys/capability.h

Remove bogus build-dependency on libcap by using linux uapi header
and libc-provided syscall wrappers for capget/capset.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoujail: add dependency on syscall-names-h
Daniel Golle [Sat, 11 Jul 2020 09:42:43 +0000 (10:42 +0100)]
ujail: add dependency on syscall-names-h

Makes sure syscall-names.h gets generated before trying to compile
ujail with OCI seccomp support.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: fix build on platforms without seccomp support
Daniel Golle [Fri, 10 Jul 2020 22:53:59 +0000 (23:53 +0100)]
jail: fix build on platforms without seccomp support

buildbots started failing due to -Werror=missing-declarations
for 'parseOCIlinuxseccomp' and 'applyOCIlinuxseccomp'.
Make sure functions were declared before defining comptibility stubs
for non-seccomp platforms.

Fixes: ea7a790 ("jail: add support for running OCI bundle")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agouxc: add container management CLI tool
Daniel Golle [Fri, 10 Jul 2020 09:57:23 +0000 (10:57 +0100)]
uxc: add container management CLI tool

As procd can now provide a fully fetured container runtime using ujail,
add a (for now) simple CLI tool to list, add, delete, start and stop
OCI-complaint container bundles and selecting whether they should be
launched on boot.
In future commits, this will be extended to provide state output, take
care of hooks, send signals and fetch remote container images in
accordance with the Open Container Initiative Runtime Specification.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for running OCI bundle
Daniel Golle [Fri, 10 Jul 2020 09:56:58 +0000 (10:56 +0100)]
jail: add support for running OCI bundle

Prepare ujail for running OCI bundled Linux containers.
This adds handling of most of the JSON schema defined by the
Open Container Initiative Runtime Specification.

What is supported by this commits:
 * basic OCI process definition
 * seccomp filters (no args yet)
 * capabilities (100%)
 * namespaces (100%)
 * uid/gid mappings for userns (100%)
 * mounts (no free form mounts yet)
 * env (100%, limited to a low number entries)
 * hostname (100%)
 * terminal (no consoleSize yet)

What is still missing:
 * complex mounts
 * maskedPaths, readonlyPaths
 * referencing existing namespaces
 * all hooks
 * rlimits
 * oomScoreAdj
 * additionalGids
 * cgroups
 * devices
 * sysctl
 * rootfsPropagation
 * personality and bi-arch (ie. 32-bit container on 64-bit host)

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: handle containers seperately
Daniel Golle [Wed, 20 May 2020 14:26:08 +0000 (15:26 +0100)]
jail: handle containers seperately

To make the API more clean and running containers less of a hidden
feature offer new object ubus 'containers' to handle container
operations similar to how services are handled.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: use sane termios settings for console pts
Daniel Golle [Wed, 20 May 2020 13:57:21 +0000 (14:57 +0100)]
jail: use sane termios settings for console pts

The previously used expression (inpired by LXC) didn't actually make
a lot of sense. Replace it with something inspired by a more recent
version of LXC...

Reported-by: Oldřich Jedlička <oldium.pro@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add option to provide /dev/console to containers
Daniel Golle [Sun, 12 Apr 2020 21:35:25 +0000 (22:35 +0100)]
jail: add option to provide /dev/console to containers

Create UNIX/98 PTY, pass master fd to procd and setup mount-bind of
slave PTS device on /dev/console inside jail.
Allow attaching to an instance's console by using the newly introduced
ujail-console command (no multiplexing for now).

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: unnamed jails can not have netns (fix segfault)
Leonardo Mörlein [Fri, 8 May 2020 00:58:25 +0000 (02:58 +0200)]
jail: unnamed jails can not have netns (fix segfault)

Signed-off-by: Leonardo Mörlein <me@irrelefant.net>
4 years agojail: SIGSEGV must not be forwarded to the child process
Leonardo Mörlein [Fri, 8 May 2020 00:58:24 +0000 (02:58 +0200)]
jail: SIGSEGV must not be forwarded to the child process

A segfault in ujail caused ujail to hang with no chance to abort.
Raising the debug level revealed that SIGSEGV was delivered to
the child process instead of handled directly by ujail. The
corresponding debug message was triggered infinitely again and
again:

forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
forwarding signal 11 to the jailed process
[...]

Signed-off-by: Leonardo Mörlein <me@irrelefant.net>
4 years agojail: don't load libpreload-seccomp.so if it doesn't exist
Daniel Golle [Sat, 25 Apr 2020 09:24:35 +0000 (10:24 +0100)]
jail: don't load libpreload-seccomp.so if it doesn't exist

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: don't fail unless requirejail is set
Daniel Golle [Sat, 25 Apr 2020 08:48:46 +0000 (09:48 +0100)]
jail: don't fail unless requirejail is set

Pass requirejail attribute to ujail and only fail to start a service
which has seccomp policy defined on a system which doesn't have
procd-seccomp installed in case requirejail is set.

Fixes: bcb8655 ("instance: add 'requirejail' attribute")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: include /etc/nsswitch.conf in jail for glibc.
Daniel Golle [Sun, 19 Apr 2020 22:06:51 +0000 (23:06 +0100)]
jail: include /etc/nsswitch.conf in jail for glibc.

/etc/nsswitch.conf is needed to resolve usernames and groups from
/etc/passwd and /etc/groups, name resoultion and a bunch of other
things when using glibc.
Mount /etc/nsswitch.conf in jail when building against glibc.

Reported-by: Tobias Waldvogel <tobias.waldvogel@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: always mount /dev as additional tmpfs
Daniel Golle [Tue, 14 Apr 2020 14:46:03 +0000 (15:46 +0100)]
jail: always mount /dev as additional tmpfs

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: replace /etc/resolv.conf with symlink in extroot+overlay
Daniel Golle [Mon, 13 Apr 2020 01:03:53 +0000 (02:03 +0100)]
jail: replace /etc/resolv.conf with symlink in extroot+overlay

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: mount /sys read-only
Daniel Golle [Sun, 12 Apr 2020 20:39:05 +0000 (21:39 +0100)]
jail: mount /sys read-only

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: make /proc more secure
Daniel Golle [Sun, 12 Apr 2020 20:12:20 +0000 (21:12 +0100)]
jail: make /proc more secure

Make sure /proc/sys is read-only while keeping read-write access to
/proc/sys/net if spawning a new network namespace.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinstance: harmonize instance API
Daniel Golle [Sun, 12 Apr 2020 18:31:36 +0000 (19:31 +0100)]
instance: harmonize instance API

Move attributes in generated output to match their place in the
expected input.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: only mess with rootfs if CLONE_NEWNS was set
Daniel Golle [Sun, 12 Apr 2020 14:51:49 +0000 (15:51 +0100)]
jail: only mess with rootfs if CLONE_NEWNS was set

Avoid messing up rootfs of the parent/only mount namespace for the
unusual case of a jailed process which does use namespaces, but
doesn't make use of mount namespaces.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for (ram-)overlayfs
Daniel Golle [Fri, 20 Mar 2020 18:21:43 +0000 (18:21 +0000)]
jail: add support for (ram-)overlayfs

Add support for running service with a read/write filesystem overlay.
This can either be a user-defined directory for persistency or reside
on a tmpfs with fixed size in the RAM.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for userns and cgroupsns
Daniel Golle [Fri, 20 Mar 2020 18:20:51 +0000 (18:20 +0000)]
jail: add support for userns and cgroupsns

Add options to have jailed process inside new user namespace and
cgroups namespace.
Currently only the root user inside the container is mapped.
Also, mounting /proc currently still fails in the new user namespace
with permission denied for unknown reasons.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add support for launching extroot containers
Daniel Golle [Fri, 20 Mar 2020 18:19:53 +0000 (18:19 +0000)]
jail: add support for launching extroot containers

Add option to ujail to use an existing rootfs when launching a
containerized service. Later on this option will also be used to
launch full-system containers.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: mount-bind /etc/resolv.conf for non-netns jails
Daniel Golle [Thu, 12 Mar 2020 21:54:19 +0000 (22:54 +0100)]
jail: mount-bind /etc/resolv.conf for non-netns jails

Many applications won't work without name resolution and expect
/etc/resolv.conf in place. While this is already handled for
netns-jails, simply mount-bind /etc/resolv.conf for non-netns-jails.

Signed-off-by: Daniel Golle <daniel@makrotoia.org>
4 years agoseccomp: fix resource leak
Kevin Darbyshire-Bryant [Tue, 11 Feb 2020 09:07:00 +0000 (09:07 +0000)]
seccomp: fix resource leak

Fix coverity reported resource leaks:

CID 1446217:    (RESOURCE_LEAK)
   Variable "filter" going out of scope leaks the storage it points to.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
4 years agoinstance: add 'requirejail' attribute
Kevin Darbyshire-Bryant [Thu, 30 Jan 2020 17:35:06 +0000 (17:35 +0000)]
instance: add 'requirejail' attribute

Since commit b44417c instance: provide error feedback if ujail binary is
missing, worrying log spam of the form "unable to find /sbin/jail ..."
may be encountered.

On systems not configured with jail capabilities the lack of jail binary
is not an error, whilst on systems with jail capabilities the warning
will be issued and the process is started outside of a jail.

This commit adds a new procd jail parameter 'mustjail' which if set
issues an error and does NOT start the process outside of a jailed
environment.

The original 'unable to find jail binary' warning is output in DEBUG
mode, thus processes started in a 'may jail' but non-jail capable
environment do not spam the log.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
4 years agoprocd: show process's exit code
Ondřej Votava [Thu, 23 Jan 2020 14:31:31 +0000 (15:31 +0100)]
procd: show process's exit code

Adds feature to show exit code of processes launched by procd.
The exit code is shown for finished process when ubus's
service list method is called.

The exit code value is computed according to waitpid(2)
and http://tldp.org/LDP/abs/html/exitcodes.html

Signed-off-by: Ondřej Votava <ondrej.votava@cvut.cz>
4 years agostate: fix reboot causing shutdown inside LXC container
Petr Štetiar [Wed, 15 Jan 2020 19:28:38 +0000 (20:28 +0100)]
state: fix reboot causing shutdown inside LXC container

Executing `reboot` command in OpenWrt system runing inside LXC container
results in a shutdown of the container instead of rebooting the
container.

This appears to have been caused by commit 832369078d81 ("state: fix
shutdown when running in a container (FS#2425)"), which exits the pid
einz instead of the reboot().

While at it, refactor the halting code into separate function to shorten
the switch/case block and make it clearer, decrease the indentation
level by reversing the container if condition, replace magic 0 with
EXIT_SUCCESS constant in exit() and make it wait 1s for reboot message
delivery in both container/host cases as well.

Ref: FS#2666
Cc: Paul Spooren <mail@aparcar.org>
Fixes: 832369078d81 ("state: fix shutdown when running in a container (FS#2425)")
Tested-by: Baptiste Jonglez <lede@bitsofnetworks.org>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agoinstance: provide error feedback if ujail binary is missing
Petr Štetiar [Fri, 17 Jan 2020 15:21:51 +0000 (16:21 +0100)]
instance: provide error feedback if ujail binary is missing

Otherwise it's quite hard to track such issues.  While at it, be DRY and
use UJAIL_BIN_PATH constant for ujail binary.

Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agojail: more strict mount options for /tmp/resolv.conf.d/
Daniel Golle [Fri, 3 Jan 2020 13:54:57 +0000 (15:54 +0200)]
jail: more strict mount options for /tmp/resolv.conf.d/

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: create resolv.conf symlink for netns jails
Daniel Golle [Fri, 3 Jan 2020 10:29:17 +0000 (12:29 +0200)]
jail: create resolv.conf symlink for netns jails

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: add basic support for network namespaces
Daniel Golle [Wed, 4 Dec 2019 13:06:06 +0000 (14:06 +0100)]
jail: add basic support for network namespaces

Add new 'netns' flag for procd_add_jail to make ujail setup a new
network namespace for the jailed service.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinstance: Fix instance_config_move_strdup() function
Daniel Golle [Sun, 19 Jan 2020 07:42:37 +0000 (09:42 +0200)]
instance: Fix instance_config_move_strdup() function

instance_config_move_strdup() previously returned too early in case of
a value being previously unassigned.

Fixes: 153820c ("instance: fix pidfile and seccomp attributes double free")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agoinstance: fix typo in error message
Petr Štetiar [Fri, 10 Jan 2020 21:56:31 +0000 (22:56 +0100)]
instance: fix typo in error message

Fixes `removed` to proper `remove` in "Failed to removed pidfile".

Fixes: b12bb150ed38 ("procd: service: Support writing pidfiles")
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agoinstance: fix pidfile and seccomp attributes double free
Petr Štetiar [Fri, 17 Jan 2020 17:22:37 +0000 (18:22 +0100)]
instance: fix pidfile and seccomp attributes double free

Commit a5af33ce9a16 ("instance: strdup string attributes") has
introduced duplication of various string attributes in order to fix
use-after-free, but missed handling of `pidfile` and `seccomp` attribute
cases in instance_config_move() where the new value of `pidfile` or
`seccomp` is being copied/assigned. Source of this values is then
free()d in subsequent call to instance_free() and then again for 2nd
time during the service stop command handling, leading to double free
crash:

 #0  unmap_chunk at src/malloc/malloc.c:515
 #1  free at src/malloc/malloc.c:526
 #2  instance_free (in=0xd5e300) at instance.c:1100
 #3  instance_delete (in=0xd5e300) at instance.c:559
 #4  instance_stop (in=0xd5e300, halt=true) at instance.c:611

While at it, add missing handling of jail.name and jail.hostname
attributes as well.

Ref: FS#2723
Fixes: a5af33ce9a16 ("instance: strdup string attributes")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agoinstance: strdup string attributes
Daniel Golle [Sat, 4 Jan 2020 14:16:12 +0000 (16:16 +0200)]
instance: strdup string attributes

Previously string attributes were set to pointers returned by
blobmsg_get_string() which caused use-after-free problems.
Use strdup() to have copies of all stored strings and free them
during cleanup.

Reviewed-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agosystem: watchdog_set: fix misleading indentation
Petr Štetiar [Sun, 5 Jan 2020 10:48:35 +0000 (11:48 +0100)]
system: watchdog_set: fix misleading indentation

Fixes error reported by clang version 10.0.0-+20200102091410:

 system.c:367:4: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
                  watchdog_timeout(timeout);
                  ^
 system.c:365:3: note: previous statement is here
                 if (timeout <= frequency)

Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agosystem: sysupgrade: fix possibly misleading error
Petr Štetiar [Fri, 3 Jan 2020 00:26:50 +0000 (01:26 +0100)]
system: sysupgrade: fix possibly misleading error

Fix possibly misleading error "Firmware image is broken and cannot be
installed" which could be produced by JSON without expected validation
variables, where "Validation script provided invalid input" error message
makes more sense.

Cc: Rafał Miłecki <rafal@milecki.pl>
Tested-by: Kuan-Yi Li <kyli@abysm.org>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agosystem: sysupgrade: rework firmware validation
Petr Štetiar [Mon, 30 Dec 2019 21:34:50 +0000 (22:34 +0100)]
system: sysupgrade: rework firmware validation

Fixes following deficiencies:

 * unhandled read() errors
 * everything bundled in one long function, which is hard to follow and
   reason about
 * JSON parser errors are being ignored, anything else then
   json_tokener_continue is fatal error
 * JSON parser errors are being output to stderr, thus invisible via SSH
 * validate_firmware_image_call can fail at a lot of places, but we just
   get one generic "Firmware image couldn't be validated" so it's hard
   to debug

Cc: Rafał Miłecki <rafal@milecki.pl>
Tested-by: Kuan-Yi Li <kyli@abysm.org>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agosystem: fix failing image validation due to EINTR
Petr Štetiar [Sun, 29 Dec 2019 20:25:33 +0000 (21:25 +0100)]
system: fix failing image validation due to EINTR

It was quite common to see following error during sysupgrade on serial
console:

 Failed to parse JSON: 4

This is happening due to the fact, that validate_firmware_image_call
fork()s then waits in blocking read() for the input from the child
process, but child finishes its tasks and exits, thus emitting SIGCHLD
signal which then leads to the interruption of the blocking read() in
the parent process with EINTR error.

It seems like the recent fixes in the libubox library, particulary in
the jshn sub-component (which empowers json_dump used in the shell
script executed by the child process) made the execution somehow faster,
thus exposing this racy behaviour in the validate_firmware_image_call at
least on RPi-4 (Cortex-A72) target.

So this patch fixes this issue by checking the read() return value and
retrying the read() if interrupted due to the EINTR error.

Ref: http://lists.infradead.org/pipermail/openwrt-devel/2020-January/020994.html
Fixes: e990e215e8a3 ("system: add "validate_firmware_image" ubus method")
Cc: Rafał Miłecki <rafal@milecki.pl>
Tested-by: Kuan-Yi Li <kyli@abysm.org>
Tested-by: Petr Novák <petrn@me.com>
Reported-by: Petr Novák <petrn@me.com>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agocmake: fix lookup of external libraries
Petr Štetiar [Sun, 29 Dec 2019 14:56:43 +0000 (15:56 +0100)]
cmake: fix lookup of external libraries

In order to make it compile properly in more environments.

Tested-by: Petr Novák <petrn@me.com>
Tested-by: Kuan-Yi Li <kyli@abysm.org>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
4 years agojail: remove accidentally added lines
Daniel Golle [Mon, 30 Dec 2019 18:22:45 +0000 (20:22 +0200)]
jail: remove accidentally added lines

The previous commit accidentally added unrelated lines which broke
build. Remove them.

Fixes: 2c5c19 ("jail: set user and group inside jail")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
4 years agojail: set user and group inside jail
Daniel Golle [Sun, 29 Dec 2019 14:23:34 +0000 (16:23 +0200)]
jail: set user and group inside jail

This allows jailed services to run as users other than root, simply
because some services refuse to be run as UID 0.
Previously, setting the the process UID and GID before launching the
jail wrapper prevented the jail from starting.
Rather than setting them in procd/service.c, pass user and group
parameters to ujail and set them inside ujail just before executing the
service.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
5 years agosystem: sysupgrade: close input side of pipe before reading
Dustin Lundquist [Mon, 28 Oct 2019 16:52:06 +0000 (16:52 +0000)]
system: sysupgrade: close input side of pipe before reading

When /usr/libexec/validate_firmware_image is not present on the system
procd will hang indefinitely on the read() since the input side of the
pipe is still open.

Also fix pipe file descriptor leak when fork() fails.

Signed-off-by: Dustin Lundquist <d.lundquist@temperednetworks.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
5 years agoinstance: Warn about unexpected number of parameters
Hauke Mehrtens [Fri, 1 Nov 2019 16:16:39 +0000 (17:16 +0100)]
instance: Warn about unexpected number of parameters

Warn when the number of allocated parameters for the jail argv does not
match the number of used parameters. This normally leads to a buffer
overflow.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
5 years agoinstance: ujail: Fix allocated size for no_new_privs parameter
Hauke Mehrtens [Fri, 1 Nov 2019 16:16:38 +0000 (17:16 +0100)]
instance: ujail: Fix allocated size for no_new_privs parameter

When the no_new_privs parameter is given, thei size of the  array which
contains the argv pointers is not increased in instance_jail_parse()
which causes a buffer overflow. Fix this by requesting one more entry in
instance_jail_parse() for the allocation.

Fixes: dfd5816bcbef ("instance, ujail: wire no_new_privs (-c) option")
Cc: Etienne CHAMPETIER <champetier.etienne@gmail.com>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
5 years agoprocd: simplify code in procd_inittab_run
Michael Heimpold [Tue, 1 Jan 2019 23:44:53 +0000 (00:44 +0100)]
procd: simplify code in procd_inittab_run

This is a trial to make it more obvious what the historically
grown code is actually doing.

Signed-off-by: Michael Heimpold <mhei@heimpold.de>
5 years agoprocd: replace exit(-1) with exit(EXIT_FAILURE)
Michael Heimpold [Tue, 1 Jan 2019 23:44:59 +0000 (00:44 +0100)]
procd: replace exit(-1) with exit(EXIT_FAILURE)

Signed-off-by: Michael Heimpold <mhei@heimpold.de>
5 years agoprocd: add upgraded binary to .gitignore
Michael Heimpold [Tue, 1 Jan 2019 23:44:58 +0000 (00:44 +0100)]
procd: add upgraded binary to .gitignore

Signed-off-by: Michael Heimpold <mhei@heimpold.de>
5 years agoprocd: add start-console support
Michael Heimpold [Tue, 1 Jan 2019 23:44:57 +0000 (00:44 +0100)]
procd: add start-console support

This adds a hotplug function to (re-)start inittab entries with askfirst or respawn.

At the moment the devices used with these actions must be present during boot
otherwise such lines are skipped.

However, this prevents having inittab entries with consoles for e.g. USB gadget
devices which only appear after kernel module loading and after configuring them
with configfs.

While it was possible to only scan the inittab for the desired item to start,
I assume the inittab to be short and re-running the whole list will be negligible.

Signed-off-by: Michael Heimpold <mhei@heimpold.de>
5 years agoprocd: shift arguments for askfirst only once
Michael Heimpold [Tue, 1 Jan 2019 23:44:56 +0000 (00:44 +0100)]
procd: shift arguments for askfirst only once

In case we want to process an inittab item multiple times (e.g. in case
of hotplugging) we must not shift the arguments for askfirst multiple
times. So check whether we already did it.

Signed-off-by: Michael Heimpold <mhei@heimpold.de>
5 years agoprocd: skip respawn in case device disappeared
Michael Heimpold [Tue, 1 Jan 2019 23:44:55 +0000 (00:44 +0100)]
procd: skip respawn in case device disappeared

Signed-off-by: Michael Heimpold <mhei@heimpold.de>
5 years agoprocd: guard fork_worker calls
Michael Heimpold [Tue, 1 Jan 2019 23:44:54 +0000 (00:44 +0100)]
procd: guard fork_worker calls

Usually respawn(), askfirst(), askconsole() and rcrespawn() are run only
one time to start a worker child for the given inittab entry.

In case we want to allow calling these functions several times, we need
to ensure that we do not start multiple workers at the same time for the
same inittab item.

For this, we can re-use the remembered pid of the worker child,
however, we need to reset this pid to allow a new instance in case the
previous child exited.

Signed-off-by: Michael Heimpold <mhei@heimpold.de>
5 years agoprocd: Add cached and available to memory table
Zachary Cook [Tue, 8 Oct 2019 05:02:50 +0000 (01:02 -0400)]
procd: Add cached and available to memory table

Provides a better measure of actual system memory usage for Luci/users.
"cached" will be used to add a new progress bar, "available" is the
kernel's estimate of memory that is actually useable, and is more
accurate than (memory.free + memory.buffered) that Luci currently uses
to calculate available memory.

Signed-off-by: Zachary Cook <zachcook1991@gmail.com>
5 years agoprocd: Switch to nanosleep
Rosen Penev [Sun, 1 Sep 2019 20:26:43 +0000 (13:26 -0700)]
procd: Switch to nanosleep

usleep has been deprecated by POSIX.1-2001 and removed in POSIX.1-2008.
Fixes compilation when libc does not include usleep (optional with
uClibc-ng).

nanosleep also has the advantage of being more accurate.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
5 years agosystem: Fix possible integer overflows
Hauke Mehrtens [Fri, 13 Sep 2019 20:04:03 +0000 (22:04 +0200)]
system: Fix possible integer overflows

This multiplication was done on 32 bit integers before, explicitly cast
them to 64 bit values before to make sure the multiplication is done on
64 bit numbers.

Coverity: #1412417, #1412410, #1412409, #1412411, #1412424, #1412407
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
5 years agosystem: sysupgrade: send reply on error
Rafał Miłecki [Wed, 11 Sep 2019 09:21:59 +0000 (11:21 +0200)]
system: sysupgrade: send reply on error

This provides some meaningful info on why sysupgrade has failed.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Jo-Philipp Wich <jo@mein.io>
5 years agosystem: refuse sysupgrade with backup if it's unsupported
Rafał Miłecki [Wed, 11 Sep 2019 08:34:41 +0000 (10:34 +0200)]
system: refuse sysupgrade with backup if it's unsupported

Don't allow it if validation methods marked firmware as not supporting a
backup.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
5 years agosysupgrade: support "backup" attribute
Rafał Miłecki [Wed, 11 Sep 2019 06:58:15 +0000 (08:58 +0200)]
sysupgrade: support "backup" attribute

This new attribute allows passing path of the backup archive. It
provides much more flexibility than hardcoding /tmp/sysupgrade.tgz. It
may help avoiding some cp/mv for user-provided backup archive.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
5 years agosysupgrade: set UPGRADE_BACKUP env variable
Rafał Miłecki [Thu, 5 Sep 2019 07:20:13 +0000 (09:20 +0200)]
sysupgrade: set UPGRADE_BACKUP env variable

It points to the backup file to use duing sysupgrace process. Right now
it's hardcoded to the /tmp/sysupgrade.tgz. Once all cleanups are in
place "sysupgrade" ubus method should be extended to allow passing any
custom path.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
5 years agosystem: fix uninitialized variables in firmware validation code
Rafał Miłecki [Thu, 5 Sep 2019 21:07:21 +0000 (23:07 +0200)]
system: fix uninitialized variables in firmware validation code

This fixes:
system.c: In function 'validate_firmware_image':
system.c:403:6: error: 'fd' may be used uninitialized in this function [-Werror=maybe-uninitialized]
   if (fd >= 0) {
      ^
system.c:446:4: error: 'jsobj' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    blobmsg_add_object(&b, jsobj);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: e990e215e8a3 ("system: add "validate_firmware_image" ubus method")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
5 years agosystem: reject sysupgrade of invalid firmware images by default
Rafał Miłecki [Wed, 4 Sep 2019 09:06:52 +0000 (11:06 +0200)]
system: reject sysupgrade of invalid firmware images by default

This validation step can be bypassed by passing "force" argument. This
is very similar to the /sbin/sysupgrade behavior and --force.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
5 years agosystem: reject sysupgrade of broken firmware images
Rafał Miłecki [Fri, 30 Aug 2019 15:46:07 +0000 (17:46 +0200)]
system: reject sysupgrade of broken firmware images

This uses recently added "validate_firmware_image" to validate passed
firmware. If it happens to be invalid and marked as impossible to force
then sysupgrade simply exits with an error.

This change is needed to avoid bricking devices with some totally broken
images.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
5 years agosystem: add "validate_firmware_image" ubus method
Rafał Miłecki [Fri, 30 Aug 2019 07:28:34 +0000 (09:28 +0200)]
system: add "validate_firmware_image" ubus method

This new method allows validating firmware image (stored on a device)
using ubus. It uses new executable helper that provides detailed info
about firmware image.

The point of this method is to allow user interfaces provide various
info before starting actual upgrade process.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>