The Linux version of core used a part of the BSS to simulate access to
physical memory. All dataspaces would refer to a portion of 'some_mem'.
So every time when core would access the dataspace content, it would
access its local BSS. For all processes outside of core, dataspaces were
represented as files. This patch removes the distinction between core
and non-core processes. Now, core uses the same 'Rm_session_mmap'
implementation as regular processes. This way, the 'some_mem' could be
abandoned. We still use BSS variable for allocating core-local meta
data through.
This patch improves the life-time management of socket descriptors and
addresses several corner cases exposed by the 'bomb' test.
The lookup and association of file descriptors with global IDs have been
turned into an atomic operation. Otherwise, multiple threads interacting
with the singleton 'ep_sd_registry' may override each other's
associations.
Closing the socket pair used for the reply channel has been implemented
via the RAII pattern to capture all corner cases, in particular
exceptions.
If blocking operations are interrupted by signals, we throw a
'Blocking_canceled' exception.
We preserve core's socket descriptor at 'PARENT_SOCKET_HANDLE' to avoid
a corner case where the parent capability is going to dup2'ed to the
same handle.
Support for 'Thread_base::join' within core to enable leaving Genode via
Control-C.
With this patch, custom UIDs and GIDs can be assigned to individual
Genode processes or whole Genode subsystems.
The new 'base-linux/run/lx_uid.run' script contains an example of how to
use the feature.
Fixes#510
On Linux, we want to attach additional attributes to processes, i.e.,
the chroot location, the designated UID, and GID. Instead of polluting
the generic code with such Linux-specific platform details, I introduced
the new 'Native_pd_args' type, which can be customized for each
platform. The platform-dependent policy of init is factored out in the
new 'pd_args' library.
The new 'base-linux/run/lx_pd_args.run' script can be used to validate
the propagation of those attributes into core.
Note that this patch does not add the interpretation of the new UID and
PID attributes by core. This will be subject of a follow-up patch.
Related to #510.
This ensures that the cwd of the process is within the chroot
environment, improving security for root processes.
The cwd after the chroot is the same as before, this is needed to
start binaries given as relative path name.
Using the new 'join()' function, the caller can explicitly block for the
completion of the thread's 'entry()' function. The test case for this
feature can be found at 'os/src/test/thread_join'. For hybrid
Linux/Genode programs, the 'Thread_base::join()' does not map directly
to 'pthread_join'. The latter function gets already called by the
destructor of 'Thread_base'. According to the documentation, subsequent
calls of 'pthread_join' for one thread may result in undefined behaviour.
So we use a 'Genode::Lock' on this platform, which is in line with the
other platforms.
Related to #194, #501
When an IPC server is finalized two important things should happen:
First, the association of the server socket with a capability must be
invalidated. And finally, the server socket pair (server side and client
side) must be closed.
Related to #38.
This patch fixes the 'lx_hybrid_pthread_ipc.run' test. In order to use
the 'Genode::Lock' we need to set the SIGUSR1 handler to an empty handler.
Normally, this happens when creating a thread via the Genode API. But as
this test creates a thread via the pthread library and thereby bypasses
the Genode API, the signal handler remained unset.
Using the host compiler in this case seems to be an artifact from an
older change. On x86_64, this approach ended in unsable hybrid binaries
due to incompatible handling of non-trivial return values, i.e.
structures. See '-freg-struct-return' in GCC manual page:
"[...] If there is no standard convention, GCC defaults to
-fpcc-struct-return, except on targets where GCC is the principal
compiler. In those cases, we can choose the standard, and we chose
the more efficient register return alternative."
In other words: All x86_64 Linux systems break the ABI standard :-(
The thread ID reported to core was not always initialized prior the RPC
call. The 'startup_lock' ensures that the thread is completely
initialized before this information gets propagated.
Since the recent move of the process creation into core, the original chroot trampoline
mechanism implemented in 'os/src/app/chroot' does not work anymore. A
process could simply escape the chroot environment by spawning a new
process via core's PD service. Therefore, this patch moves the chroot
support into core. So the chroot policy becomes mandatory part of the
process creation. For each process created by core, core checks for
'root' argument of the PD session. If a path is present, core takes the
precautions needed to execute the new process in the specified chroot
environment.
This conceptual change implies minor changes with respect to the Genode
API and the configuration of the init process. The API changes are the
enhancement of the 'Genode::Child' and 'Genode::Process' constructors to
take the root path as argument. Init supports the specification of a
chroot per process by specifying the new 'root' attribute to the
'<start>' node of the process. In line with these changes, the
'Loader::Session::start' function has been enhanced with the additional
(optional) root argument.
When building in hybrid Linux/Genode mode, there exist two definitions
of 'size_t', one in the 'Genode' namespace and one imported from the
glibc headers.
On Linux, we use the session label for naming the corresponding Linux
process. When looking up the processes via 'ps', the Genode process
hierarchy becomes immediately visible.
Genode used to create new processes by directly forking from the
respective Genode parent using the process library. The forking process
created a PD session at core merely for propagating the PID of the new
process into core (for later destruction). This traditional mechanisms
has the following disadvantages:
First, the PID reported by the creating process to core cannot easily be
validated by core. Therefore core has to trust the PD client to not
specify a PID of an existing process, which would happen to be killed
once the PD session gets destructed. This problem is documented by
issue #318. Second, there is no way for a Genode process to detect the
failure of its any grandchildren. The immediate parent of a faulting
process could use the SIGCHLD-and-waitpid mechanism to observe its
children but this mechanism does not work transitively.
By performing the process creation exclusively within core, all Genode
processes become immediate child processes of core. Hence, core can
respond to failures of any of those processes and reflect such
conditions via core's session interfaces. Furthermore, the PID
associated to a PD session is locally known within core and cannot be
forged anymore. In fact, there is actually no need at all to make
processes aware of any PIDs of other processes.
Please note that this patch breaks the 'chroot' mechanism that comes in
the form of the 'os/src/app/chroot' program. Because all processes are
forked from core, a chroot'ed process could sneak outside its chroot
environment by just creating a new Genode process. To address this
issue, the chroot mechanism must be added to core.
This patch simplifies the system call bindings. The common syscall
bindings in 'src/platform/' have been reduced to the syscalls needed by
non-core programs. The additional syscalls that are needed solely by
core have been moved to 'src/core/include/core_linux_syscalls.h'.
Furthermore, the resource path is not used outside of core anymore.
Hence, we could get rid of the rpath library. The resource-path code has
been moved to 'src/core/include/resource_path.h'. The IPC-related parts
of 'src/platform' have been moved to the IPC library. So there is now a
clean separation between low-level syscall bindings (in 'src/platform')
and higher-level code.
The code for the socket-descriptor registry is now located in the
'src/base/ipc/socket_descriptor_registry.h' header. The interface is
separated from 'ipc.cc' because core needs to access the registry from
outside the ipc library.
This patch changes the way of how dataspace content is accessed by
processes outside of core. Dataspaces are opened by core only and the
corresponding file descriptors are handed out the other processes via
the 'Linux_dataspace::fd()' RPC function. At the client side, the
returned file descriptor is then used to mmap the file.
Consequently, this patch eliminates all files from 'lx_rpath'. The
path is still needed by core to temporarily create dataspaces and
unix domain sockets. However, those files are unlinked immediately
after their creation.
This patch alleviates the need for any non-core process to create Unix
domain sockets locally. All sockets used for RPC communication are
created by core and subsequently passed to the other processes via RPC
or the parent interface. The immediate benefit is that no process other
than core needs to access the 'rpath' directory in order to communicate.
However, access to 'rpath' is still needed for accessing dataspaces.
Core creates one socket pair per thread on demand on the first call of
the 'Linux_cpu_session::server_sd()' or 'Linux_cpu_session::client_sd()'
functions. 'Linux_cpu_session' is a Linux-specific extension to the CPU
session interface. In addition to the socket accessors, the extension
provides a mechanism to register the PID/TID of a thread. Those
information were formerly propagated into core along with the thread
name as argument to 'create_thread()'.
Because core creates socket pairs for entrypoints, it needs to know all
threads that are potential entrypoints. For lx_hybrid programs, we
hadn't had propagated any thread information into core, yet. Hence, this
patch also contains the code for registering threads of hybrid
applications at core.
This patch eliminates the thread ID portion of the 'Native_capability'
type. The access to entrypoints is now exclusively handled by passing
socket descripts over Unix domain sockets and by inheriting the socket
descriptor of the parent entrypoint at process-creation time.
Each entrypoint creates a socket pair. The server-side socket is bound
to a unique name defined by the server. The client-side socket is then
connected to the same name. Whereas the server-side socket is meant to
be exclusively used by the server to wait for incoming requests, the
client-side socket can be delegated to other processes as payload of RPC
messages (via SCM rights). Anyone who receives a capability over RPC
receives the client-side socket of the entrypoint to which the
capability refers. Given this socket descriptor, the unique name (as
defined by the server) can be requested using 'getpeername'. Using this
name, it is possible to compare socket descriptors, which is important
to avoid duplicates from polluting the limited socket-descriptor name
space.
Wheras this patch introduces capability-based delegation of access
rights to entrypoints, it does not cover the protection of the integrity
of RPC objects. RPC objects are still referenced by a global ID passed
as normal message payload.
This patch adds prinicipal support for transmitting socket descriptors
as RPC payload. Socket descriptors are handled by the linux-specific
implementation of the capability marshalling and unmarshalling functions
in 'ipc.h'. The 'Message' type in 'src/platform/linux_socket.h' has been
extended to carry multiple descriptors in a single message.
Unfortuately, we hit a problem (and potential show stopper) here:
lx_sendmsg failed with -109 in lx_call()
The error code corresponds to ETOOMANYREFS. There is only one place in
the Linux kernel where this error code is used (net/unix/af_unix.c).
The code for 'unix_attach_fds()' suggests that there is a limit with
regard to the maximum number of references for a given Unix domain
socket. When the error occurs, core and init are running. The socket
of core's server entrypoint is present in the '/proc/pid/fd' of those
processes 8 times. The error occurs when core tries to perform an
RPC to the entrypoint to perform 'Ram_session::transfer_quota()'
(base/include/base/child.h at line 248).
By storing the reply socket descriptor inside the 'Ipc_ostream::_dst'
capability instead as part of the connection state object, we can
use the 'explicit_reply' mechanism as usual. Right now, we store
both the tid and socket handle in 'Native_capability::Dst'. In the
final version, the 'tid' member will be gone.
In the final version, the 'socket' will be the only member to remain in
the 'Dst' time. In the transition phase, we store both the old 'tid' and
the 'socket'.
This patch, which was originally created by Christian Helmuth,
represents the first step towards using SCM rights as capability
mechanism on Linux. It employs the SCM rights mechanism for transmitting
a reply capability to the server as argument of each IPC call. The
server will then send its respond to this reply file descriptor. This
way, the reply channel does not need to be globally visible anymore.
By now all services in core where created, and registered in the generic
main routine. Although there exists already a x86-specific service (I/O ports)
there was no possibility to announce core-services for certain platforms only.
This commit introduces a hook function in the 'Platform' class, that enables
registration of platform-specific services. Moreover, the io-port service
is offered on x86 platforms only now.
GCC warns about uninitialized local variables in cases where no
initialization is needed, in particular in the overloads of the
'Capability::call()' function. Prior this patch, we dealt with those
warnings by using an (unreliable) GCC pragma or by disabling the
particular warning altogether (which is a bad idea). This patch removes
the superfluous warnings by telling the compiler that the variable in
question is volatile.
This patch introduces the functions 'affinity' and 'num_cpus' to the CPU
session interface. The interface extension will allow the assignment of
individual threads to CPUs. At this point, it is just a stub with no
actual platform support.
Setting the handler for SIGCHLD to SIG_IGN (ignore) informs the kernel
not to enter the zombie state: (man 2 wait)
POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to
SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see
sigaction(2)), then children that terminate do not become zombies
[...]
Fixes#271.
Without this patch the compilation failed with:
/usr/bin/ld: main.o: relocation R_X86_64_32S against
`vtable for Genode::Dataspace' can not be used when making a shared object;
recompile with -fPIC
main.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
make[6]: *** [init] Error 1
For this patch the use of the hardening tool chain must be indicated
using the "hardening_tool_chain" SPECS entry within the file
<build>/etc/specs.conf
Fixes#79
This patch extends the RAM session interface with the ability to
allocate DMA buffers. The client specifies the type of RAM dataspace to
allocate via the new 'cached' argument of the 'Ram_session::alloc()'
function. By default, 'cached' is true, which correponds to the common
case and the original behavior. When setting 'cached' to 'false', core
takes the precautions needed to register the memory as uncached in the
page table of each process that has the dataspace attached.
Currently, the support for allocating DMA buffers is implemented for
Fiasco.OC only. On x86 platforms, it is generally not needed. But on
platforms with more relaxed cache coherence (such as ARM), user-level
device drivers should always use uncacheable memory for DMA transactions.
With this patch clients of the RM service can state if they want a mapping
to be executable or not. This allows dataspaces to be mapped as
non-executable on Linux by default and as executable only if needed.
Partially fixes#176.