Invoking a system call from within the kernel
Demi Marie Obenour
demiobenour at gmail.com
Sat Nov 18 13:15:27 EST 2017
On Thu, Nov 16, 2017 at 10:54:24AM +0100, Greg KH wrote:
> On Wed, Nov 15, 2017 at 09:16:35PM -0500, Demi Marie Obenour wrote:
> > I am looking to write my first driver. This driver will create a single
> > character device, which can be opened by any user. The device will
> > support one ioctl:
> >
> > long ioctl_syscall(int fd, long syscall, long args[6]);
> >
> > This is simply equivalent to:
> >
> > syscall(syscall, args[0], args[1], args[2], args[3], args[4],
> > args[5], args[6]);
>
> Wait, why? Why do you want to do something like this, what problem are
> you trying to solve that you feel that something like this is the
> solution? Let's step back and see if there isn't a better way to do
> this.
>
You are correct that there is a different problem that I really want to
solve.
Here is the different problem: I want to have a new device (let's call
it `/dev/async_syscall`), with root:root owner and 0600 permissions.
When the user opens the device, the returned file descriptor can be used
to submit an async syscall request using the following ioctl:
/* Fixed-size types to avoid a 32-bit compat layer */
struct linux_async_syscall {
__u64 syscall;
__u64 args[6];
__u64 user1;
__u64 user2;
};
/* arguments is really a struct linux_async_syscall * */
/* n_syscalls is really a size_t */
int ioctl(int fd, LINUX_ASYNC_SYSCALL, __u64 n_syscalls,
__u64 arguments, __u64 num_succeed);
Here `arguments` is an array of `struct linux_async_syscall` with
size `n_syscalls`, and `num_succeeded` is a pointer to an `int` that
receives the number of successfully submitted system calls.
In the kernel, this does the following:
1. Check that the parameters make sense
2. Copy them into kernel memory, and place the memory somewhere where it
will be freed if the process terminates.
3. For each `struct linux_async_syscall` passed:
1. Run seccomp filters to ensure that the process can actually make
the syscall.
2. Check the syscall against a whitelist of system calls that can be
made asynchronously.
4. Call the in-kernel implementation of clone(), creating a new
kernel thread.
5. In the parent, return success if and only if the thread creation was
successfull.
6. In the child, for each `struct linux_async_syscall` passed, invoke
the system call, as if from userspace. Upon return, post a message
to the file descriptor, which the userspace process can then
retrieve with read(2).
I am sure there are more optimizations to be made, or possibly an
entirely different and superior approach.
> > and indeed I want it to behave *identically* to that. That means that
> > ptracers are notified about the syscall (and given the opportunity to
> > update its arguments), and that seccomp_bpf filters are applied.
> > Furthermore, it means that all arguments to the syscall need full
> > validation, as if they came from userspace (because they do).
> >
> > Is there an in-kernel API that allows one to invoke an arbitrary syscall
> > with arguments AND proper ptrace/seccomp_bpf filtering? If not, how
> > difficult would it be to create one?
>
> Wouldn't creating such an interface be more work than just using the
> correct user/kernel interface in the first place? :)
>
Yes, it would. :)
However, the ioctl I actually want to implement (see above) does the
system call asynchronously. That isn’t possible using the existing
APIs.
>
> Again, what is the problem you are trying to solve here.
>
See above :) Basically, I am trying to improve performance and reduce
complexity of programs that need to do a lot of buffered file I/O.
>
> thanks,
>
> greg k-h
>
Thank you, Greg!
Demi
More information about the Kernelnewbies
mailing list