Invoking a system call from within the kernel
Demi Marie Obenour
demiobenour at gmail.com
Sat Nov 18 14:09:31 EST 2017
On Sat, Nov 18, 2017 at 01:44:44PM -0500, valdis.kletnieks at vt.edu wrote:
> On Sat, 18 Nov 2017 13:15:27 -0500, Demi Marie Obenour said:
> > However, the ioctl I actually want to implement (see above) does the
> > system call asynchronously. That isn???t possible using the existing
> > APIs.
> Ever consider that it's because there's no clear semantics to what
> executing an arbitrary syscall asyncronously even *means*?
> What doe an async getuid() mean? For bonus points, what does it
> return if the program does an async getuid(), and then does a
> setuid() call *before the async call completes*?
Only whitelisted system calls would be allowed, such as open(), read(),
and write(). Async getuid() would not be allowed. Nor would async
exit() or exit_group().
The only system calls that would be whitelisted for async use are those
that could potentially block on I/O. “Block” is used in a general
sense: it includes disc I/O as well as network I/O.
> What is the return value of an async call that fails? How is it
> returned, and how do you tell if a negative return code is
> from the async code failing, or the syscall failing?
If an async call fails, the packet posted to the file descriptor
contains the negative error code.
> > See above :) Basically, I am trying to improve performance and reduce
> > complexity of programs that need to do a lot of buffered file I/O.
> We already have an AIO subsystem for exactly this. And eventfd's, and
> poll(), and a bunch of other stuff.
This actually works with poll()/epoll()/etc. Specifically, the device
file descriptor becomes readable when a completion event is posted to
it, indiating that an async system call has completed and its result is
> And they improve performance, but increase complexity. It's pretty
> hard to make
> while (rc=read(....) > 0)
> rc2 = write(....)
> less complex. Catching the return of an async call makes it more complex.
Many programs (such as Node.js, NGINX, Firefox, Chrome, and every other
GUI program) use an event loop architecture. To maintain
responsiveness, it is necessary to avoid blocking calls on the main
thread (the thread that runs the event loop). For filesystem
operations, this is generally done by doing the operation in a thread
Async system calls move the thread pool to the kernel. The kernel has
system-wide information and perform optimizations regarding e.g.
scheduling and threadpool size that userspace cannot. Furthermore,
the kernel threadpool threads have no userspace counterparts, so they
avoid requiring a userspace stack or other data structures.
There was a previous attempt to implement async system calls using the
AIO interface. Linus rejected it on the basis that an async system call
API should be more general.
More information about the Kernelnewbies