|
|
@ -13,14 +13,14 @@ web page you might find easier to navigate when reading it for the first |
|
|
|
time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>. |
|
|
|
|
|
|
|
Note that this library is a by-product of the C<IO::AIO> perl |
|
|
|
module, and many of the subtler points regarding requets lifetime |
|
|
|
module, and many of the subtler points regarding requests lifetime |
|
|
|
and so on are only documented in its documentation at the |
|
|
|
moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>. |
|
|
|
|
|
|
|
=head2 FEATURES |
|
|
|
|
|
|
|
This library provides fully asynchronous versions of most POSIX functions |
|
|
|
dealign with I/O. Unlike most asynchronous libraries, this not only |
|
|
|
dealing with I/O. Unlike most asynchronous libraries, this not only |
|
|
|
includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and |
|
|
|
similar functions, as well as less rarely ones such as C<mknod>, C<futime> |
|
|
|
or C<readlink>. |
|
|
@ -39,7 +39,7 @@ C<readdir>. |
|
|
|
Libeio represents time as a single floating point number, representing the |
|
|
|
(fractional) number of seconds since the (POSIX) epoch (somewhere near |
|
|
|
the beginning of 1970, details are complicated, don't ask). This type is |
|
|
|
called C<eio_tstamp>, but it is guarenteed to be of type C<double> (or |
|
|
|
called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or |
|
|
|
better), so you can freely use C<double> yourself. |
|
|
|
|
|
|
|
Unlike the name component C<stamp> might indicate, it is also used for |
|
|
@ -47,15 +47,24 @@ time differences throughout libeio. |
|
|
|
|
|
|
|
=head2 FORK SUPPORT |
|
|
|
|
|
|
|
Calling C<fork ()> is fully supported by this module. It is implemented in these steps: |
|
|
|
Usage of pthreads in a program changes the semantics of fork |
|
|
|
considerably. Specifically, only async-safe functions can be called after |
|
|
|
fork. Libeio uses pthreads, so this applies, and makes using fork hard for |
|
|
|
anything but relatively fork + exec uses. |
|
|
|
|
|
|
|
1. wait till all requests in "execute" state have been handled |
|
|
|
(basically requests that are already handed over to the kernel). |
|
|
|
2. fork |
|
|
|
3. in the parent, continue business as usual, done |
|
|
|
4. in the child, destroy all ready and pending requests and free the |
|
|
|
memory used by the worker threads. This gives you a fully empty |
|
|
|
libeio queue. |
|
|
|
This library only works in the process that initialised it: Forking is |
|
|
|
fully supported, but using libeio in any other process than the one that |
|
|
|
called C<eio_init> is not. |
|
|
|
|
|
|
|
You might get around by not I<using> libeio before (or after) forking in |
|
|
|
the parent, and using it in the child afterwards. You could also try to |
|
|
|
call the L<eio_init> function again in the child, which will brutally |
|
|
|
reinitialise all data structures, which isn't POSIX conformant, but |
|
|
|
typically works. |
|
|
|
|
|
|
|
Otherwise, the only recommendation you should follow is: treat fork code |
|
|
|
the same way you treat signal handlers, and only ever call C<eio_init> in |
|
|
|
the process that uses it, and only once ever. |
|
|
|
|
|
|
|
=head1 INITIALISATION/INTEGRATION |
|
|
|
|
|
|
@ -75,6 +84,9 @@ failure it returns C<-1> and sets C<errno> appropriately. |
|
|
|
It accepts two function pointers specifying callbacks as argument, both of |
|
|
|
which can be C<0>, in which case the callback isn't called. |
|
|
|
|
|
|
|
There is currently no way to change these callbacks later, or to |
|
|
|
"uninitialise" the library again. |
|
|
|
|
|
|
|
=item want_poll callback |
|
|
|
|
|
|
|
The C<want_poll> callback is invoked whenever libeio wants attention (i.e. |
|
|
@ -99,7 +111,7 @@ handled or C<done_poll> has been called, which signals the same. |
|
|
|
Note that C<eio_poll> might return after C<done_poll> and C<want_poll> |
|
|
|
have been called again, so watch out for races in your code. |
|
|
|
|
|
|
|
As with C<want_poll>, this callback is called while lcoks are being held, |
|
|
|
As with C<want_poll>, this callback is called while locks are being held, |
|
|
|
so you I<must not call any libeio functions form within this callback>. |
|
|
|
|
|
|
|
=item int eio_poll () |
|
|
@ -121,19 +133,686 @@ returns C<-1>. |
|
|
|
For libev, you would typically use an C<ev_async> watcher: the |
|
|
|
C<want_poll> callback would invoke C<ev_async_send> to wake up the event |
|
|
|
loop. Inside the callback set for the watcher, one would call C<eio_poll |
|
|
|
()> (followed by C<ev_async_send> again if C<eio_poll> indicates that not |
|
|
|
all requests have been handled yet). The race is taken care of because |
|
|
|
libev resets/rearms the async watcher before calling your callback, |
|
|
|
and therefore, before calling C<eio_poll>. This might result in (some) |
|
|
|
spurious wake-ups, but is generally harmless. |
|
|
|
()>. |
|
|
|
|
|
|
|
If C<eio_poll ()> is configured to not handle all results in one go |
|
|
|
(i.e. it returns C<-1>) then you should start an idle watcher that calls |
|
|
|
C<eio_poll> until it returns something C<!= -1>. |
|
|
|
|
|
|
|
A full-featured connector between libeio and libev would look as follows |
|
|
|
(if C<eio_poll> is handling all requests, it can of course be simplified a |
|
|
|
lot by removing the idle watcher logic): |
|
|
|
|
|
|
|
static struct ev_loop *loop; |
|
|
|
static ev_idle repeat_watcher; |
|
|
|
static ev_async ready_watcher; |
|
|
|
|
|
|
|
/* idle watcher callback, only used when eio_poll */ |
|
|
|
/* didn't handle all results in one call */ |
|
|
|
static void |
|
|
|
repeat (EV_P_ ev_idle *w, int revents) |
|
|
|
{ |
|
|
|
if (eio_poll () != -1) |
|
|
|
ev_idle_stop (EV_A_ w); |
|
|
|
} |
|
|
|
|
|
|
|
/* eio has some results, process them */ |
|
|
|
static void |
|
|
|
ready (EV_P_ ev_async *w, int revents) |
|
|
|
{ |
|
|
|
if (eio_poll () == -1) |
|
|
|
ev_idle_start (EV_A_ &repeat_watcher); |
|
|
|
} |
|
|
|
|
|
|
|
/* wake up the event loop */ |
|
|
|
static void |
|
|
|
want_poll (void) |
|
|
|
{ |
|
|
|
ev_async_send (loop, &ready_watcher) |
|
|
|
} |
|
|
|
|
|
|
|
void |
|
|
|
my_init_eio () |
|
|
|
{ |
|
|
|
loop = EV_DEFAULT; |
|
|
|
|
|
|
|
ev_idle_init (&repeat_watcher, repeat); |
|
|
|
ev_async_init (&ready_watcher, ready); |
|
|
|
ev_async_start (loop &watcher); |
|
|
|
|
|
|
|
eio_init (want_poll, 0); |
|
|
|
} |
|
|
|
|
|
|
|
For most other event loops, you would typically use a pipe - the event |
|
|
|
loop should be told to wait for read readyness on the read end. In |
|
|
|
loop should be told to wait for read readiness on the read end. In |
|
|
|
C<want_poll> you would write a single byte, in C<done_poll> you would try |
|
|
|
to read that byte, and in the callback for the read end, you would call |
|
|
|
C<eio_poll>. The race is avoided here because the event loop should invoke |
|
|
|
your callback again and again until the byte has been read (as the pipe |
|
|
|
read callback does not read it, only C<done_poll>). |
|
|
|
C<eio_poll>. |
|
|
|
|
|
|
|
You don't have to take special care in the case C<eio_poll> doesn't handle |
|
|
|
all requests, as the done callback will not be invoked, so the event loop |
|
|
|
will still signal readiness for the pipe until I<all> results have been |
|
|
|
processed. |
|
|
|
|
|
|
|
|
|
|
|
=head1 HIGH LEVEL REQUEST API |
|
|
|
|
|
|
|
Libeio has both a high-level API, which consists of calling a request |
|
|
|
function with a callback to be called on completion, and a low-level API |
|
|
|
where you fill out request structures and submit them. |
|
|
|
|
|
|
|
This section describes the high-level API. |
|
|
|
|
|
|
|
=head2 REQUEST SUBMISSION AND RESULT PROCESSING |
|
|
|
|
|
|
|
You submit a request by calling the relevant C<eio_TYPE> function with the |
|
|
|
required parameters, a callback of type C<int (*eio_cb)(eio_req *req)> |
|
|
|
(called C<eio_cb> below) and a freely usable C<void *data> argument. |
|
|
|
|
|
|
|
The return value will either be 0, in case something went really wrong |
|
|
|
(which can basically only happen on very fatal errors, such as C<malloc> |
|
|
|
returning 0, which is rather unlikely), or a pointer to the newly-created |
|
|
|
and submitted C<eio_req *>. |
|
|
|
|
|
|
|
The callback will be called with an C<eio_req *> which contains the |
|
|
|
results of the request. The members you can access inside that structure |
|
|
|
vary from request to request, except for: |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item C<ssize_t result> |
|
|
|
|
|
|
|
This contains the result value from the call (usually the same as the |
|
|
|
syscall of the same name). |
|
|
|
|
|
|
|
=item C<int errorno> |
|
|
|
|
|
|
|
This contains the value of C<errno> after the call. |
|
|
|
|
|
|
|
=item C<void *data> |
|
|
|
|
|
|
|
The C<void *data> member simply stores the value of the C<data> argument. |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
The return value of the callback is normally C<0>, which tells libeio to |
|
|
|
continue normally. If a callback returns a nonzero value, libeio will |
|
|
|
stop processing results (in C<eio_poll>) and will return the value to its |
|
|
|
caller. |
|
|
|
|
|
|
|
Memory areas passed to libeio must stay valid as long as a request |
|
|
|
executes, with the exception of paths, which are being copied |
|
|
|
internally. Any memory libeio itself allocates will be freed after the |
|
|
|
finish callback has been called. If you want to manage all memory passed |
|
|
|
to libeio yourself you can use the low-level API. |
|
|
|
|
|
|
|
For example, to open a file, you could do this: |
|
|
|
|
|
|
|
static int |
|
|
|
file_open_done (eio_req *req) |
|
|
|
{ |
|
|
|
if (req->result < 0) |
|
|
|
{ |
|
|
|
/* open() returned -1 */ |
|
|
|
errno = req->errorno; |
|
|
|
perror ("open"); |
|
|
|
} |
|
|
|
else |
|
|
|
{ |
|
|
|
int fd = req->result; |
|
|
|
/* now we have the new fd in fd */ |
|
|
|
} |
|
|
|
|
|
|
|
return 0; |
|
|
|
} |
|
|
|
|
|
|
|
/* the first three arguments are passed to open(2) */ |
|
|
|
/* the remaining are priority, callback and data */ |
|
|
|
if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0)) |
|
|
|
abort (); /* something went wrong, we will all die!!! */ |
|
|
|
|
|
|
|
Note that you additionally need to call C<eio_poll> when the C<want_cb> |
|
|
|
indicates that requests are ready to be processed. |
|
|
|
|
|
|
|
=head2 CANCELLING REQUESTS |
|
|
|
|
|
|
|
Sometimes the need for a request goes away before the request is |
|
|
|
finished. In that case, one can cancel the request by a call to |
|
|
|
C<eio_cancel>: |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item eio_cancel (eio_req *req) |
|
|
|
|
|
|
|
Cancel the request (and all its subrequests). If the request is currently |
|
|
|
executing it might still continue to execute, and in other cases it might |
|
|
|
still take a while till the request is cancelled. |
|
|
|
|
|
|
|
Even if cancelled, the finish callback will still be invoked - the |
|
|
|
callbacks of all cancellable requests need to check whether the request |
|
|
|
has been cancelled by calling C<EIO_CANCELLED (req)>: |
|
|
|
|
|
|
|
static int |
|
|
|
my_eio_cb (eio_req *req) |
|
|
|
{ |
|
|
|
if (EIO_CANCELLED (req)) |
|
|
|
return 0; |
|
|
|
} |
|
|
|
|
|
|
|
In addition, cancelled requests will I<either> have C<< req->result >> |
|
|
|
set to C<-1> and C<errno> to C<ECANCELED>, or I<otherwise> they were |
|
|
|
successfully executed, despite being cancelled (e.g. when they have |
|
|
|
already been executed at the time they were cancelled). |
|
|
|
|
|
|
|
C<EIO_CANCELLED> is still true for requests that have successfully |
|
|
|
executed, as long as C<eio_cancel> was called on them at some point. |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=head2 AVAILABLE REQUESTS |
|
|
|
|
|
|
|
The following request functions are available. I<All> of them return the |
|
|
|
C<eio_req *> on success and C<0> on failure, and I<all> of them have the |
|
|
|
same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is |
|
|
|
mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some |
|
|
|
custom data value as C<data>. |
|
|
|
|
|
|
|
=head3 POSIX API WRAPPERS |
|
|
|
|
|
|
|
These requests simply wrap the POSIX call of the same name, with the same |
|
|
|
arguments. If a function is not implemented by the OS and cannot be emulated |
|
|
|
in some way, then all of these return C<-1> and set C<errorno> to C<ENOSYS>. |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_rmdir (const char *path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_unlink (const char *path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_close (int fd, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_sync (int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_fsync (int fd, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_fdatasync (int fd, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
These have the same semantics as the syscall of the same name, their |
|
|
|
return value is available as C<< req->result >> later. |
|
|
|
|
|
|
|
=item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
These two requests are called C<read> and C<write>, but actually wrap |
|
|
|
C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin), |
|
|
|
libeio uses lseek/read_or_write/lseek and a mutex to serialise the |
|
|
|
requests, so all these requests run serially and do not disturb each |
|
|
|
other. However, they still disturb the file offset while they run, so it's |
|
|
|
not safe to call these functions concurrently with non-libeio functions on |
|
|
|
the same fd on these systems. |
|
|
|
|
|
|
|
Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X), |
|
|
|
so it is advised not to submit multiple requests on the same fd on this |
|
|
|
horrible pile of garbage. |
|
|
|
|
|
|
|
=item eio_mlockall (int flags, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Like C<mlockall>, but the flag value constants are called |
|
|
|
C<EIO_MCL_CURRENT> and C<EIO_MCL_FUTURE>. |
|
|
|
|
|
|
|
=item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Just like msync, except that the flag values are called C<EIO_MS_ASYNC>, |
|
|
|
C<EIO_MS_INVALIDATE> and C<EIO_MS_SYNC>. |
|
|
|
|
|
|
|
=item eio_readlink (const char *path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
If successful, the path read by C<readlink(2)> can be accessed via C<< |
|
|
|
req->ptr2 >> and is I<NOT> null-terminated, with the length specified as |
|
|
|
C<< req->result >>. |
|
|
|
|
|
|
|
if (req->result >= 0) |
|
|
|
{ |
|
|
|
char *target = strndup ((char *)req->ptr2, req->result); |
|
|
|
|
|
|
|
free (target); |
|
|
|
} |
|
|
|
|
|
|
|
=item eio_realpath (const char *path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Similar to the realpath libc function, but unlike that one, C<< |
|
|
|
req->result >> is C<-1> on failure. On success, the result is the length |
|
|
|
of the returned path in C<ptr2> (which is I<NOT> 0-terminated) - this is |
|
|
|
similar to readlink. |
|
|
|
|
|
|
|
=item eio_stat (const char *path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_lstat (const char *path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_fstat (int fd, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Stats a file - if C<< req->result >> indicates success, then you can |
|
|
|
access the C<struct stat>-like structure via C<< req->ptr2 >>: |
|
|
|
|
|
|
|
EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2; |
|
|
|
|
|
|
|
=item eio_statvfs (const char *path, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
=item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Stats a filesystem - if C<< req->result >> indicates success, then you can |
|
|
|
access the C<struct statvfs>-like structure via C<< req->ptr2 >>: |
|
|
|
|
|
|
|
EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2; |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=head3 READING DIRECTORIES |
|
|
|
|
|
|
|
Reading directories sounds simple, but can be rather demanding, especially |
|
|
|
if you want to do stuff such as traversing a directory hierarchy or |
|
|
|
processing all files in a directory. Libeio can assist these complex tasks |
|
|
|
with it's C<eio_readdir> call. |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
This is a very complex call. It basically reads through a whole directory |
|
|
|
(via the C<opendir>, C<readdir> and C<closedir> calls) and returns either |
|
|
|
the names or an array of C<struct eio_dirent>, depending on the C<flags> |
|
|
|
argument. |
|
|
|
|
|
|
|
The C<< req->result >> indicates either the number of files found, or |
|
|
|
C<-1> on error. On success, null-terminated names can be found as C<< req->ptr2 >>, |
|
|
|
and C<struct eio_dirents>, if requested by C<flags>, can be found via C<< |
|
|
|
req->ptr1 >>. |
|
|
|
|
|
|
|
Here is an example that prints all the names: |
|
|
|
|
|
|
|
int i; |
|
|
|
char *names = (char *)req->ptr2; |
|
|
|
|
|
|
|
for (i = 0; i < req->result; ++i) |
|
|
|
{ |
|
|
|
printf ("name #%d: %s\n", i, names); |
|
|
|
|
|
|
|
/* move to next name */ |
|
|
|
names += strlen (names) + 1; |
|
|
|
} |
|
|
|
|
|
|
|
Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>. |
|
|
|
|
|
|
|
C<flags> can be any combination of: |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item EIO_READDIR_DENTS |
|
|
|
|
|
|
|
If this flag is specified, then, in addition to the names in C<ptr2>, |
|
|
|
also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct |
|
|
|
eio_dirent> looks like this: |
|
|
|
|
|
|
|
struct eio_dirent |
|
|
|
{ |
|
|
|
int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */ |
|
|
|
unsigned short namelen; /* size of filename without trailing 0 */ |
|
|
|
unsigned char type; /* one of EIO_DT_* */ |
|
|
|
signed char score; /* internal use */ |
|
|
|
ino_t inode; /* the inode number, if available, otherwise unspecified */ |
|
|
|
}; |
|
|
|
|
|
|
|
The only members you normally would access are C<nameofs>, which is the |
|
|
|
byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>. |
|
|
|
|
|
|
|
C<type> can be one of: |
|
|
|
|
|
|
|
C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat> |
|
|
|
the name yourself if you need to know, |
|
|
|
one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>, |
|
|
|
C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>) |
|
|
|
or some OS-specific type (currently |
|
|
|
C<EIO_DT_MPC> - multiplexed char device (v7+coherent), |
|
|
|
C<EIO_DT_NAM> - xenix special named file, |
|
|
|
C<EIO_DT_MPB> - multiplexed block device (v7+coherent), |
|
|
|
C<EIO_DT_NWK> - HP-UX network special, |
|
|
|
C<EIO_DT_CMP> - VxFS compressed, |
|
|
|
C<EIO_DT_DOOR> - solaris door, or |
|
|
|
C<EIO_DT_WHT>). |
|
|
|
|
|
|
|
This example prints all names and their type: |
|
|
|
|
|
|
|
int i; |
|
|
|
struct eio_dirent *ents = (struct eio_dirent *)req->ptr1; |
|
|
|
char *names = (char *)req->ptr2; |
|
|
|
|
|
|
|
for (i = 0; i < req->result; ++i) |
|
|
|
{ |
|
|
|
struct eio_dirent *ent = ents + i; |
|
|
|
char *name = names + ent->nameofs; |
|
|
|
|
|
|
|
printf ("name #%d: %s (type %d)\n", i, name, ent->type); |
|
|
|
} |
|
|
|
|
|
|
|
=item EIO_READDIR_DIRS_FIRST |
|
|
|
|
|
|
|
When this flag is specified, then the names will be returned in an order |
|
|
|
where likely directories come first, in optimal C<stat> order. This is |
|
|
|
useful when you need to quickly find directories, or you want to find all |
|
|
|
directories while avoiding to stat() each entry. |
|
|
|
|
|
|
|
If the system returns type information in readdir, then this is used |
|
|
|
to find directories directly. Otherwise, likely directories are names |
|
|
|
beginning with ".", or otherwise names with no dots, of which names with |
|
|
|
short names are tried first. |
|
|
|
|
|
|
|
=item EIO_READDIR_STAT_ORDER |
|
|
|
|
|
|
|
When this flag is specified, then the names will be returned in an order |
|
|
|
suitable for stat()'ing each one. That is, when you plan to stat() |
|
|
|
all files in the given directory, then the returned order will likely |
|
|
|
be fastest. |
|
|
|
|
|
|
|
If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then the |
|
|
|
likely directories come first, resulting in a less optimal stat order. |
|
|
|
|
|
|
|
=item EIO_READDIR_FOUND_UNKNOWN |
|
|
|
|
|
|
|
This flag should not be specified when calling C<eio_readdir>. Instead, |
|
|
|
it is being set by C<eio_readdir> (you can access the C<flags> via C<< |
|
|
|
req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The |
|
|
|
absence of this flag therefore indicates that all C<type>'s are known, |
|
|
|
which can be used to speed up some algorithms. |
|
|
|
|
|
|
|
A typical use case would be to identify all subdirectories within a |
|
|
|
directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If |
|
|
|
then this flag is I<NOT> set, then all the entries at the beginning of the |
|
|
|
returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you |
|
|
|
should start C<stat()>'ing the entries starting at the beginning of the |
|
|
|
array, stopping as soon as you found all directories (the count can be |
|
|
|
deduced by the link count of the directory). |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=head3 OS-SPECIFIC CALL WRAPPERS |
|
|
|
|
|
|
|
These wrap OS-specific calls (usually Linux ones), and might or might not |
|
|
|
be emulated on other operating systems. Calls that are not emulated will |
|
|
|
return C<-1> and set C<errno> to C<ENOSYS>. |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Wraps the C<sendfile> syscall. The arguments follow the Linux version, but |
|
|
|
libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and |
|
|
|
Darwin. |
|
|
|
|
|
|
|
If the OS doesn't support some sendfile-like call, or the call fails, |
|
|
|
indicating support for the given file descriptor type (for example, |
|
|
|
Linux's sendfile might not support file to file copies), then libeio will |
|
|
|
emulate the call in userspace, so there are almost no limitations on its |
|
|
|
use. |
|
|
|
|
|
|
|
=item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Calls C<readahead(2)>. If the syscall is missing, then the call is |
|
|
|
emulated by simply reading the data (currently in 64kiB chunks). |
|
|
|
|
|
|
|
=item eio_syncfs (int fd, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Calls Linux' C<syncfs> syscall, if available. Returns C<-1> and sets |
|
|
|
C<errno> to C<ENOSYS> if the call is missing I<but still calls sync()>, |
|
|
|
if the C<fd> is C<< >= 0 >>, so you can probe for the availability of the |
|
|
|
syscall with a negative C<fd> argument and checking for C<-1/ENOSYS>. |
|
|
|
|
|
|
|
=item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Calls C<sync_file_range>. If the syscall is missing, then this is the same |
|
|
|
as calling C<fdatasync>. |
|
|
|
|
|
|
|
Flags can be any combination of C<EIO_SYNC_FILE_RANGE_WAIT_BEFORE>, |
|
|
|
C<EIO_SYNC_FILE_RANGE_WRITE> and C<EIO_SYNC_FILE_RANGE_WAIT_AFTER>. |
|
|
|
|
|
|
|
=item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Calls C<fallocate> (note: I<NOT> C<posix_fallocate>!). If the syscall is |
|
|
|
missing, then it returns failure and sets C<errno> to C<ENOSYS>. |
|
|
|
|
|
|
|
The C<mode> argument can be C<0> (for behaviour similar to |
|
|
|
C<posix_fallocate>), or C<EIO_FALLOC_FL_KEEP_SIZE>, which keeps the size |
|
|
|
of the file unchanged (but still preallocates space beyond end of file). |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=head3 LIBEIO-SPECIFIC REQUESTS |
|
|
|
|
|
|
|
These requests are specific to libeio and do not correspond to any OS call. |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY) the given |
|
|
|
memory area, page-wise, that is, it reads (or reads and writes back) the |
|
|
|
first octet of every page that spans the memory area. |
|
|
|
|
|
|
|
This can be used to page in some mmapped file, or dirty some pages. Note |
|
|
|
that dirtying is an unlocked read-write access, so races can ensue when |
|
|
|
the some other thread modifies the data stored in that memory area. |
|
|
|
|
|
|
|
=item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
Executes a custom request, i.e., a user-specified callback. |
|
|
|
|
|
|
|
The callback gets the C<eio_req *> as parameter and is expected to read |
|
|
|
and modify any request-specific members. Specifically, it should set C<< |
|
|
|
req->result >> to the result value, just like other requests. |
|
|
|
|
|
|
|
Here is an example that simply calls C<open>, like C<eio_open>, but it |
|
|
|
uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If |
|
|
|
you want to pass more/other parameters, you either need to pass some |
|
|
|
struct or so via C<data> or provide your own wrapper using the low-level |
|
|
|
API. |
|
|
|
|
|
|
|
static int |
|
|
|
my_open_done (eio_req *req) |
|
|
|
{ |
|
|
|
int fd = req->result; |
|
|
|
|
|
|
|
return 0; |
|
|
|
} |
|
|
|
|
|
|
|
static void |
|
|
|
my_open (eio_req *req) |
|
|
|
{ |
|
|
|
req->result = open (req->data, O_RDONLY); |
|
|
|
} |
|
|
|
|
|
|
|
eio_custom (my_open, 0, my_open_done, "/etc/passwd"); |
|
|
|
|
|
|
|
=item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
This is a request that takes C<delay> seconds to execute, but otherwise |
|
|
|
does nothing - it simply puts one of the worker threads to sleep for this |
|
|
|
long. |
|
|
|
|
|
|
|
This request can be used to artificially increase load, e.g. for debugging |
|
|
|
or benchmarking reasons. |
|
|
|
|
|
|
|
=item eio_nop (int pri, eio_cb cb, void *data) |
|
|
|
|
|
|
|
This request does nothing, except go through the whole request cycle. This |
|
|
|
can be used to measure latency or in some cases to simplify code, but is |
|
|
|
not really of much use. |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=head3 GROUPING AND LIMITING REQUESTS |
|
|
|
|
|
|
|
There is one more rather special request, C<eio_grp>. It is a very special |
|
|
|
aio request: Instead of doing something, it is a container for other eio |
|
|
|
requests. |
|
|
|
|
|
|
|
There are two primary use cases for this: a) bundle many requests into a |
|
|
|
single, composite, request with a definite callback and the ability to |
|
|
|
cancel the whole request with its subrequests and b) limiting the number |
|
|
|
of "active" requests. |
|
|
|
|
|
|
|
Further below you will find more discussion of these topics - first |
|
|
|
follows the reference section detailing the request generator and other |
|
|
|
methods. |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item eio_req *grp = eio_grp (eio_cb cb, void *data) |
|
|
|
|
|
|
|
Creates, submits and returns a group request. Note that it doesn't have a |
|
|
|
priority, unlike all other requests. |
|
|
|
|
|
|
|
=item eio_grp_add (eio_req *grp, eio_req *req) |
|
|
|
|
|
|
|
Adds a request to the request group. |
|
|
|
|
|
|
|
=item eio_grp_cancel (eio_req *grp) |
|
|
|
|
|
|
|
Cancels all requests I<in> the group, but I<not> the group request |
|
|
|
itself. You can cancel the group request I<and> all subrequests via a |
|
|
|
normal C<eio_cancel> call. |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=head4 GROUP REQUEST LIFETIME |
|
|
|
|
|
|
|
Left alone, a group request will instantly move to the pending state and |
|
|
|
will be finished at the next call of C<eio_poll>. |
|
|
|
|
|
|
|
The usefulness stems from the fact that, if a subrequest is added to a |
|
|
|
group I<before> a call to C<eio_poll>, via C<eio_grp_add>, then the group |
|
|
|
will not finish until all the subrequests have finished. |
|
|
|
|
|
|
|
So the usage cycle of a group request is like this: after it is created, |
|
|
|
you normally instantly add a subrequest. If none is added, the group |
|
|
|
request will finish on it's own. As long as subrequests are added before |
|
|
|
the group request is finished it will be kept from finishing, that is the |
|
|
|
callbacks of any subrequests can, in turn, add more requests to the group, |
|
|
|
and as long as any requests are active, the group request itself will not |
|
|
|
finish. |
|
|
|
|
|
|
|
=head4 CREATING COMPOSITE REQUESTS |
|
|
|
|
|
|
|
Imagine you wanted to create an C<eio_load> request that opens a file, |
|
|
|
reads it and closes it. This means it has to execute at least three eio |
|
|
|
requests, but for various reasons it might be nice if that request looked |
|
|
|
like any other eio request. |
|
|
|
|
|
|
|
This can be done with groups: |
|
|
|
|
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item 1) create the request object |
|
|
|
|
|
|
|
Create a group that contains all further requests. This is the request you |
|
|
|
can return as "the load request". |
|
|
|
|
|
|
|
=item 2) open the file, maybe |
|
|
|
|
|
|
|
Next, open the file with C<eio_open> and add the request to the group |
|
|
|
request and you are finished setting up the request. |
|
|
|
|
|
|
|
If, for some reason, you cannot C<eio_open> (path is a null ptr?) you |
|
|
|
can set C<< grp->result >> to C<-1> to signal an error and let the group |
|
|
|
request finish on its own. |
|
|
|
|
|
|
|
=item 3) open callback adds more requests |
|
|
|
|
|
|
|
In the open callback, if the open was not successful, copy C<< |
|
|
|
req->errorno >> to C<< grp->errorno >> and set C<< grp->errorno >> to |
|
|
|
C<-1> to signal an error. |
|
|
|
|
|
|
|
Otherwise, malloc some memory or so and issue a read request, adding the |
|
|
|
read request to the group. |
|
|
|
|
|
|
|
=item 4) continue issuing requests till finished |
|
|
|
|
|
|
|
In the real callback, check for errors and possibly continue with |
|
|
|
C<eio_close> or any other eio request in the same way. |
|
|
|
|
|
|
|
As soon as no new requests are added the group request will finish. Make |
|
|
|
sure you I<always> set C<< grp->result >> to some sensible value. |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=head4 REQUEST LIMITING |
|
|
|
|
|
|
|
|
|
|
|
#TODO |
|
|
|
|
|
|
|
void eio_grp_limit (eio_req *grp, int limit); |
|
|
|
|
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
|
|
|
|
=head1 LOW LEVEL REQUEST API |
|
|
|
|
|
|
|
#TODO |
|
|
|
|
|
|
|
|
|
|
|
=head1 ANATOMY AND LIFETIME OF AN EIO REQUEST |
|
|
|
|
|
|
|
A request is represented by a structure of type C<eio_req>. To initialise |
|
|
|
it, clear it to all zero bytes: |
|
|
|
|
|
|
|
eio_req req; |
|
|
|
|
|
|
|
memset (&req, 0, sizeof (req)); |
|
|
|
|
|
|
|
A more common way to initialise a new C<eio_req> is to use C<calloc>: |
|
|
|
|
|
|
|
eio_req *req = calloc (1, sizeof (*req)); |
|
|
|
|
|
|
|
In either case, libeio neither allocates, initialises or frees the |
|
|
|
C<eio_req> structure for you - it merely uses it. |
|
|
|
|
|
|
|
zero |
|
|
|
|
|
|
|
#TODO |
|
|
|
|
|
|
|
=head2 CONFIGURATION |
|
|
|
|
|
|
@ -154,14 +833,18 @@ C<0.01> seconds or so. |
|
|
|
|
|
|
|
Note that: |
|
|
|
|
|
|
|
a) libeio doesn't know how long your request callbacks take, so the time |
|
|
|
spent in C<eio_poll> is up to one callback invocation longer then this |
|
|
|
interval. |
|
|
|
=over 4 |
|
|
|
|
|
|
|
=item a) libeio doesn't know how long your request callbacks take, so the |
|
|
|
time spent in C<eio_poll> is up to one callback invocation longer then |
|
|
|
this interval. |
|
|
|
|
|
|
|
b) this is implemented by calling C<gettimeofday> after each request, |
|
|
|
which can be costly. |
|
|
|
=item b) this is implemented by calling C<gettimeofday> after each |
|
|
|
request, which can be costly. |
|
|
|
|
|
|
|
c) at least one request will be handled. |
|
|
|
=item c) at least one request will be handled. |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
=item eio_set_max_poll_reqs (unsigned int nreqs) |
|
|
|
|
|
|
@ -187,7 +870,7 @@ Set the maximum number of threads that libeio will spawn. |
|
|
|
Libeio uses threads internally to handle most requests, and will start and stop threads on demand. |
|
|
|
|
|
|
|
This call can be used to limit the number of idle threads (threads without |
|
|
|
work to do): libeio will keep some threads idle in preperation for more |
|
|
|
work to do): libeio will keep some threads idle in preparation for more |
|
|
|
requests, but never longer than C<nthreads> threads. |
|
|
|
|
|
|
|
In addition to this, libeio will also stop threads when they are idle for |
|
|
@ -216,23 +899,6 @@ C<eio_poll>). |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
|
|
|
|
=head1 ANATOMY OF AN EIO REQUEST |
|
|
|
|
|
|
|
#TODO |
|
|
|
|
|
|
|
|
|
|
|
=head1 HIGH LEVEL REQUEST API |
|
|
|
|
|
|
|
#TODO |
|
|
|
|
|
|
|
=back |
|
|
|
|
|
|
|
|
|
|
|
=head1 LOW LEVEL REQUEST API |
|
|
|
|
|
|
|
#TODO |
|
|
|
|
|
|
|
=head1 EMBEDDING |
|
|
|
|
|
|
|
Libeio can be embedded directly into programs. This functionality is not |
|
|
@ -258,7 +924,7 @@ was written to use very little stackspace, but when using C<EIO_CUSTOM> |
|
|
|
requests, you might want to increase this. |
|
|
|
|
|
|
|
If this symbol is undefined (the default) then libeio will use its default |
|
|
|
stack size (C<sizeof (long) * 4096> currently). If it is defined, but |
|
|
|
stack size (C<sizeof (void *) * 4096> currently). If it is defined, but |
|
|
|
C<0>, then the default operating system stack size will be used. In all |
|
|
|
other cases, the value must be an expression that evaluates to the desired |
|
|
|
stack size. |
|
|
|