This is a problem present in both v0.10, and v0.11, where the 'setup'
event is synchronously emitted by `cluster.setupMaster()`, a mostly
harmless anti-pattern.
Fix inadvertent v0.11 changes to the definition of suicide, particularly
the relationship between suicide state, the disconnect event, and when
exit should occur.
In v0.10, workers don't forcibly exit on disconnect, it doesn't give
them time to do a graceful finish of open client connections, they exit
under normal node rules - when there is nothing left to do. But on
unexpected disconnect they do exit so the workers aren't left around
after the master.
Note that a test as-written was invalid, it failed against the v0.10
cluster API, demonstrating that it was an undocumented API change.
Fixes issue in 0.11 where callback doesn't occur if worker count is
currently zero. In 0.10 callback occurs after worker count is zero, and
occurs in next tick if worker count is currently zero.
Don't emit the 'disconnect' event until all workers have gone away.
Before this commit, the event was emitted when all open handles were
closed, which usually - but not always - amounts to the same thing.
Fixes#6346.
If `obj` given to `cluster._getServer` has `_setServerData` or
`_getServerData` methods, the data will be synchronized across workers
and stored in master.
Libuv now returns errors directly. Make everything in src/ and lib/
follow suit.
The changes to lib/ are not strictly necessary but they remove the need
for the abominations that are process._errno and node::SetErrno().
Empirical evidence suggests that OS-level load balancing (that is,
having multiple processes listen on a socket and have the operating
system wake up one when a connection comes in) produces skewed load
distributions on Linux, Solaris and possibly other operating systems.
The observed behavior is that a fraction of the listening processes
receive the majority of the connections. From the perspective of the
operating system, that somewhat makes sense: a task switch is expensive,
to be avoided whenever possible. That's why the operating system likes
to give preferential treatment to a few processes, because it reduces
the number of switches.
However, that rather subverts the purpose of the cluster module, which
is to distribute the load as evenly as possible. That's why this commit
adds (and defaults to) round-robin support, meaning that the master
process accepts connections and distributes them to the workers in a
round-robin fashion, effectively bypassing the operating system.
Round-robin is currently disabled on Windows due to how IOCP is wired
up. It works and you can select it manually but it probably results in
a heavy performance hit.
Fixes#4435.
Implement support for debugging cluster workers. Each worker process
is assigned a new debug port in an increasing sequence.
I.e. when master process uses port 5858, then worker 1 uses port 5859,
worker 2 uses port 5860, and so on.
Introduce new command-line parameter '--debug-port=' which sets debug_port
but does not start debugger. This option works for all node processes, it
is not specific to cluster workers.
Fixesjoyent/node#5318.
Clean up and DRY the cluster source code. Fix a few bugs while we're
here:
* Short-lived handles in long-lived worker processes were never
reclaimed, resulting in resource leaks.
* Handles in the master process are now closed when the last worker
that holds a reference to them quits. Previously, they were only
closed at cluster shutdown.
* The cluster object no longer exposes functions/properties that are
only valid in the 'other' process, e.g. cluster.fork() is no longer
exported in worker processes.
So much goodness and still manages to reduce the line count from 590
to 320.
Don't scan the whole string for a "NODE_CLUSTER_" substring, just check
that the string starts with the expected prefix. The linear scan was
causing a noticeable (but unsurprising) slowdown on messages with a
large .cmd string property.
Profiling in clustered environments doesn't work out of the box.
By default, V8 writes the profile data of all processes to a single
v8.log.
Running that log file through a tick processor produces bogus numbers
because many events won't match up with the recorded memory mappings
and you end up with graphs where 80+% of ticks is unaccounted for.
Fixing the tick processor to deal with multi-process output is not very
useful because the processes may be running wildly disparate workloads.
That's why we fix up the command line arguments to include
a "--logfile=v8-%p.log" argument (where %p is expanded to the PID)
unless it already contains a --logfile argument.
Fixes#4617.
Make the 'listening' event handler in the master process see the actual port
that the worker bound to when the worker specified port 0, i.e. a random port.
This commit reverts the following commits (in reverse chronological order):
74d076c errnoException must be done immediately
ddb02b9 net: support Server.listen(Pipe)
085a098 cluster: do not use internal server API
d138875 net: lazy listen on handler
Commit d138875 introduced a backwards incompatible change that broke the
simple/test-net-socket-timeout and simple/test-net-lazy-listen tests - it
defers listening on the target port until the `net.Server` instance has at
least one 'connection' event listener.
The other patches had to be reverted in order to revert d138875.
Fixes#3832.
This implements server.listen({ fd: <filedescriptor> }). The fd should
refer to an underlying resource that is already bound and listening, and
causes the new server to also accept connections on it.
Not supported on Windows. Raises ENOTSUP.
In case a worker would spawn a new subprocess with process.env, NODE_UNIQUE_ID
would have been a part of the env. Making the new subprocess believe it is a
worker, this would result in some confusion if the subprocess where to listen to
a port, since the server handle request would then be relayed to the worker.
This patch removes the NODE_UNIQUE_ID flag from process.env on startup so any
subprocess spawned by a worker is a normal process with no cluster stuff.
Regarding discussion in #3198. Passing the worker as an argument
to an event emitted on the worker is redundant, and an unnecessary
break in consistency vs the events on the ChildProcess objects.
It was removed from 'exit', but 'listening' and others were
overlooked. This corrects that oversight.
test: fixes due to new cluster api.
- changed worker `death` to `exit`.
- corrected argument type expected by worker `exit` handler.
test: more tests of cluster.worker death
cluster: fixed arguments on worker 'exit' event
worker 'exit' event now emits arguments consistent with the
corresponding event in child_process module.
This patch will kill the worker once it has lost its connection with the parent.
However if the worker are doing a suicide, other measures will be used.
This patch add a worker.disconnect() method there will stop the worker from accepting
new connections and then stop the IPC. This allow the worker to die graceful.
When the IPC has been disconnected a 'disconnect' event will emit.
The patch also add a cluster.disconnect() method, this will call worker.disconnect() on
all connected workers. When the workers are disconneted it will then close all server
handlers. This allow the cluster itself to self terminate in a graceful way.
This simplify the internalMessage and exit event handling
And simply relay message and error event to the worker object
Note that the error event was not relayed before
uncaughtException handlers installed by the user override the default one that
the cluster module installs, the one that kills off the master process.
Fixes#2556.