From ff39d413a3c14bec2d997570e26df61d372a4001 Mon Sep 17 00:00:00 2001 From: isaacs Date: Wed, 9 Feb 2011 13:56:59 -0800 Subject: [PATCH] Document module loading --- doc/api/modules.markdown | 313 ++++++++++++++++++++++++++++++++------- 1 file changed, 260 insertions(+), 53 deletions(-) diff --git a/doc/api/modules.markdown b/doc/api/modules.markdown index eb3a4905a3..425f77c5af 100644 --- a/doc/api/modules.markdown +++ b/doc/api/modules.markdown @@ -21,13 +21,13 @@ one-to-one correspondence. As an example, `foo.js` loads the module The contents of `foo.js`: - var circle = require('./circle'); + var circle = require('./circle.js'); console.log( 'The area of a circle of radius 4 is ' + circle.area(4)); The contents of `circle.js`: - var PI = 3.14; + var PI = Math.PI; exports.area = function (r) { return PI * r * r; @@ -39,78 +39,285 @@ The contents of `circle.js`: The module `circle.js` has exported the functions `area()` and `circumference()`. To export an object, add to the special `exports` -object. (Alternatively, one can use `this` instead of `exports`.) Variables +object. + +Variables local to the module will be private. In this example the variable `PI` is -private to `circle.js`. The function `puts()` comes from the module `'util'`, -which is a built-in module. Modules which are not prefixed by `'./'` are -built-in modules--more about this later. +private to `circle.js`. + +### Core Modules + +Node has several modules compiled into the binary. These modules are +described in greater detail elsewhere in this documentation. + +The core modules are defined in node's source in the `lib/` folder. + +Core modules are always preferentially loaded if their identifier is +passed to `require()`. For instance, `require('http')` will always +return the built in HTTP module, even if there is a file by that name. -### Module Resolving +### File Modules + +If the exact filename is not found, then node will attempt to load the +required filename with the added extension of `.js`, and then `.node`. + +`.js` files are interpreted as JavaScript text files, and `.node` files +are interpreted as compiled addon modules loaded with `dlopen`. + +A module prefixed with `'/'` is an absolute path to the file. For +example, `require('/home/marco/foo.js')` will load the file at +`/home/marco/foo.js`. A module prefixed with `'./'` is relative to the file calling `require()`. That is, `circle.js` must be in the same directory as `foo.js` for `require('./circle')` to find it. -Without the leading `'./'`, like `require('assert')` the module is searched -for in the `require.paths` array. `require.paths` on my system looks like -this: +Without a leading '/' or './' to indicate a file, the module is either a +"core module" or is loaded from a `node_modules` folder. + +### Loading from `node_modules` Folders + +If the module identifier passed to `require()` is not a native module, +and does not begin with `'/'`, `'../'`, or `'./'`, then node starts at the +parent directory of the current module, and adds `/node_modules`, and +attempts to load the module from that location. -`[ '/home/ryan/.node_modules' ]` +If it is not found there, then it moves to the parent directory, and so +on, until either the module is found, or the root of the tree is +reached. -That is, when `require('foo')` is called Node looks for: +For example, if the file at `'/home/ry/projects/foo.js'` called +`require('bar.js')`, then node would look in the following locations, in +this order: -* 1: `/home/ryan/.node_modules/foo` -* 2: `/home/ryan/.node_modules/foo.js` -* 3: `/home/ryan/.node_modules/foo.node` -* 4: `/home/ryan/.node_modules/foo/index.js` -* 5: `/home/ryan/.node_modules/foo/index.node` +* `/home/ry/projects/node_modules/bar.js` +* `/home/ry/node_modules/bar.js` +* `/home/node_modules/bar.js` +* `/node_modules/bar.js` -interrupting once a file is found. Files ending in `'.node'` are binary Addon -Modules; see 'Addons' below. `'index.js'` allows one to package a module as -a directory. +This allows programs to localize their dependencies, so that they do not +clash. -Additionally, a `package.json` file may be used to treat a folder as a -module, if it specifies a `'main'` field. For example, if the file at -`./foo/bar/package.json` contained this data: +#### Optimizations to the `node_modules` Lookup Process - { "name" : "bar", - "version" : "1.2.3", - "main" : "./lib/bar.js" } +When there are many levels of nested dependencies, it is possible for +these file trees to get fairly long. The following optimizations are thus +made to the process. -then `require('./foo/bar')` would load the file at -`'./foo/bar/lib/bar.js'`. This allows package authors to specify an -entry point to their module, while structuring their package how it -suits them. +First, `/node_modules` is never appended to a folder already ending in +`/node_modules`. -Any folders named `"node_modules"` that exist in the current module path -will also be appended to the effective require path. This allows for -bundling libraries and other dependencies in a 'node_modules' folder at -the root of a program. +Second, if the file calling `require()` is already inside a `node_modules` +heirarchy, then the top-most `node_modules` folder is treated as the +root of the search tree. -To avoid overly long lookup paths in the case of nested packages, -the following 2 optimizations are made: +For example, if the file at +`'/home/ry/projects/foo/node_modules/bar/node_modules/baz/quux.js'` +called `require('asdf.js')`, then node would search the following +locations: -1. If the module calling `require()` is already within a `node_modules` - folder, then the lookup will not go above the top-most `node_modules` - directory. -2. Node will not append `node_modules` to a path already ending in - `node_modules`. +* `/home/ry/projects/foo/node_modules/bar/node_modules/baz/node_modules/asdf.js` +* `/home/ry/projects/foo/node_modules/bar/node_modules/asdf.js` +* `/home/ry/projects/foo/node_modules/asdf.js` -So, for example, if the file at -`/usr/lib/node_modules/foo/node_modules/bar.js` were to do -`require('baz')`, then the following places would be searched for a -`baz` module, in this order: +### Folders as Modules -* 1: `/usr/lib/node_modules/foo/node_modules` -* 2: `/usr/lib/node_modules` +It is convenient to organize programs and libraries into self-contained +directories, and then provide a single entry point to that library. +There are three ways in which a folder may be passed to `require()` as +an argument. -`require.paths` can be modified at runtime by simply unshifting new -paths onto it, or at startup with the `NODE_PATH` environmental -variable (which should be a list of paths, colon separated). +The first is to create a `package.json` file in the root of the folder, +which specifies a `main` module. An example package.json file might +look like this: -The second time `require('foo')` is called, it is not loaded again from -disk. It looks in the `require.cache` object to see if it has been loaded -before. + { "name" : "some-library", + "main" : "./lib/some-library.js" } + +If this was in a folder at `./some-library`, then +`require('./some-library')` would attempt to load +`./some-library/lib/some-library.js`. + +This is the extent of Node's awareness of package.json files. + +If there is no package.json file present in the directory, then node +will attempt to load an `index.js` or `index.node` file out of that +directory. For example, if there was no package.json file in the above +example, then `require('./some-library')` would attempt to load: + +* `./some-library/index.js` +* `./some-library/index.node` + +### Caching + +Modules are cached after the first time they are loaded. This means +(among other things) that every call to `require('foo')` will get +exactly the same object returned, if it would resolve to the same file. + +### All Together... To get the exact filename that will be loaded when `require()` is called, use the `require.resolve()` function. + +Putting together all of the above, here is the high-level algorithm +in pseudocode of what require.resolve does: + + require(X) + 1. If X is a core module, + a. return the core module + b. STOP + 2. If X begins with `./` or `/`, + a. LOAD_AS_FILE(Y + X) + b. LOAD_AS_DIRECTORY(Y + X) + 3. LOAD_NODE_MODULES(X, dirname(Y)) + 4. THROW "not found" + + LOAD_AS_FILE(X) + 1. If X is a file, load X as JavaScript text. STOP + 2. If X.js is a file, load X.js as JavaScript text. STOP + 3. If X.node is a file, load X.node as binary addon. STOP + + LOAD_AS_DIRECTORY(X) + 1. If X/package.json is a file, + a. Parse X/package.json, and look for "main" field. + b. let M = X + (json main field) + c. LOAD_AS_FILE(M) + 2. LOAD_AS_FILE(X/index) + + LOAD_NODE_MODULES(X, START) + 1. let DIRS=NODE_MODULES_PATHS(START) + 2. for each DIR in DIRS: + a. LOAD_AS_FILE(DIR/X) + b. LOAD_AS_DIRECTORY(DIR/X) + + NODE_MODULES_PATHS(START) + 1. let PARTS = path split(START) + 2. let ROOT = index of first instance of "node_modules" in PARTS, or 0 + 3. let I = count of PARTS - 1 + 4. let DIRS = [] + 5. while I > ROOT, + a. if PARTS[I] = "node_modules" CONTINUE + c. DIR = path join(PARTS[0 .. I] + "node_modules") + b. DIRS = DIRS + DIR + 6. return DIRS + +### Loading from the `require.paths` Folders + +In node, `require.paths` is an array of strings that represent paths to +be searched for modules when they are not prefixed with `'/'`, `'./'`, or +`'../'`. For example, if require.paths were set to: + + [ '/home/micheil/.node_modules', + '/usr/local/lib/node_modules' ] + +Then calling `require('bar/baz.js')` would search the following +locations: + +* 1: `'/home/micheil/.node_modules/bar/baz.js'` +* 2: `'/usr/local/lib/node_modules/bar/baz.js'` + +The `require.paths` array can be mutated at run time to alter this +behavior. + +It is set initially from the `NODE_PATH` environment variable, which is +a colon-delimited list of absolute paths. In the previous example, +the `NODE_PATH` environment variable might have been set to: + + /home/micheil/.node_modules:/usr/local/lib/node_modules + +#### **Note:** Please Avoid Modifying `require.paths` + +For compatibility reasons, `require.paths` is still given first priority +in the module lookup process. However, it may disappear in a future +release. + +While it seemed like a good idea at the time, and enabled a lot of +useful experimentation, in practice a mutable `require.paths` list is +often a troublesome source of confusion and headaches. + +##### Setting `require.paths` to some other value does nothing. + +This does not do what one might expect: + + require.paths = [ '/usr/lib/node' ]; + +All that does is lose the reference to the *actual* node module lookup +paths, and create a new reference to some other thing that isn't used +for anything. + +##### Putting relative paths in `require.paths` is... weird. + +If you do this: + + require.paths.push('./lib'); + +then it does *not* add the full resolved path to where `./lib` +is on the filesystem. Instead, it literally adds `'./lib'`, +meaning that if you do `require('y.js')` in `/a/b/x.js`, then it'll look +in `/a/b/lib/y.js`. If you then did `require('y.js')` in +`/l/m/n/o/p.js`, then it'd look in `/l/m/n/o/p/lib/y.js`. + +In practice, people have used this as an ad hoc way to bundle +dependencies, but this technique is brittle. + +##### Zero Isolation + +There is (by regrettable design), only one `require.paths` array used by +all modules. + +As a result, if one node program comes to rely on this behavior, it may +permanently and subtly alter the behavior of all other node programs in +the same process. As the application stack grows, we tend to assemble +functionality, and it is a problem with those parts interact in ways +that are difficult to predict. + +## Addenda: Package Manager Tips + +If you were to build a package manager, the tools above provide you with +all you need to very elegantly set up modules in a folder structure such +that they get the required dependencies and do not conflict with one +another. + +Let's say that we wanted to have the folder at +`/usr/lib//` hold the contents of a specific +version of a package. + +Packages can depend on one another. So, in order to install +package `foo`, you may have to install a specific version of package `bar`. +The `bar` package may itself have dependencies, and in some cases, these +dependencies may even collide or form cycles. + +Since Node looks up the `realpath` of any modules it loads, and then +looks for their dependencies in the `node_modules` folders as described +above, this situation is very simple to resolve with the following +architecture: + +* `/usr/lib/foo/1.2.3/` - Contents of the `foo` package, version 1.2.3. +* `/usr/lib/bar/4.3.2/` - Contents of the `bar` package that `foo` + depends on. +* `/usr/lib/foo/1.2.3/node_modules/bar` - Symbolic link to + `/usr/lib/bar/4.3.2/`. +* `/usr/lib/bar/4.3.2/node_modules/*` - Symbolic links to the packages + that `bar` depends on. + +Thus, even if a cycle is encountered, or if there are dependency +conflicts, every module will be able to get a version of its dependency +that it can use. + +When the code in the `foo` package does `require('bar')`, it will get +the version that is symlinked into +`/usr/lib/foo/1.2.3/node_modules/bar`. Then, when the code in the `bar` +package calls `require('quux')`, it'll get the version that is symlinked +into `/usr/lib/bar/4.3.2/node_modules/quux`. + +Furthermore, to make the module lookup process even more optimal, rather +than putting packages directly in `/usr/lib`, we could put them in +`/usr/lib/node_modules//`. Then node will not bother +looking for missing dependencies in `/usr/node_modules` or +`/node_modules`. + +In order to make modules available to the node repl, it might be useful +to also add the `/usr/lib/node_modules` folder to the `NODE_PATH` +environment variable. Since the module lookups using `node_modules` +folders are all relative, and based on the real path of the files +making the calls to `require()`, the packages themselves can be anywhere.