Suppose you're writing a web server which does video encoding on each file upload. Video encoding is very much compute bound. Some recent blog posts suggest that Node.js would fail miserably at this.
Using Node does not mean that you have to write a video encoding algorithm in JavaScript (a language without even 64 bit integers) and crunch away in the main server event loop. The suggested approach is to separate the I/O bound task of receiving uploads and serving downloads from the compute bound task of video encoding. In the case of video encoding this is accomplished by forking out to ffmpeg. Node provides advanced means of asynchronously controlling subprocesses for work like this.
It has also been suggested that Node does not take advantage of multicore machines. Node has long supported load-balancing connections over multiple processes in just a few lines of code - in this way a Node server will use the available cores. In coming releases we'll make it even easier: just pass <code>--balance</code> on the command line and Node will manage the cluster of processes.
Node has a clear purpose: provide an easy way to build scalable network programs. It is not a tool for every problem. Do not write a ray tracer with Node. Do not write a web browser with Node. Do however reach for Node if tasked with writing a DNS server, DHCP server, or even a video encoding server.
Suppose you're writing a web server which does video encoding on each file upload. Video encoding is very much compute bound. Some recent blog posts suggest that Node.js would fail miserably at this.
Using Node does not mean that you have to write a video encoding algorithm in JavaScript (a language without even 64 bit integers) and crunch away in the main server event loop. The suggested approach is to separate the I/O bound task of receiving uploads and serving downloads from the compute bound task of video encoding. In the case of video encoding this is accomplished by forking out to ffmpeg. Node provides advanced means of asynchronously controlling subprocesses for work like this.
It has also been suggested that Node does not take advantage of multicore machines. Node has long supported load-balancing connections over multiple processes in just a few lines of code - in this way a Node server will use the available cores. In coming releases we'll make it even easier: just pass <code>--balance</code> on the command line and Node will manage the cluster of processes.
Node has a clear purpose: provide an easy way to build scalable network programs. It is not a tool for every problem. Do not write a ray tracer with Node. Do not write a web browser with Node. Do however reach for Node if tasked with writing a DNS server, DHCP server, or even a video encoding server.
By relying on the kernel to schedule and preempt computationally expensive tasks and to load balance incoming connections, Node appears less magical than server platforms that employ userland scheduling. So far, our focus on simplicity and transparency has paid off: <ahref="http://joyeur.com/2011/08/11/node-js-meetup-distributed-web-architectures/">the</a><ahref="http://venturebeat.com/2011/08/16/linkedin-node/">number</a><ahref="http://corp.klout.com/blog/2011/10/the-tech-behind-klout-com/">of</a><ahref="http://www.joelonsoftware.com/items/2011/09/13.html">success</a><ahref="http://pow.cx/">stories</a> from developers and corporations who are adopting the technology continues to grow.
If you're compiling a software package because you need a particular version (e.g. the latest), then it requires a little bit more maintenance than using a package manager like <code>dpkg</code>. Software that you compile yourself should *not* go into <code>/usr</code>, it should go into your home directory. This is part of being a software developer.
One way of doing this is to install everything into <code>$HOME/local/$PACKAGE</code>. Here is how I install node on my machine:<pre>./configure --prefix=$HOME/local/node-v0.4.5 && make install</pre>
To have my paths automatically set I put this inside my <code>$HOME/.zshrc</code>:<pre>PATH="$HOME/local/bin:/opt/local/bin:/usr/bin:/sbin:/bin"
Node is under sufficiently rapid development that <i>everyone</i> should be compiling it themselves. A corollary of this is that <code>npm</code> (which should be installed alongside Node) does not require root to install packages.
If you're compiling a software package because you need a particular version (e.g. the latest), then it requires a little bit more maintenance than using a package manager like <code>dpkg</code>. Software that you compile yourself should *not* go into <code>/usr</code>, it should go into your home directory. This is part of being a software developer.
One way of doing this is to install everything into <code>$HOME/local/$PACKAGE</code>. Here is how I install node on my machine:<pre>./configure --prefix=$HOME/local/node-v0.4.5 && make install</pre>
To have my paths automatically set I put this inside my <code>$HOME/.zshrc</code>:<pre>PATH="$HOME/local/bin:/opt/local/bin:/usr/bin:/sbin:/bin"
Node is under sufficiently rapid development that <i>everyone</i> should be compiling it themselves. A corollary of this is that <code>npm</code> (which should be installed alongside Node) does not require root to install packages.
CPAN and RubyGems have blurred the lines between development tools and system package managers. With <code>npm</code> we wish to draw a clear line: it is not a system package manager. It is not for installing firefox or ffmpeg or OpenSSL; it is for rapidly downloading, building, and setting up Node packages. <code>npm</code> is a <i>development</i> tool. When a program written in Node becomes sufficiently mature it should be distributed as a tarball, <code>.deb</code>, <code>.rpm</code>, or other package system. It should not be distributed to end users with <code>npm</code>.
To echo <ahref="http://nodejs.org/">Node</a>’s evolutionary nature, we have refreshed the identity to help mark an exciting time for developers, businesses and users who benefit from the pioneering technology.
<strong>Building a brand</strong>
We began exploring elements to express Node.js and jettisoned preconceived notions about what we thought Node should look like, and focused on what Node is: <strong>kinetic</strong>,<inscite="mailto:EMILY%20TANAKA-DELGADO"datetime="2011-07-09T18:32"></ins><strong>connected</strong>, <strong>scalable</strong>, <strong>modular</strong>, <strong>mechanical</strong> and <strong>organic</strong>. Working with designer <ahref="http://www.chrisglass.com">Chris Glass</a>, our explorations emphasized Node's dynamism and formed a visual language based on structure, relationships and interconnectedness.
Inspired by <strong>process visualization, </strong>we discovered pattern, form, and by relief, the hex shape. The angled infrastructure encourages energy to move through the letterforms.
We look forward to exploring<inscite="mailto:EMILY%20TANAKA-DELGADO"datetime="2011-07-09T18:30"></ins>this visual language as the technology charges into a very promising future.
To download the new logo, visit <ahref="http://nodejs.org/logos/">nodejs.org/logos</a>.
To echo <ahref="http://nodejs.org/">Node</a>’s evolutionary nature, we have refreshed the identity to help mark an exciting time for developers, businesses and users who benefit from the pioneering technology.
<strong>Building a brand</strong>
We began exploring elements to express Node.js and jettisoned preconceived notions about what we thought Node should look like, and focused on what Node is: <strong>kinetic</strong>,<inscite="mailto:EMILY%20TANAKA-DELGADO"datetime="2011-07-09T18:32"></ins><strong>connected</strong>, <strong>scalable</strong>, <strong>modular</strong>, <strong>mechanical</strong> and <strong>organic</strong>. Working with designer <ahref="http://www.chrisglass.com">Chris Glass</a>, our explorations emphasized Node's dynamism and formed a visual language based on structure, relationships and interconnectedness.
Inspired by <strong>process visualization, </strong>we discovered pattern, form, and by relief, the hex shape. The angled infrastructure encourages energy to move through the letterforms.
We look forward to exploring<inscite="mailto:EMILY%20TANAKA-DELGADO"datetime="2011-07-09T18:30"></ins>this visual language as the technology charges into a very promising future.
This week Microsoft announced <ahref="https://www.windowsazure.com/en-us/develop/nodejs/">support for Node in Windows Azure</a>, their cloud computing platform. For the Node core team and the community, this is an important milestone. We've worked hard over the past six months reworking Node's machinery to support IO completion ports and Visual Studio to provide a good native port to Windows. The overarching goal of the port was to expand our user base to the largest number of developers. Happily, this has paid off in the form of being a first class citizen on Azure. Many users who would have never used Node as a pure unix tool are now up and running on the Windows platform. More users translates into a deeper and better ecosystem of modules, which makes for a better experience for everyone.
We also redesigned <ahref="http://nodejs.org">our website</a> - something that we've put off for a long time because we felt that Node was too nascent to dedicate marketing to it. But now that we have binary distributions for Macintosh and Windows, have bundled npm, and are <ahref="https://twitter.com/#!/mranney/status/145778414165569536">serving millions of users</a> at various companies, we felt ready to indulge in a new website and share of a few of our success stories on the home page.
This week Microsoft announced <ahref="https://www.windowsazure.com/en-us/develop/nodejs/">support for Node in Windows Azure</a>, their cloud computing platform. For the Node core team and the community, this is an important milestone. We've worked hard over the past six months reworking Node's machinery to support IO completion ports and Visual Studio to provide a good native port to Windows. The overarching goal of the port was to expand our user base to the largest number of developers. Happily, this has paid off in the form of being a first class citizen on Azure. Many users who would have never used Node as a pure unix tool are now up and running on the Windows platform. More users translates into a deeper and better ecosystem of modules, which makes for a better experience for everyone.
We also redesigned <ahref="http://nodejs.org">our website</a> - something that we've put off for a long time because we felt that Node was too nascent to dedicate marketing to it. But now that we have binary distributions for Macintosh and Windows, have bundled npm, and are <ahref="https://twitter.com/#!/mranney/status/145778414165569536">serving millions of users</a> at various companies, we felt ready to indulge in a new website and share of a few of our success stories on the home page.
Work is on-going. We continue to improve the software, making performance improvements and adding isolate support, but Node is growing up.
This post has been about 10 years in the making. My first job out of college was at IBM working on the <atitle="Tivoli Directory Server"href="http://www-01.ibm.com/software/tivoli/products/directory-server/">Tivoli Directory Server</a>, and at the time I had a preconceived notion that working on anything related to Internet RFCs was about as hot as you could get. I spent a lot of time back then getting "down and dirty" with everything about LDAP: the protocol, performance, storage engines, indexing and querying, caching, customer use cases and patterns, general network server patterns, etc. Basically, I soaked up as much as I possibly could while I was there. On top of that, I listened to all the "gray beards" tell me about the history of LDAP, which was a bizarre marriage of telecommunications conglomerates and graduate students. The point of this blog post is to give you a crash course in LDAP, and explain what makes <atitle="ldapjs"href="http://ldapjs.org">ldapjs</a> different. Allow me to be the gray beard for a bit...
<h2>What is LDAP and where did it come from?</h2>
Directory services were largely pioneered by the telecommunications companies (e.g., AT&T) to allow fast information retrieval of all the crap you'd expect would be in a telephone book and directory. That is, given a name, or an address, or an area code, or a number, or a foo support looking up customer records, billing information, routing information, etc. The efforts of several telcos came to exist in the <atitle="X.500"href="http://en.wikipedia.org/wiki/X.500">X.500</a> standard(s). An X.500 directory is one of the most complicated beasts you can possibly imagine, but on a high note, there's
probably not a thing you can imagine in a directory service that wasn't thought of in there. It is literally the kitchen sink. Oh, and it doesn't run over IP (it's <em>actually</em> on the <atitle="OSI Model"href="http://en.wikipedia.org/wiki/OSI_model">OSI</a> model).
Several years after X.500 had been deployed (at telcos, academic institutions, etc.), it became clear that the Internet was "for real." <atitle="LDAP"href="http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol">LDAP</a>, the "Lightweight Directory Access Protocol," was invented to act purely as an IP-accessible gateway to an X.500 directory.
At some point in the early 90's, a <atitle="Tim Howes"href="http://en.wikipedia.org/wiki/Tim_Howes">graduate student</a> at the University of Michigan (with some help) cooked up the "grandfather" implementation of the LDAP protocol, which wasn't actually a "gateway," but rather a stand-alone implementation of LDAP. Said implementation, like many things at the time, was a process-per-connection concurrency model, and had "backends" (aka storage engine) for the file system and the Unix DB API. At some point the <atitle="Berkeley Database"href="http://www.oracle.com/technetwork/database/berkeleydb/index.html">Berkeley Database </a>(BDB) was put in, and still remains the de facto storage engine for most LDAP directories.
Ok, so some a graduate student at UM wrote an LDAP server that wasn't a gateway. So what? Well, that UM code base turns out to be the thing that pretty much every vendor did a source license for. Those graduate students went off to Netscape later in the 90's, and largely dominated the market of LDAP middleware until <atitle="Active Directory"href="http://en.wikipedia.org/wiki/Active_Directory">Active Directory</a> came along many years later (as far as I know, Active Directory is "from scratch", since while it's "almost" LDAP, it's different in a lot of ways). That Netscape code base was further bought and sold over the years to iPlanet, Sun Microsystems, and Red Hat (I'm probably missing somebody in that chain). It now lives in the Fedora umbrella as '<atitle="389 Directory Server"href="http://directory.fedoraproject.org/">389 Directory Server</a>.' Probably the most popular fork of that code base now is <atitle="OpenLDAP"href="http://www.openldap.org/">OpenLDAP</a>.
IBM did the same thing, and the Directory Server I worked on was a fork of the UM code too, but it heavily diverged from the Netscape branches. The divergence was primarily due to: (1) backing to DB2 as opposed to BDB, and (2) needing to run on IBM's big iron like OS/400 and Z series mainframes.
Macro point is that there have actually been very few "fresh" implementations of LDAP, and it gets a pretty bad reputation because at the end of the day you've got 20 years of "bolt-ons" to grad student code. Oh, and it was born out of ginormous telcos, so of course the protocol is overly complex.
That said, while there certainly is some wacky stuff in the LDAP protocol itself, it really suffered from poor and buggy implementations more than the fact that LDAP itself was fundamentally flawed. As <atitle="Engine Yard LDAP"href="http://www.engineyard.com/blog/2009/ldap-directories-the-forgotten-nosql/">engine yard pointed out a few years back</a>, you can think of LDAP as the original NoSQL store.
<h2>LDAP: The Good Parts</h2>
So what's awesome about LDAP? Since it's a directory system it maintains a hierarchy of your data, which as an information management pattern aligns
with _a lot_ of use case (the quintessential example is white pages for people in your company, but subscriptions to SaaS applications, "host groups"
for tracking machines/instances, physical goods tracking, etc., all have use cases that fit that organization scheme). For example, presumably at your job
you have a "reporting chain." Let's say a given record in LDAP (I'll use myself as a guinea pig here) looks like:
<pre> firstName: Mark
lastName: Cavage
city: Seattle
uid: markc
state: Washington
mail: mcavagegmailcom
phone: (206) 555-1212
title: Software Engineer
department: 123456
objectclass: joyentPerson</pre>
The record for me would live under the tree of engineers I report to (and as an example some other popular engineers under said vice president) would look like:
<pre> uid=david
/
uid=bryan
/ | \
uid=markc uid=ryah uid=isaacs</pre>
Ok, so we've got a tree. It's not tremendously different from your filesystem, but how do we find people? LDAP has a rich search filter syntax that makes a lot of sense for key/value data (far more than tacking Map Reduce jobs on does, imo), and all search queries take a "start point" in the tree. Here's an example: let's say I wanted to find all "Software Engineers" in the entire company, a filter would look like:
<pre> (title="Software Engineer")</pre>
And I'd just start my search from 'uid=david' in the example above. Let's say I wanted to find all software engineers who worked in Seattle:
I could keep going, but the gist is that LDAP has "full" boolean predicate logic, wildcard filters, etc. It's really rich.
Oh, and on top of the technical merits, better or worse, it's an established standard for both administrators and applications (i.e., most "shipped" intranet software has either a local user repository or the ability to leverage an LDAP server somewhere). So there's a lot of compelling reasons to look at leveraging LDAP.
<h2>ldapjs: Why do I care?</h2>
As I said earlier, I spent a lot of time at IBM observing how customers used LDAP, and the real items I took away from that experience were:
<ul>
<li>LDAP implementations have suffered a lot from never having been designed from the ground up for a large number of concurrent connections with asynchronous operations.</li>
<li>There are use cases for LDAP that just don't always fit the traditional "here's my server and storage engine" model. A lot of simple customer use cases wanted an LDAP access point, but not be forced into taking the heavy backends that came with it (they wanted the original gateway model!). There was an entire "sub" industry for this known as "<atitle="Metadirectory"href="http://en.wikipedia.org/wiki/Metadirectory">meta directories</a>" back in the late 90's and early 2000's.</li>
<li>Replication was always a sticking point. LDAP vendors all tried to offer a big multi-master, multi-site replication model. It was a lot of "bolt-on" complexity, done before the <atitle="CAP Theorem"href="http://en.wikipedia.org/wiki/CAP_theorem">CAP theorem</a> was written, and certainly before it was accepted as "truth."</li>
<li>Nobody uses all of the protocol. In fact, 20% of the features solve 80% of the use cases (I'm making that number up, but you get the idea).</li>
</ul>
For all the good parts of LDAP, those are really damned big failing points, and even I eventually abandoned LDAP for the greener pastures of NoSQL somewhere
along the way. But it always nagged at me that LDAP didn't get it's due because of a lot of implementation problems (to be clear, if I could, I'd change some
aspects of the protocol itself too, but that's a lot harder).
Well, in the last year, I went to work for <atitle="Joyent"href="http://www.joyent.com/">Joyent</a>, and like everyone else, we have several use problems that are classic directory service problems. If you break down the list I outlined above:
<ul>
<li><strong>Connection-oriented and asynchronous:</strong> Holy smokes batman, <atitle="node.js"href="http://nodejs.org/">node.js</a> is a completely kick-ass event-driven asynchronous server platform that manages connections like a boss. Check!</li>
<li><strong>Lots of use cases:</strong> Yeah, we've got some. Man, the <atitle="sinatra"href="http://www.sinatrarb.com/">sinatra</a>/<atitle="express"href="http://expressjs.com/">express</a> paradigm is so easy to slap over anything. How about we just do that and leave as many use cases open as we can. Check!</li>
<li><strong>Replication is hard. CAP is right:</strong> There are a lot of distributed databases out vying to solve exactly this problem. At Joyent we went with <atitle="Riak"href="http://www.basho.com/">Riak</a>. Check!</li>
<li><strong>Don't need all of the protocol:</strong> I'm lazy. Let's just skip the stupid things most people don't need. Check!</li>
</ul>
So that's the crux of ldapjs right there. Giving you the ability to put LDAP back into your application while nailing those 4 fundamental problems that plague most existing LDAP deployments.
The obvious question is how it turned out, and the answer is, honestly, better than I thought it would. When I set out to do this, I actually assumed I'd be shipping a much smaller percentage of the RFC than is there. There's actually about 95% of the core RFC implemented. I wasn't sure if the marriage of this protocol to node/JavaScript would work out, but if you've used express ever, this should be _really_ familiar. And I tried to make it as natural as possible to use "pure" JavaScript objects, rather than requiring the developer to understand <atitle="ASN.1"href="http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One">ASN.1</a> (the binary wire protocol) or the<atitle="RFC 4510"href="http://tools.ietf.org/html/rfc4510"> LDAP RFC</a> in detail (this one mostly worked out; ldap_modify is still kind of a PITA).
Within 24 hours of releasing ldapjs on <atitle="twitter"href="http://twitter.com/#!/mcavage/status/106767571012952064">Twitter</a>, there was an <atitle="github ldapjs address book"href="https://gist.github.com/1173999">implementation of an address book</a> that works with Thunderbird/Evolution, by the end of that weekend there was some <ahref="http://i.imgur.com/uR16U.png">slick integration with CouchDB</a>, and ldapjs even got used in one of the <ahref="http://twitter.com/#!/jheusala/status/108977708649811970">node knockout apps</a>. Off to a pretty good start!
<h2>The Road Ahead</h2>
Hopefully you've been motivated to learn a little bit more about LDAP and try out <ahref="http://ldapjs.org">ldapjs</a>. The best place to start is probably the <atitle="ldapjs guide"href="http://ldapjs.org/guide.html">guide</a>. After that you'll probably need to pick up a book from <ahref="http://www.amazon.com/Understanding-Deploying-LDAP-Directory-Services/dp/0672323168">back in the day</a>. ldapjs itself is still in its infancy; there's quite a bit of room to add some slick client-side logic (e.g., connection pools, automatic reconnects), easy to use schema validation, backends, etc. By the time this post is live, there will be experimental <ahref="http://en.wikipedia.org/wiki/DTrace">dtrace</a> support if you're running on Mac OS X or preferably Joyent's <ahref="http://smartos.org/">SmartOS</a> (shameless plug). And that nagging percentage of the protocol I didn't do will get filled in over time I suspect. If you've got an interest in any of this, send me some pull requests, but most importantly, I just want to see LDAP not just be a skeleton in the closet and get used in places where you should be using it. So get out there and write you some LDAP.
This post has been about 10 years in the making. My first job out of college was at IBM working on the <atitle="Tivoli Directory Server"href="http://www-01.ibm.com/software/tivoli/products/directory-server/">Tivoli Directory Server</a>, and at the time I had a preconceived notion that working on anything related to Internet RFCs was about as hot as you could get. I spent a lot of time back then getting "down and dirty" with everything about LDAP: the protocol, performance, storage engines, indexing and querying, caching, customer use cases and patterns, general network server patterns, etc. Basically, I soaked up as much as I possibly could while I was there. On top of that, I listened to all the "gray beards" tell me about the history of LDAP, which was a bizarre marriage of telecommunications conglomerates and graduate students. The point of this blog post is to give you a crash course in LDAP, and explain what makes <atitle="ldapjs"href="http://ldapjs.org">ldapjs</a> different. Allow me to be the gray beard for a bit...
<h2>What is LDAP and where did it come from?</h2>
Directory services were largely pioneered by the telecommunications companies (e.g., AT&T) to allow fast information retrieval of all the crap you'd expect would be in a telephone book and directory. That is, given a name, or an address, or an area code, or a number, or a foo support looking up customer records, billing information, routing information, etc. The efforts of several telcos came to exist in the <atitle="X.500"href="http://en.wikipedia.org/wiki/X.500">X.500</a> standard(s). An X.500 directory is one of the most complicated beasts you can possibly imagine, but on a high note, there's
probably not a thing you can imagine in a directory service that wasn't thought of in there. It is literally the kitchen sink. Oh, and it doesn't run over IP (it's <em>actually</em> on the <atitle="OSI Model"href="http://en.wikipedia.org/wiki/OSI_model">OSI</a> model).
Several years after X.500 had been deployed (at telcos, academic institutions, etc.), it became clear that the Internet was "for real." <atitle="LDAP"href="http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol">LDAP</a>, the "Lightweight Directory Access Protocol," was invented to act purely as an IP-accessible gateway to an X.500 directory.
At some point in the early 90's, a <atitle="Tim Howes"href="http://en.wikipedia.org/wiki/Tim_Howes">graduate student</a> at the University of Michigan (with some help) cooked up the "grandfather" implementation of the LDAP protocol, which wasn't actually a "gateway," but rather a stand-alone implementation of LDAP. Said implementation, like many things at the time, was a process-per-connection concurrency model, and had "backends" (aka storage engine) for the file system and the Unix DB API. At some point the <atitle="Berkeley Database"href="http://www.oracle.com/technetwork/database/berkeleydb/index.html">Berkeley Database </a>(BDB) was put in, and still remains the de facto storage engine for most LDAP directories.
Ok, so some a graduate student at UM wrote an LDAP server that wasn't a gateway. So what? Well, that UM code base turns out to be the thing that pretty much every vendor did a source license for. Those graduate students went off to Netscape later in the 90's, and largely dominated the market of LDAP middleware until <atitle="Active Directory"href="http://en.wikipedia.org/wiki/Active_Directory">Active Directory</a> came along many years later (as far as I know, Active Directory is "from scratch", since while it's "almost" LDAP, it's different in a lot of ways). That Netscape code base was further bought and sold over the years to iPlanet, Sun Microsystems, and Red Hat (I'm probably missing somebody in that chain). It now lives in the Fedora umbrella as '<atitle="389 Directory Server"href="http://directory.fedoraproject.org/">389 Directory Server</a>.' Probably the most popular fork of that code base now is <atitle="OpenLDAP"href="http://www.openldap.org/">OpenLDAP</a>.
IBM did the same thing, and the Directory Server I worked on was a fork of the UM code too, but it heavily diverged from the Netscape branches. The divergence was primarily due to: (1) backing to DB2 as opposed to BDB, and (2) needing to run on IBM's big iron like OS/400 and Z series mainframes.
Macro point is that there have actually been very few "fresh" implementations of LDAP, and it gets a pretty bad reputation because at the end of the day you've got 20 years of "bolt-ons" to grad student code. Oh, and it was born out of ginormous telcos, so of course the protocol is overly complex.
That said, while there certainly is some wacky stuff in the LDAP protocol itself, it really suffered from poor and buggy implementations more than the fact that LDAP itself was fundamentally flawed. As <atitle="Engine Yard LDAP"href="http://www.engineyard.com/blog/2009/ldap-directories-the-forgotten-nosql/">engine yard pointed out a few years back</a>, you can think of LDAP as the original NoSQL store.
<h2>LDAP: The Good Parts</h2>
So what's awesome about LDAP? Since it's a directory system it maintains a hierarchy of your data, which as an information management pattern aligns
with _a lot_ of use case (the quintessential example is white pages for people in your company, but subscriptions to SaaS applications, "host groups"
for tracking machines/instances, physical goods tracking, etc., all have use cases that fit that organization scheme). For example, presumably at your job
you have a "reporting chain." Let's say a given record in LDAP (I'll use myself as a guinea pig here) looks like:
<pre> firstName: Mark
lastName: Cavage
city: Seattle
uid: markc
state: Washington
mail: mcavagegmailcom
phone: (206) 555-1212
title: Software Engineer
department: 123456
objectclass: joyentPerson</pre>
The record for me would live under the tree of engineers I report to (and as an example some other popular engineers under said vice president) would look like:
<pre> uid=david
/
uid=bryan
/ | \
uid=markc uid=ryah uid=isaacs</pre>
Ok, so we've got a tree. It's not tremendously different from your filesystem, but how do we find people? LDAP has a rich search filter syntax that makes a lot of sense for key/value data (far more than tacking Map Reduce jobs on does, imo), and all search queries take a "start point" in the tree. Here's an example: let's say I wanted to find all "Software Engineers" in the entire company, a filter would look like:
<pre> (title="Software Engineer")</pre>
And I'd just start my search from 'uid=david' in the example above. Let's say I wanted to find all software engineers who worked in Seattle:
I could keep going, but the gist is that LDAP has "full" boolean predicate logic, wildcard filters, etc. It's really rich.
Oh, and on top of the technical merits, better or worse, it's an established standard for both administrators and applications (i.e., most "shipped" intranet software has either a local user repository or the ability to leverage an LDAP server somewhere). So there's a lot of compelling reasons to look at leveraging LDAP.
<h2>ldapjs: Why do I care?</h2>
As I said earlier, I spent a lot of time at IBM observing how customers used LDAP, and the real items I took away from that experience were:
<ul>
<li>LDAP implementations have suffered a lot from never having been designed from the ground up for a large number of concurrent connections with asynchronous operations.</li>
<li>There are use cases for LDAP that just don't always fit the traditional "here's my server and storage engine" model. A lot of simple customer use cases wanted an LDAP access point, but not be forced into taking the heavy backends that came with it (they wanted the original gateway model!). There was an entire "sub" industry for this known as "<atitle="Metadirectory"href="http://en.wikipedia.org/wiki/Metadirectory">meta directories</a>" back in the late 90's and early 2000's.</li>
<li>Replication was always a sticking point. LDAP vendors all tried to offer a big multi-master, multi-site replication model. It was a lot of "bolt-on" complexity, done before the <atitle="CAP Theorem"href="http://en.wikipedia.org/wiki/CAP_theorem">CAP theorem</a> was written, and certainly before it was accepted as "truth."</li>
<li>Nobody uses all of the protocol. In fact, 20% of the features solve 80% of the use cases (I'm making that number up, but you get the idea).</li>
</ul>
For all the good parts of LDAP, those are really damned big failing points, and even I eventually abandoned LDAP for the greener pastures of NoSQL somewhere
along the way. But it always nagged at me that LDAP didn't get it's due because of a lot of implementation problems (to be clear, if I could, I'd change some
aspects of the protocol itself too, but that's a lot harder).
Well, in the last year, I went to work for <atitle="Joyent"href="http://www.joyent.com/">Joyent</a>, and like everyone else, we have several use problems that are classic directory service problems. If you break down the list I outlined above:
<ul>
<li><strong>Connection-oriented and asynchronous:</strong> Holy smokes batman, <atitle="node.js"href="http://nodejs.org/">node.js</a> is a completely kick-ass event-driven asynchronous server platform that manages connections like a boss. Check!</li>
<li><strong>Lots of use cases:</strong> Yeah, we've got some. Man, the <atitle="sinatra"href="http://www.sinatrarb.com/">sinatra</a>/<atitle="express"href="http://expressjs.com/">express</a> paradigm is so easy to slap over anything. How about we just do that and leave as many use cases open as we can. Check!</li>
<li><strong>Replication is hard. CAP is right:</strong> There are a lot of distributed databases out vying to solve exactly this problem. At Joyent we went with <atitle="Riak"href="http://www.basho.com/">Riak</a>. Check!</li>
<li><strong>Don't need all of the protocol:</strong> I'm lazy. Let's just skip the stupid things most people don't need. Check!</li>
</ul>
So that's the crux of ldapjs right there. Giving you the ability to put LDAP back into your application while nailing those 4 fundamental problems that plague most existing LDAP deployments.
The obvious question is how it turned out, and the answer is, honestly, better than I thought it would. When I set out to do this, I actually assumed I'd be shipping a much smaller percentage of the RFC than is there. There's actually about 95% of the core RFC implemented. I wasn't sure if the marriage of this protocol to node/JavaScript would work out, but if you've used express ever, this should be _really_ familiar. And I tried to make it as natural as possible to use "pure" JavaScript objects, rather than requiring the developer to understand <atitle="ASN.1"href="http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One">ASN.1</a> (the binary wire protocol) or the<atitle="RFC 4510"href="http://tools.ietf.org/html/rfc4510"> LDAP RFC</a> in detail (this one mostly worked out; ldap_modify is still kind of a PITA).
Within 24 hours of releasing ldapjs on <atitle="twitter"href="http://twitter.com/#!/mcavage/status/106767571012952064">Twitter</a>, there was an <atitle="github ldapjs address book"href="https://gist.github.com/1173999">implementation of an address book</a> that works with Thunderbird/Evolution, by the end of that weekend there was some <ahref="http://i.imgur.com/uR16U.png">slick integration with CouchDB</a>, and ldapjs even got used in one of the <ahref="http://twitter.com/#!/jheusala/status/108977708649811970">node knockout apps</a>. Off to a pretty good start!
<h2>The Road Ahead</h2>
Hopefully you've been motivated to learn a little bit more about LDAP and try out <ahref="http://ldapjs.org">ldapjs</a>. The best place to start is probably the <atitle="ldapjs guide"href="http://ldapjs.org/guide.html">guide</a>. After that you'll probably need to pick up a book from <ahref="http://www.amazon.com/Understanding-Deploying-LDAP-Directory-Services/dp/0672323168">back in the day</a>. ldapjs itself is still in its infancy; there's quite a bit of room to add some slick client-side logic (e.g., connection pools, automatic reconnects), easy to use schema validation, backends, etc. By the time this post is live, there will be experimental <ahref="http://en.wikipedia.org/wiki/DTrace">dtrace</a> support if you're running on Mac OS X or preferably Joyent's <ahref="http://smartos.org/">SmartOS</a> (shameless plug). And that nagging percentage of the protocol I didn't do will get filled in over time I suspect. If you've got an interest in any of this, send me some pull requests, but most importantly, I just want to see LDAP not just be a skeleton in the closet and get used in places where you should be using it. So get out there and write you some LDAP.
We <ahref="http://blog.nodejs.org/2011/06/23/porting-node-to-windows-with-microsoft%E2%80%99s-help/">announced</a> back in July that with Microsoft's support Joyent would be porting Node to Windows. This effort is ongoing but I thought it would be nice to make a status report post about the new platform library <code><ahref="https://github.com/joyent/libuv">libuv</a></code> which has resulted from porting Node to Windows.
<code>libuv</code>'s purpose is to abstract platform-dependent code in Node into one place where it can be tested for correctness and performance before bindings to V8 are added. Since Node is totally non-blocking, <code>libuv</code> turns out to be a rather useful library itself: a BSD-licensed, minimal, high-performance, cross-platform networking library.
We attempt to not reinvent the wheel where possible. The entire Unix backend sits heavily on Marc Lehmann's beautiful libraries <ahref="http://software.schmorp.de/pkg/libev.html">libev</a> and <ahref="http://software.schmorp.de/pkg/libeio.html">libeio</a>. For DNS we integrated with Daniel Stenberg's <ahref="http://c-ares.haxx.se/">C-Ares</a>. For cross-platform build-system support we're relying on Chrome's <ahref="http://code.google.com/p/gyp/">GYP</a> meta-build system.
The current implmented features are:
<ul>
<li>Non-blocking TCP sockets (using IOCP on Windows)</li>
<li>Non-blocking named pipes</li>
<li>UDP</li>
<li>Timers</li>
<li>Child process spawning</li>
<li>Asynchronous DNS via <ahref="http://c-ares.haxx.se/">c-ares</a> or <code>uv_getaddrinfo</code>.</li>
<li>Asynchronous file system APIs <code>uv_fs_*</code></li>
<li>High resolution time <code>uv_hrtime</code></li>
<li>Current executable path look up <code>uv_exepath</code></li>
<li>Thread pool scheduling <code>uv_queue_work</code></li>
</ul>
The features we are working on still are
<ul>
<li>File system events (Currently supports inotify, <code>ReadDirectoryChangesW</code> and will support kqueue and event ports in the near future.) <code>uv_fs_event_t</code></li>
<li>VT100 TTY <code>uv_tty_t</code></li>
<li>Socket sharing between processes <code>uv_ipc_t (<ahref="https://gist.github.com/1233593">planned API</a>)</code></li>
</ul>
For complete documentation see the header file: <ahref="https://github.com/joyent/libuv/blob/03d0c57ea216abd611286ff1e58d4e344a459f76/include/uv.h">include/uv.h</a>. There are a number of tests in <ahref="https://github.com/joyent/libuv/tree/3ca382be741ec6ce6a001f0db04d6375af8cd642/test">the test directory</a> which demonstrate the API.
<code>libuv</code> supports Microsoft Windows operating systems since Windows XP SP2. It can be built with either Visual Studio or MinGW. Solaris 121 and later using GCC toolchain. Linux 2.6 or better using the GCC toolchain. Macinotsh Darwin using the GCC or XCode toolchain. It is known to work on the BSDs but we do not check the build regularly.
In addition to Node v0.5, a number of projects have begun to use <code>libuv</code>:
We <ahref="http://blog.nodejs.org/2011/06/23/porting-node-to-windows-with-microsoft%E2%80%99s-help/">announced</a> back in July that with Microsoft's support Joyent would be porting Node to Windows. This effort is ongoing but I thought it would be nice to make a status report post about the new platform library <code><ahref="https://github.com/joyent/libuv">libuv</a></code> which has resulted from porting Node to Windows.
<code>libuv</code>'s purpose is to abstract platform-dependent code in Node into one place where it can be tested for correctness and performance before bindings to V8 are added. Since Node is totally non-blocking, <code>libuv</code> turns out to be a rather useful library itself: a BSD-licensed, minimal, high-performance, cross-platform networking library.
We attempt to not reinvent the wheel where possible. The entire Unix backend sits heavily on Marc Lehmann's beautiful libraries <ahref="http://software.schmorp.de/pkg/libev.html">libev</a> and <ahref="http://software.schmorp.de/pkg/libeio.html">libeio</a>. For DNS we integrated with Daniel Stenberg's <ahref="http://c-ares.haxx.se/">C-Ares</a>. For cross-platform build-system support we're relying on Chrome's <ahref="http://code.google.com/p/gyp/">GYP</a> meta-build system.
The current implmented features are:
<ul>
<li>Non-blocking TCP sockets (using IOCP on Windows)</li>
<li>Non-blocking named pipes</li>
<li>UDP</li>
<li>Timers</li>
<li>Child process spawning</li>
<li>Asynchronous DNS via <ahref="http://c-ares.haxx.se/">c-ares</a> or <code>uv_getaddrinfo</code>.</li>
<li>Asynchronous file system APIs <code>uv_fs_*</code></li>
<li>High resolution time <code>uv_hrtime</code></li>
<li>Current executable path look up <code>uv_exepath</code></li>
<li>Thread pool scheduling <code>uv_queue_work</code></li>
</ul>
The features we are working on still are
<ul>
<li>File system events (Currently supports inotify, <code>ReadDirectoryChangesW</code> and will support kqueue and event ports in the near future.) <code>uv_fs_event_t</code></li>
<li>VT100 TTY <code>uv_tty_t</code></li>
<li>Socket sharing between processes <code>uv_ipc_t (<ahref="https://gist.github.com/1233593">planned API</a>)</code></li>
</ul>
For complete documentation see the header file: <ahref="https://github.com/joyent/libuv/blob/03d0c57ea216abd611286ff1e58d4e344a459f76/include/uv.h">include/uv.h</a>. There are a number of tests in <ahref="https://github.com/joyent/libuv/tree/3ca382be741ec6ce6a001f0db04d6375af8cd642/test">the test directory</a> which demonstrate the API.
<code>libuv</code> supports Microsoft Windows operating systems since Windows XP SP2. It can be built with either Visual Studio or MinGW. Solaris 121 and later using GCC toolchain. Linux 2.6 or better using the GCC toolchain. Macinotsh Darwin using the GCC or XCode toolchain. It is known to work on the BSDs but we do not check the build regularly.
In addition to Node v0.5, a number of projects have begun to use <code>libuv</code>:
This week office hours are only from 4pm to 6pm. Isaac will be in the Joyent office in SF - everyone else is out of town. Sign up at http://nodeworkup.eventbrite.com/ if you would like to come.
The week after, Thursday May 5th, we will all be at NodeConf in Portland.
This week office hours are only from 4pm to 6pm. Isaac will be in the Joyent office in SF - everyone else is out of town. Sign up at http://nodeworkup.eventbrite.com/ if you would like to come.
The week after, Thursday May 5th, we will all be at NodeConf in Portland.
Starting next Thursday Isaac, Tom, and I will be holding weekly office hours at <ahref="http://maps.google.com/maps?q=345+California+St,+San+Francisco,+CA+94104&layer=c&sll=37.793040,-122.400491&cbp=13,178.31,,0,-60.77&cbll=37.793131,-122.400484&hl=en&sspn=0.006295,0.006295&ie=UTF8&hq=&hnear=345+California+St,+San+Francisco,+California+94104&ll=37.793131,-122.400484&spn=0.001295,0.003428&z=19&panoid=h0dlz3VG-hMKlzOu0LxMIg">Joyent HQ</a> in San Francisco. Office hours are meant to be subdued working time - there are no talks and no alcohol. Bring your bugs or just come and hack with us.
Our building requires that everyone attending be on a list so you must sign up at <ahref="http://nodeworkup01.eventbrite.com/">Event Brite</a>.
Starting next Thursday Isaac, Tom, and I will be holding weekly office hours at <ahref="http://maps.google.com/maps?q=345+California+St,+San+Francisco,+CA+94104&layer=c&sll=37.793040,-122.400491&cbp=13,178.31,,0,-60.77&cbll=37.793131,-122.400484&hl=en&sspn=0.006295,0.006295&ie=UTF8&hq=&hnear=345+California+St,+San+Francisco,+California+94104&ll=37.793131,-122.400484&spn=0.001295,0.003428&z=19&panoid=h0dlz3VG-hMKlzOu0LxMIg">Joyent HQ</a> in San Francisco. Office hours are meant to be subdued working time - there are no talks and no alcohol. Bring your bugs or just come and hack with us.
Our building requires that everyone attending be on a list so you must sign up at <ahref="http://nodeworkup01.eventbrite.com/">Event Brite</a>.
I'm pleased to announce that Microsoft is partnering with Joyent in formally contributing resources towards porting Node to Windows. As you may have heard in <ahref="http://nodejs.org/nodeconf.pdf"title="a talk">a talk</a> we gave earlier this year, we have started the undertaking of a native port to Windows - targeting the high-performance IOCP API.
This requires a rather large modification of the core structure, and we're very happy to have official guidance and engineering resources from Microsoft. <ahref="https://www.cloudkick.com/">Rackspace</a> is also contributing <ahref="https://github.com/piscisaureus">Bert Belder</a>'s time to this undertaking.
I'm pleased to announce that Microsoft is partnering with Joyent in formally contributing resources towards porting Node to Windows. As you may have heard in <ahref="http://nodejs.org/nodeconf.pdf"title="a talk">a talk</a> we gave earlier this year, we have started the undertaking of a native port to Windows - targeting the high-performance IOCP API.
This requires a rather large modification of the core structure, and we're very happy to have official guidance and engineering resources from Microsoft. <ahref="https://www.cloudkick.com/">Rackspace</a> is also contributing <ahref="https://github.com/piscisaureus">Bert Belder</a>'s time to this undertaking.
The result will be an official binary node.exe releases on nodejs.org, which will work on Windows Azure and other Windows versions as far back as Server 2003.
<li>Superfeedr released <ahref="http://blog.superfeedr.com/node-xmpp-server/">a Node XMPP Server</a>. "<i>Since <ahref="http://spaceboyz.net/~astro/">astro</a> had been doing an <strong>amazing work</strong> with his <ahref="https://github.com/astro/node-xmpp">node-xmpp</a> library to build <em>Client</em>, <em>Components</em> and even <em>Server to server</em> modules, the logical next step was to try to build a <em>Client to Server</em> module so that we could have a full blown server. That’s what we worked on the past couple days, and <ahref="https://github.com/superfeedr/node-xmpp">it’s now on Github</a>!</i></li>
<li>Joyent's Mark Cavage released <ahref="http://ldapjs.org/">LDAP.js</a>. "<i>ldapjs is a pure JavaScript, from-scratch framework for implementing <ahref="http://tools.ietf.org/html/rfc4510">LDAP</a> clients and servers in <ahref="http://nodejs.org">Node.js</a>. It is intended for developers used to interacting with HTTP services in node and <ahref="http://expressjs.com">express</a>.</i></li>
<ul>
<li>Superfeedr released <ahref="http://blog.superfeedr.com/node-xmpp-server/">a Node XMPP Server</a>. "<i>Since <ahref="http://spaceboyz.net/~astro/">astro</a> had been doing an <strong>amazing work</strong> with his <ahref="https://github.com/astro/node-xmpp">node-xmpp</a> library to build <em>Client</em>, <em>Components</em> and even <em>Server to server</em> modules, the logical next step was to try to build a <em>Client to Server</em> module so that we could have a full blown server. That’s what we worked on the past couple days, and <ahref="https://github.com/superfeedr/node-xmpp">it’s now on Github</a>!</i></li>
<li>Joyent's Mark Cavage released <ahref="http://ldapjs.org/">LDAP.js</a>. "<i>ldapjs is a pure JavaScript, from-scratch framework for implementing <ahref="http://tools.ietf.org/html/rfc4510">LDAP</a> clients and servers in <ahref="http://nodejs.org">Node.js</a>. It is intended for developers used to interacting with HTTP services in node and <ahref="http://expressjs.com">express</a>.</i></li>
<li>Microsoft's Tomasz Janczuk released <ahref="http://tomasz.janczuk.org/2011/08/hosting-nodejs-applications-in-iis-on.html">iisnode</a> "<i>The <ahref="https://github.com/tjanczuk/iisnode">iisnode</a> project provides a native IIS 7.x module that allows hosting of node.js applications in IIS.</i><br/><br/>Scott Hanselman posted <ahref="http://www.hanselman.com/blog/InstallingAndRunningNodejsApplicationsWithinIISOnWindowsAreYouMad.aspx">a detailed walkthrough</a> of how to get started with iisnode
One of the things Joyent accepted when we took on the Node project was to provide resources to help the community grow. The Node project is amazing because of the expertize, dedication and hard work of the community. However in all communities there is the possibility of people acting inappropriately. We decided to introduce trademarks on the “Node.js” and the “Node logo” in order to ensure that people or organisations who are not investing in the Node community misrepresent, or create confusion about the role of themselves or their products with Node.
We are big fans of the people who have contributed to Node and we have worked hard to make sure that existing members of the community will be unaffected by this change. For most people they don’t have to do anything they are free to use the Node.js marks in their free open source projects (see guidelines). For others we’ve already granted them licenses to use Node.js marks in their domain names and their businesses. We value all of these contributions to the Node community and hope that we can continue to protect their good names and hard work.
Where does our trademark policy come from? We started by looking at popular open source foundations like the Apache Software Foundation and Linux. By strongly basing our policy on the one used by the Apache Software Foundation we feel that we’ve created a policy which is liberal enough to allow the open source community to easily make use of the mark in the context of free open source software, but secure enough to protect the community’s work from being misrepresented by other organisations.
While we realise that any changes involving lawyers can be intimidating to the community we want to make this transition as smoothly as possible and welcome your questions and feedback on the policy and how we are implementing it.
One of the things Joyent accepted when we took on the Node project was to provide resources to help the community grow. The Node project is amazing because of the expertize, dedication and hard work of the community. However in all communities there is the possibility of people acting inappropriately. We decided to introduce trademarks on the “Node.js” and the “Node logo” in order to ensure that people or organisations who are not investing in the Node community misrepresent, or create confusion about the role of themselves or their products with Node.
We are big fans of the people who have contributed to Node and we have worked hard to make sure that existing members of the community will be unaffected by this change. For most people they don’t have to do anything they are free to use the Node.js marks in their free open source projects (see guidelines). For others we’ve already granted them licenses to use Node.js marks in their domain names and their businesses. We value all of these contributions to the Node community and hope that we can continue to protect their good names and hard work.
Where does our trademark policy come from? We started by looking at popular open source foundations like the Apache Software Foundation and Linux. By strongly basing our policy on the one used by the Apache Software Foundation we feel that we’ve created a policy which is liberal enough to allow the open source community to easily make use of the mark in the context of free open source software, but secure enough to protect the community’s work from being misrepresented by other organisations.
While we realise that any changes involving lawyers can be intimidating to the community we want to make this transition as smoothly as possible and welcome your questions and feedback on the policy and how we are implementing it.
Version 0.6.0 will be released next week. Please spend some time this week upgrading your code to v0.5.10. Report any API differences at <ahref="https://github.com/joyent/node/wiki/API-changes-between-v0.4-and-v0.6">https://github.com/joyent/node/wiki/API-changes-between-v0.4-and-v0.6</a> or report a bug to us at <ahref="http://github.com/joyent/node/issues">http://github.com/joyent/node/issues</a> if you hit problems.
The API changes between v0.4.12 and v0.5.10 are 99% cosmetic, minor, and easy to fix. Most people are able to migrate their code in 10 minutes. Don't fear.
Version 0.6.0 will be released next week. Please spend some time this week upgrading your code to v0.5.10. Report any API differences at <ahref="https://github.com/joyent/node/wiki/API-changes-between-v0.4-and-v0.6">https://github.com/joyent/node/wiki/API-changes-between-v0.4-and-v0.6</a> or report a bug to us at <ahref="http://github.com/joyent/node/issues">http://github.com/joyent/node/issues</a> if you hit problems.
The API changes between v0.4.12 and v0.5.10 are 99% cosmetic, minor, and easy to fix. Most people are able to migrate their code in 10 minutes. Don't fear.
Once you've ported your code to v0.5.10 please help out by testing third party modules. Make bug reports. Encourage authors to publish new versions of their modules. Go through the list of modules at <ahref="http://search.npmjs.org/">http://search.npmjs.org/</a> and try out random ones. This is especially encouraged of Windows users!
<p><imgstyle="float:right;margin-left:1.2em;"alt="substack"src="http://substack.net/images/substackistan.png"><i>This is a guest post by James "SubStack" Halliday, originally posted <ahref="http://substack.net/posts/16a9d8/multi-server-continuous-deployment-with-fleet">on his blog</a>, and reposted here with permission.</i></p>
<p>Writing applications as a sequence of tiny services that all talk to each other over the network has many upsides, but it can be annoyingly tedious to get all the subsystems up and running. </p>
<p>Running a <ahref="http://substack.net/posts/7a1c42">seaport</a> can help with getting all the services to talk to each other, but running the processes is another matter, especially when you have new code to push into production. </p>
<p><ahref="http://github.com/substack/fleet">fleet</a> aims to make it really easy for anyone on your team to push new code from git to an armada of servers and manage all the processes in your stack. </p>
<p>To start using fleet, just install the fleet command with <ahref="http://npmjs.org">npm</a>: </p>
<prestyle="">npm install -g fleet </pre>
<p>Then on one of your servers, start a fleet hub. From a fresh directory, give it a passphrase and a port to listen on: </p>
<p>Now fleet is listening on :7000 for commands and has started a git server on :7001 over http. There's no ssh keys or post commit hooks to configure, just run that command and you're ready to go! </p>
<p>Next set up some worker drones to run your processes. You can have as many workers as you like on a single server but each worker should be run from a separate directory. Just do: </p>
<p>where <spanclass="code">x.x.x.x</span> is the address where the fleet hub is running. Spin up a few of these drones. </p>
<p>Now navigate to the directory of the app you want to deploy. First set a remote so you don't need to type <spanclass="code">--hub</span> and <spanclass="code">--secret</span> all the time. </p>
<p>Fleet just created a <spanclass="code">fleet.json</span> file for you to save your settings. </p>
<p>From the same app directory, to deploy your code just do: </p>
<prestyle="">fleet deploy </pre>
<p>The deploy command does a <spanclass="code">git push</span> to the fleet hub's git http server and then the hub instructs all the drones to pull from it. Your code gets checked out into a new directory on all the fleet drones every time you deploy. </p>
<p>Because fleet is designed specifically for managing applications with lots of tiny services, the deploy command isn't tied to running any processes. Starting processes is up to the programmer but it's super simple. Just use the <spanclass="code">fleet spawn</span> command: </p>
<p>By default fleet picks a drone at random to run the process on. You can specify which drone you want to run a particular process on with the <spanclass="code">--drone</span> switch if it matters. </p>
<p>Start a few processes across all your worker drones and then show what is running with the <spanclass="code">fleet ps</span> command: </p>
<p>Now suppose that you have new code to push out into production. By default, fleet lets you spin up new services without disturbing your existing services. If you <spanclass="code">fleet deploy</span> again after checking in some new changes to git, the next time you <spanclass="code">fleet spawn</span> a new process, that process will be spun up in a completely new directory based on the git commit hash. To stop a process, just use <spanclass="code">fleet stop</span>. </p>
<p>This approach lets you verify that the new services work before bringing down the old services. You can even start experimenting with heterogeneous and incremental deployment by hooking into a custom <ahref="http://substack.net/posts/5bd18d">http proxy</a>! </p>
<p>Even better, if you use a service registry like <ahref="http://substack.net/posts/7a1c42">seaport</a> for managing the host/port tables, you can spin up new ad-hoc staging clusters all the time without disrupting the normal operation of your site before rolling out new code to users. </p>
<p>Fleet has many more commands that you can learn about with its git-style manpage-based help system! Just do <spanclass="code">fleet help</span> to get a list of all the commands you can run. </p>
<prestyle="">fleet help
Usage: fleet <command> [<args>]
The commands are:
deploy Push code to drones.
drone Connect to a hub as a worker.
exec Run commands on drones.
hub Create a hub for drones to connect.
monitor Show service events system-wide.
ps List the running processes on the drones.
remote Manage the set of remote hubs.
spawn Run services on drones.
stop Stop processes running on drones.
For help about a command, try `fleet help `.</pre>
<p><spanclass="code">npm install -g fleet</span> and <ahref="https://github.com/substack/fleet">check out the code on github</a>! </p>
<p><imgstyle="float:right;margin-left:1.2em;"alt="substack"src="http://substack.net/images/substackistan.png"><i>This is a guest post by James "SubStack" Halliday, originally posted <ahref="http://substack.net/posts/16a9d8/multi-server-continuous-deployment-with-fleet">on his blog</a>, and reposted here with permission.</i></p>
<p>Writing applications as a sequence of tiny services that all talk to each other over the network has many upsides, but it can be annoyingly tedious to get all the subsystems up and running. </p>
<p>Running a <ahref="http://substack.net/posts/7a1c42">seaport</a> can help with getting all the services to talk to each other, but running the processes is another matter, especially when you have new code to push into production. </p>
<p><ahref="http://github.com/substack/fleet">fleet</a> aims to make it really easy for anyone on your team to push new code from git to an armada of servers and manage all the processes in your stack. </p>
<p>To start using fleet, just install the fleet command with <ahref="http://npmjs.org">npm</a>: </p>
<prestyle="">npm install -g fleet </pre>
<p>Then on one of your servers, start a fleet hub. From a fresh directory, give it a passphrase and a port to listen on: </p>
<p>Now fleet is listening on :7000 for commands and has started a git server on :7001 over http. There's no ssh keys or post commit hooks to configure, just run that command and you're ready to go! </p>
<p>Next set up some worker drones to run your processes. You can have as many workers as you like on a single server but each worker should be run from a separate directory. Just do: </p>
<p>where <spanclass="code">x.x.x.x</span> is the address where the fleet hub is running. Spin up a few of these drones. </p>
<p>Now navigate to the directory of the app you want to deploy. First set a remote so you don't need to type <spanclass="code">--hub</span> and <spanclass="code">--secret</span> all the time. </p>
<p>Fleet just created a <spanclass="code">fleet.json</span> file for you to save your settings. </p>
<p>From the same app directory, to deploy your code just do: </p>
<prestyle="">fleet deploy </pre>
<p>The deploy command does a <spanclass="code">git push</span> to the fleet hub's git http server and then the hub instructs all the drones to pull from it. Your code gets checked out into a new directory on all the fleet drones every time you deploy. </p>
<p>Because fleet is designed specifically for managing applications with lots of tiny services, the deploy command isn't tied to running any processes. Starting processes is up to the programmer but it's super simple. Just use the <spanclass="code">fleet spawn</span> command: </p>
<p>By default fleet picks a drone at random to run the process on. You can specify which drone you want to run a particular process on with the <spanclass="code">--drone</span> switch if it matters. </p>
<p>Start a few processes across all your worker drones and then show what is running with the <spanclass="code">fleet ps</span> command: </p>
<p>Now suppose that you have new code to push out into production. By default, fleet lets you spin up new services without disturbing your existing services. If you <spanclass="code">fleet deploy</span> again after checking in some new changes to git, the next time you <spanclass="code">fleet spawn</span> a new process, that process will be spun up in a completely new directory based on the git commit hash. To stop a process, just use <spanclass="code">fleet stop</span>. </p>
<p>This approach lets you verify that the new services work before bringing down the old services. You can even start experimenting with heterogeneous and incremental deployment by hooking into a custom <ahref="http://substack.net/posts/5bd18d">http proxy</a>! </p>
<p>Even better, if you use a service registry like <ahref="http://substack.net/posts/7a1c42">seaport</a> for managing the host/port tables, you can spin up new ad-hoc staging clusters all the time without disrupting the normal operation of your site before rolling out new code to users. </p>
<p>Fleet has many more commands that you can learn about with its git-style manpage-based help system! Just do <spanclass="code">fleet help</span> to get a list of all the commands you can run. </p>
<prestyle="">fleet help
Usage: fleet <command> [<args>]
The commands are:
deploy Push code to drones.
drone Connect to a hub as a worker.
exec Run commands on drones.
hub Create a hub for drones to connect.
monitor Show service events system-wide.
ps List the running processes on the drones.
remote Manage the set of remote hubs.
spawn Run services on drones.
stop Stop processes running on drones.
For help about a command, try `fleet help `.</pre>
<p><spanclass="code">npm install -g fleet</span> and <ahref="https://github.com/substack/fleet">check out the code on github</a>! </p>
<imgclass="alignnone size-full wp-image-469"title="Bunyan"src="http://nodeblog.files.wordpress.com/2012/03/bunyan.png"alt="Paul Bunyan and Babe the Blue Ox"width="240"height="320"/><br/>
<ahref="http://www.flickr.com/photos/stublag/2876034487">Photo by Paul Carroll</a>
</div>
<p>Service logs are gold, if you can mine them. We scan them for occasional debugging. Perhaps we grep them looking for errors or warnings, or setup an occasional nagios log regex monitor. If that. This is a waste of the best channel for data about a service.</p>
<p><ahref="http://www.youtube.com/watch?v=01-2pNCZiNk">"Log. (Huh) What is it good for. Absolutely ..."</a></p>
<ul>
<li>debugging</li>
<li>monitors tools that alert operators</li>
<li>non real-time analysis (business or operational analysis)</li>
<li>historical analysis</li>
</ul>
<p>These are what logs are good for. The current state of logging is barely adequate for the first of these. Doing reliable analysis, and even monitoring, of varied <ahref="http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/">"printf-style" logs</a> is a grueling or hacky task that most either don't bother with, fallback to paying someone else to do (viz. Splunk's great successes), or, for web sites, punt and use the plethora of JavaScript-based web analytics tools.</p>
<p>Let's log in JSON. Let's format log records with a filter <em>outside</em> the app. Let's put more info in log records by not shoehorning into a printf-message. Debuggability can be improved. Monitoring and analysis can <em>definitely</em> be improved. Let's <em>not</em> write another regex-based parser, and use the time we've saved writing tools to collate logs from multiple nodes and services, to query structured logs (from all services, not just web servers), etc.</p>
<p>At <ahref="http://joyent.com">Joyent</a> we use node.js for running many core services -- loosely coupled through HTTP REST APIs and/or AMQP. In this post I'll draw on experiences from my work on Joyent's <ahref="http://www.joyent.com/products/smartdatacenter/">SmartDataCenter product</a> and observations of <ahref="http://www.joyentcloud.com/">Joyent Cloud</a> operations to suggest some improvements to service logging. I'll show the (open source) <strong>Bunyan logging library and tool</strong> that we're developing to improve the logging toolchain.</p>
<h1style="margin:48px 0 24px;"id="current-state-of-log-formatting">Current State of Log Formatting</h1>
[Mon, 21 Nov 2011 20:52:11 GMT] 200 GET /foo (1ms)
Blah, some other unstructured output to from a console.log call.
</code></pre>
<p>What're we doing here? Five logs at random. Five different date formats. As <ahref="http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/">Paul Querna points out</a> we haven't improved log parsability in 20 years. Parsability is enemy number one. You can't use your logs until you can parse the records, and faced with the above the inevitable solution is a one-off regular expression.</p>
<p>The current state of the art is various <ahref="http://search.cpan.org/~akira/Apache-ParseLog-1.02/ParseLog.pm">parsing libs</a>, <ahref="http://www.webalizer.org/">analysis</a><ahref="http://awstats.sourceforge.net/">tools</a> and homebrew scripts ranging from grep to Perl, whose scope is limited to a few niches log formats.</p>
<h1style="margin:48px 0 24px;"id="json-for-logs">JSON for Logs</h1>
<p><codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">JSON.parse()</code> solves all that. Let's log in JSON. But it means a change in thinking: <strong>The first-level audience for log files shouldn't be a person, but a machine.</strong></p>
<p>That is not said lightly. The "Unix Way" of small focused tools lightly coupled with text output is important. JSON is less "text-y" than, e.g., Apache common log format. JSON makes <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">grep</code> and <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">awk</code> awkward. Using <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">less</code> directly on a log is handy.</p>
<p>But not handy enough. That <ahref="http://bit.ly/wTPlN3">80's pastel jumpsuit awkwardness</a> you're feeling isn't the JSON, it's your tools. Time to find a <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">json</code> tool -- <ahref="https://github.com/trentm/json">json</a> is one, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> described below is another one. Time to learn your JSON library instead of your regex library: <ahref="https://developer.mozilla.org/en/JSON">JavaScript</a>, <ahref="http://docs.python.org/library/json.html">Python</a>, <ahref="http://flori.github.com/json/">Ruby</a>, <ahref="http://json.org/java/">Java</a>, <ahref="http://search.cpan.org/~makamaka/JSON-2.53/lib/JSON.pm">Perl</a>.</p>
<p>Time to burn your log4j Layout classes and move formatting to the tools side. Creating a log message with semantic information and throwing that away to make a string is silly. The win at being able to trivially parse log records is huge. The possibilities at being able to add ad hoc structured information to individual log records is interesting: think program state metrics, think feeding to Splunk, or loggly, think easy audit logs.</p>
<p><ahref="https://github.com/trentm/node-bunyan">Bunyan</a> is <strong>a node.js module for logging in JSON</strong> and <strong>a <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> CLI tool</strong> to view those logs.</p>
<p>Logging with Bunyan basically looks like this:</p>
<p>Pipe that through the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> tool that is part of the "node-bunyan" install to get more readable output:</p>
<p>Bunyan is log4j-like: create a Logger with a name, call <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.info(...)</code>, etc. However it has no intention of reproducing much of the functionality of log4j. IMO, much of that is overkill for the types of services you'll tend to be writing with node.js.</p>
<p>Let's walk through a bigger example to show some interesting things in Bunyan. We'll create a very small "Hello API" server using the excellent <ahref="https://github.com/mcavage/node-restify">restify</a> library -- which we used heavily here at <ahref="http://joyent.com">Joyent</a>. (Bunyan doesn't require restify at all, you can easily use Bunyan with <ahref="http://expressjs.com/">Express</a> or whatever.)</p>
<p><em>You can follow along in <ahref="https://github.com/trentm/hello-json-logging">https://github.com/trentm/hello-json-logging</a> if you like. Note that I'm using the current HEAD of the bunyan and restify trees here, so details might change a bit. Prerequisite: a node 0.6.x installation.</em></p>
<p>Every Bunyan logger must have a <strong>name</strong>. Unlike log4j, this is not a hierarchical dotted namespace. It is just a name field for the log records.</p>
<p>Every Bunyan logger has one or more <strong>streams</strong>, to which log records are written. Here we've defined two: logging at DEBUG level and above is written to stdout, and logging at TRACE and above is appended to 'hello.log'.</p>
<p>Bunyan has the concept of <strong>serializers</strong>: a registry of functions that know how to convert a JavaScript object for a certain log record field to a nice JSON representation for logging. For example, here we register the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">Logger.stdSerializers.req</code> function to convert HTTP Request objects (using the field name "req") to JSON. More on serializers later.</p>
<h2id="restify-server">Restify Server</h2>
<p>Restify 1.x and above has bunyan support baked in. You pass in your Bunyan logger like this:</p>
<prestyle="overflow:auto;color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:5px;"><code>var server = restify.createServer({
name: 'Hello API',
log: log // Pass our logger to restify.
});
</code></pre>
<p>Our simple API will have a single <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">GET /hello?name=NAME</code> endpoint:</p>
<p>If we run that, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">node server.js</code>, and call the endpoint, we get the expected restify response:</p>
<h2id="setup-server-logging">Setup Server Logging</h2>
<p>Let's add two things to our server. First, we'll use the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">server.pre</code> to hook into restify's request handling before routing where we'll <strong>log the request</strong>.</p>
<p>This is the first time we've seen this <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.info</code> style with an object as the first argument. Bunyan logging methods (<codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.trace</code>, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.debug</code>, ...) all support an optional <strong>first object argument with extra log record fields</strong>:</p>
<p>Here we pass in the restify Request object, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req</code>. The "req" serializer we registered above will come into play here, but bear with me.</p>
<p>Remember that we already had this debug log statement in our endpoint handler:</p>
<prestyle="overflow:auto;color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:5px;"><code>req.log.debug('caller is "%s"', caller); // (2)
</code></pre>
<p>Second, use the restify server <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">after</code> event to <strong>log the response</strong>:</p>
<prestyle="overflow:auto;color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:5px;"><code>server.on('after', function (req, res, route) {
req.log.info({res: res}, "finished"); // (3)
});
</code></pre>
<h2id="log-output">Log Output</h2>
<p>Now lets see what log output we get when somebody hits our API's endpoint:</p>
{"name":"helloapi","hostname":"banana.local","pid":40341,"route":"SayHello","req_id":"9496dfdd-4ec7-4b59-aae7-3fed57aed5ba","level":20,"msg":"caller is \"paul\"","time":"2012-03-28T17:37:29.507Z","v":0}
<p>Lets look at each in turn to see what is interesting -- pretty-printed with <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">node server.js | ./node_modules/.bin/bunyan -j</code>:</p>
<p>Here we logged the incoming request with <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">request.log.info({req: request}, 'start')</code>. The use of the "req" field triggers the <ahref="https://github.com/trentm/node-bunyan/blob/master/lib/bunyan.js#L857-870">"req" serializer</a><ahref="https://github.com/trentm/hello-json-logging/blob/master/server.js#L24">registered at Logger creation</a>.</p>
<p>Next the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req.log.debug</code> in our handler:</p>
<li><p>The last two log messages include <strong>a "req_id" field</strong> (added to the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req.log</code> logger by restify). Note that this is the same UUID as the "X-Request-Id" header in the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">curl</code> response. This means that if you use <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req.log</code> for logging in your API handlers you will get an easy way to collate all logging for particular requests.</p>
<p>If your's is an SOA system with many services, a best practice is to carry that X-Request-Id/req_id through your system to enable collating handling of a single top-level request.</p></li>
<li><p>The last two log messages include <strong>a "route" field</strong>. This tells you to which handler restify routed the request. While possibly useful for debugging, this can be very helpful for log-based monitoring of endpoints on a server.</p></li>
</ol>
<p>Recall that we also setup all logging to go the "hello.log" file. This was set at the TRACE level. Restify will log more detail of its operation at the trace level. See <ahref="https://gist.github.com/1761772">my "hello.log"</a> for an example. The <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> tool does a decent job of <ahref="https://gist.github.com/1761772#file_2.+cat+hello.log+pipe+bunyan">nicely formatting</a> multiline messages and "req"/"res" keys (with color, not shown in the gist).</p>
<p><em>This</em> is logging you can use effectively.</p>
<p>Bunyan is just one of many options for logging in node.js-land. Others (that I know of) supporting JSON logging are <ahref="https://github.com/flatiron/winston#readme">winston</a> and <ahref="https://github.com/pquerna/node-logmagic/">logmagic</a>. Paul Querna has <ahref="http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/">an excellent post on using JSON for logging</a>, which shows logmagic usage and also touches on topics like the GELF logging format, log transporting, indexing and searching.</p>
<p>Parsing challenges won't ever completely go away, but it can for your logs if you use JSON. Collating log records across logs from multiple nodes is facilitated by a common "time" field. Correlating logging across multiple services is enabled by carrying a common "req_id" (or equivalent) through all such logs.</p>
<p>Separate log files for a single service is an anti-pattern. The typical Apache example of separate access and error logs is legacy, not an example to follow. A JSON log provides the structure necessary for tooling to easily filter for log records of a particular type.</p>
<p>JSON logs bring possibilities. Feeding to tools like Splunk becomes easy. Ad hoc fields allow for a lightly spec'd comm channel from apps to other services: records with a "metric" could feed to <ahref="http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/">statsd</a>, records with a "loggly: true" could feed to loggly.com.</p>
<p>Here I've described a very simple example of restify and bunyan usage for node.js-based API services with easy JSON logging. Restify provides a powerful framework for robust API services. Bunyan provides a light API for nice JSON logging and the beginnings of tooling to help consume Bunyan JSON logs.</p>
<divstyle="float:right;margin:0 0 15px 15px;">
<imgclass="alignnone size-full wp-image-469"title="Bunyan"src="http://nodeblog.files.wordpress.com/2012/03/bunyan.png"alt="Paul Bunyan and Babe the Blue Ox"width="240"height="320"/><br/>
<ahref="http://www.flickr.com/photos/stublag/2876034487">Photo by Paul Carroll</a>
</div>
<p>Service logs are gold, if you can mine them. We scan them for occasional debugging. Perhaps we grep them looking for errors or warnings, or setup an occasional nagios log regex monitor. If that. This is a waste of the best channel for data about a service.</p>
<p><ahref="http://www.youtube.com/watch?v=01-2pNCZiNk">"Log. (Huh) What is it good for. Absolutely ..."</a></p>
<ul>
<li>debugging</li>
<li>monitors tools that alert operators</li>
<li>non real-time analysis (business or operational analysis)</li>
<li>historical analysis</li>
</ul>
<p>These are what logs are good for. The current state of logging is barely adequate for the first of these. Doing reliable analysis, and even monitoring, of varied <ahref="http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/">"printf-style" logs</a> is a grueling or hacky task that most either don't bother with, fallback to paying someone else to do (viz. Splunk's great successes), or, for web sites, punt and use the plethora of JavaScript-based web analytics tools.</p>
<p>Let's log in JSON. Let's format log records with a filter <em>outside</em> the app. Let's put more info in log records by not shoehorning into a printf-message. Debuggability can be improved. Monitoring and analysis can <em>definitely</em> be improved. Let's <em>not</em> write another regex-based parser, and use the time we've saved writing tools to collate logs from multiple nodes and services, to query structured logs (from all services, not just web servers), etc.</p>
<p>At <ahref="http://joyent.com">Joyent</a> we use node.js for running many core services -- loosely coupled through HTTP REST APIs and/or AMQP. In this post I'll draw on experiences from my work on Joyent's <ahref="http://www.joyent.com/products/smartdatacenter/">SmartDataCenter product</a> and observations of <ahref="http://www.joyentcloud.com/">Joyent Cloud</a> operations to suggest some improvements to service logging. I'll show the (open source) <strong>Bunyan logging library and tool</strong> that we're developing to improve the logging toolchain.</p>
<h1style="margin:48px 0 24px;"id="current-state-of-log-formatting">Current State of Log Formatting</h1>
[Mon, 21 Nov 2011 20:52:11 GMT] 200 GET /foo (1ms)
Blah, some other unstructured output to from a console.log call.
</code></pre>
<p>What're we doing here? Five logs at random. Five different date formats. As <ahref="http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/">Paul Querna points out</a> we haven't improved log parsability in 20 years. Parsability is enemy number one. You can't use your logs until you can parse the records, and faced with the above the inevitable solution is a one-off regular expression.</p>
<p>The current state of the art is various <ahref="http://search.cpan.org/~akira/Apache-ParseLog-1.02/ParseLog.pm">parsing libs</a>, <ahref="http://www.webalizer.org/">analysis</a><ahref="http://awstats.sourceforge.net/">tools</a> and homebrew scripts ranging from grep to Perl, whose scope is limited to a few niches log formats.</p>
<h1style="margin:48px 0 24px;"id="json-for-logs">JSON for Logs</h1>
<p><codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">JSON.parse()</code> solves all that. Let's log in JSON. But it means a change in thinking: <strong>The first-level audience for log files shouldn't be a person, but a machine.</strong></p>
<p>That is not said lightly. The "Unix Way" of small focused tools lightly coupled with text output is important. JSON is less "text-y" than, e.g., Apache common log format. JSON makes <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">grep</code> and <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">awk</code> awkward. Using <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">less</code> directly on a log is handy.</p>
<p>But not handy enough. That <ahref="http://bit.ly/wTPlN3">80's pastel jumpsuit awkwardness</a> you're feeling isn't the JSON, it's your tools. Time to find a <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">json</code> tool -- <ahref="https://github.com/trentm/json">json</a> is one, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> described below is another one. Time to learn your JSON library instead of your regex library: <ahref="https://developer.mozilla.org/en/JSON">JavaScript</a>, <ahref="http://docs.python.org/library/json.html">Python</a>, <ahref="http://flori.github.com/json/">Ruby</a>, <ahref="http://json.org/java/">Java</a>, <ahref="http://search.cpan.org/~makamaka/JSON-2.53/lib/JSON.pm">Perl</a>.</p>
<p>Time to burn your log4j Layout classes and move formatting to the tools side. Creating a log message with semantic information and throwing that away to make a string is silly. The win at being able to trivially parse log records is huge. The possibilities at being able to add ad hoc structured information to individual log records is interesting: think program state metrics, think feeding to Splunk, or loggly, think easy audit logs.</p>
<p><ahref="https://github.com/trentm/node-bunyan">Bunyan</a> is <strong>a node.js module for logging in JSON</strong> and <strong>a <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> CLI tool</strong> to view those logs.</p>
<p>Logging with Bunyan basically looks like this:</p>
<p>Pipe that through the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> tool that is part of the "node-bunyan" install to get more readable output:</p>
<p>Bunyan is log4j-like: create a Logger with a name, call <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.info(...)</code>, etc. However it has no intention of reproducing much of the functionality of log4j. IMO, much of that is overkill for the types of services you'll tend to be writing with node.js.</p>
<p>Let's walk through a bigger example to show some interesting things in Bunyan. We'll create a very small "Hello API" server using the excellent <ahref="https://github.com/mcavage/node-restify">restify</a> library -- which we used heavily here at <ahref="http://joyent.com">Joyent</a>. (Bunyan doesn't require restify at all, you can easily use Bunyan with <ahref="http://expressjs.com/">Express</a> or whatever.)</p>
<p><em>You can follow along in <ahref="https://github.com/trentm/hello-json-logging">https://github.com/trentm/hello-json-logging</a> if you like. Note that I'm using the current HEAD of the bunyan and restify trees here, so details might change a bit. Prerequisite: a node 0.6.x installation.</em></p>
<p>Every Bunyan logger must have a <strong>name</strong>. Unlike log4j, this is not a hierarchical dotted namespace. It is just a name field for the log records.</p>
<p>Every Bunyan logger has one or more <strong>streams</strong>, to which log records are written. Here we've defined two: logging at DEBUG level and above is written to stdout, and logging at TRACE and above is appended to 'hello.log'.</p>
<p>Bunyan has the concept of <strong>serializers</strong>: a registry of functions that know how to convert a JavaScript object for a certain log record field to a nice JSON representation for logging. For example, here we register the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">Logger.stdSerializers.req</code> function to convert HTTP Request objects (using the field name "req") to JSON. More on serializers later.</p>
<h2id="restify-server">Restify Server</h2>
<p>Restify 1.x and above has bunyan support baked in. You pass in your Bunyan logger like this:</p>
<prestyle="overflow:auto;color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:5px;"><code>var server = restify.createServer({
name: 'Hello API',
log: log // Pass our logger to restify.
});
</code></pre>
<p>Our simple API will have a single <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">GET /hello?name=NAME</code> endpoint:</p>
<p>If we run that, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">node server.js</code>, and call the endpoint, we get the expected restify response:</p>
<h2id="setup-server-logging">Setup Server Logging</h2>
<p>Let's add two things to our server. First, we'll use the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">server.pre</code> to hook into restify's request handling before routing where we'll <strong>log the request</strong>.</p>
<p>This is the first time we've seen this <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.info</code> style with an object as the first argument. Bunyan logging methods (<codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.trace</code>, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">log.debug</code>, ...) all support an optional <strong>first object argument with extra log record fields</strong>:</p>
<p>Here we pass in the restify Request object, <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req</code>. The "req" serializer we registered above will come into play here, but bear with me.</p>
<p>Remember that we already had this debug log statement in our endpoint handler:</p>
<prestyle="overflow:auto;color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:5px;"><code>req.log.debug('caller is "%s"', caller); // (2)
</code></pre>
<p>Second, use the restify server <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">after</code> event to <strong>log the response</strong>:</p>
<prestyle="overflow:auto;color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:5px;"><code>server.on('after', function (req, res, route) {
req.log.info({res: res}, "finished"); // (3)
});
</code></pre>
<h2id="log-output">Log Output</h2>
<p>Now lets see what log output we get when somebody hits our API's endpoint:</p>
{"name":"helloapi","hostname":"banana.local","pid":40341,"route":"SayHello","req_id":"9496dfdd-4ec7-4b59-aae7-3fed57aed5ba","level":20,"msg":"caller is \"paul\"","time":"2012-03-28T17:37:29.507Z","v":0}
<p>Lets look at each in turn to see what is interesting -- pretty-printed with <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">node server.js | ./node_modules/.bin/bunyan -j</code>:</p>
<p>Here we logged the incoming request with <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">request.log.info({req: request}, 'start')</code>. The use of the "req" field triggers the <ahref="https://github.com/trentm/node-bunyan/blob/master/lib/bunyan.js#L857-870">"req" serializer</a><ahref="https://github.com/trentm/hello-json-logging/blob/master/server.js#L24">registered at Logger creation</a>.</p>
<p>Next the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req.log.debug</code> in our handler:</p>
<li><p>The last two log messages include <strong>a "req_id" field</strong> (added to the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req.log</code> logger by restify). Note that this is the same UUID as the "X-Request-Id" header in the <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">curl</code> response. This means that if you use <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">req.log</code> for logging in your API handlers you will get an easy way to collate all logging for particular requests.</p>
<p>If your's is an SOA system with many services, a best practice is to carry that X-Request-Id/req_id through your system to enable collating handling of a single top-level request.</p></li>
<li><p>The last two log messages include <strong>a "route" field</strong>. This tells you to which handler restify routed the request. While possibly useful for debugging, this can be very helpful for log-based monitoring of endpoints on a server.</p></li>
</ol>
<p>Recall that we also setup all logging to go the "hello.log" file. This was set at the TRACE level. Restify will log more detail of its operation at the trace level. See <ahref="https://gist.github.com/1761772">my "hello.log"</a> for an example. The <codestyle="color:#999;background-color:#2f2f2f;border:1px solid #484848;padding:.2em .4em;">bunyan</code> tool does a decent job of <ahref="https://gist.github.com/1761772#file_2.+cat+hello.log+pipe+bunyan">nicely formatting</a> multiline messages and "req"/"res" keys (with color, not shown in the gist).</p>
<p><em>This</em> is logging you can use effectively.</p>
<p>Bunyan is just one of many options for logging in node.js-land. Others (that I know of) supporting JSON logging are <ahref="https://github.com/flatiron/winston#readme">winston</a> and <ahref="https://github.com/pquerna/node-logmagic/">logmagic</a>. Paul Querna has <ahref="http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/">an excellent post on using JSON for logging</a>, which shows logmagic usage and also touches on topics like the GELF logging format, log transporting, indexing and searching.</p>
<p>Parsing challenges won't ever completely go away, but it can for your logs if you use JSON. Collating log records across logs from multiple nodes is facilitated by a common "time" field. Correlating logging across multiple services is enabled by carrying a common "req_id" (or equivalent) through all such logs.</p>
<p>Separate log files for a single service is an anti-pattern. The typical Apache example of separate access and error logs is legacy, not an example to follow. A JSON log provides the structure necessary for tooling to easily filter for log records of a particular type.</p>
<p>JSON logs bring possibilities. Feeding to tools like Splunk becomes easy. Ad hoc fields allow for a lightly spec'd comm channel from apps to other services: records with a "metric" could feed to <ahref="http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/">statsd</a>, records with a "loggly: true" could feed to loggly.com.</p>
<p>Here I've described a very simple example of restify and bunyan usage for node.js-based API services with easy JSON logging. Restify provides a powerful framework for robust API services. Bunyan provides a light API for nice JSON logging and the beginnings of tooling to help consume Bunyan JSON logs.</p>
<p><strong>Update (29-Mar-2012):</strong> Fix styles somewhat for RSS readers.</p>
<p>Managing dependencies is a fundamental problem in building complex software. The terrific success of github and <ahref="http://npmjs.org/">npm</a> have made code reuse especially easy in the Node world, where packages don't exist in isolation but rather as nodes in a large graph. The software is constantly changing (releasing new versions), and each package has its own constraints about what other packages it requires to run (dependencies). npm keeps track of these constraints, and authors express what kind of changes are compatible using <ahref="http://npmjs.org/doc/semver.html">semantic versioning</a>, allowing authors to specify that their package will work with even future versions of its dependencies as long as the semantic versions are assigned properly.
</p>
<p>This does mean that when you "npm install" a package with dependencies, there's no guarantee that you'll get the same set of code now that you would have gotten an hour ago, or that you would get if you were to run it again an hour later. You may get a bunch of bug fixes now that weren't available an hour ago. This is great during development, where you want to keep up with changes upstream. It's not necessarily what you want for deployment, though, where you want to validate whatever bits you're actually shipping.
</p>
<p>Put differently, <strong>it's understood that all software changes incur some risk, and it's critical to be able to manage this risk on your own terms</strong>. Taking that risk in development is good because by definition that's when you're incorporating and testing software changes. On the other hand, if you're shipping production software, you probably don't want to take this risk when cutting a release candidate (i.e. build time) or when you actually ship (i.e. deploy time) because you want to validate whatever you ship.
</p>
<p>You can address a simple case of this problem by only depending on specific versions of packages, allowing no semver flexibility at all, but this falls apart when you depend on packages that don't also adopt the same principle. Many of us at Joyent started wondering: can we generalize this approach?
</p>
<h2>Shrinkwrapping packages</h2>
<p>That brings us to <ahref="http://npmjs.org/doc/shrinkwrap.html">npm shrinkwrap</a><ahref="#note1-note"name="note1-top">[1]</a>:
</p>
<pre><code>NAME
npm-shrinkwrap -- Lock down dependency versions
SYNOPSIS
npm shrinkwrap
DESCRIPTION
This command locks down the versions of a package's dependencies so
that you can control exactly which versions of each dependency will
be used when your package is installed.</code></pre>
<p>Let's consider package A:
</p>
<pre><code>{
"name": "A",
"version": "0.1.0",
"dependencies": {
"B": "<0.1.0"
}
}</code></pre>
<p>package B:
</p>
<pre><code>{
"name": "B",
"version": "0.0.1",
"dependencies": {
"C": "<0.1.0"
}
}</code></pre>
<p>and package C:
</p>
<pre><code>{
"name": "C,
"version": "0.0.1"
}</code></pre>
<p>If these are the only versions of A, B, and C available in the registry, then a normal "npm install A" will install:
</p>
<pre><code>A@0.1.0
└─┬ B@0.0.1
└── C@0.0.1</code></pre>
<p>Then if B@0.0.2 is published, then a fresh "npm install A" will install:
</p>
<pre><code>A@0.1.0
└─┬ B@0.0.2
└── C@0.0.1</code></pre>
<p>assuming the new version did not modify B's dependencies. Of course, the new version of B could include a new version of C and any number of new dependencies. As we said before, if A's author doesn't want that, she could specify a dependency on B@0.0.1. But if A's author and B's author are not the same person, there's no way for A's author to say that she does not want to pull in newly published versions of C when B hasn't changed at all.
</p>
<p>In this case, A's author can use
</p>
<pre><code># npm shrinkwrap</code></pre>
<p>This generates npm-shrinkwrap.json, which will look something like this:
<p>The shrinkwrap command has locked down the dependencies based on what's currently installed in node_modules. <strong>When "npm install" installs a package with a npm-shrinkwrap.json file in the package root, the shrinkwrap file (rather than package.json files) completely drives the installation of that package and all of its dependencies (recursively).</strong> So now the author publishes A@0.1.0, and subsequent installs of this package will use B@0.0.1 and C@0.1.0, regardless the dependencies and versions listed in A's, B's, and C's package.json files. If the authors of B and C publish new versions, they won't be used to install A because the shrinkwrap refers to older versions. Even if you generate a new shrinkwrap, it will still reference the older versions, since "npm shrinkwrap" uses what's installed locally rather than what's available in the registry.
</p>
<h4>Using shrinkwrapped packages</h4>
<p>Using a shrinkwrapped package is no different than using any other package: you can "npm install" it by hand, or add a dependency to your package.json file and "npm install" it.
</p>
<h4>Building shrinkwrapped packages</h4>
<p>To shrinkwrap an existing package:
</p>
<ol>
<li>Run "npm install" in the package root to install the current versions of all dependencies.</li>
<li>Validate that the package works as expected with these versions.</li>
<li>Run "npm shrinkwrap", add npm-shrinkwrap.json to git, and publish your package.</li>
</ol>
<p>To add or update a dependency in a shrinkwrapped package:
</p>
<ol>
<li>Run "npm install" in the package root to install the current versions of all dependencies.</li>
<li>Add or update dependencies. "npm install" each new or updated package individually and then update package.json.</li>
<li>Validate that the package works as expected with the new dependencies.</li>
<li>Run "npm shrinkwrap", commit the new npm-shrinkwrap.json, and publish your package.</li>
</ol>
<p>You can still use <ahref="http://npmjs.org/doc/outdated.html">npm outdated(1)</a> to view which dependencies have newer versions available.
</p>
<p>For more details, check out the full docs on <ahref="http://npmjs.org/doc/shrinkwrap.html">npm shrinkwrap</a>, from which much of the above is taken.
</p>
<h2>Why not just check <code>node_modules</code> into git?</h2>
<p>One previously <ahref="http://www.mikealrogers.com/posts/nodemodules-in-git.html">proposed solution</a> is to "npm install" your dependencies during development and commit the results into source control. Then you deploy your app from a specific git SHA knowing you've got exactly the same bits that you tested in development. This does address the problem, but it has its own issues: for one, binaries are tricky because you need to "npm install" them to get their sources, but this builds the [system-dependent] binary too. You can avoid checking in the binaries and use "npm rebuild" at build time, but we've had a lot of difficulty trying to do this.<ahref="#note2-note"name="note2-top">[2]</a> At best, this is second-class treatment for binary modules, which are critical for many important types of Node applications.<ahref="#note3-note"name="note3-top">[3]</a>
</p>
<p>Besides the issues with binary modules, this approach just felt wrong to many of us. There's a reason we don't check binaries into source control, and it's not just because they're platform-dependent. (After all, we could build and check in binaries for all supported platforms and operating systems.) It's because that approach is error-prone and redundant: error-prone because it introduces a new human failure mode where someone checks in a source change but doesn't regenerate all the binaries, and redundant because the binaries can always be built from the sources alone. An important principle of software version control is that you don't check in files derived directly from other files by a simple transformation.<ahref="#note4-note"name="note4-top">[4]</a> Instead, you check in the original sources and automate the transformations via the build process.
</p>
<p>Dependencies are just like binaries in this regard: they're files derived from a simple transformation of something else that is (or could easily be) already available: the name and version of the dependency. Checking them in has all the same problems as checking in binaries: people could update package.json without updating the checked-in module (or vice versa). Besides that, adding new dependencies has to be done by hand, introducing more opportunities for error (checking in the wrong files, not checking in certain files, inadvertently changing files, and so on). Our feeling was: why check in this whole dependency tree (and create a mess for binary add-ons) when we could just check in the package name and version and have the build process do the rest?
</p>
<p>Finally, the approach of checking in node_modules doesn't really scale for us. We've got at least a dozen repos that will use restify, and it doesn't make sense to check that in everywhere when we could instead just specify which version each one is using. There's another principle at work here, which is <strong>separation of concerns</strong>: each repo specifies <em>what</em> it needs, while the build process figures out <em>where to get it</em>.
</p>
<h2>What if an author republishes an existing version of a package?</h2>
<p>We're not suggesting deploying a shrinkwrapped package directly and running "npm install" to install from shrinkwrap in production. We already have a build process to deal with binary modules and other automateable tasks. That's where we do the "npm install". We tar up the result and distribute the tarball. Since we test each build before shipping, we won't deploy something we didn't test.
</p>
<p>It's still possible to pick up newly published versions of existing packages at build time. We assume force publish is not that common in the first place, let alone force publish that breaks compatibility. If you're worried about this, you can use git SHAs in the shrinkwrap or even consider maintaining a mirror of the part of the npm registry that you use and require human confirmation before mirroring unpublishes.
</p>
<h2>Final thoughts</h2>
<p>Of course, the details of each use case matter a lot, and the world doesn't have to pick just one solution. If you like checking in node_modules, you should keep doing that. We've chosen the shrinkwrap route because that works better for us.
</p>
<p>It's not exactly news that Joyent is heavy on Node. Node is the heart of our SmartDataCenter (SDC) product, whose public-facing web portal, public API, Cloud Analytics, provisioning, billing, heartbeating, and other services are all implemented in Node. That's why it's so important to us to have robust components (like <ahref="https://github.com/trentm/node-bunyan">logging</a> and <ahref="http://mcavage.github.com/node-restify/">REST</a>) and tools for <ahref="http://dtrace.org/blogs/dap/2012/01/13/playing-with-nodev8-postmortem-debugging/">understanding production failures post mortem</a>, <ahref="http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/">profile Node apps in production</a>, and now managing Node dependencies. Again, we're interested to hear feedback from others using these tools.
</p>
<hr/>
Dave Pacheco blogs at <ahref="http://dtrace.org/blogs/dap/">dtrace.org</a>.
<p><ahref="#note1-top"name="note1-note">[1]</a> Much of this section is taken directly from the "npm shrinkwrap" documentation.
</p>
<p><ahref="#note2-top"name="note2-note">[2]</a> We've had a lot of trouble with checking in node_modules with binary dependencies. The first problem is figuring out exactly which files <em>not</em> to check in (<em>.o, </em>.node, <em>.dynlib, </em>.so, *.a, ...). When <ahref="https://twitter.com/#!/mcavage">Mark</a> went to apply this to one of our internal services, the "npm rebuild" step blew away half of the dependency tree because it ran "make clean", which in dependency <ahref="http://ldapjs.org/">ldapjs</a> brings the repo to a clean slate by blowing away its dependencies. Later, a new (but highly experienced) engineer on our team was tasked with fixing a bug in our Node-based DHCP server. To fix the bug, we went with a new dependency. He tried checking in node_modules, which added 190,000 lines of code (to this repo that was previously a few hundred LOC). And despite doing everything he could think of to do this correctly and test it properly, the change broke the build because of the binary modules. So having tried this approach a few times now, it appears quite difficult to get right, and as I pointed out above, the lack of actual documentation and real world examples suggests others either aren't using binary modules (which we know isn't true) or haven't had much better luck with this approach.
</p>
<p><ahref="#note3-top"name="note3-note">[3]</a> Like a good Node-based distributed system, our architecture uses lots of small HTTP servers. Each of these serves a REST API using <ahref="http://mcavage.github.com/node-restify/">restify</a>. restify uses the binary module <ahref="https://github.com/chrisa/node-dtrace-provider">node-dtrace-provider</a>, which gives each of our services <ahref="http://mcavage.github.com/node-restify/#DTrace">deep DTrace-based observability for free</a>. So literally almost all of our components are or will soon be depending on a binary add-on. Additionally, the foundation of <ahref="http://dtrace.org/blogs/dap/2011/03/01/welcome-to-cloud-analytics/">Cloud Analytics</a> are a pair of binary modules that extract data from <ahref="https://github.com/bcantrill/node-libdtrace">DTrace</a> and <ahref="https://github.com/bcantrill/node-kstat">kstat</a>. So this isn't a corner case for us, and we don't believe we're exceptional in this regard. The popular <ahref="https://github.com/pietern/hiredis-node">hiredis</a> package for interfacing with redis from Node is also a binary module.
</p>
<p>Managing dependencies is a fundamental problem in building complex software. The terrific success of github and <ahref="http://npmjs.org/">npm</a> have made code reuse especially easy in the Node world, where packages don't exist in isolation but rather as nodes in a large graph. The software is constantly changing (releasing new versions), and each package has its own constraints about what other packages it requires to run (dependencies). npm keeps track of these constraints, and authors express what kind of changes are compatible using <ahref="http://npmjs.org/doc/semver.html">semantic versioning</a>, allowing authors to specify that their package will work with even future versions of its dependencies as long as the semantic versions are assigned properly.
</p>
<p>This does mean that when you "npm install" a package with dependencies, there's no guarantee that you'll get the same set of code now that you would have gotten an hour ago, or that you would get if you were to run it again an hour later. You may get a bunch of bug fixes now that weren't available an hour ago. This is great during development, where you want to keep up with changes upstream. It's not necessarily what you want for deployment, though, where you want to validate whatever bits you're actually shipping.
</p>
<p>Put differently, <strong>it's understood that all software changes incur some risk, and it's critical to be able to manage this risk on your own terms</strong>. Taking that risk in development is good because by definition that's when you're incorporating and testing software changes. On the other hand, if you're shipping production software, you probably don't want to take this risk when cutting a release candidate (i.e. build time) or when you actually ship (i.e. deploy time) because you want to validate whatever you ship.
</p>
<p>You can address a simple case of this problem by only depending on specific versions of packages, allowing no semver flexibility at all, but this falls apart when you depend on packages that don't also adopt the same principle. Many of us at Joyent started wondering: can we generalize this approach?
</p>
<h2>Shrinkwrapping packages</h2>
<p>That brings us to <ahref="http://npmjs.org/doc/shrinkwrap.html">npm shrinkwrap</a><ahref="#note1-note"name="note1-top">[1]</a>:
</p>
<pre><code>NAME
npm-shrinkwrap -- Lock down dependency versions
SYNOPSIS
npm shrinkwrap
DESCRIPTION
This command locks down the versions of a package's dependencies so
that you can control exactly which versions of each dependency will
be used when your package is installed.</code></pre>
<p>Let's consider package A:
</p>
<pre><code>{
"name": "A",
"version": "0.1.0",
"dependencies": {
"B": "<0.1.0"
}
}</code></pre>
<p>package B:
</p>
<pre><code>{
"name": "B",
"version": "0.0.1",
"dependencies": {
"C": "<0.1.0"
}
}</code></pre>
<p>and package C:
</p>
<pre><code>{
"name": "C,
"version": "0.0.1"
}</code></pre>
<p>If these are the only versions of A, B, and C available in the registry, then a normal "npm install A" will install:
</p>
<pre><code>A@0.1.0
└─┬ B@0.0.1
└── C@0.0.1</code></pre>
<p>Then if B@0.0.2 is published, then a fresh "npm install A" will install:
</p>
<pre><code>A@0.1.0
└─┬ B@0.0.2
└── C@0.0.1</code></pre>
<p>assuming the new version did not modify B's dependencies. Of course, the new version of B could include a new version of C and any number of new dependencies. As we said before, if A's author doesn't want that, she could specify a dependency on B@0.0.1. But if A's author and B's author are not the same person, there's no way for A's author to say that she does not want to pull in newly published versions of C when B hasn't changed at all.
</p>
<p>In this case, A's author can use
</p>
<pre><code># npm shrinkwrap</code></pre>
<p>This generates npm-shrinkwrap.json, which will look something like this:
<p>The shrinkwrap command has locked down the dependencies based on what's currently installed in node_modules. <strong>When "npm install" installs a package with a npm-shrinkwrap.json file in the package root, the shrinkwrap file (rather than package.json files) completely drives the installation of that package and all of its dependencies (recursively).</strong> So now the author publishes A@0.1.0, and subsequent installs of this package will use B@0.0.1 and C@0.1.0, regardless the dependencies and versions listed in A's, B's, and C's package.json files. If the authors of B and C publish new versions, they won't be used to install A because the shrinkwrap refers to older versions. Even if you generate a new shrinkwrap, it will still reference the older versions, since "npm shrinkwrap" uses what's installed locally rather than what's available in the registry.
</p>
<h4>Using shrinkwrapped packages</h4>
<p>Using a shrinkwrapped package is no different than using any other package: you can "npm install" it by hand, or add a dependency to your package.json file and "npm install" it.
</p>
<h4>Building shrinkwrapped packages</h4>
<p>To shrinkwrap an existing package:
</p>
<ol>
<li>Run "npm install" in the package root to install the current versions of all dependencies.</li>
<li>Validate that the package works as expected with these versions.</li>
<li>Run "npm shrinkwrap", add npm-shrinkwrap.json to git, and publish your package.</li>
</ol>
<p>To add or update a dependency in a shrinkwrapped package:
</p>
<ol>
<li>Run "npm install" in the package root to install the current versions of all dependencies.</li>
<li>Add or update dependencies. "npm install" each new or updated package individually and then update package.json.</li>
<li>Validate that the package works as expected with the new dependencies.</li>
<li>Run "npm shrinkwrap", commit the new npm-shrinkwrap.json, and publish your package.</li>
</ol>
<p>You can still use <ahref="http://npmjs.org/doc/outdated.html">npm outdated(1)</a> to view which dependencies have newer versions available.
</p>
<p>For more details, check out the full docs on <ahref="http://npmjs.org/doc/shrinkwrap.html">npm shrinkwrap</a>, from which much of the above is taken.
</p>
<h2>Why not just check <code>node_modules</code> into git?</h2>
<p>One previously <ahref="http://www.mikealrogers.com/posts/nodemodules-in-git.html">proposed solution</a> is to "npm install" your dependencies during development and commit the results into source control. Then you deploy your app from a specific git SHA knowing you've got exactly the same bits that you tested in development. This does address the problem, but it has its own issues: for one, binaries are tricky because you need to "npm install" them to get their sources, but this builds the [system-dependent] binary too. You can avoid checking in the binaries and use "npm rebuild" at build time, but we've had a lot of difficulty trying to do this.<ahref="#note2-note"name="note2-top">[2]</a> At best, this is second-class treatment for binary modules, which are critical for many important types of Node applications.<ahref="#note3-note"name="note3-top">[3]</a>
</p>
<p>Besides the issues with binary modules, this approach just felt wrong to many of us. There's a reason we don't check binaries into source control, and it's not just because they're platform-dependent. (After all, we could build and check in binaries for all supported platforms and operating systems.) It's because that approach is error-prone and redundant: error-prone because it introduces a new human failure mode where someone checks in a source change but doesn't regenerate all the binaries, and redundant because the binaries can always be built from the sources alone. An important principle of software version control is that you don't check in files derived directly from other files by a simple transformation.<ahref="#note4-note"name="note4-top">[4]</a> Instead, you check in the original sources and automate the transformations via the build process.
</p>
<p>Dependencies are just like binaries in this regard: they're files derived from a simple transformation of something else that is (or could easily be) already available: the name and version of the dependency. Checking them in has all the same problems as checking in binaries: people could update package.json without updating the checked-in module (or vice versa). Besides that, adding new dependencies has to be done by hand, introducing more opportunities for error (checking in the wrong files, not checking in certain files, inadvertently changing files, and so on). Our feeling was: why check in this whole dependency tree (and create a mess for binary add-ons) when we could just check in the package name and version and have the build process do the rest?
</p>
<p>Finally, the approach of checking in node_modules doesn't really scale for us. We've got at least a dozen repos that will use restify, and it doesn't make sense to check that in everywhere when we could instead just specify which version each one is using. There's another principle at work here, which is <strong>separation of concerns</strong>: each repo specifies <em>what</em> it needs, while the build process figures out <em>where to get it</em>.
</p>
<h2>What if an author republishes an existing version of a package?</h2>
<p>We're not suggesting deploying a shrinkwrapped package directly and running "npm install" to install from shrinkwrap in production. We already have a build process to deal with binary modules and other automateable tasks. That's where we do the "npm install". We tar up the result and distribute the tarball. Since we test each build before shipping, we won't deploy something we didn't test.
</p>
<p>It's still possible to pick up newly published versions of existing packages at build time. We assume force publish is not that common in the first place, let alone force publish that breaks compatibility. If you're worried about this, you can use git SHAs in the shrinkwrap or even consider maintaining a mirror of the part of the npm registry that you use and require human confirmation before mirroring unpublishes.
</p>
<h2>Final thoughts</h2>
<p>Of course, the details of each use case matter a lot, and the world doesn't have to pick just one solution. If you like checking in node_modules, you should keep doing that. We've chosen the shrinkwrap route because that works better for us.
</p>
<p>It's not exactly news that Joyent is heavy on Node. Node is the heart of our SmartDataCenter (SDC) product, whose public-facing web portal, public API, Cloud Analytics, provisioning, billing, heartbeating, and other services are all implemented in Node. That's why it's so important to us to have robust components (like <ahref="https://github.com/trentm/node-bunyan">logging</a> and <ahref="http://mcavage.github.com/node-restify/">REST</a>) and tools for <ahref="http://dtrace.org/blogs/dap/2012/01/13/playing-with-nodev8-postmortem-debugging/">understanding production failures post mortem</a>, <ahref="http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/">profile Node apps in production</a>, and now managing Node dependencies. Again, we're interested to hear feedback from others using these tools.
</p>
<hr/>
Dave Pacheco blogs at <ahref="http://dtrace.org/blogs/dap/">dtrace.org</a>.
<p><ahref="#note1-top"name="note1-note">[1]</a> Much of this section is taken directly from the "npm shrinkwrap" documentation.
</p>
<p><ahref="#note2-top"name="note2-note">[2]</a> We've had a lot of trouble with checking in node_modules with binary dependencies. The first problem is figuring out exactly which files <em>not</em> to check in (<em>.o, </em>.node, <em>.dynlib, </em>.so, *.a, ...). When <ahref="https://twitter.com/#!/mcavage">Mark</a> went to apply this to one of our internal services, the "npm rebuild" step blew away half of the dependency tree because it ran "make clean", which in dependency <ahref="http://ldapjs.org/">ldapjs</a> brings the repo to a clean slate by blowing away its dependencies. Later, a new (but highly experienced) engineer on our team was tasked with fixing a bug in our Node-based DHCP server. To fix the bug, we went with a new dependency. He tried checking in node_modules, which added 190,000 lines of code (to this repo that was previously a few hundred LOC). And despite doing everything he could think of to do this correctly and test it properly, the change broke the build because of the binary modules. So having tried this approach a few times now, it appears quite difficult to get right, and as I pointed out above, the lack of actual documentation and real world examples suggests others either aren't using binary modules (which we know isn't true) or haven't had much better luck with this approach.
</p>
<p><ahref="#note3-top"name="note3-note">[3]</a> Like a good Node-based distributed system, our architecture uses lots of small HTTP servers. Each of these serves a REST API using <ahref="http://mcavage.github.com/node-restify/">restify</a>. restify uses the binary module <ahref="https://github.com/chrisa/node-dtrace-provider">node-dtrace-provider</a>, which gives each of our services <ahref="http://mcavage.github.com/node-restify/#DTrace">deep DTrace-based observability for free</a>. So literally almost all of our components are or will soon be depending on a binary add-on. Additionally, the foundation of <ahref="http://dtrace.org/blogs/dap/2011/03/01/welcome-to-cloud-analytics/">Cloud Analytics</a> are a pair of binary modules that extract data from <ahref="https://github.com/bcantrill/node-libdtrace">DTrace</a> and <ahref="https://github.com/bcantrill/node-kstat">kstat</a>. So this isn't a corner case for us, and we don't believe we're exceptional in this regard. The popular <ahref="https://github.com/pietern/hiredis-node">hiredis</a> package for interfacing with redis from Node is also a binary module.
</p>
<p><ahref="#note4-top"name="note4-note">[4]</a> Note that I said this is an important principle for <em>software version control</em>, not using git in general. People use git for lots of things where checking in binaries and other derived files is probably fine. Also, I'm not interested in proselytizing; if you want to do this for software version control too, go ahead. But don't do it out of ignorance of existing successful software engineering practices.</p>
<p><i>npm 1.0 is in release candidate mode. <ahref="http://groups.google.com/group/npm-/browse_thread/thread/43d3e76d71d1f141">Go get it!</a></i></p>
<p>More than anything else, the driving force behind the npm 1.0 rearchitecture was the desire to simplify what a package installation directory structure looks like.</p>
<p>In npm 0.x, there was a command called <code>bundle</code> that a lot of people liked. <code>bundle</code> let you install your dependencies locally in your project, but even still, it was basically a hack that never really worked very reliably.</p>
<p>Also, there was that activation/deactivation thing. That’s confusing.</p>
<h2>Two paths</h2>
<p>In npm 1.0, there are two ways to install things:</p>
<ol><li>globally —- This drops modules in <code>{prefix}/lib/node_modules</code>, and puts executable files in <code>{prefix}/bin</code>, where <code>{prefix}</code> is usually something like <code>/usr/local</code>. It also installs man pages in <code>{prefix}/share/man</code>, if they’re supplied.</li><li>locally —- This installs your package in the current working directory. Node modules go in <code>./node_modules</code>, executables go in <code>./node_modules/.bin/</code>, and man pages aren’t installed at all.</li></ol>
<h2>Which to choose</h2>
<p>Whether to install a package globally or locally depends on the <code>global</code> config, which is aliased to the <code>-g</code> command line switch.</p>
<p>Just like how global variables are kind of gross, but also necessary in some cases, global packages are important, but best avoided if not needed.</p>
<p>In general, the rule of thumb is:</p>
<ol><li>If you’re installing something that you want to use <em>in</em> your program, using <code>require('whatever')</code>, then install it locally, at the root of your project.</li><li>If you’re installing something that you want to use in your <em>shell</em>, on the command line or something, install it globally, so that its binaries end up in your <code>PATH</code> environment variable.</li></ol>
<h2>When you can't choose</h2>
<p>Of course, there are some cases where you want to do both. <ahref="http://coffeescript.org/">Coffee-script</a> and <ahref="http://expressjs.com/">Express</a> both are good examples of apps that have a command line interface, as well as a library. In those cases, you can do one of the following:</p>
<ol><li>Install it in both places. Seriously, are you that short on disk space? It’s fine, really. They’re tiny JavaScript programs.</li><li>Install it globally, and then <code>npm link coffee-script</code> or <code>npm link express</code> (if you’re on a platform that supports symbolic links.) Then you only need to update the global copy to update all the symlinks as well.</li></ol>
<p>The first option is the best in my opinion. Simple, clear, explicit. The second is really handy if you are going to re-use the same library in a bunch of different projects. (More on <code>npm link</code> in a future installment.)</p>
<p>You can probably think of other ways to do it by messing with environment variables. But I don’t recommend those ways. Go with the grain.</p>
<h2id="slight_exception_it8217s_not_always_the_cwd">Slight exception: It’s not always the cwd.</h2>
<p>Let’s say you do something like this:</p>
<prestyle="background:#333!important;color:#ccc!important;overflow:auto!important;padding:2px!important;"><code>cd ~/projects/foo # go into my project
npm install express # ./node_modules/express
cd lib/utils # move around in there
vim some-thing.js # edit some stuff, work work work
<p>In this case, npm will install <code>redis</code> into <code>~/projects/foo/node_modules/redis</code>. Sort of like how git will work anywhere within a git repository, npm will work anywhere within a package, defined by having a <code>node_modules</code> folder.</p>
<h2>Test runners and stuff</h2>
<p>If your package's <code>scripts.test</code> command uses a command-line program installed by one of your dependencies, not to worry. npm makes <code>./node_modules/.bin</code> the first entry in the <code>PATH</code> environment variable when running any lifecycle scripts, so this will work fine, even if your program is not globally installed:
<p><i>npm 1.0 is in release candidate mode. <ahref="http://groups.google.com/group/npm-/browse_thread/thread/43d3e76d71d1f141">Go get it!</a></i></p>
<p>More than anything else, the driving force behind the npm 1.0 rearchitecture was the desire to simplify what a package installation directory structure looks like.</p>
<p>In npm 0.x, there was a command called <code>bundle</code> that a lot of people liked. <code>bundle</code> let you install your dependencies locally in your project, but even still, it was basically a hack that never really worked very reliably.</p>
<p>Also, there was that activation/deactivation thing. That’s confusing.</p>
<h2>Two paths</h2>
<p>In npm 1.0, there are two ways to install things:</p>
<ol><li>globally —- This drops modules in <code>{prefix}/lib/node_modules</code>, and puts executable files in <code>{prefix}/bin</code>, where <code>{prefix}</code> is usually something like <code>/usr/local</code>. It also installs man pages in <code>{prefix}/share/man</code>, if they’re supplied.</li><li>locally —- This installs your package in the current working directory. Node modules go in <code>./node_modules</code>, executables go in <code>./node_modules/.bin/</code>, and man pages aren’t installed at all.</li></ol>
<h2>Which to choose</h2>
<p>Whether to install a package globally or locally depends on the <code>global</code> config, which is aliased to the <code>-g</code> command line switch.</p>
<p>Just like how global variables are kind of gross, but also necessary in some cases, global packages are important, but best avoided if not needed.</p>
<p>In general, the rule of thumb is:</p>
<ol><li>If you’re installing something that you want to use <em>in</em> your program, using <code>require('whatever')</code>, then install it locally, at the root of your project.</li><li>If you’re installing something that you want to use in your <em>shell</em>, on the command line or something, install it globally, so that its binaries end up in your <code>PATH</code> environment variable.</li></ol>
<h2>When you can't choose</h2>
<p>Of course, there are some cases where you want to do both. <ahref="http://coffeescript.org/">Coffee-script</a> and <ahref="http://expressjs.com/">Express</a> both are good examples of apps that have a command line interface, as well as a library. In those cases, you can do one of the following:</p>
<ol><li>Install it in both places. Seriously, are you that short on disk space? It’s fine, really. They’re tiny JavaScript programs.</li><li>Install it globally, and then <code>npm link coffee-script</code> or <code>npm link express</code> (if you’re on a platform that supports symbolic links.) Then you only need to update the global copy to update all the symlinks as well.</li></ol>
<p>The first option is the best in my opinion. Simple, clear, explicit. The second is really handy if you are going to re-use the same library in a bunch of different projects. (More on <code>npm link</code> in a future installment.)</p>
<p>You can probably think of other ways to do it by messing with environment variables. But I don’t recommend those ways. Go with the grain.</p>
<h2id="slight_exception_it8217s_not_always_the_cwd">Slight exception: It’s not always the cwd.</h2>
<p>Let’s say you do something like this:</p>
<prestyle="background:#333!important;color:#ccc!important;overflow:auto!important;padding:2px!important;"><code>cd ~/projects/foo # go into my project
npm install express # ./node_modules/express
cd lib/utils # move around in there
vim some-thing.js # edit some stuff, work work work
<p>In this case, npm will install <code>redis</code> into <code>~/projects/foo/node_modules/redis</code>. Sort of like how git will work anywhere within a git repository, npm will work anywhere within a package, defined by having a <code>node_modules</code> folder.</p>
<h2>Test runners and stuff</h2>
<p>If your package's <code>scripts.test</code> command uses a command-line program installed by one of your dependencies, not to worry. npm makes <code>./node_modules/.bin</code> the first entry in the <code>PATH</code> environment variable when running any lifecycle scripts, so this will work fine, even if your program is not globally installed:
<p><i>npm 1.0 is in release candidate mode. <ahref="http://groups.google.com/group/npm-/browse_thread/thread/43d3e76d71d1f141">Go get it!</a></i></p>
<p>In npm 0.x, there was a command called <code>link</code>. With it, you could “link-install” a package so that changes would be reflected in real-time. This is especially handy when you’re actually building something. You could make a few changes, run the command again, and voila, your new code would be run without having to re-install every time.</p>
<p>Of course, compiled modules still have to be rebuilt. That’s not ideal, but it’s a problem that will take more powerful magic to solve.</p>
<p>In npm 0.x, this was a pretty awful kludge. Back then, every package existed in some folder like:</p>
<p>It was easy enough to point that symlink to a different location. However, since the <em>package.json file could change</em>, that meant that the connection between the version and the folder was not reliable.</p>
<p>At first, this was just sort of something that we dealt with by saying, “Relink if you change the version.” However, as more and more edge cases arose, eventually the solution was to give link packages this fakey version of “9999.0.0-LINK-hash” so that npm knew it was an imposter. Sometimes the package was treated as if it had the 9999.0.0 version, and other times it was treated as if it had the version specified in the package.json.</p>
<h2id="a_better_way">A better way</h2>
<p>For npm 1.0, we backed up and looked at what the actual use cases were. Most of the time when you link something you want one of the following:</p>
<ol>
<li>globally install this package I’m working on so that I can run the command it creates and test its stuff as I work on it.</li>
<li>locally install my thing into some <em>other</em> thing that depends on it, so that the other thing can <code>require()</code> it.</li>
</ol>
<p>And, in both cases, changes should be immediately apparent and not require any re-linking.</p>
<p><em>Also</em>, there’s a third use case that I didn’t really appreciate until I started writing more programs that had more dependencies:</p>
<olstart="3"><li><p>Globally install something, and use it in development in a bunch of projects, and then update them all at once so that they all use the latest version. </ol>
<p>Really, the second case above is a special-case of this third case.</p>
<p>The first step is to link your local project into the global install space. (See <ahref="http://blog.nodejs.org/2011/03/23/npm-1-0-global-vs-local-installation/">global vs local installation</a> for more on this global/local business.)</p>
<p>I do this as I’m developing node projects (including npm itself).</p>
<pre><code>cd ~/dev/js/node-tap # go into the project dir
npm link # create symlinks into {prefix}
</code></pre>
<p>Because of how I have my computer set up, with <code>/usr/local</code> as my install prefix, I end up with a symlink from <code>/usr/local/lib/node_modules/tap</code> pointing to <code>~/dev/js/node-tap</code>, and the executable linked to <code>/usr/local/bin/tap</code>.</p>
<p>Of course, if you <ahref="http://blog.nodejs.org/2011/04/04/development-environment/">set your paths differently</a>, then you’ll have different results. (That’s why I tend to talk in terms of <code>prefix</code> rather than <code>/usr/local</code>.)</p>
<h2id="link_global_local">Link global → local</h2>
<p>When you want to link the globally-installed package into your local development folder, you run <code>npm link pkg</code> where <code>pkg</code> is the name of the package that you want to install.</p>
<p>For example, let’s say that I wanted to write some tap tests for my node-glob package. I’d <em>first</em> do the steps above to link tap into the global install space, and <em>then</em> I’d do this:</p>
<pre><code>cd ~/dev/js/node-glob # go to the project that uses the thing.
npm link tap # link the global thing into my project.
</code></pre>
<p>Now when I make changes in <code>~/dev/js/node-tap</code>, they’ll be immediately reflected in <code>~/dev/js/node-glob/node_modules/tap</code>.</p>
<h2id="link_to_stuff_you_don8217t_build">Link to stuff you <em>don’t</em> build</h2>
<p>Let’s say I have 15 sites that all use express. I want the benefits of local development, but I also want to be able to update all my dev folders at once. You can globally install express, and then link it into your local development folder.</p>
npm link express # link the global express into ./node_modules
cd ~/dev/js/photo-site # other project folder
npm link express # link express into here, as well
# time passes
# TJ releases some new stuff.
# you want this new stuff.
npm update express -g # update the global install.
# this also updates my project folders.
</code></pre>
<h2id="caveat_not_for_real_servers">Caveat: Not For Real Servers</h2>
<p>npm link is a development tool. It’s <em>awesome</em> for managing packages on your local development box. But deploying with npm link is basically asking for problems, since it makes it super easy to update things without realizing it.</p>
<p>I highly doubt that a native Windows node will ever have comparable symbolic link support to what Unix systems provide. I know that there are junctions and such, and I've heard legends about symbolic links on Windows 7.</p>
<p>When there is a native windows port of Node, if that native windows port has `fs.symlink` and `fs.readlink` support that is exactly identical to the way that they work on Unix, then this should work fine.</p>
<p>But I wouldn't hold my breath. Any bugs about this not working on a native Windows system (ie, not Cygwin) will most likely be closed with <code>wontfix</code>.</p>
<h2id="aside_credit_where_credit8217s_due">Aside: Credit where Credit’s Due</h2>
<p>Back before the Great Package Management Wars of Node 0.1, before npm or kiwi or mode or seed.js could do much of anything, and certainly before any of them had more than 2 users, Mikeal Rogers invited me to the Couch.io offices for lunch to talk about this npm registry thingie I’d mentioned wanting to build. (That is, to convince me to use CouchDB for it.)</p>
<p>Since he was volunteering to build the first version of it, and since couch is pretty much the ideal candidate for this use-case, it was an easy sell.</p>
<p>While I was there, he said, “Look. You need to be able to link a project directory as if it was installed as a package, and then have it all Just Work. Can you do that?”</p>
<p>I was like, “Well, I don’t know… I mean, there’s these edge cases, and it doesn’t really fit with the existing folder structure very well…”</p>
<p>“Dude. Either you do it, or I’m going to have to do it, and then there’ll be <em>another</em> package manager in node, instead of writing a registry for npm, and it won’t be as good anyway. Don’t be python.”</p>
<p><i>npm 1.0 is in release candidate mode. <ahref="http://groups.google.com/group/npm-/browse_thread/thread/43d3e76d71d1f141">Go get it!</a></i></p>
<p>In npm 0.x, there was a command called <code>link</code>. With it, you could “link-install” a package so that changes would be reflected in real-time. This is especially handy when you’re actually building something. You could make a few changes, run the command again, and voila, your new code would be run without having to re-install every time.</p>
<p>Of course, compiled modules still have to be rebuilt. That’s not ideal, but it’s a problem that will take more powerful magic to solve.</p>
<p>In npm 0.x, this was a pretty awful kludge. Back then, every package existed in some folder like:</p>
<p>It was easy enough to point that symlink to a different location. However, since the <em>package.json file could change</em>, that meant that the connection between the version and the folder was not reliable.</p>
<p>At first, this was just sort of something that we dealt with by saying, “Relink if you change the version.” However, as more and more edge cases arose, eventually the solution was to give link packages this fakey version of “9999.0.0-LINK-hash” so that npm knew it was an imposter. Sometimes the package was treated as if it had the 9999.0.0 version, and other times it was treated as if it had the version specified in the package.json.</p>
<h2id="a_better_way">A better way</h2>
<p>For npm 1.0, we backed up and looked at what the actual use cases were. Most of the time when you link something you want one of the following:</p>
<ol>
<li>globally install this package I’m working on so that I can run the command it creates and test its stuff as I work on it.</li>
<li>locally install my thing into some <em>other</em> thing that depends on it, so that the other thing can <code>require()</code> it.</li>
</ol>
<p>And, in both cases, changes should be immediately apparent and not require any re-linking.</p>
<p><em>Also</em>, there’s a third use case that I didn’t really appreciate until I started writing more programs that had more dependencies:</p>
<olstart="3"><li><p>Globally install something, and use it in development in a bunch of projects, and then update them all at once so that they all use the latest version. </ol>
<p>Really, the second case above is a special-case of this third case.</p>
<p>The first step is to link your local project into the global install space. (See <ahref="http://blog.nodejs.org/2011/03/23/npm-1-0-global-vs-local-installation/">global vs local installation</a> for more on this global/local business.)</p>
<p>I do this as I’m developing node projects (including npm itself).</p>
<pre><code>cd ~/dev/js/node-tap # go into the project dir
npm link # create symlinks into {prefix}
</code></pre>
<p>Because of how I have my computer set up, with <code>/usr/local</code> as my install prefix, I end up with a symlink from <code>/usr/local/lib/node_modules/tap</code> pointing to <code>~/dev/js/node-tap</code>, and the executable linked to <code>/usr/local/bin/tap</code>.</p>
<p>Of course, if you <ahref="http://blog.nodejs.org/2011/04/04/development-environment/">set your paths differently</a>, then you’ll have different results. (That’s why I tend to talk in terms of <code>prefix</code> rather than <code>/usr/local</code>.)</p>
<h2id="link_global_local">Link global → local</h2>
<p>When you want to link the globally-installed package into your local development folder, you run <code>npm link pkg</code> where <code>pkg</code> is the name of the package that you want to install.</p>
<p>For example, let’s say that I wanted to write some tap tests for my node-glob package. I’d <em>first</em> do the steps above to link tap into the global install space, and <em>then</em> I’d do this:</p>
<pre><code>cd ~/dev/js/node-glob # go to the project that uses the thing.
npm link tap # link the global thing into my project.
</code></pre>
<p>Now when I make changes in <code>~/dev/js/node-tap</code>, they’ll be immediately reflected in <code>~/dev/js/node-glob/node_modules/tap</code>.</p>
<h2id="link_to_stuff_you_don8217t_build">Link to stuff you <em>don’t</em> build</h2>
<p>Let’s say I have 15 sites that all use express. I want the benefits of local development, but I also want to be able to update all my dev folders at once. You can globally install express, and then link it into your local development folder.</p>
npm link express # link the global express into ./node_modules
cd ~/dev/js/photo-site # other project folder
npm link express # link express into here, as well
# time passes
# TJ releases some new stuff.
# you want this new stuff.
npm update express -g # update the global install.
# this also updates my project folders.
</code></pre>
<h2id="caveat_not_for_real_servers">Caveat: Not For Real Servers</h2>
<p>npm link is a development tool. It’s <em>awesome</em> for managing packages on your local development box. But deploying with npm link is basically asking for problems, since it makes it super easy to update things without realizing it.</p>
<p>I highly doubt that a native Windows node will ever have comparable symbolic link support to what Unix systems provide. I know that there are junctions and such, and I've heard legends about symbolic links on Windows 7.</p>
<p>When there is a native windows port of Node, if that native windows port has `fs.symlink` and `fs.readlink` support that is exactly identical to the way that they work on Unix, then this should work fine.</p>
<p>But I wouldn't hold my breath. Any bugs about this not working on a native Windows system (ie, not Cygwin) will most likely be closed with <code>wontfix</code>.</p>
<h2id="aside_credit_where_credit8217s_due">Aside: Credit where Credit’s Due</h2>
<p>Back before the Great Package Management Wars of Node 0.1, before npm or kiwi or mode or seed.js could do much of anything, and certainly before any of them had more than 2 users, Mikeal Rogers invited me to the Couch.io offices for lunch to talk about this npm registry thingie I’d mentioned wanting to build. (That is, to convince me to use CouchDB for it.)</p>
<p>Since he was volunteering to build the first version of it, and since couch is pretty much the ideal candidate for this use-case, it was an easy sell.</p>
<p>While I was there, he said, “Look. You need to be able to link a project directory as if it was installed as a package, and then have it all Just Work. Can you do that?”</p>
<p>I was like, “Well, I don’t know… I mean, there’s these edge cases, and it doesn’t really fit with the existing folder structure very well…”</p>
<p>“Dude. Either you do it, or I’m going to have to do it, and then there’ll be <em>another</em> package manager in node, instead of writing a registry for npm, and it won’t be as good anyway. Don’t be python.”</p>
<p>npm 1.0 has been released. Here are the highlights:</p>
<ul><li><ahref="http://blog.nodejs.org/2011/03/23/npm-1-0-global-vs-local-installation/">Global vs local installation</a></li><li><ahref="http://blog.nodejs.org/2011/03/17/npm-1-0-the-new-ls/">ls displays a tree</a>, instead of being a remote search</li><li>No more “activation” concept - dependencies are nested</li><li><ahref="http://blog.nodejs.org/2011/04/06/npm-1-0-link/">Updates to link command</a></li><li>Install script cleans up any 0.x cruft it finds. (That is, it removes old packages, so that they can be installed properly.)</li><li>Simplified “search” command. One line per package, rather than one line per version.</li><li>Renovated “completion” approach</li><li>More help topics</li><li>Simplified folder structure</li></ul>
<p>The focus is on npm being a development tool, rather than an apt-wannabe.</p>
<h2id="installing_it">Installing it</h2>
<p>To get the new version, run this command:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>curl http://npmjs.org/install.sh | sh </code></pre>
<p>This will prompt to ask you if it’s ok to remove all the old 0.x cruft. If you want to not be asked, then do this:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>curl http://npmjs.org/install.sh | clean=yes sh </code></pre>
<p>Or, if you want to not do the cleanup, and leave the old stuff behind, then do this:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>curl http://npmjs.org/install.sh | clean=no sh </code></pre>
<p>A lot of people in the node community were brave testers and helped make this release a lot better (and swifter) than it would have otherwise been. Thanks :)</p>
<h2id="code_freeze">Code Freeze</h2>
<p>npm will not have any major feature enhancements or architectural changes <spanstyle="border-bottom:1px dotted;cursor:default;"title="That is, the freeze ends no sooner than November 1, 2011">for at least 6 months</span>. There are interesting developments planned that leverage npm in some ways, but it’s time to let the client itself settle. Also, I want to focus attention on some other problems for a little while.</p>
<p>Of course, <ahref="https://github.com/isaacs/npm/issues">bug reports</a> are always welcome.</p>
<p>npm 1.0 has been released. Here are the highlights:</p>
<ul><li><ahref="http://blog.nodejs.org/2011/03/23/npm-1-0-global-vs-local-installation/">Global vs local installation</a></li><li><ahref="http://blog.nodejs.org/2011/03/17/npm-1-0-the-new-ls/">ls displays a tree</a>, instead of being a remote search</li><li>No more “activation” concept - dependencies are nested</li><li><ahref="http://blog.nodejs.org/2011/04/06/npm-1-0-link/">Updates to link command</a></li><li>Install script cleans up any 0.x cruft it finds. (That is, it removes old packages, so that they can be installed properly.)</li><li>Simplified “search” command. One line per package, rather than one line per version.</li><li>Renovated “completion” approach</li><li>More help topics</li><li>Simplified folder structure</li></ul>
<p>The focus is on npm being a development tool, rather than an apt-wannabe.</p>
<h2id="installing_it">Installing it</h2>
<p>To get the new version, run this command:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>curl http://npmjs.org/install.sh | sh </code></pre>
<p>This will prompt to ask you if it’s ok to remove all the old 0.x cruft. If you want to not be asked, then do this:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>curl http://npmjs.org/install.sh | clean=yes sh </code></pre>
<p>Or, if you want to not do the cleanup, and leave the old stuff behind, then do this:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>curl http://npmjs.org/install.sh | clean=no sh </code></pre>
<p>A lot of people in the node community were brave testers and helped make this release a lot better (and swifter) than it would have otherwise been. Thanks :)</p>
<h2id="code_freeze">Code Freeze</h2>
<p>npm will not have any major feature enhancements or architectural changes <spanstyle="border-bottom:1px dotted;cursor:default;"title="That is, the freeze ends no sooner than November 1, 2011">for at least 6 months</span>. There are interesting developments planned that leverage npm in some ways, but it’s time to let the client itself settle. Also, I want to focus attention on some other problems for a little while.</p>
<p>Of course, <ahref="https://github.com/isaacs/npm/issues">bug reports</a> are always welcome.</p>
<p><em>This is the first in a series of hopefully more than 1 posts, each detailing some aspect of npm 1.0.</em></p>
<p>In npm 0.x, the <code>ls</code> command was a combination of both searching the registry as well as reporting on what you have installed.</p>
<p>As the registry has grown in size, this has gotten unwieldy. Also, since npm 1.0 manages dependencies differently, nesting them in <code>node_modules</code> folder and installing locally by default, there are different things that you want to view.</p>
<p>The functionality of the <code>ls</code> command was split into two different parts. <code>search</code> is now the way to find things on the registry (and it only reports one line per package, instead of one line per version), and <code>ls</code> shows a tree view of the packages that are installed locally.</p>
<p>Here’s an example of the output:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>$ npm ls
<p>This is after I’ve done <code>npm install semver ronn express</code> in the npm source directory. Since express isn’t actually a dependency of npm, it shows up with that “extraneous” marker.</p>
<p>Let’s see what happens when we create a broken situation:</p>
<p>Tree views are great for human readability, but some times you want to pipe that stuff to another program. For that output, I took the same datastructure, but instead of building up a treeview string for each line, it spits out just the folders like this:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>$ npm ls -p
<p><em>This is the first in a series of hopefully more than 1 posts, each detailing some aspect of npm 1.0.</em></p>
<p>In npm 0.x, the <code>ls</code> command was a combination of both searching the registry as well as reporting on what you have installed.</p>
<p>As the registry has grown in size, this has gotten unwieldy. Also, since npm 1.0 manages dependencies differently, nesting them in <code>node_modules</code> folder and installing locally by default, there are different things that you want to view.</p>
<p>The functionality of the <code>ls</code> command was split into two different parts. <code>search</code> is now the way to find things on the registry (and it only reports one line per package, instead of one line per version), and <code>ls</code> shows a tree view of the packages that are installed locally.</p>
<p>Here’s an example of the output:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>$ npm ls
<p>This is after I’ve done <code>npm install semver ronn express</code> in the npm source directory. Since express isn’t actually a dependency of npm, it shows up with that “extraneous” marker.</p>
<p>Let’s see what happens when we create a broken situation:</p>
<p>Tree views are great for human readability, but some times you want to pipe that stuff to another program. For that output, I took the same datastructure, but instead of building up a treeview string for each line, it spits out just the folders like this:</p>
<prestyle="background:#333;color:#ccc;overflow:auto;padding:2px;"><code>$ npm ls -p
<li>#572 Don't print result of --eval in CLI (Ben Noordhuis)
<li>#1223 Fix http.ClientRequest crashes if end() was called twice (koichik)
<li>#1383 Emit 'close' after all connections have closed (Felix Geisendörfer)
<li>Add sprintf-like util.format() function (Ben Noordhuis)
<li>Add support for TLS SNI (Fedor Indutny)
<li>New http agent implementation. Off by default the command line flag <code>--use-http2</code> will enable it. <code>make test-http2</code> will run the tests for the new implementation. (Mikeal Rogers)
<li>#572 Don't print result of --eval in CLI (Ben Noordhuis)
<li>#1223 Fix http.ClientRequest crashes if end() was called twice (koichik)
<li>#1383 Emit 'close' after all connections have closed (Felix Geisendörfer)
<li>Add sprintf-like util.format() function (Ben Noordhuis)
<li>Add support for TLS SNI (Fedor Indutny)
<li>New http agent implementation. Off by default the command line flag <code>--use-http2</code> will enable it. <code>make test-http2</code> will run the tests for the new implementation. (Mikeal Rogers)
<b>Update:</b> The <code>.exe</code> has a bug that results in incompatibility with Windows XP and Server 2003. This has been reported in <ahref="https://github.com/joyent/node/issues/1592">issue #1592</a> and fixed. A new binary was made that is compatibile with the older Windows: <ahref="http://nodejs.org/dist/v0.5.5/node-186364e.exe">http://nodejs.org/dist/v0.5.5/node-186364e.exe</a>.
<li>#1586 Make socket write encoding case-insensitive (Koichi Kobayashi)</li>
<li>#1591, #1656, #1657 Implement fs in libuv, remove libeio and pthread-win32 dependency on windows (Igor Zinkovsky, Ben Noordhuis, Ryan Dahl, Isaac Schlueter)</li>
<li>#1592 Don't load-time link against CreateSymbolicLink on windows (Peter Bright)</li>
<li>#1601 Improve API consistency when dealing with the socket underlying a HTTP client request (Mikeal Rogers)</li>
<li>#1610 Remove DigiNotar CA from trusted list (Isaac Schlueter)</li>
<li>#1617 Added some win32 os functions (Karl Skomski)</li>
<li>#1624 avoid buffer overrun with 'binary' encoding (Koichi Kobayashi)</li>
<li>#1633 make Buffer.write() always set _charsWritten (Koichi Kobayashi)</li>
<li>#1644 Windows: set executables to be console programs (Peter Bright)</li>
<li>#1651 improve inspection for sparse array (Koichi Kobayashi)</li>
<li>#1672 set .code='ECONNRESET' on socket hang up errors (Ben Noordhuis)</li>
<li>Add test case for foaf+ssl client certificate (Niclas Hoyer)</li>
<li>#1586 Make socket write encoding case-insensitive (Koichi Kobayashi)</li>
<li>#1591, #1656, #1657 Implement fs in libuv, remove libeio and pthread-win32 dependency on windows (Igor Zinkovsky, Ben Noordhuis, Ryan Dahl, Isaac Schlueter)</li>
<li>#1592 Don't load-time link against CreateSymbolicLink on windows (Peter Bright)</li>
<li>#1601 Improve API consistency when dealing with the socket underlying a HTTP client request (Mikeal Rogers)</li>
<li>#1610 Remove DigiNotar CA from trusted list (Isaac Schlueter)</li>
<li>#1617 Added some win32 os functions (Karl Skomski)</li>
<li>#1624 avoid buffer overrun with 'binary' encoding (Koichi Kobayashi)</li>
<li>#1633 make Buffer.write() always set _charsWritten (Koichi Kobayashi)</li>
<li>#1644 Windows: set executables to be console programs (Peter Bright)</li>
<li>#1651 improve inspection for sparse array (Koichi Kobayashi)</li>
<li>#1672 set .code='ECONNRESET' on socket hang up errors (Ben Noordhuis)</li>
<li>Add test case for foaf+ssl client certificate (Niclas Hoyer)</li>
We are happy to announce the third stable branch of Node v0.6. We will be freezing JavaScript, C++, and binary interfaces for all v0.6 releases.
The major differences between v0.4 and v0.6 are<ul>
<li>Native Windows support using I/O Completion Ports for sockets.
<li>Integrated load balancing over multiple processes. <ahref="http://nodejs.org/docs/v0.6.0/api/cluster.html">docs</a>
<li>Better support for IPC between Node instances <ahref="http://nodejs.org/docs/v0.6.0/api/child_processes.html#child_process.fork">docs</a>
<li>Improved command line debugger <ahref="http://nodejs.org/docs/v0.6.0/api/debugger.html">docs</a>
<li>Built-in binding to zlib for compression <ahref="http://nodejs.org/docs/v0.6.0/api/zlib.html">docs</a>
<li>Upgrade v8 from 3.1 to 3.6</ul>
In order to support Windows we reworked much of the core architecture. There was some fear that our work would degrade performance on UNIX systems but this was not the case. Here is a Linux system we benched for demonstration:
Bigger is better in http and io benchmarks, smaller is better in startup. The http benchmark was done with 600 clients on a 10GE network served from three load generation machines.
In the last version of Node, v0.4, we could only run Node on Windows with Cygwin. Therefore we've gotten massive improvements by targeting the native APIs. Benchmarks on the same machine:
We consider this a good intermediate stage for the Windows port. There is still work to be done. For example, we are not yet providing users with a blessed path for building addon modules in MS Visual Studio. Work will continue in later releases.
For users upgrading code bases from v0.4 to v0.6 <ahref="https://github.com/joyent/node/wiki/API-changes-between-v0.4-and-v0.6">we've documented</a> most of the issues that you will run into. Most people find the change painless. Despite the long list of changes most core APIs remain untouched.
Our release cycle will be tightened dramatically now. Expect to see a new stable branch in January. We wish to eventually have our releases in sync with Chrome and V8's 6 week cycle.
Thank you to everyone who contributed code, tests, docs, or sent in bug reports.
Here are the changes between v0.5.12 and v0.6.0:
2011.11.04, Version 0.6.0 (stable)
<ul><li>print undefined on undefined values in REPL (Nathan Rajlich)</li>
<li>doc improvements (koichik, seebees, bnoordhuis, Maciej Małecki, Jacob Kragh)</li>
<li>support native addon loading in windows (Bert Belder)</li>
<li>rename getNetworkInterfaces() to networkInterfaces() (bnoordhuis)</li>
<li>add pending accepts knob for windows (igorzi)</li>
<li>http.request(url.parse(x)) (seebees)</li>
<li>#1929 zlib Respond to 'resume' events properly (isaacs)</li>
<li>stream.pipe: Remove resume and pause events</li>
<li>test fixes for windows (igorzi)</li>
<li>build system improvements (bnoordhuis)</li>
<li>#1936 tls: does not emit 'end' from EncryptedStream (koichik)</li>
We are happy to announce the third stable branch of Node v0.6. We will be freezing JavaScript, C++, and binary interfaces for all v0.6 releases.
The major differences between v0.4 and v0.6 are<ul>
<li>Native Windows support using I/O Completion Ports for sockets.
<li>Integrated load balancing over multiple processes. <ahref="http://nodejs.org/docs/v0.6.0/api/cluster.html">docs</a>
<li>Better support for IPC between Node instances <ahref="http://nodejs.org/docs/v0.6.0/api/child_processes.html#child_process.fork">docs</a>
<li>Improved command line debugger <ahref="http://nodejs.org/docs/v0.6.0/api/debugger.html">docs</a>
<li>Built-in binding to zlib for compression <ahref="http://nodejs.org/docs/v0.6.0/api/zlib.html">docs</a>
<li>Upgrade v8 from 3.1 to 3.6</ul>
In order to support Windows we reworked much of the core architecture. There was some fear that our work would degrade performance on UNIX systems but this was not the case. Here is a Linux system we benched for demonstration:
Bigger is better in http and io benchmarks, smaller is better in startup. The http benchmark was done with 600 clients on a 10GE network served from three load generation machines.
In the last version of Node, v0.4, we could only run Node on Windows with Cygwin. Therefore we've gotten massive improvements by targeting the native APIs. Benchmarks on the same machine:
We consider this a good intermediate stage for the Windows port. There is still work to be done. For example, we are not yet providing users with a blessed path for building addon modules in MS Visual Studio. Work will continue in later releases.
For users upgrading code bases from v0.4 to v0.6 <ahref="https://github.com/joyent/node/wiki/API-changes-between-v0.4-and-v0.6">we've documented</a> most of the issues that you will run into. Most people find the change painless. Despite the long list of changes most core APIs remain untouched.
Our release cycle will be tightened dramatically now. Expect to see a new stable branch in January. We wish to eventually have our releases in sync with Chrome and V8's 6 week cycle.
Thank you to everyone who contributed code, tests, docs, or sent in bug reports.
Here are the changes between v0.5.12 and v0.6.0:
2011.11.04, Version 0.6.0 (stable)
<ul><li>print undefined on undefined values in REPL (Nathan Rajlich)</li>
<li>doc improvements (koichik, seebees, bnoordhuis, Maciej Małecki, Jacob Kragh)</li>
<li>support native addon loading in windows (Bert Belder)</li>
<li>rename getNetworkInterfaces() to networkInterfaces() (bnoordhuis)</li>
<li>add pending accepts knob for windows (igorzi)</li>
<li>http.request(url.parse(x)) (seebees)</li>
<li>#1929 zlib Respond to 'resume' events properly (isaacs)</li>
<li>stream.pipe: Remove resume and pause events</li>
<li>test fixes for windows (igorzi)</li>
<li>build system improvements (bnoordhuis)</li>
<li>#1936 tls: does not emit 'end' from EncryptedStream (koichik)</li>
<li><p>unix: don't flush tty on switch to raw mode (Ben Noordhuis)</p>
</li>
<li><p>windows: reset brightness when reverting to default text color (Bert Belder)</p>
</li>
<li><p>npm: update to 1.1.1</p>
<p>- Update which, fstream, mkdirp, request, and rimraf<br>- Fix #2123 Set path properly for lifecycle scripts on windows<br>- Mark the root as seen, so we don't recurse into it. Fixes #1838. (Martin Cooper)</p>
<li><p>unix: don't flush tty on switch to raw mode (Ben Noordhuis)</p>
</li>
<li><p>windows: reset brightness when reverting to default text color (Bert Belder)</p>
</li>
<li><p>npm: update to 1.1.1</p>
<p>- Update which, fstream, mkdirp, request, and rimraf<br>- Fix #2123 Set path properly for lifecycle scripts on windows<br>- Mark the root as seen, so we don't recurse into it. Fixes #1838. (Martin Cooper)</p>
<p>This is the last release on the 0.7 branch. Version 0.8.0 will be released some time later this week, barring any major problems. </p>
<p>As with other even-numbered Node releases before it, the v0.8.x releases will maintain API and binary compatibility. </p>
<p>The major changes between v0.6 and v0.8 are detailed in <ahref="https://github.com/joyent/node/wiki/API-changes-between-v0.6-and-v0.8">https://github.com/joyent/node/wiki/API-changes-between-v0.6-and-v0.8</a></p>
<p>Please try out this release. There will be very virtually no changes between this and the v0.8.x release family. This is the last chance to comment before it is locked down for stability. The API is effectively frozen now. </p>
<p>This version adds backwards-compatible shims for binary addons that use libeio and libev directly. If you find that binary modules that could compile on v0.6 can not compile on this version, please let us know. Note that libev is officially deprecated in v0.8, and will be removed in v0.9. You should be porting your modules to use libuv as soon as possible. </p>
<p>V8 is on 3.11.10 currently, and will remain on the V8 3.11.x branch for the duration of Node v0.8.x. </p>
<ul><li><p>npm: Upgrade to 1.1.30<br> - Improved 'npm init'<br> - Fix the 'cb never called' error from 'oudated' and 'update'<br> - Add --save-bundle|-B config<br> - Fix isaacs/npm#2465: Make npm script and windows shims cygwin-aware<br> - Fix isaacs/npm#2452 Use --save(-dev|-optional) in npm rm<br> - <code>logstream</code> option to replace removed <code>logfd</code> (Rod Vagg)<br> - Read default descriptions from README.md files </p>
</li><li><p>Shims to support deprecated <code>ev_*</code> and <code>eio_*</code> methods (Ben Noordhuis)</p>
</li><li><p>#3118 net.Socket: Delay pause/resume until after connect (isaacs)</p>
</li><li><p>#3465 Add ./configure --no-ifaddrs flag (isaacs)</p>
</li><li><p>child_process: add .stdin stream to forks (Fedor Indutny)</p>
<p>This is the last release on the 0.7 branch. Version 0.8.0 will be released some time later this week, barring any major problems. </p>
<p>As with other even-numbered Node releases before it, the v0.8.x releases will maintain API and binary compatibility. </p>
<p>The major changes between v0.6 and v0.8 are detailed in <ahref="https://github.com/joyent/node/wiki/API-changes-between-v0.6-and-v0.8">https://github.com/joyent/node/wiki/API-changes-between-v0.6-and-v0.8</a></p>
<p>Please try out this release. There will be very virtually no changes between this and the v0.8.x release family. This is the last chance to comment before it is locked down for stability. The API is effectively frozen now. </p>
<p>This version adds backwards-compatible shims for binary addons that use libeio and libev directly. If you find that binary modules that could compile on v0.6 can not compile on this version, please let us know. Note that libev is officially deprecated in v0.8, and will be removed in v0.9. You should be porting your modules to use libuv as soon as possible. </p>
<p>V8 is on 3.11.10 currently, and will remain on the V8 3.11.x branch for the duration of Node v0.8.x. </p>
<ul><li><p>npm: Upgrade to 1.1.30<br> - Improved 'npm init'<br> - Fix the 'cb never called' error from 'oudated' and 'update'<br> - Add --save-bundle|-B config<br> - Fix isaacs/npm#2465: Make npm script and windows shims cygwin-aware<br> - Fix isaacs/npm#2452 Use --save(-dev|-optional) in npm rm<br> - <code>logstream</code> option to replace removed <code>logfd</code> (Rod Vagg)<br> - Read default descriptions from README.md files </p>
</li><li><p>Shims to support deprecated <code>ev_*</code> and <code>eio_*</code> methods (Ben Noordhuis)</p>
</li><li><p>#3118 net.Socket: Delay pause/resume until after connect (isaacs)</p>
</li><li><p>#3465 Add ./configure --no-ifaddrs flag (isaacs)</p>
</li><li><p>child_process: add .stdin stream to forks (Fedor Indutny)</p>
Bryan Cantrill, VP of Engineering at Joyent, describes the challenges of instrumenting a distributed, dynamic, highly virtualized system -- and what their experiences taught them about the problem, the technologies used to tackle it, and promising approaches.
This talk was given at Velocity Conf in 2011.
<table><tr><tdalign="center">
Bryan Cantrill, VP of Engineering at Joyent, describes the challenges of instrumenting a distributed, dynamic, highly virtualized system -- and what their experiences taught them about the problem, the technologies used to tackle it, and promising approaches.
<ul><li><p>A carefully crafted attack request can cause the contents of the HTTP parser's buffer to be appended to the attacking request's header, making it appear to come from the attacker. Since it is generally safe to echo back contents of a request, this can allow an attacker to get an otherwise correctly designed server to divulge information about other requests. It is theoretically possible that it could enable header-spoofing attacks, though such an attack has not been demonstrated.</li>
<li>Versions affected: All versions of the 0.5/0.6 branch prior to 0.6.17, and all versions of the 0.7 branch prior to 0.7.8. Versions in the 0.4 branch are not affected.</li>
<li>Fix: Upgrade to <ahref="http://blog.nodejs.org/2012/05/04/version-0-6-17-stable/">v0.6.17</a>, or apply the fix in <ahref="https://github.com/joyent/node/commit/c9a231d">c9a231d</a> to your system.</li></ul>
<h2>Details</h2>
<p>A few weeks ago, Matthew Daley found a security vulnerability in Node's HTTP implementation, and thankfully did the responsible thing and reported it to us via email. He explained it quite well, so I'll quote him here:</p>
<blockquote>
<p>There is a vulnerability in node's `http_parser` binding which allows information disclosure to a remote attacker:
</p>
<p>In node::StringPtr::Update, an attempt is made at an optimization on certain inputs (`node_http_parser.cc`, line 151). The intent is that if the current string pointer plus the current string size is equal to the incoming string pointer, the current string size is just increased to match, as the incoming string lies just beyond the current string pointer. However, the check to see whether or not this can be done is incorrect; "size" is used whereas "size_" should be used. Therefore, an attacker can call Update with a string of certain length and cause the current string to have other data appended to it. In the case of HTTP being parsed out of incoming socket data, this can be incoming data from other sockets.
</p>
<p>Normally node::StringPtr::Save, which is called after each execution of `http_parser`, would stop this from being exploitable as it converts strings to non-optimizable heap-based strings. However, this is not done to 0-length strings. An attacker can therefore exploit the mistake by making Update set a 0-length string, and then Update past its boundary, so long as it is done in one `http_parser` execution. This can be done with an HTTP header with empty value, followed by a continuation with a value of certain length.
</p>
<p>The <ahref="https://gist.github.com/2628868">attached files</a> demonstrate the issue: </p>
<pre><code>$ ./node ~/stringptr-update-poc-server.js &
[1] 11801
$ ~/stringptr-update-poc-client.py
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Wed, 18 Apr 2012 00:05:11 GMT
Connection: close
Transfer-Encoding: chunked
64
X header:
This is private data, perhaps an HTTP request with a Cookie in it.
0</code></pre>
</blockquote>
<p>The fix landed on <ahref="https://github.com/joyent/node/commit/7b3fb22">7b3fb22</a> and <ahref="https://github.com/joyent/node/commit/c9a231d">c9a231d</a>, for master and v0.6, respectively. The innocuous commit message does not give away the security implications, precisely because we wanted to get a fix out before making a big deal about it. </p>
<p>The first releases with the fix are v0.7.8 and 0.6.17. So now is a good time to make a big deal about it. </p>
<p>If you are using node version 0.6 in production, please upgrade to at least <ahref="http://blog.nodejs.org/2012/05/04/version-0-6-17-stable/">v0.6.17</a>, or at least apply the fix in <ahref="https://github.com/joyent/node/commit/c9a231d">c9a231d</a> to your system. (Version 0.6.17 also fixes some other important bugs, and is without doubt the most stable release of Node 0.6 to date, so it's a good idea to upgrade anyway.) </p>
<h2>tl;dr</h2>
<ul><li><p>A carefully crafted attack request can cause the contents of the HTTP parser's buffer to be appended to the attacking request's header, making it appear to come from the attacker. Since it is generally safe to echo back contents of a request, this can allow an attacker to get an otherwise correctly designed server to divulge information about other requests. It is theoretically possible that it could enable header-spoofing attacks, though such an attack has not been demonstrated.</li>
<li>Versions affected: All versions of the 0.5/0.6 branch prior to 0.6.17, and all versions of the 0.7 branch prior to 0.7.8. Versions in the 0.4 branch are not affected.</li>
<li>Fix: Upgrade to <ahref="http://blog.nodejs.org/2012/05/04/version-0-6-17-stable/">v0.6.17</a>, or apply the fix in <ahref="https://github.com/joyent/node/commit/c9a231d">c9a231d</a> to your system.</li></ul>
<h2>Details</h2>
<p>A few weeks ago, Matthew Daley found a security vulnerability in Node's HTTP implementation, and thankfully did the responsible thing and reported it to us via email. He explained it quite well, so I'll quote him here:</p>
<blockquote>
<p>There is a vulnerability in node's `http_parser` binding which allows information disclosure to a remote attacker:
</p>
<p>In node::StringPtr::Update, an attempt is made at an optimization on certain inputs (`node_http_parser.cc`, line 151). The intent is that if the current string pointer plus the current string size is equal to the incoming string pointer, the current string size is just increased to match, as the incoming string lies just beyond the current string pointer. However, the check to see whether or not this can be done is incorrect; "size" is used whereas "size_" should be used. Therefore, an attacker can call Update with a string of certain length and cause the current string to have other data appended to it. In the case of HTTP being parsed out of incoming socket data, this can be incoming data from other sockets.
</p>
<p>Normally node::StringPtr::Save, which is called after each execution of `http_parser`, would stop this from being exploitable as it converts strings to non-optimizable heap-based strings. However, this is not done to 0-length strings. An attacker can therefore exploit the mistake by making Update set a 0-length string, and then Update past its boundary, so long as it is done in one `http_parser` execution. This can be done with an HTTP header with empty value, followed by a continuation with a value of certain length.
</p>
<p>The <ahref="https://gist.github.com/2628868">attached files</a> demonstrate the issue: </p>
<pre><code>$ ./node ~/stringptr-update-poc-server.js &
[1] 11801
$ ~/stringptr-update-poc-client.py
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Wed, 18 Apr 2012 00:05:11 GMT
Connection: close
Transfer-Encoding: chunked
64
X header:
This is private data, perhaps an HTTP request with a Cookie in it.
0</code></pre>
</blockquote>
<p>The fix landed on <ahref="https://github.com/joyent/node/commit/7b3fb22">7b3fb22</a> and <ahref="https://github.com/joyent/node/commit/c9a231d">c9a231d</a>, for master and v0.6, respectively. The innocuous commit message does not give away the security implications, precisely because we wanted to get a fix out before making a big deal about it. </p>
<p>The first releases with the fix are v0.7.8 and 0.6.17. So now is a good time to make a big deal about it. </p>
<p>If you are using node version 0.6 in production, please upgrade to at least <ahref="http://blog.nodejs.org/2012/05/04/version-0-6-17-stable/">v0.6.17</a>, or at least apply the fix in <ahref="https://github.com/joyent/node/commit/c9a231d">c9a231d</a> to your system. (Version 0.6.17 also fixes some other important bugs, and is without doubt the most stable release of Node 0.6 to date, so it's a good idea to upgrade anyway.) </p>
<p>I'm extremely grateful that Matthew took the time to report the problem to us with such an elegant explanation, and in such a way that we had a reasonable amount of time to fix the issue before making it public. </p>