Merge pull request #17 from trezor/master

BIP39 changes
11 years ago · 8ac51158bd
2 changed files with 2099 additions and 38 deletions
--- a/bip-0039.mediawiki
+++ b/bip-0039.mediawiki
@ -5,6 +5,7 @@
           Pavol Rusnak <stick@satoshilabs.com>
           ThomasV <thomasv@bitcointalk.org>
           Aaron Voisine <voisine@gmail.com>
+           Sean Bowe <ewillbefull@gmail.com>
  Status:  Draft
  Type:    Standards Track
  Created: 10-09-2013
@ -12,35 +13,39 @@

 ==Abstract==

-This BIP describes an usage of mnemonic code or mnemonic sentence - a group of
-easy to remember words - to generate deterministic wallets.
+This BIP describes the implementation of a mnemonic code or mnemonic sentence --
+a group of easy to remember words -- for the generation of deterministic wallets.

-It consists of two parts: generating the mnemonic and converting it into
-a binary seed. This seed can be later used to generate deterministic wallets
-using BIP-0032 or similar methods.
+It consists of two parts: generating the mnenomic, and converting it into a
+binary seed. This seed can be later used to generate deterministic wallets using
+BIP-0032 or similar methods.

 ==Motivation==

-Such mnemonic code or mnemonic sentence is much easier to work with than working
-with the binary data directly (or its hexadecimal interpretation). The sentence
-could be writen down on paper (e.g. for storing in a secure location such as
-safe), told over telephone or other voice communication method, or memorized
-in ones memory (this method is called brainwallet).
+A mnenomic code or sentence is superior for human interaction compared to the
+handling of raw binary or hexidecimal representations of a wallet seed. The
+sentence could be written on paper or spoken over the telephone.
+
+This guide meant to be as a way to transport computer-generated randomnes over
+human readable transcription. It's not a way how to process user-created
+sentences (also known as brainwallet) to wallet seed.

 ==Generating the mnemonic==

-First, we decide how much entropy we want mnemonic to encode. Recommended size
-is 128-256 bits, but basically any multiple of 32 bits will do. More bits
-mean more security, but also longer word sentence.
+The mnemonic must encode entropy in any multiple of 32 bits. With larger entropy
+security is improved but the sentence length increases. We can refer to the
+initial entropy length as ENT. The recommended size of ENT is 128-256 bits.

-We take initial entropy of ENT bits and compute its checksum by taking first
-ENT / 32 bits of its SHA256 hash. We append these bits to the end of the initial
-entropy. Next we take these concatenated bits and split them into groups of 11
-bits. Each group encodes number from 0-2047 which is a position in a wordlist.
-We convert numbers into words and use joined words as mnemonic sentence.
+First, an initial entropy of ENT bits is generated. A checksum is generated by
+taking the first <pre>ENT / 32</pre> bits of its SHA256 hash. This checksum is
+appended to the end of the initial entropy. Next, these concatenated bits are
+are split into groups of 11 bits, each encoding a number from 0-2047, serving
+as an index to a wordlist. Later, we will convert these numbers into words and
+use the joined words as a mnemonic sentence.

-The following table describes the relation between initial entropy length (ENT),
-checksum length (CS) and length of the generated mnemonic sentence (MS) in words.
+The following table describes the relation between the initial entropy
+length (ENT), the checksum length (CS) and length of the generated mnemonic
+sentence (MS) in words.

 <pre>
 CS = ENT / 32
@ -57,49 +62,57 @@ MS = (ENT + CS) / 11

 ==Wordlist==

-In previous section we described how to pick words from a wordlist. Now we
-describe how does a good wordlist look like.
+An ideal wordlist has the following characteristics:

 a) smart selection of words
-   - wordlist is created in such way that it's enough to type just first four
+   - wordlist is created in such way that it's enough to type the first four
     letters to unambiguously identify the word

 b) similar words avoided
-   - words as "build" and "built", "woman" and "women" or "quick" or "quickly"
+   - word pairs like "build" and "built", "woman" and "women", or "quick" and "quickly"
     not only make remembering the sentence difficult, but are also more error
-     prone and more difficult to guess (see point below)
-   - we avoid these words by carefully selecting them during addition
+     prone and more difficult to guess

 c) sorted wordlists
-   - wordlist is sorted which allow more efficient lookup of the code words
+   - wordlist is sorted which allows for more efficient lookup of the code words
     (i.e. implementation can use binary search instead of linear search)
   - this also allows trie (prefix tree) to be used, e.g. for better compression

-Wordlist can contain native characters, but they have to be encoded using UTF-8.
+The wordlist can contain native characters, but they have to be encoded in UTF-8
+using Normalization Form Compatibility Decomposition (NFKD).

 ==From mnemonic to seed==

-User can decide to protect his mnemonic by passphrase. If passphrase is not present
-an empty string "" is used instead.
+A user may decide to protect their mnemonic by passphrase. If a passphrase is not
+present, an empty string "" is used instead.

-To create binary seed from mnemonic, we use PBKDF2 function with mnemonic sentence
-(in UTF-8) used as a password and string "mnemonic" + passphrase (again in UTF-8)
-used as a salt. Iteration count is set to 4096 and HMAC-SHA512 is used as a pseudo-
-random function. Desired length of the derived key is 512 bits (= 64 bytes).
+To create a binary seed from the mnemonic, we use PBKDF2 function with a mnemonic
+sentence (in UTF-8 NFKD) used as a password and string "mnemonic" + passphrase (again
+in UTF-8 NFKD) used as a salt. Iteration count is set to 2048 and HMAC-SHA512 is used as
+a pseudo-random function. Desired length of the derived key is 512 bits (= 64 bytes).

 This seed can be later used to generate deterministic wallets using BIP-0032 or
 similar methods.

 The conversion of the mnemonic sentence to binary seed is completely independent
-from generating the sentence. This results in rather simple code, there are no
+from generating the sentence. This results in rather simple code; there are no
 constraints on sentence structure and clients are free to implement their own
-wordlists or even whole sentence generators (they'll lose the proposed method
-for typo detection in that case, but they can come up with their own).
+wordlists or even whole sentence generators, allowing for flexibility in wordlists
+for typo detection or other purposes.
+
+Although using mnemonic not generated by algorithm described in "Generating the
+mnemonic" section is possible, this is not advised and software must compute
+checksum of the mnemonic sentence using wordlist and issue a warning if it is
+invalid.

-Described method also provides plausable deniability, because every passphrase
+Described method also provides plausible deniability, because every passphrase
 generates a valid seed (and thus deterministic wallet) but only the correct one
 will make the desired wallet available.

+==Wordlists==
+
+* [[bip-0039/english.txt|English]]
+
 ==Test vectors==

 See https://github.com/trezor/python-mnemonic/blob/master/vectors.json
--- a/bip-0039/english.txt
+++ b/bip-0039/english.txt