Basic blockchain knowledge: how does wallet apps derive your address from your seed phrase?
This is a rather complex process, involving many steps, summarized in the diagram below:
#1 --> #2
1st step is to convert your human-readable seed phrase aka mnemonic phrase (#1 in the diagram) into a raw binary seed (#2), aka entropy. The standardized algorithm for this conversion is Bitcoin Improvement Proposal (BIP)-39:
github.com/bitcoin/bips/blob…. Despite the "bitcoin" in the name, this standard is adopted by all other chains as well.
#2 --> #3
Next we derive the private/public key pair (#3) from the seed. The algorithm for this is defined by BIP-32:
github.com/bitcoin/bips/blob…
Here it gets a bit complicated. The BIP-32 algorithm takes 2 parameters: the seed, and another thing called the derivation path. The idea is that people may have multiple wallets for different purposes, but they probably don't want to keep record of many seed phrases. BIP-32 allows users to use only one seed, but combine it with various derivation paths to generate different wallet addresses.
The specific format of derivation path is standardized by yet another BIP, BIP-44 (
github.com/bitcoin/bips/blob…). It looks like this:
m / purpose' / coin_type' / account' / change / address_index
The two fields that are worth mentioning are coin_type and address_index. The latter is easy to understand-- your first address uses index 0, the 2nd uses 1, so on. When you click "Add account" in MetaMask to generate a new address, what it does behind the scene is to increment the value of this address_index.
coin_type is where it gets a bit controversial, at least within the Cosmos ecosystem. This parameter designates which chain the account is intended for. Satoshi Labs, the creator of Trezor wallet, maintains a registry called SLIP-0044 (
github.com/satoshilabs/slips…) that allocates a number for each chain. Bitcoin gets coin_type 0, Litecoin gets 2, Dogecoin 3, Ethereum 60, so on.
The controversial part is that deriving a different address for each chain may create problems for user experience. There has been a debate on whether all Cosmos chains should use the same coin_type, 118, that of Cosmos Hub. Another opinion is that Cosmos chains should just align with Ethereum and use 60. Let's not go too deep into the rabbit hole for now.
#3 --> #4
Now we derive the address in its raw binary form (#4) from the public key. This is where it gets really messy. Unlike the previous steps which are well standardized, here each chain has its own methodology. To name a few:
Bitcoin
address := ripemd160(sha256(pubkey))
Ethereum
address := keccak256(pubkey)[-20:]
Cosmos
address := sha256(sha256(typ) || pubkey) **1
Sui
address := blake2b256(flag || pubkey) **2
Here sha256, ripemd160, keccak256, and blake2b256 are all cryptographic hash functions.
**1 `typ` is a string designating the pubkey type. Typically the Protobuf typeURL is used. See:
github.com/cosmos/cosmos-sdk…
**2 `flag` is a single byte that designates which ECDSA curve the account uses. See:
docs.sui.io/concepts/cryptog…
#4 --> #5
Our address is now in its raw binary form, meaning a few hundred 0s and 1s. The final step is to encode it to a human-readable form.
Here's another place where chains take different approaches. There're 3 encoding schemes used:
- bech32: Bitcoin and Cosmos
- hex: Ethereum, Aptos, Sui
- base58: Solana
What we should know is that the encoding schemes are just for the ease-of-use by humans. Blockchains operates on raw binaries and don't care how humans encode them. The very same address can look totally different when encoded in various schemes but its underlying bytes are unchanged. For example, this is the hex-encoded address for vitalik.eth:
0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
This is the very same address in Bech32 encoding:
cosmos1mrdxhunfvjhe6lhdncp72dq46da2jcz9d9sh93
In base58:
42E5f5XLWJkBaQc3DdthcgUycdYQ
Implementation
I have a Rust code snippet for deriving the address (using Bitcoin's approach and bech32 encoding) for those who're interested:
gist.github.com/larry0x/9239…
-------------------
That's it! Thanks for tuning in to my lecture. Let's summarize the nerd words you learned today so that you can show off to your friends:
- BIP-32
- BIP-39
- BIP-44
- SLIP-0044
- coin_type
- account_index
- bech32
- base58
- hex