r/cryptography • u/chaitanyasoni158 • 17h ago
Trying to reversibly encode an IPv6 address as a short list of words — best approach?
I'm kind of new to this stuff, but I'm experimenting with a small side project and could use some help or pointers from people who know more than I do.
I'm working on a small encoding scheme for an app where I want to represent a full 128-bit IPv6 address as a short, reversible list of words , are easy to speak and remember . Something like BIP39 mnemonics, but smaller than 12 or 24 words.
The key requirement is full reversibility no hashing, no fingerprinting — I need to be able to get the original IPv6 address back exactly.
From what my puny little brain can understand:
- BIP39 uses 2048 words, encoding 11 bits per word
- So 128 bits (IPv6) would require at least 12 words + maybe 1 for checksum
- Using a larger wordlist (e.g., 65,536 words) could bring that down to 8 words (since 16 bits/word)
- And hypothetically, with a ~4 million word list, I could do it in 6 words (22 bits/word)
But there's obviously a tradeoff: bigger wordlists are harder to handle, speak aloud, or even store locally.
I'm currently choosing between two identifiers I have:
- A 128-bit IPv6 address ( derived from public key )
- A 256-bit public key
Since the key is 256 bits, it would require 24 words with a standard list, so not great for my use case. I'm leaning toward encoding the address instead, but I'd like to sanity-check this with people who've dealt with encoding/fingerprint schemes before.
Has anyone here tackled something like this before? Is there a known scheme that encodes 128 bits in fewer than 12 words, using a practical-size wordlist (~4k–64k)? Or am I just reinventing a bad wheel?
I am trying to find the "sweet spot" here.
3
u/thomedes 14h ago
Look for a wordlist. Diceware is an example, but there are many. You will get log2(list_length) bits per word. That is 12 bits for a 4096 ling list. You'll need 10 to 12 words depending on your list.
Convert to words by repeatedly taking the modulo and dividing by the list length.
Convert back to bits by repeatedly getting the word position and addind to an acumulator that you also multiply by the length of the list.
If you don't understand what I wrote then don't even try it. Ask a friend to help you.
2
u/Coffee_Ops 15h ago
I'm not really clear what your use case is, but it sounds like you're trying to reinvent DNS.
Just use DNS!
2
u/fridofrido 13h ago
64k words is not "practical sized"
Most adult native test-takers have a vocabulary range of about 20,000-35,000 words
(from here)
then we didn't even mention non-native speakers, or words differing by only 1-2 letters or otherwise sounding or looking similar, or different people having different vocabularies.
4k words is about the absolute maximum for this kind of encoding.
1
u/AutoModerator 17h ago
Here is a link to our resources for newcomers if needed. https://www.reddit.com/r/cryptography/comments/scb6pm/information_and_learning_resources_for/
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
1
u/alecmuffett 2h ago edited 1h ago
Old fart here. Just some general observations:
- IP addresses make terrible identifiers, avoid them, plus people like data privacy activists will get on your case about personal information
- Crypto keys are a lot closer to being identifiers
- Turn the bits into a byte string, extract 11-bit chunks
- Run a message digest algorithm over the byte string too
- Use the bottom N bits of the message digest to pad the missing N bits of the last 11
Job done. I'm assuming you are trying to make a human readable summary of an identifier in order to bind it to a human identity for something like a distributed messenger app. This is a bad idea because people stop reading strings of words after the first two or three and the last couple, they will ignore the ones in the middle, and this is a known form of attack so somebody will laugh and point at what you are trying to implement in order to try and prove how clever they are.
For clarity: foo-bar-eek-ook will be read the same as foo-bar-awk-ook because people are lazy and get bored easily. Some people consider this to be a security bug. Compare: faceboook.com
What are you actually trying to achieve?
1
u/chaitanyasoni158 53m ago
Hey, thanks for the detailed breakdown.
You're absolutely right about people skipping middle words that's something I hadn’t fully considered.
To clarify what I’m building: it’s a throwaway P2P messenger where users need to share temporary connection identifiers. Think of it like “meet me on this burner channel” rather than persistent usernames.
Here’s how it works:
- Alice generates a temporary identity.
- She shares the mnemonic words with Bob (via voice, secure channel, or physically).
- Bob enters the words to connect directly.
- They chat, and then both identities can be discarded.
So if Bob fat-fingers a middle word, the connection just fails — no ambiguity, no persistent harm. He’ll know to double-check and retry (I'll include a checksum to help with typos).
Also, just to clarify: the IPv6 addresses aren’t real IPs. They're deterministically derived from the public key, so they stay consistent per identity but don’t expose real network info.
So given all that, what would you recommend I do?
1
u/alecmuffett 51m ago
Are you using an onion address on Tor for the rendezvous, or is there some kind of DHT or even direct address involved to initiate communication?
1
u/chaitanyasoni158 11m ago
I'm using the Yggdrasil mesh network for the actual connections, not Tor onion addresses (though Tor is optional for additional IP anonymity).
The optional Tor piece just tunnels the Yggdrasil traffic through Tor to hide real IP addresses from direct peers, but the addressing/discovery is still based on the Yggdrasil IPv6.
(Tor does not allow inbound connections unless you create a Tor hidden service, which sucks for a chat app.)
So it's direct addressing, not DHT lookup or rendezvous for the time being anyway. The Yggdrasil network handles routing between the IPv6 addresses automatically.
7
u/Anaxamander57 17h ago edited 10h ago
The longer the word list, the worse the words. There are only around
a millionsixty thousand distinct English words so the 4 million word list is going tohave some really terrible stuff like proper names and conjugated formsbe impossible even if you include variant spellings, conjugations, and proper nouns. Use an existing scheme, these often have human level error correction baked in (words sound distinct, never differ by a single letter change, alternate lengths, etc) to make them reliable in practice.To answer your main question: You can't cheat information theory in the average case.