Building Multi-Character Emojis
Unicode has always supported building accented characters by combining letters and diacritics. This idea has been extended to meet the growing demand for emojis.
This post explains how to combine special markers and characters to make new pictographs.
Contents:
Country Flags
Throughout history, countries split, join, mutate or simply adopt new flags. The Unicode consortium found a way to avoid keeping up with those changes and outsource the problem to the systems that claim Unicode support: its character database has no country flags. Instead, there is a set of 26 "regional indicator symbols", corresponding to latin letters from A to Z and assigned codes from U+1F1E6 to U+1F1FF. When you combine two of those indicator letters to form an ISO 3166-1 country code, you get the corresponding country flag—if the UI supports it. Example 1 shows how.
# REGIONAL INDICATOR SYMBOLS
RIS_A = '\U0001F1E6' # LETTER A
RIS_U = '\U0001F1FA' # LETTER U
print(RIS_A + RIS_U) # AU: Australia
print(RIS_U + RIS_A) # UA: Ukraine
print(RIS_A + RIS_A) # AA: no such country
If your program outputs a combination of indicator letters that is not recognized by the app, you get the indicators displayed as letters inside dashed squares—again, depending on the UI. See the last line in Figure 2.
Note
|
Europe and the United Nations are not countries, but their flags are supported by the regional indicator pairs EU and UN, respectively. England, Scotland, and Wales may or may not be separate countries by the time you read this, but they also have flags supported by Unicode. However, instead of regional indicator letters, those flags require a more complicated scheme. Read Emoji Flags Explained on Emojipedia to learn how that works. |
Now let’s see how emoji modifiers can be used to set the skin tone of emojis that show human faces, hands, noses etc.
Skin Tones
Unicode provides a set of 5 emoji modifiers to set skin tone from pale to dark brown. They are based on the Fitzpatrick scale—developed to study the effects of ultraviolet light on human skin. Example 2 shows the use of those modifiers to set the skin tone of the thumbs up emoji.
from unicodedata import name
SKIN1 = 0x1F3FB # EMOJI MODIFIER FITZPATRICK TYPE-1-2 # (1)
SKINS = [chr(i) for i in range(SKIN1, SKIN1 + 5)] # (2)
THUMB = '\U0001F44d' # THUMBS UP SIGN 👍
examples = [THUMB] # (3)
examples.extend(THUMB + skin for skin in SKINS) # (4)
for example in examples:
print(example, end='\t') # (5)
print(' + '.join(name(char) for char in example)) # (6)
-
EMOJI MODIFIER FITZPATRICK TYPE-1-2 is the first modifier.
-
Build list with all five modifiers.
-
Start list with the unmodified THUMBS UP SIGN.
-
Extend list with the same emoji followed by each of the modifiers.
-
Display emoji and tab.
-
Display names of characters combined in the emoji, joined by
' + '
.
The output of Example 2 looks like Figure 3 on MacOS. As you can see, the unmodified emoji has a cartoonish yellow color, while the others have more realistic skin tones.
Let’s now move to more complex emoji combinations using special markers.
Rainbow Flag and ZWJ Combinations
Besides the special purpose indicators and modifiers we’ve seen, Unicode provides an invisible marker used as glue between emojis and other characters, to produce new combinations: U+200D, ZERO WIDTH JOINER—nicknamed "ZWJ" in many Unicode documents.
For example, the rainbow flag is built by joining the emojis WAVING WHITE FLAG and RAINBOW, as Figure 4 shows.
Unicode 13 supports more than 1100 ZWJ emoji sequences as RGI—"recommended for general interchange […] intended to be widely supported across multiple platforms".[1] You can find the full list of RGI ZWJ emoji sequences in emoji-zwj-sequences.txt and a small sample in Figure 5.
Example 3 is the source code that produced Figure 5. You can run it from your shell, but for better results I recommend pasting it inside a Jupyter Notebook to run it in a browser. Browsers often lead the way in Unicode support, and provide prettier emoji pictographs.
from unicodedata import name
zwg_sample = """
1F468 200D 1F9B0 |man: red hair |E11.0
1F9D1 200D 1F91D 200D 1F9D1 |people holding hands |E12.0
1F3CA 1F3FF 200D 2640 FE0F |woman swimming: dark skin tone |E4.0
1F469 1F3FE 200D 2708 FE0F |woman pilot: medium-dark skin tone |E4.0
1F468 200D 1F469 200D 1F467 |family: man, woman, girl |E2.0
1F3F3 FE0F 200D 26A7 FE0F |transgender flag |E13.0
1F469 200D 2764 FE0F 200D 1F48B 200D 1F469 |kiss: woman, woman |E2.0
"""
markers = {'\u200D': 'ZWG', # ZERO WIDTH JOINER
'\uFE0F': 'V16', # VARIATION SELECTOR-16
}
for line in zwg_sample.strip().split('\n'):
code, descr, version = (s.strip() for s in line.split('|'))
chars = [chr(int(c, 16)) for c in code.split()]
print(''.join(chars), version, descr, sep='\t', end='')
for char in chars:
if char in markers:
print(' + ' + markers[char], end='')
else:
ucode = f'U+{ord(char):04X}'
print(f'\n\t{char}\t{ucode}\t{name(char)}', end='')
print()
One trend in modern Unicode is the addition of gender-neutral emojis such as SWIMMER (U+1F3CA) or ADULT (U+1F9D1), which can then be shown as they are, or with different gender in ZWJ sequences with the female sign ♀ (U+2640) or the male sign ♂ (U+2642).
Modern Families
The Unicode Consortium is also moving towards more diversity in the supported family emojis. Figure 6 is a matrix of family emojis showing support for families with different combinations of parents and children—as of January 2020.
The code I wrote to build Figure 6 uses the Bottle framework to build and serve an HTML page with that matrix of family emojis. You can find emoji_families.py it in this site’s public repository.
To see the page, install Bottle, run the emoji_families.py script and visit http://localhost:8080/.
Tip
|
Browsers follow the evolution of Unicode Emoji closely, and here no OS has a clear advantage. While preparing this chapter, I captured Figure 5 on Ubuntu 19.10 and Figure 6 on Windows 10, using Firefox 72 on both, because those were the OS/browser combinations with the most complete support for the emojis in those examples. |
Further Reading
To learn more about Unicode Emoji standards, visit the Unicode Emoji index page, which links to the Technical Standard #51: Unicode Emoji and the emoji data files, where you’ll find emoji-zwj-sequences.txt—the source of the samples I used in Figure 5.
Emojipedia is the best site to find emojis and learn about them. Besides a comprehensive searchable database, Emojipedia also has a blog including posts like Emoji ZWJ Sequences: Three Letters, Many Possibilities and Emoji Flags Explained.
In 2016, the Museum of Modern Art (MoMA) in NYC added to its collection The Original Emoji, the 176 emojis designed by Shigetaka Kurita in 1999 for NTT DOCOMO—the Japanese mobile carrier.
Going further back in history, Emojipedia published Correcting the Record on the First Emoji Set, crediting Japan’s SoftBank for the earliest known emoji set, deployed in cell phones in 1997. SoftBank’s set is the source of 90 emojis now in Unicode, including U+1F4A9 (PILE OF POO).
The culture and politics of emoji evolution in the 2010-2019 decade are the subject of Paddy Johnson’s article Emoji We Lost for Gizmodo.
Matthew Rothenberg’s emojitracker.com is a live, real time dashboard that counts emoji usage on Twitter. As I write this on July 4, 2021, FACE WITH TEARS OF JOY (U+1F602) is the most popular emoji on Twitter, with 3,314,598,733 recorded occurrences.
Soapbox
The Power of Football
Why are England, Scotland, and Wales entitled to their own Unicode flags, but Catalonia, Pernambuco, and Texas are not? Because the first three are permanent members of the International Football Association Board which controls the rules of football. As such, they are allowed to enter their "national teams" in the FIFA World Cup—therefore media outlets need their flags to display tournament charts. At least that’s my unproven hypothesis.