Unicode Emojis in Python

 unicode emoji python

While dealing with Emojis, you might notice that some emojis look like normal characters - they are not colored and look roughly the same on every computer, no matter the font. Others, however, are colored and look different on every phone, computer and operating system.

This is because some emojis are made up of multiple characters, while others are made up of a single character.

While that explanation might sound easy enough, and you could click off this article right away, the world of Unicode is far more complicated. This post intends to explain the basics of Unicode, and how to deal with them in Python.

Multi-Character Emojis

Multi-character emoji

Extracting Emojis from Strings

If the string containing emojis has the emojis embedded between β€˜normal’ text, you’ll find the regex module invaluable.

Note: Do not confuse the regex module with the re module. The regex module is a third-party module that provides more advanced functionality than the standard re module. Install it with pip install regex.

For example, given a string like this: πŸ’˜ I πŸ’– love ❣️ πŸ’πŸ‘¨β€πŸ‘©β€πŸ’žβœ¨ emojis! πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ You’ll find that traditional methods of splitting the string will not work as expected.

import regex

embedded_emojis = "πŸ’˜ I πŸ’– love ❣️ πŸ’πŸ‘¨β€πŸ‘©β€πŸ’žβœ¨ emojis! πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"
for match in regex.finditer(r"\X", embedded_emojis):
    print(match.group(0), ascii(match.group(0))
# 

The special \X matcher matches complex Graphemes and conforms to the Unicode specification. To translate, it will properly separate emojis for normal letters, and it won’t break apart multi-character emojis.


Atom | #9c6927a