Encoding & Decoding

Unicode Encode: Convert Text to Unicode Escape Sequences

Convert any text to Unicode escape sequences (\uXXXX format) for safe embedding in source code, JSON, and systems that require ASCII-only strings.

Published January 15, 2025Updated June 1, 20254 min read

Try the free online tool

Runs entirely in your browser — no signup, no uploads.

Open Tool

Unicode escape sequences represent characters as \uXXXX (where XXXX is a four-digit hexadecimal code point) and allow any Unicode character to be expressed using only ASCII. This is essential when working in environments that do not support non-ASCII text directly in source files, configuration files, or data formats.

JavaScript, Java, Python, JSON, and many other languages and formats support Unicode escape sequences natively. They allow developers to include characters from any language script, emoji, mathematical symbols, and special characters in code without relying on the file's character encoding or the editor's Unicode support.

This tool converts any text to its Unicode escape sequence representation. Whether you are embedding a Chinese character in a Java string literal, storing emoji in an ASCII-safe format, or debugging character encoding issues, this encoder gives you the exact escape sequences you need.

What Are Unicode Escape Sequences?

Unicode is a universal character encoding standard that assigns a unique code point to every character in every human writing system. Unicode escape sequences express a code point as a text string using only ASCII characters. The most common format is \uXXXX, where XXXX is a four-digit hexadecimal number representing the code point.

For code points beyond U+FFFF (supplementary characters like many emoji), two formats are used: JavaScript and JSON use surrogate pairs (😀 for the grinning face emoji), while some languages and formats use \U00XXXXXX (uppercase U with eight hex digits) or \u{XXXXX} with braces.

The \uXXXX format is supported in JavaScript string literals, JSON string values, Java source code, C# source code, and Python 3 strings (via \uXXXX for BMP characters and \UXXXXXXXX for supplementary characters). CSS uses a different format: \XXXXXX without the u prefix.

How to Use This Tool

Converting text to Unicode escape sequences is fast with this tool.

1
Enter the text
Type or paste the text containing Unicode characters you want to convert to escape sequences.
2
Choose the escape format
Select JavaScript/JSON (\uXXXX), Python (\uXXXX or \UXXXXXXXX), Java, or CSS depending on where the output will be used.
3
Choose encoding scope
Select whether to encode all characters or only non-ASCII characters; ASCII-safe characters do not need escaping in most contexts.
4
Click Encode
The tool outputs the escape sequence representation of your text.
5
Copy and paste into your code
The output is ready to paste directly into a string literal in your source code or JSON file.

Common Use Cases

Unicode escape sequences are used across many development contexts.

Embedding non-ASCII characters in Java source files that must remain ASCII-safe for legacy build tools.
Writing test fixtures with specific Unicode characters without requiring the test file to be saved as UTF-8.
Storing emoji or special symbols in JSON configuration files in an escape-safe format.
Encoding right-to-left markers, zero-width spaces, and other invisible control characters for debugging.
Creating ASCII-safe versions of localised strings for systems that do not fully support Unicode.
Encoding characters that might be altered by copy-paste operations or text processing tools.

Tips and Best Practices

Prefer UTF-8 source files with actual Unicode characters over escape sequences for readability; use escapes only when the target environment requires ASCII-only strings.
In JSON, only control characters (U+0000 to U+001F) are required to be escaped; all other Unicode characters can appear as literal UTF-8 bytes.
Be aware that supplementary characters (emoji, many CJK extension characters) require surrogate pairs in JavaScript and JSON, which means two \uXXXX sequences per character.
Use \u{XXXXX} braces syntax in modern JavaScript (ES2015+) to express supplementary code points directly without surrogate pairs.
When debugging encoding issues, encoding the entire string to Unicode escapes and comparing the code points is one of the most reliable ways to identify unexpected characters.

Frequently Asked Questions

What is the difference between \uXXXX and \UXXXXXXXX?

\uXXXX (lowercase u, 4 hex digits) represents Basic Multilingual Plane characters with code points U+0000 to U+FFFF. \UXXXXXXXX (uppercase U, 8 hex digits) is used in Python and some other languages to represent supplementary characters with code points above U+FFFF. JavaScript uses surrogate pairs (two \uXXXX sequences) for supplementary characters.

Why do emoji require two \uXXXX escape sequences in JavaScript?

Emoji and other supplementary characters have code points above U+FFFF. JavaScript strings use UTF-16 encoding internally, and characters above U+FFFF are represented as surrogate pairs: two 16-bit values in the range U+D800 to U+DFFF. Each surrogate is expressed as a separate \uXXXX sequence. ES2015 introduced \u{XXXXX} syntax to express supplementary characters as a single escape.

How do Unicode escape sequences differ in Java versus JavaScript?

Java Unicode escapes (\uXXXX) are processed by the Java compiler before parsing, meaning they can appear anywhere in Java source code including comments and identifiers. JavaScript escapes are processed at the string literal level. This difference means that in Java, a \u002F (forward slash) in a comment is processed as a slash before the comment is parsed, which can cause surprising behaviour.

Can I use Unicode escapes in JSON?

Yes. JSON specifies \uXXXX as the escape sequence for Unicode characters. Supplementary characters must be expressed as UTF-16 surrogate pairs. JSON parsers are required to handle these escape sequences, so they are a portable way to include any Unicode character in a JSON string.

Do Unicode escape sequences affect how strings are compared?

No. In languages that support Unicode escapes, the escape sequence and the literal character are identical at the string level. 'A' and '\u0041' are the same string. Comparison, length, and all other string operations treat them identically.

Why would I encode ASCII characters like letters to \uXXXX sequences?

You generally would not, but it is sometimes done intentionally to obfuscate code (for example, malicious JavaScript) or to test how a system handles Unicode escape sequences. Legitimate uses include encoding characters that might be corrupted by specific text processing tools or build systems.

unicodeescape-sequencesjavascriptjsoninternationalisation

Ready to use this tool?

Free, instant, no account required. Runs entirely in your browser.

Open Tool

More Encoding & Decoding Guides

Base64 Encode: How to Encode Text and Files to Base64 Online

5 min read

Base64 Decode: How to Decode Base64 Strings Back to Text

4 min read

URL Encode: How to Percent-Encode URLs and Query Parameters

5 min read

URL Decode: How to Decode Percent-Encoded URL Strings

4 min read