A regular expression (regex) is a pattern for finding and matching text: \d{3}-\d{4} matches a phone number, ^\w+@\w+ matches the start of an email. This is the common syntax (character classes, quantifiers, anchors, groups and flags) shared across JavaScript, Python, PCRE and most engines.

📄 Download the printable regex cheat sheet (PDF)

A one-page printable regex token reference. → Download the regex cheat sheet (PDF)

Regex tokens

Token Matches
. Any single character except newline
\d A digit (0-9)
\D A non-digit
\w A word character (letter, digit, underscore)
\W A non-word character
\s Any whitespace (space, tab, newline)
\S Any non-whitespace
^ Start of the string / line
$ End of the string / line
\b A word boundary
\B Not a word boundary
* 0 or more of the previous
+ 1 or more of the previous
? 0 or 1 of the previous (optional)
{n} Exactly n of the previous
{n,} n or more of the previous
{n,m} Between n and m of the previous
[abc] Any one of a, b or c
[^abc] Any character except a, b or c
[a-z] Any character in the range a to z
(…) A capturing group
(?:…) A non-capturing group
a|b a or b (alternation)
\1 Backreference to group 1
(?=…) Lookahead: followed by …
(?!…) Negative lookahead: not followed by …
(?<=…) Lookbehind: preceded by …
\. A literal dot (escape a special char)
flag g Global: find all matches
flag i Case-insensitive
flag m Multiline: ^ and $ match each line

A few useful patterns

  • Digits only: ^\d+$
  • Simple email: ^[\w.+-]+@[\w-]+\.[\w.-]+$
  • Whole word: \bword\b
  • Trim spaces: replace ^\s+|\s+$ with nothing
  • Hex color: ^#[0-9a-fA-F]{6}$

Frequently asked questions

What does \d mean in regex? \d matches any single digit (0-9). \d{3} matches exactly three digits. Its opposite, \D, matches any non-digit.

What is the difference between * and + in regex?

  • matches zero or more of the preceding item, so it can match nothing. + matches one or more, so it requires at least one. ? matches zero or one (optional).

What do ^ and $ mean in regex? ^ anchors the match to the start of the string or line, and $ anchors it to the end. With the multiline (m) flag they match the start and end of each line.

How do I match a literal special character like a dot? Escape it with a backslash: . matches a literal dot, * a literal asterisk, \ a literal backslash. Inside a character class, most characters are already literal.