Using Claude as your regex generator and explainer
Every engineer has a regex moment: you need a pattern, you vaguely remember how lookaheads work, you spend twenty minutes on regex101.com, you end up with something that works on your test cases but breaks in production on an edge case you did not think of. I stopped doing that. Now I describe what I want in plain English and Claude gives me the pattern, explains it line by line, and generates test cases I would not have thought of. The whole process takes three minutes.
The generation prompt
The most useful thing about describing regex requirements in English is that it forces you to be precise about edge cases. Here is a prompt format that works well:
Write a Python regex pattern that matches a valid UK phone number.
Rules:
- Can start with +44 or 0
- After the country code / leading 0, must be followed by 10 digits total
- Can have spaces or hyphens as separators between digit groups
- Must NOT match if there are letters anywhere
- Must NOT match partial numbers embedded in longer strings (anchor it)
Provide:
1. The pattern
2. A plain-English explanation of each part
3. 10 test cases: 5 that should match and 5 that should not, with comments explaining why
Claude's output:
import re
# UK phone number pattern
pattern = r'^+?(?:44|0)s?(?:d[s-]?){9}d$'
# Explanation:
# ^ - start of string
# +? - optional literal plus sign
# (?:44|0) - non-capturing group: either "44" or "0" (country code / trunk prefix)
# s? - optional single space
# (?:d[s-]?){9} - exactly 9 digits, each optionally followed by a space or hyphen
# d - one final digit (ensures exactly 10 digits after prefix)
# $ - end of string
test_cases = [
("+44 7911 123456", True), # E.164 with spaces
("+447911123456", True), # E.164 no spaces
("07911 123456", True), # UK local format with space
("07911-123-456", True), # UK local format with hyphens
("0207 123 4567", True), # London landline
("+44 7911 12345", False), # Too short
("+44 7911 1234567", False), # Too long
("07911 12345a", False), # Contains a letter
("447911123456", False), # Missing + for country code (ambiguous — treated as wrong)
("123-456-7890", False), # US format, not UK
]
for number, expected in test_cases:
match = bool(re.fullmatch(pattern, number))
status = "PASS" if match == expected else "FAIL"
print(f"{status}: '{number}' -> {'match' if match else 'no match'} (expected {'match' if expected else 'no match'})")
Notice the test cases include edge cases I would not have put in my own test suite: the ambiguous "447911123456" (is the 44 a country code or part of the number?), the too-long number. These are the cases that cause bugs in production.
Explaining existing patterns
Almost as useful as generating patterns is explaining ones you have inherited. The regex below appeared in a legacy codebase with no comment:
Explain this regex in plain English. What does it match? What are its edge cases? Are there any bugs?
(?i)^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Claude explains it is an IPv4 address validator, breaks down each group (25[0-5] handles 250-255, 2[0-4][0-9] handles 200-249, [01]?[0-9][0-9]? handles 0-199), notes that (?i) at the start is unnecessary since there are no letters to ignore, and flags that it does not match IPv4 addresses with leading zeros in a way that is consistent across programming languages (010 is valid in some contexts, invalid in others).
That last observation is a real bug in some IP validation contexts. Claude found it. I would not have.
Converting between flavours
Regex flavours differ between JavaScript, Python, Go, and PCRE. A pattern that works in Python may not work in JavaScript. Conversion prompt:
Convert this Python regex to JavaScript (ES2018+):
(?P<year>d{4})-(?P<month>0[1-9]|1[0-2])-(?P<day>0[1-9]|[12]d|3[01])
Notes:
- Python uses (?P<name>...) for named groups
- JavaScript uses (?<name>...) for named groups (ES2018+)
- Show how to access the named groups in JavaScript code after matching
const pattern = /(?<year>d{4})-(?<month>0[1-9]|1[0-2])-(?<day>0[1-9]|[12]d|3[01])/;
const input = '2026-04-04';
const match = input.match(pattern);
if (match) {
const { year, month, day } = match.groups;
console.log(year, month, day); // "2026" "04" "04"
}
Generating patterns for specific data formats
Some data formats have subtle rules that are easy to get wrong. I keep a library of prompts for common ones:
Generate a regex for each of these. For each, explain any format-specific rules you are encoding:
1. AWS S3 bucket name (rules: 3-63 chars, lowercase and hyphens, cannot start/end with hyphen, no consecutive hyphens, not formatted as IP address)
2. Semantic version string (major.minor.patch, optional pre-release like -alpha.1, optional build metadata like +001)
3. UUID v4 (case-insensitive, with or without hyphens)
4. JWT token (three base64url-encoded segments separated by dots)
Having Claude enumerate the rules it is encoding is important. If the rules are wrong, you catch it before the pattern is wrong.
The workflow I actually use
- Describe what I need in plain English, including explicit edge cases
- Ask for the pattern + explanation + test cases
- Run the test cases to confirm they all pass
- Add the explanation as a comment above the pattern in the codebase
- Add the test cases to the test file with a comment linking to this conversation
The comment linking to the conversation is the most valuable part. Six months later, when someone wants to understand why the pattern looks the way it does, or when a new edge case appears, the context is one click away.
Regex will never be readable. But with Claude generating it, explaining it, and stress-testing it, the pattern you ship to production will at least be correct.