If you’ve spent any time writing code you’ve no doubt abused regular expressions until they were an inscrutable character jumble that could give a real parser a run for its money. Even so, I was still surprised when I learned that there are 3 different kinds of parentheses in regular expressions, not just 2. And no, the 2 aren’t left and right, wise guy. The 3 types of parentheses are Literal, Capturing, and Non-Capturing. You probably know about capturing parentheses. You’ll recognize literal parentheses too. It’s the non-capturing parentheses that’ll throw most folks, along with the semantics around multiple and nested capturing parentheses. (True RegEx masters, please hold the, “But wait, there’s more!” for the conclusion). Literal Parentheses Literal Parentheses are just that, literal text that you want to match. Suppose you want to match U.S. phone numbers of the form . You could write the regular expression as . Notice that we had to type instead of just a naked . That’s because a raw parenthesis starts a capturing or non-capturing group. If we want to match a literal parenthesis in the text, we have to escape it with . (xxx)yyy-zzzz /\(\d{3})\d{3}-\d{4}/ \( ( \ Capturing Parentheses You’ve probably written some capturing parentheses too, whether you meant to capture or not. These parentheses aren’t used to match literal in the text, but instead they are used to group characters together in a regular expression so that we can apply other operators like , , , or . () + * ? {n} For example, if we want to match just the strings or we can write . We need the parentheses here because would match only the strings , and , not quite what we had in mind. can can’t /can('t)?/ /can't?/ can’ can’t Photo by on David Clode Unsplash However, there’s something else going on here. These are called capturing parentheses for a reason — namely they capture anything that matches the expression they contain for later use by your program. Continuing the can/can’t example, in JavaScript we get: const match = /can('t)?/.exec("We can't do it!");console.log(match[0]); // prints the match "can't"console.log(match[1]); // prints captured "'t" Here, contains the item captured by the parentheses. Now this is somewhat uninteresting because we really don’t care about the separately from the word . match[1] ‘t can’t The phone number example gets more interesting. In JavaScript, we can extract the area code of a U.S. style phone number as follows: const match = /\((\d{3})\)\d{3}-\d{4}/.exec("(303)555-1212");console.log(match[0]); // (303)555-1212console.log(match[1]); // 303 Let’s take a closer look at what is going on in that regular expression, . It is almost identical to the expression we used in the literal parentheses example, but this time I added a set of capturing parentheses inside the pair of literal parentheses. This tells the regular expression engine to remember the part of the match that is inside the capturing parentheses. This captured match is what we find in . Notice that the entire phone number match is in . This little example shows the power of capturing parentheses. Above, we used it to extract an area code from a phone number. We can use it to extract all kinds of text — a poor man’s parser. /\((\d{3})\)\d{3}-\d{4}/ match[1] match[0] from “Exploits of a Mom” XKCD As another quick example, we can use capturing parentheses to extract first name and last name via . will have the first name and will have the last name, assuming you’re not matching Bobby Tables’ given name (see comic), or have extra spaces to deal with. /(\D+) (\D+)/ match[1] match[2] Non-capturing Parentheses Now, we get to the third kind of parenthesis — non-capturing parentheses. There are times when you need to group things together in a regular expression, but you don’t want to capture the match, like in the can/can’t example above. To avoid capturing the , we write . Here, all we get is the full match, with no sub-matches. ‘t /can(?:'t)?/ The is a special sequence that starts a parenthesized group, just like , but the regular expression engine is told, don’t bother to capture the match in the group, just use it for operator precedence. Let’s look at a more complex example where ignoring a parenthesized group is useful. (?: ( Let’s extend that phone number regular expression to allow a prefix of or . With only capturing parentheses, this looks like . (Is this inscrutable yet?). The problem is that the area code we want to extract is in . (I’ll leave it as an exercise to the reader as to why.) This is confusing and unnecessary since we don’t care about the annotation or anything other than the area code in this example. To capture only the area code, we can do: mobile office match = /((mobile|office) )?\((\d{3})\)\d{3}-\d{4}/.exec(...) match[3] const re = /(?:(?:mobile|office) )?\((\d{3})\)\d{3}-\d{4}/;const match = re.exec('mobile (303)555-1212');console.log(match[0]); //mobile (303)555-1212console.log(match[1]); //303 Notice the two sets of non-capturing parentheses around the annotation, but the use of regular capturing parentheses around the area code. (?: Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. -Jamie Zawinksi But Wait, There’s More! And there you have it, 3 kinds of parentheses, literal, capturing, and non-capturing — , , . We should probably use more than we do, but I find it hard to read, so as long as doesn’t cause any performance issues or semantic changes to an existing regular expression (by changing the index needed to find relevant group matches), I’ll skip the extra . I’m not sure if this is the best practice, but let’s face it, regular expressions are hard enough to read as it is. \( ( (?: (?: ( ?: True RegEx masters know that there are other types of parentheses that use the syntax as well. Alas, I’m not actually a RegEx master so I’ll leave you to searching for other sources to learn about those, as they aren’t supported in many native regular expression libraries, JavaScript being one of them. are among the most useful of these. (? Named regular expression groups

Naked

Is React leading the Infrastructure As Code Movement?

Debugging with Science!

Check out Adapt - React, but for Infrastructure

3 Kinds of Parentheses: Are you a RegEx Master?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Debugging with Science!

104 Stories To Learn About Go

105 Stories To Learn About Functional Programming

100+ Free Pluralsight Courses to learn Python, Java, and Spring Boot

10 Websites to Learn JavaScript for Beginners

104 Stories To Learn About Programming Top Story

Debugging with Science!

104 Stories To Learn About Go

105 Stories To Learn About Functional Programming

100+ Free Pluralsight Courses to learn Python, Java, and Spring Boot

10 Websites to Learn JavaScript for Beginners

104 Stories To Learn About Programming Top Story

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps