paint-brush
Regex Refresher: Named Groups and Backreferencesby@amrdeveloper
368 reads
368 reads

Regex Refresher: Named Groups and Backreferences

by Amr HeshamNovember 28th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The feature helps you to group your regular expressions with name and reference to those groups later in the. feature. It was introduced first time in Python re module then Microsoft developers supported it in.NET with different syntax. Java supported it from JDK 7, now it supported in most of the modern programming languages like Ruby, PHP, R …etc, etc, this feature is very useful in Android Development and Compiler Design. To learn more about the history of this feature and more details I recommend checking [regular-expressions.info].

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Regex Refresher: Named Groups and Backreferences
Amr Hesham HackerNoon profile picture

Hi, I am Amr Hesham a Software Engineer, I am interested in Android Development and Compiler Design, In this article, I will talk about a very good and useful feature which is Regex Named group and Backreferences with examples,


Regex Named group and Backreferences introduced first time in Python re module then Microsoft developers supported it in .NET with different syntax, and Java supported it from JDK 7, Now it supported in most of the modern programming languages like Ruby, PHP, R …etc,


This feature helps you to group your regular expressions with name and reference to those groups later in the regex, to learn more about the history of this feature and more details I recommend checking regular-expressions.info tutorials written by Jan Goyvaerts.


Let’s start with examples about how to define a named group and get group by name and by index, I will use Kotlin programming language, the concepts are the same, bug as i said before but some non JVM languages have different syntax for the same feature.


Suppose we want to parse color attributes in our Android project color.xml file and print each color name and value, for example here we have 3 colors.


<color name="black">#000000</color>
<color name="white">#ffffff</color>
<color name="grey">#cccccc</color>


And we want to print


black #000000
white #ffffff
grey #cccccc


We can do this task using many different techniques, but I will show you how to do it using regex named groups easily.

First, we need to create a regex that matches each attribute normally, each attribute contains type, name and value like this


<type name=”attribute_name”>value</type>


So the normal regex will be


You can use Regex101.com to test your regex easily and understand it, but make sure you use the selected Java 8 flavor.


Now after we created our regex we need to group the information that we need to get them,


What we need is attribute name and value, so just put their regex inside ( ) like this


<\\w+ name=\"(\\w+)\">(.+)</\\w+>


Now attribute name will be in group number 1 because group number 0 contains the full text which is matched by our full regex and attribute value on group number 2,


To get information first, We will compile this pattern


val pattern = Pattern.compile(attributePattern)


Then we will get every substring that matches our pattern, and get the 2 groups 1 and 2


val matcher = pattern.matcher(text) 
while (matcher.find()) {
    val attributeName = matcher.group(1)
    val attributeValue = matcher.group(2)
    println("$attributeName $attributeValue")
}


That’s it!! and the output will be exactly what we want.


To use grouping by name all you need is to add a name for each group, just add ?<NAME> inside your group for example.


<\\w+ name=\"(?<KEY>\\w+)\">(?<VALUE>.+)</\\w+>


The group name must be an alphanumeric sequence starting with a letter and you can’t name two groups with the same name.


Now instead of getting group by index like 1, 2 we will use KEY and VALUE,


while (matcher.find()) {
    val attributeName = matcher.group("KEY")
    val attributeValue = matcher.group("VALUE")
    println("$attributeName $attributeValue")
}

And you will get the same output :D.


Now after we learned about what is named groups and how we can use it, it’s time for Backreferences.


Basically, Backreferences are used to match the same text as previously matched by a group, for example, suppose we want to check if a number contains only one repeated digit like 1, 22, 333, 444 so how we can do this using Regex,


To use Backreferences first we need to define a group and our group, in this case, will be one digit (?<DIGIT>\d), so this will match the first digit right, then we will use Backreferences to check if all other digits are the same as the matched text for our first one, to do this you can use ‘\k<DIGIT>’ or by index like ‘\1’.


Our full regex will be “(?<DIGIT>\d)\k<DIGIT>” or “(\d)\1” this means we expect one digit with a group DIGIT and zero or more of the same digit that matched by this group, full code will be like this


fun main() {     
    val repeatedDigitRegex = "(?<DIGIT>\\d)\\k<DIGIT>*"
    val pattern = Pattern.compile(repeatedDigitRegex)
    println(pattern.matcher("1").matches())
    println(pattern.matcher("22").matches())
    println(pattern.matcher("333").matches())
    println(pattern.matcher("4444").matches())
    println(pattern.matcher("10").matches())
    println(pattern.matcher("21").matches())
    println(pattern.matcher("101").matches())
}


And the output will be


true true true true false false false


There are many ideas that can be created using this feature for example check if HTML start and closed by the same tag, check if anything is repeated …etc


There is another useful method called replaceAll from matcher class, it can replace the current matched substring with any string or with group matched text using references, for example in the last example if we want to replace all the repeated digits with only one of them, so we need to replace the regex with just the first group of it.


So instead of using matches we will use replaceAll and pass the group reference by a number which is $1, code will be like this.


println(pattern.matcher("2222").replaceAll("$1"))


and printed output will be 2


I hope you enjoyed this article and if you want to learn more about this topic there are some useful resources.



You can find me on: GitHub, LinkedIn, Twitter.


Enjoy Programming 😋.