Everybody talks about regular expression, but everyone hates regular expression yet ends up using regular expression! So what is regular expression? umm, we need to go deeper? So yeah, Let’s dive into building blocks of regex with a short intro.. regular expression: Regular expression or rational expression itself is an object and describes a pattern of characters. It allows us to search for specific patterns of text it also help match, locate, and manage text. Though they look pretty complicated yet they are very powerful, you can absolutely create a regex for almost any pattern of text you think. Building block of regular expression: Metacharacters are the building blocks of regular expressions. Characters in regex are understood to be either a metacharacter with a special meaning or a regular character with a literal meaning. Reserved meta-characters: Meta characters that are reserved and need to be escaped: .[{()\^$ |?*+ we gonna see example of escaping later. Other common meta characters are: Caret (^): Matches the start of the string, and in mode also matches immediately after each newline. (^) multiline example: ^\d{ } will patterns like . 3 match with "456" in "456-112-112" Dollar ($): Matches the end of the string or just before the newline at the end of the string, and in mode also matches before a newline. ($) multiline example: \d{ }$ will patterns like . 3 match with "112" in "456-112-112" \d: matches whole number or digit . Here number of determines the umber of digits our regex will match for. i,e: means single digit and so on. \d (0–9) \d \d \d\d = double digits example: will match as there are 4 digits in “1223” and our regex is a match for 3 digits. \d\d\d 327 , 123, 787 but not 1223 \d =1 \d\d = 12 as it returns 3 digits but contains 6 digits. \d\d\d ≠ 473847 473847 because it will match only digits but cat contains letter. \d\d\d ≠ cat \D: Reverse of . Matches anything except digits. \d example: \D\D = AB \D\D = xy as it won’t match numeric character. \D\D ≠ 12 \w: Matches any alpha-numeric(word) characters. example: \w\w\w = 467 \w\w\w\w = Crow \w\w\w ≠ python doesn’t return python because python contains 6 characters. \w\w\w \W: Similar to \D \W is reverse of \w i,e: Matches anything but alpha-numeric characters example: \W\W = ,, or !! or @# \W\W\W = !@# as every character is alpha-numeric. \W\W\W\W != Titanic2 : /s Matches any white-space characters such as space and tab. For example from upper example_text the regex will match only the space between two words and ignore everything else. \s : /S Matches any non-whitespace characters unlike \s Repeaters ( *, + and { } ): are called repeaters as they denote that the preceding character is to be used for more than one time. *, + and { } Asterisk symbol ( * ): Asterisk matches when the character preceding matches 0 or more times. i.e: It tells the computer to match the preceding character (or set of characters) for 0 or more times (upto infinite). * example: and so on .. Gre*n = Green(e is found 2 times), Grn(e is found 0 time), Greeeeen (e is found 5 times) as there is “s” followes by “ee”. tre* != trees Plus symbol ( + ): sign matches when the character preceding matches atleast one or more times (upto infinite). (+) ‘+’ : example and so on.. Gre+n = Green, Greeeen, Gren as “e” is absent here. Gre+n != Grn Dot(.): The period matches any alphanumeric character or symbol. Interestingly it can take place of any other symbol and for that reason it is being called . Wildcard example: and so on Gre. = Gree, Gren, Gre1 as . by itself will only match for a single character, here, in the 4th position of the term. n is the 5th character and is not accounted for in the RegEx. Gre. != Green but will match Green as it tells to match any character used any number of times. Gre.* Alternation (|): Allows for alternate matches. | works like the Boolean OR. example: creates a regular expression that will match either A|B A or B will match either H(i!|ey!) Hi! or Hey! will match any name started with . M(s|r|rs)\.?\s[A-Z]\w+ Ms, Mr or Mrs Question mark (?): Matches when the character preceding ? occurs 0 or 1 time only, making the character match optional. example: (u is found 1 time) Favou?rite = Favourite (u is found 0 time) Favou?rite = Favorite Character set ([]): is used to indicate a set of characters. In a set: [] Characters can be listed individually, e.g. will match . Ranges of characters can be indicated by giving two characters and separating them by a , [cat] 'c', 'a', or 't' '-' : example will match any uppercase ASCII letter, [A-Z] will match any digit from . [0–9] 0 to 9 will match all the two-digits numbers from [0-3][0-3] 00 to 33 will match any hexadecimal digit. [0-9A-Fa-f] If - is escaped (e.g. or if it’s placed as the first or last character (e.g. , it will match a literal '-'. [A\-Z]) [A-]) The order of the characters does not matter. Special characters lose their special meaning inside sets.For example, will match any of the literal characters [(+*)] '(', '+', '*', or ')' To match a literal '{' inside a set, precede it with a backslash, or place it at the beginning of the set. For example, both and will both match a parenthesis. [()[\]{}] []()[{}] Character group (): A character group is indicated by () matches the characters in exact order. example: not (abc) = abc acb not (123) = 123 321 will match any url. There are 3 groups here. https?://(www\.)?(\w+)(\.\w+) 1st group: the optional www. 2nd group: the domain name etc google, facebook 3rd group: top level domain .com, .net, .org There is another implicit group group 0 group 0 is everything that we captured in our case the entire . url Quantifiers: regex use quantifiers to indicate the scope of a search string. We can use multiple quantifiers in our search string. quantifiers are: {n}: Matches when the preceding character, or character group, occurs n times exactly. example: \d{3}=123 pand[ora]{2} = pandar, pandoo as the quantifier only allows for 2 letters from the character set . pand[ora]{2} ≠ pandora {2} [ora] {n,m}: Matches when the preceding character, or character group, occurs at least n times, and at most m times. : example \d{2,6} = 430973, 4303, 38238 3 does not match because it is 1 digit, so outside of the character range. \d{2, 6} ≠ 3 Escaping Metacharacters: To search for a character that is a reserved metacharacter (any of ), we can use the backslash \ to escape the character so it can be recognized. .[{()\^$|?*+ Example: Below regex will match any valid mail id. Here we’ve used \ to escape reserved character. ^([a-zA-Z0–9_\-\.]+)@([a-zA-Z0–9_\-\.]+)\.([a-zA-Z]{2,5})$ Congratulations! Now you know the very basic of regex and it’s already too much for a day! In my upcoming article we will practice regex with python.