Hackernoon logoEverything You Need to Know About Text String Manipulation by@tom2

Everything You Need to Know About Text String Manipulation

For those new to coding or even experienced coders, this guide details how to manipulate text strings. It is useful if you haven't worked with strings or user-facing web applications. A string is just a specific name used to label a piece of data that contains text. Every software application with a presentation layer (web apps) applies a form of string manipulation. This is done through string manipulation using code such as javascript. You will quickly go from beginner to expert using javascript, built-in methods, and powerful regular expressions.
image
Rutkat Hacker Noon profile picture

Rutkat

Front End Engineer + Blockchain Advocate

For those new to coding or even experienced coders, this guide details how to manipulate text strings, just like the pros. It is useful if you haven't worked with strings or user-facing web applications. You will quickly go from beginner to expert using javascript, built-in methods, and powerful regular expressions.

Have you wondered how censoring words on the internet occurs? Perhaps you want to know why your username on apps has to conform to specific rules? This is done through string manipulation using code such as javascript. A string is just a specific name used to label a piece of data that contains text and can consist of alphanumeric characters mixed with numbers and symbols.

Why is it important? Every software application with a presentation layer (web app) applies a form of string manipulation, and it is the foundation of algorithms. Think about how it applies to business ideas as well. Grammarly is an excellent example of a business that is all about string manipulation.

Text And Strings

The first thing to consider is how to engage text manipulation from a visual perspective. For example, if you're a non-coder or just a human being, you know you can write text on paper, on your smartphone, computer, and even rice. Okay, maybe not rice. The writing can occur from left-to-right, top-to-bottom, right-handed, left-handed, etc. Afterward, you can manipulate what you wrote with an eraser, scratching it out, or tapping the backspace key.

From a coder's perspective, it doesn't work the same way, except when writing the actual code. The code instructions for manipulating strings have restrictions and specific methods. You will learn these methods here but let's start with a visual approach to envision how code will do the magical transformations.

Direction

Like writing, strings can be manipulated from left-to-right and right-to-left. The length of a string can be as little as a single space to pages of text, but most commonly in code, a string will not be longer than a sentence. A string can be a username, phone number, a snippet of code, a poem etc. When working with a specific coding language, there are built-in methods to use, or you can create your own custom method. A combination of these methods can manipulate text to do virtually whatever you want. You can become a string master with the force of practice.

Besides processing a string from left-to-right or right-to-left, it can be broken down and manipulated to individual characters using the number representing the position of any character. This is known as the index value of the string. For example, the string "Hello!" contains 6 characters, so your code can directly access any letter by indicating a corresponding index number.

"Hello!"
 123456 (number represents position)

Traversing

Several coding methods will process the string in this ascending-numerical order however since computers compute with a basis of zero, the first item position is always 0. To be more accurate, I should state that the computer is traversing, not processing strings. The difference is that "processing" indicates an effect happens, whereas "traversing" indicates a passage or travel across something. When dealing with code instructions, you should be conscious about the computing resources utilized so you may not need to process every character in a string but rather traverse to the individual character you need to change.

For example, your objective is to remove punctuation, so you have several approaches to remove the "!" From "Hello!". You can use a method to find the position of "!" or you can access the last character of the string. These methods include getting the length of the string, getting the index of "!" or traversing the string in reverse. If you use the length method, you have to remember to subtract 1 since computing starts with zero. Also, spaces count as part of the string and will have an index position, thus increasing the length of the string.

The INDEX number represents the position of a character in a string.

"Hello!"
 012345 character positions

"Hello!".length - 1
Length is a property of a string.

Here are methods to get the position of a character in a string:

"Hello!".indexOf("!") 
Find the first position of a character searching from left-to-right.

"Hello!".lastIndexOf("!") 
Find the last position of a character searching from right-to-left.

"Hello!".length - 1
Find the last character in a string.

All give 5 as the result. You can do the opposite with the charAt() method which returns the character from a string specified by the position.

"Hello!".charAt(5)
Result is "!"

One Character

Now you know the basics of traversing a string one character at a time, which are from the left, from the right, and from the end using index numbers. However, not all methods return the position of the character you seek. You may prefer a result as a boolean data type instead. Meaning your search is a test that returns true or false.

Boolean test methods: includes,

startsWith
,
endsWith
.

"Hello!".includes("!")
Returns True

"Hello!".startsWidth("!")
Returns False

"Hello!".endsWith("!")
Returns True

These character checks are not as useful as finding the position of a character because you cannot proceed with your algorithm if your purpose is to modify the string with the same search query. Besides, there are more powerful methods for true/false checks, which we will be described later. Up to this point we have learned to traverse a string left-to-right and right-to-left so what's the next step? Modification!

We can use several built-in methods or create our own for changing the text in a string. Let's start with the methods which don't require indicating a search query or index position. Since humans care more about uppercase and lowercase letters than computers, we can instantly transform an entire string use these two methods:

"Hello!".toUpperCase()
Result "HELLO!"

"Hello!".toLowerCase()
Result "hello!"

If you have seen a camel, then you know they have humps, and in programming, when code

LooksLikeThis
- it is called the camel case. This is because it has humps and no spaces. You will have to traverse and recognize this type someday. We do this to make the text easier to read for humans because who likes to read "a sEnTEnCe liKE ThiS!?" Actually, this method is also useful for web apps like blogs which take an article title and create a URL known as a slug.

Example:
Article name "Mastering String Manipulation"
Slug url "domain.com/mastering-string-manipulation/"

Since there are multiple methods to get the same result, Let's begin with this example of combining strings into one. This is known as concatenation. You can use the

+
symbol or the concat method. Please note that since Javascript does not automatically enforce data types, so you should ensure that the data types are strings as opposed to arrays or booleans when using +. This topic is for another entire article. With the lack of data type enforcement, the erroneous output can occur as a result of type coercion. Meaning the + sign can accidentally change an integer to a string.

"Hello" + "World"
Result "HelloWorld"

"Hello".concat("World")
Result "HelloWorld"

"12" + 12
Result "1212", not 24.

The newest way to concatenate strings is using template literals which utilize the back-tick symbol

`
and curly braces
{}
after the
$
symbol. Yes, using those three symbols is required. You will see this in emails as well as websites to customize the writing output based on the user's information.

var myString = "Hello"
var string2 = "World"
console.log(`${myString} ${string2}`)
Result "Hello World"

Previously I stated that empty spaces count towards the length of a string. In other words, they occupy a space in a string and can be manipulated as well. Since we want to be efficient in saving data as well as making text easy to read, we want to prevent unnecessary blank space and this can be done with the trim method.

It removes empty spaces at the beginning and end of a string but not in the middle. If you want to remove empty space in the middle of a string, you have to utilize a more powerful method known as a "regular expression," which will be described later.

"  Hello World.  ".trim()
Result "Hello World."

To do the opposite, there is a method for that. You can pad a string at the end or beginning with any character. Let's say your web app deals with sensitive information like credit cards, or you have ID numbers that have to conform to a specific length. You can use the

padStart
and
padEnd
methods for this. For example, a credit card number is saved in the app, but you only want to show the last four digits prefixed with the * symbol.

"4444".padStart(8, "*")
Result "********4444"

"1234".padStart(4, "0")
Result "00001234"

Besides concatenating strings, you can also repeat them with a multiplier. It's uncommon to repeat text, so the method will be more useful for symbols such as periods. For example, when you need to truncate a string and indicate to the reader that the string continues, you can use ellipses like this... It could also be useful for songs where lyrics are repeated. Actually, it's rare to see this method in code.

"Hello-".repeat(3)
Result "Hello-Hello-Hello"

Pizza Slice

Let's expand our character searches!

Using the previous search methods, we are only able to retrieve one character at a time from a string. What if we want to select a word or a section of a string using an index range. Well, we can do that by slicing a pizza and eating the slice we want. Almost! The string method is called slice, so a pizza slice is a good metaphor. For this, you have to pass in the start and end positions of your search query. The start position can be a negative number that will traverse the string in reverse or from the end of it. You may think, wouldn't it be easier to just match a word inside a string? Well, yes, but in some cases, coders may not be able to predict what strings they will encounter or the string will be a pre-determined length.

"Hello World".slice(6)
Result "World"

"Hello World".slice(6, 8)
Result "Wo"

"Hello World".slice(-3)
Result "rld"

Up to this point, you have learned to traverse strings from the left and from the right, get character positions, do boolean tests, transform character cases, concatenate strings, remove empty space, pad, repeat strings, and extract substrings. How about we learn how to revise our strings with the replace method. Scenarios for this can be removing explicit words, swapping the first name with the last name, swapping "-" for empty space " ".

The difference with the replace method compared to the previous methods in this article is that replace accepts strings and regular expressions as search queries. It also accepts a function as a second parameter, but we won't go into custom functions at this time. With replace, you don't need to rely on using index positions but you need to be familiar with regular expressions (regexp for short) because it is how you can replace multiple instances of the search query. Note the usage of a regular expression with the forward slashes surround the search term.

"Very bad word".replace("bad", "good")
Result "Very good word"

"Very bad bad word".replace("bad", "good")
Result "Very good bad word"

"Very bad bad word".replace("bad", "good")

"Very bad bad word".replace(/bad/, "good")
Result "Very good bad word"

"Very bad bad word".replace(/bad/g, "good")
Result "Very good good word"

Cryptic Patterns

Are you beginning to feel the power of string manipulation? You are slowly becoming an expert. A regexp can be denoted using the forward-slash/outside of the search word and the letter g after the second slash / indicates a global search which will replace multiple instances of the word inside the string. Generally, it's better to use

indexOf()
and
replace()
for faster function execution speed and when searching for one instance of a word.

Otherwise, to understand regular expressions, you have to memorize the symbols on your keyboard - many symbols, including letter cases. In fact, there's nothing regular about "regular expressions". It should be called "cryptic patterns" because no human being can read them without finding the meaning of the symbols used. To simplify the meaning of human language consumption, you can also say they are string-searching algorithms.

Magic Wand

Before I show you some of the characters used, I would like to paint you a picture of the traversing that happens using regexp. First, imagine a magic wand in your hand. Waving the magic wand releases magical stars onto the string which modify it to the desired string you want. Each star represents a symbol in the regular expression, and that is what you have to come up with as a search pattern.

Regular expressions are truly powerful search techniques. You can find a needle in a haystack instantly. Many input forms on the web use regular expressions to convert text into specific formats such as zip codes, phone numbers, domain names, currency values, and the list can go on. Do note that there are different regular expression engines depending on the programming language, and the following is specific to javascript.

/term/ 
regexp
always has to be contained inside two forward slashes. "A/B/C" is not a regexp. Every character or symbol between the slashes represents something other than the symbol itself.

/abc/
Any alphabetical character without symbols is equivalent to a regular consecutive search string.

/\$/
An explicit search for a symbol has to be prefixed with a backward slash \, in this case, it's the dollar symbol. It's called escaping even though none of them will run away. The symbols still need to escape from the wrath of your cryptic search desires.

/^abc/
and
/abc$/
These symbols don't have to be escaped. They are the carrot ^ and dollar sign $. Their purpose is to restrict the search to the beginning and end of a string, respectively. This is also known as anchoring so they can be called anchors. In this case, it means if "abc" is in the middle of "xyzabczyx", it will be ignored. ^ means the string must start with "abc" and $ means that the string must end with "abc". You can apply one or both.

What if you don't want to search for an alphabetical character nor a symbol but a formatting change in the string. Since I mentioned an empty space has meaning in code, so does a tab, a new line, and a carriage return. These can be searched using a combination of the backslash and one letter. For brevity, we've excluded the surrounding slashes.

\n find a newline
\t find a tab
\r find a carriage return

This is mind-blowing, right? You can manipulate empty space and look for invisible metacharacters which control formatting using regexp. Let's try a regexp example based on what we know so far. We want a specific dollar amount at the beginning of a string $10.xx and any cent amount.

/^\$10\.\d\d/

We are using ^ to match the start
-then a backslash \ to escape the dollar $ sign
-the number 10 followed by an escaped period \.
-the escaped \d represents any digit 0-9, so we have it twice

As previously mentioned, adding a backslash to any letter changes the search pattern. Here are some search patterns with the backlash and letter combination.

\w matches any word
\d matches any digit
\s matches empty space

In addition to that, you can match the negation of the opposite with the capital letter equivalents.

\W don't match a word
\D don't match a digit
\S don't match empty space

Globally Insensitive

Now that you are getting more comfortable with the possibilities of regular expressions, you need to be aware of the letters "g" and "i" at the ending of the regexp term, right after the second forward slash. These are known as flags that modify your search. The "g" means global, so it will return more than one result match if available, while the "i" means insensitive in regards to text case. Uppercase or lowercase will not matter using this flag.

/term/g Finds multiple instances, not just the first
/term/i Finds uppercase and lowercase characters

To expand on your searches, here's the next eddition of complexity. You may want to find a combination of letters, numbers, or symbols. You can do this by grouping inside parentheses

()
and brackets
[]
. The brackets are specific to character ranges such as 0-9 or A-Z uppercase, a-z lowercase.

You can use multiple dashes for multiple ranges inside a single set of brackets. The parentheses are not useful alone, but when you have additional search terms in one regexp. To throw in a monkey wrench, the carrot

^
symbol inside a bracket set will negate the search.

/[abc]/ Matches any of the letter a, b, or c.
/[0-7]/ Matches numbers 0-7 anywhere in the string.
/[^0-7]/ Don't match numbers 0-7 anywhere in the string.

[0-9] is identical to the

\d
for digits while
\w
is identical for [a-z] words.

Using parentheses

()
is useful when you want to search more than one pattern such as international phone numbers while brackets
[]
or for searching sets. When using parentheses in your search, you may also include the pipe symbol
|
as an OR operator. This means your result can be the search pattern on either side of the pipe. This is known as alternation. Here are examples:

/[abc](123)/ matches a, b, or c, followed by 123
/gr[ae]y/ matches gray or grey
/(gray|grey)/ matches gray or grey

Quantity to Match

Do you want to match a specific amount of letters or numbers? Perhaps 0 or 1, 1 or many, only 4. It's all possible with regular expression quantifiers. Here's are quantifier symbols and how you can use them. We will use the letter "a" as part of the example.

/a*/ Match 0 or more letter a
/a+/ Match 1 or more letter a
/a?/ Match 0 or 1 letter a
/a{4}/ Match exactly 4 consecutive letters a.
/a{2,3}/ Match between 2-3 letters a.

The possibilities don't stop here. This is why algorithms utilize regular expressions regularly so becoming an export in them is going to take you a long way. In total, there are 11 metacharacters available for regular expressions.
They are:

\ ^ $ . | ? * + () [] {}

Each one has a purpose.

Another practical example is to find html tags because they are the foundation of websites. Let's think this through before typing out the expression. We need at least one letter because all tags start with a letter, and while it should be lowercase, we may encounter legacy html that is capitalized. Next, we shall expect more letters or a number such as h1 tags. While the

*
will get one or more characters, we can limit the amount using
{}
instead. The following will capture html tags without attributes:

/<[A-Za-z][A-Za-z0-9]*>/g Matches html tags

Finally, there is another advanced concept if regular expressions weren't advanced enough. It is called the lookahead. There's a positive and a negative lookahead. It must be placed inside parentheses and begin with a question mark ?. Essentially a lookahead matches the search pattern but does not capture it or you can think of it as to match something not followed by something else. This is useful when making a combined search pattern by grouping. To demonstrate, let's search for a dollar value in a string that is followed by "USD", but we don't want to capture the "USD". We will use the positive lookahead using

(?=
and the negative lookahead using
(?!
.

/\$30(?=USD)/ Matches $30 from "The product costs $30USD"
/\$30(?!USD)/ Matches $30 from "The USD value is $30"

Begin Your Journey

Now you have gone through the fundamentals of querying, matching, and modifying the data primitives of javascript known as strings. Just reading this won't give you the ability to work these methods. You must use it in practice through code editors and internet browsers. The examples provided in this article can be used to test them for yourself, and you should retype them instead of copying and pasting. So go forth and build up your skills in coding with javascript.

Article Photo credit https://unsplash.com/@agni11

Also published here.

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.