**Document-Term Matrix in NLP: Count and TF-IDF Scores Explained**

Cryptographic Hash Functions are a class of hash functions that are cryptographically secure. From password authentication and integrity verification to blockchain—these functions are used in a multitude of applications.
In this tutorial, we'll start by reviewing the basics of blockchain, and the relevance of cryptographic hash functions in making a blockchain secure. We'll then go over what cryptographic hash functions are, and their properties.
Additionally, we'll also see how to write code to obtain hashes, in both Python and Bash.

A self-taught programmer, and a technical writer who authors tutorials, how-to guides, and more to help developers

**Cryptographic Hash Functions** are a class of hash functions that are cryptographically secure. From password authentication and integrity verification to blockchain—these functions are used in a multitude of applications.

In this tutorial, we'll start by reviewing the basics of blockchain, and the relevance of cryptographic hash functions in making a blockchain secure. We'll then go over what cryptographic hash functions are, and their properties.

Additionally, we'll also see how to write code to obtain hashes, in both Python and Bash.

For all this and more, let's get started!

- Blockchain Basics Revisited
- What is a Cryptographic Hash Function?
- Properties of Cryptographic Hash Functions
- Is 256-Bit Security Really Secure?
- How to Compute SHA256 Sum in Bash
- How to Obtain SHA256 Hash in Python

Let's start our discussion by answering the question: *"What's a Blockchain?"*

A blockchain is an** immutable**, **distributed** ledger system. It's essentially a **decentralized** peer-to-peer network in which transactions can happen between peers *without* the involvement of a central authority.

Each block in a blockchain consists of the following:

- data/details of the transaction
- its hash
- hash of the previous block

The first block is called the **genesis block**, and it's the only block that doesn't contain the previous block's hash.

In a blockchain, the transactions are hashed using secure hashing algorithms. And here's where **cryptographic hash functions** enter the discussion.

Cryptographic hash functions are used to generate the hashes that uniquely identify the blocks.

Whenever the data in a particular block changes or is updated, the hash changes drastically. And because of this, the hash of the particular block should be updated, and the hash values in subsequent blocks should change as well.

This makes it next to impossible to tamper with the contents of a specific block. In some sense, the hashes not only uniquely identify the block but also facilitate immutability of the blockchain.

In addition, blockchains use consensus mechanisms such as Proof of Work and Proof of Stake to ensure that the transactions are indeed authentic, and are not by a malicious entity who is trying to tamper with the network.

Now that you've learned the relevance of cryptographic hash functions in blockchain, let's learn about them in greater detail in the subsequent sections.

A cryptographic hash function takes in an input message and maps it to a **fixed length** output, called the **hash** or the **digest**.

Whatever be the length of the input—it could be a single character, a string, or even a large file—the output is always of fixed length, say **N**.

In the following illustration, we pass two different inputs *'blockchain'* and *'hello'* of different lengths to the

`SHA1`

block. The

`SHA1`

block accepts inputs and maps them to an output hash that is 160 bits long—or equivalently 40 hexadecimal digits long, as shown below.**Note: SHA** stands for **Secure Hashing Algorithm**. Because of known security breaches in the past,

`SHA1`

isn't recommended for sensitive use cases anymore. However, the `SHA2`

class of algorithms, namely, the `SHA256`

and `SHA512`

are still widely used.Cryptographic hash functions have a few properties that make them secure for cryptographic applications.

**1. Deterministic **

A cryptographic hash function is deterministic. This means no matter how many times you feed in a *particular* input, you'll get the *same* output hash.

**2. Computationally Efficient**

The output hash values should be quick to compute—both when hashing transactions and during verification.

So a cryptographic hash function should be computationally efficient, allowing us to obtain the output hash in a short time.

**3. Pre-Image Resistant or Non-Invertible**

This property is based on the concept of **one-way functions**.

Let's take an example. You have a function

`f(x) = cube(x)`

: the function `f`

returns the In this case, if the output is 27, you can conclude right away that the input is 3, which is the pre-image corresponding to the output 27. Therefore, such a function

`f`

is However, cryptographic hash functions *should* be pre-image resistant.

This means that you can input a message to the hashing algorithm, and obtain the hash. But it should be *infeasible* to obtain the input message by looking at the hash.

This is illustrated as shown below.

* Pre-Image Resistance of Hash Functions (Image by the author)*

**4. Collision Resistant**

A hash function should be resistant to collision.

*But what is collision in the context of hash functions? *

Is this the collision that we're talking about? 🤔

Not exactly!🙂

Well, let's parse what *collision resistance* actually means.

A collision is said to occur when two input messages

`M1`

and `M2`

map to the `M1`

and `M2 `

that map to the same hash. **5. Exhibits Avalanche Effect**

Even a small change in the input should change the hash drastically.

In the example below, we only change a single character: 'b' to 'B'. And the output hash changes completely!

To sum up, a cryptographic hash function generates afixed lengthhash that isdeterministicyetrandom,and is cryptographically secure.

So far, you've learned what cryptographic hash functions are, and their properties. And we've mentioned that the

`SHA256`

hashing function is widely used. However, is a 256-bit hash really secure? Head over to the next section to find out.

Suppose you have the desired output hash. Recall that a cryptographic hash function is deterministic. And it outputs the *same* hash for a *specific* input.

But it's also *non-invertible. *So the only way you can get back the input is by trying to generate the output hash at your end—through a series of random guesses.

If you can generate this output hash by randomly guessing inputs, you think it's possible to eventually break the hash, yes?

Well, it's not that simple!

The

`SHA256`

algorithm outputs a 256-bit hash, or equivalently 64 hexadecimal digits. And a 256-bit hash is a sequence of 256 bits—each of which is either a So there are **2^256** total combinations in all! And this is an insanely large number. And breaking this hash by random guessing is exponentially hard.

Watch this interesting video by Grant Sanderson of 3Blue1Brown.

And in the above video, Grant explains how complex the process is.

Put simply, *even if you had access to the most sophisticated computing resources in the world, and **time** equal to **37** times the **age of the universe**, you'll still have a **1 in 4 billion chance** of successfully guessing the input.*

* An excerpt from the *

You can even use simple Bash commands to obtain the hash on applying the secure hashing algorithms.

If you're on a Linux or Mac, open up your terminal and run the following line of code. If you're on a Windows machine, consider using a shell environment such as Git Bash.

The Bash command

`sha256sum`

returns the 256-bit hash, as shown below. ```
$ printf "I'm coding"| sha256sum
860f5cae6febaa6b9064a16d78553819de43cb1e4c5a87ab267bb1c35fb41a04
```

Observe that the output hash is 64 hexadecimal digits long—each taking 4 bits.

To get the 160-bit long SHA1 hash, you can use the Bash command

`sha1sum`

. Run the above code by replacing `sha256sum`

with `sha1sum`

.```
$ printf "I'm coding" | sha1sum
cafc711fba6c8ccdcbb807e5a676e9810e5cce4c
```

In the next section, we'll see how to obtain the hash under the

`SHA256`

algorithm in Python.Python ships with a built-in

`hashlib`

module. So you can just go ahead and import it like so: `import hashlib.`

Here are the steps to obtain the hash:

- Use the

constructor to instantiate a hash object`sha256()`

- Encode the message string,
*optionally*include an encoding format - Call the

method to obtain the hex equivalent of the 256-bit hash`hexdigest()`

The following code block shows how you can do it.

```
import hashlib
message = "I'm coding"
hash_obj = hashlib.sha256(message.encode())
hash_val = hash_obj.hexdigest()
print(hash_val)
# output: 860f5cae6febaa6b9064a16d78553819de43cb1e4c5a87ab267bb1c35fb41a04
print(len(hash_val))
# output: 64
# correct! 64 hexadecimal digits; total length = 64 * 4 = 256 bits
```

Notice how the

`SHA256`

sum in this case is same as the one you obtained from Bash in the previous section. This verifies the deterministic nature of cryptographic hash functions.Now, let's try to obtain the hashes for a list of strings.

```
import hashlib
strings = ["hello","sha256","sensitive info"]
for string in strings:
hash_obj = hashlib.sha256(string.encode())
hash_val = hash_obj.hexdigest()
print(f"Hash #{strings.index(string)+1}: {hash_val}")
# Output
Hash #1: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Hash #2: 5d5b09f6dcb2d53a5fffc60c4ac0d55fabdf556069d6631545f42aa6e3500f2e
Hash #3: 034fcc03d9332ee032b5815ef69b0f21926dd2da73f0fcfd65ff90ded1700892
```

See, that's how simple it is. ✅

I hope you found this tutorial on cryptographic hash functions useful.

You've learned what cryptographic functions are what their properties are. And you've also learned how to use the Bash commands like

`sha1sum`

and `sha256sum`

to obtain the hash values. In addition, you've seen how to use Python's

`hashlib`

module to generate hashes for input strings.Be sure to try out a few more examples. Keep coding!

Note: All images in the post have been created by the author.

▶️ Advanced Topic Modeling Tutorial: How to Use SVD & NMF in Python

▶️ Learn K-Means Clustering by Quantizing Color Images in Python

▶️ Confusion Matrix in Machine Learning: Everything You Need to Know

▶️ 9 Best Data Engineering Courses You Should Take in 2022

▶️ Document-Term Matrix in NLP: Count and TF-IDF Scores Explained

L O A D I N G

. . . comments & more!

. . . comments & more!