Deep Vs. Shallow Copying

Written by marvin | Published 2020/09/19
Tech Story Tags: memory-management | memory-allocation | datastructure | algorithms | javascript | python | rust | programming | web-monetization

TLDR A stack and heap are different data structures used to store variables in memory. The pointer holds the location of where the actual data is stored, so we are just given a reference to it. It's about the speed compared to heaps, so instead of moving around a whole chunk of data(the heap) while mutating it, just carry the reference to the data. For static data types, variables are added directly onto the stack, meaning we would have no heap to store a simple 456.98 because the sizes of these sizes are already known.via the TL;DR App

Let's go back one moment. A little further down to our data structures. The dear heaps and stacks of them.Quite literally.
What happens when I assign a variable? What about when I pass it as a parameter?
How does the program know this is what I'm talking about?
Before we go to a higher level of abstraction, that is, looking at it like a note title taken on a piece of paper so that you can flip the page and just look at the variable value, we should look at how the machine 'thinks' about it.
Close your eyes for a moment and declare a variable:
my_variable = "random";
See, when you do this, you're adding to the list of things the program has to remember. It's like one on top of another of 'cart' lists. A stack.
As it stands, your program will store your string as below:
Where the first diagram with the pointer(location of the variable in memory), len(for the length of the string) and capacity(the amount of memory, in bytes, the machine gives to this variable for use) is a stack and the other is a heap. Whatever we do with this variable, be it a mutation, concatenation, making a sub-string or whatever there is, refers to the stack.
In the stack, the pointer holds the location of where the actual data is stored. So we are just given a reference to it.
Why do we do this? Why do we store it in different data structures?
Simple. It's about the speed.
Stacks are faster compared to heaps. So instead of moving around a whole chunk of data(the heap) while mutating it, just carry the reference to it. I mean, the program is already doing its tasks (whether heavy or not), so there is no need to add overhead here.
A point to remember, not all data is assigned as such. For static data types, that is, boolean, integers, floats, and chars, variables are added directly onto the stack. So we would have no heap to store a simple 456.98 because the sizes of these types are already known by the program except in the rare case it is user input!
The size of these types, more so numbers(integers and floats), are determined based on whether they can be negative (signed) or exclusively positive(unsigned). This should remind you of how you declare your variables in math. You would say that any number in your paper is positive unless stated otherwise, or as we call it here unless signed.
So this assignment would work with compound data types - the result of combining two or more static types.
Example:
  • string (a combination of chars)
  • arrays
  • tuples
... and so forth, depending on how your language of choice calls it, for instance, dictionary vs JavaScript object.

Back to copying.

You want two variables to refer to the same thing and you want to edit one of the variables without affecting the other.
I suppose you could assume a simple re-assignment, right?
my_variable = "random"

my_other_variable = my_variable
Whether it's
console.log()
,
print()
or
printf()
, (choose your weapon and let's make the battle legendary!), anything that your muscle memory has right now. The two will show
random
in stdout/ output. The caveat?
Yes, you've duplicated the stack, not the heap. Being as it is, every language's goal at optimum performance. Keeping it as tidy and efficient as possible. No overhead. Pointing to the same memory space.
So what happens if I mutate one variable?
my_other_variable  + 'ramblings' 

print(f"My second variable: {my_other_variable}")
print(f"My variable: {my_variable}")
In both cases, the output is a string:
random
To get completely two different items with the same data, in that both can be mutated independently, you have to take a different approach; deep copy.
A warning
As far as memory is concerned, deep copying is memory consuming as it has to get the pointer and follow it to where the data is stored then duplicate this heap.
Depending on what language you are using, we have the inbuilt
copy 
module in python, javascript and or copy for lower-level languages and so on and so forth (We cannot simply list all the ways to deep copy across the multiverse)
import copy

my_variable = "random"

my_other_variable = copy.deepcopy(my_variable)
Love javascript much?
let myVariable = "ramblings"


let mySecondVariable = `${myVariable}`


mySecondVariable = 'random ' + mySecondVariable

console.log('My first variable',myVariable)
console.log('My second variable',mySecondVariable)
There are, of course, other multiple ways of doing this. It is, after all, javascript.
A point to mark, especially with objects, lodash, dearest ramda or rfdc work perfectly. Custom method for your implementation? Go ahead, just not
JSON.stringify()
.
The mad rustacean?
let my_variable = String::from("random");

let my_other_variable = my_variable.clone();
Having done this, you can manipulate your new variables in any way you want. Go to the moon if need be. Just need a couple of dollars more.
It is this same principle that governs the passing of variables across functions and objects. Passing a pointer to the original data and not the whole
heap
. Comprende? I sure hope so. So go forth and choose wisely.
Let's leave this piece at that, and chat in the comments if need be.
And yes, we can chat tech on twitter too marvinus_j

Written by marvin | Software engineer. I write code and stay weird.
Published by HackerNoon on 2020/09/19