Ethan Jarrell

@ethan.jarrell

The Magic of Array References in Perl

June 5th 2018

In the world of programming, you can’t get very far without coming across nested data structures. For example, a JavaScript Object might look like the following:

var cats = {
"name":"Mr. Tickles",
"owners": {
"owner1" : "Tom",
"owner2" : "Janet"
},
"color":"orange
};

In this example, we have multiple owner objects nested inside of a larger cats object. We could also have something similar with arrays:

var array1 = [["one", "two"], ["three", "seven", "five"]];

Here, we have a nested array, or an array of arrays.

In JavaScript, you might access the nested object using either dot notation or bracket notation. For the array, you might call array1[0] and your expected output of this call would be the first array in the array, which is [“one”, “two”].

This is a major difference in the way that we would access data in Perl. Although the syntax is very similar in both languages, accessing nested data can be a little more difficult in Perl. For example, if we had the same nested array above, and the variable holding the array of arrays was @array1, and we wanted to access the first array in $array1, and used the call, @array1[0], what we would actually get as output would be the first element, “one”.

The reason for this is because Perl flattens all nested data. If you have an array which contains multiple arrays, Perl interprets it as a single array, and mashes all the array data together. Perl also interprets Objects, or Hashes in a very similar way.

Enter array references

If you don’t intend to have all your data flattened when accessing it, then the best solution is to use array references when defining the array variables. This is done in the creating of the arrays, long before we intend to access the array. If the data is not stored as a reference, we won’t be able to access it as a reference.

In the above example, we could define it as follows to accomplish this:

my @array1 = ();

Here, we’re simply creating an empty array to hold our nested arrays. Next, we can push our new arrays into the array container.

my @nestedArray = ("one", "two");
my @nestedArray2 = ("five", "seven", "three");

At this point, we might be tempted to use the following:

push @array1, @nestedArray;

Which would accomplish exactly what we had previously, a flattened array. Instead, before pushing the array into our container array, we’ll create a reference of the array.

Creating a Reference:

my $nestedArray = \@nestedArray;

This creates a scalar variable which holds a nested array reference. The reference is actually created using the backslash symbol.

Now, we can replicate the above code, pushing the reference into the array:

push @array1, $nestedArray;

Dereferencing a referenced array:

Using this same format, we could push as many nested arrays into our array container. Later, we may want to access those references. Let’s first imagine that our nested data looks like this:

@array1 = [[1, 2, 4], [3, 4, 5], [8, 8, 9], [6, 7, 6]];

Here, we have four arrays nested inside of an array container, each nested array containing 3 integers. Let’s try to access the first integer in the last nested array.

The syntax for accessing a referenced array seems a little trickier at first. First, I’ll put the code here, and then explain how it works:

@{ $array1[3] }[0];

The @ symbol in front of the curly brackets tells Perl that we are accessing an array, and the $ (scalar) symbol in front of the array variable lets us access just a single array within the container. [3] lets us access the 4th element in the array, which is at index [3]. This gets us the last array in the series, and then at the end of this block, we have the familiar [0], which tells Perl we want the first element in that array. Our expected return here would be 6.

Now, lets look at a more complex problem, where we’ll have to use many references inside of a loop. This is a problem called the maximum path sum. Take the following triangle of numbers:

3,
 7, 4
 2, 4, 6
 8, 5, 9, 3

The object here is to go from the top to the bottom, and find the maximum sum path, moving only down the right or down to the left. For example, from the second row, if we chose 7, we could not access 6. In this example, the maximum sum path would be 3, 7, 4, 9, for a sum of 23.

Knowing the solution, let’s work backward to solve this problem using Perl and array references. The first thing to keep in mind is that with larger triangles, it would be impossible to solve this using brute force (trying every combination of numbers). The best way to solve this type of problem is actually working backwards. We would start at row 3, or example, and test each number against the sum of the two possible numbers under the current number. The first number, being 2, and our possible combinations are 8 and 5. This the biggest possible sum here is 10 ( 8 + 2). After we go through this process with each number, we would simply remove the last row, and replace the third row with only the highest sums. Our new triangle would look like the following:

3,
 7, 4
 10, 13, 15

Now, we would do the same thing, with the second row, which would give us:

3,
 20, 19

And then, the last row, which would give us 23.

Thinking about this process from a programming standpoint, instead of a triangle, our data input would likely be a single array:

my @triangle = (3,7,4,2,4,6,8,5,9,3);

Our process from here would probably be to break this up into rows. This is where the nested referenced arrays come into play. What we want to end up with, is something like this:

my @triangle = ([3],[7,4],[2,4,6],[8,5,9,3]);

The simplest way to do this would be to loop through our array, and create a reference of each nested array, and push it into a new array.

Here’s one way we could do that:

my @choppedTriangle = []; 
my @chopped = (); #Our new array container
my $start = 0;
my $stop = 0;
my $length = 1;
my $row = 1;
my $triangle = @triangle;
while (@choppedTriangle < @triangle) {
my @row = ();
for (my $i=$start; $i <= $stop; $i++) {
push @choppedTriangle, $triangle[$i];
push @row, $triangle[$i];
};
my $row = \@row;
push @chopped, $row;
$start = $stop + 1;
$length = $length + 1;
$stop = $stop + $length;
};

Here, we’re using a for loop inside of a while loop. In the loop, at each iteration, we’re pushing each @row into @choppedTriangle. This is not an array reference and that’s on purpose. We should only have 4 nested arrays in this case, and we don’t want to push anything more than 4 nested arrays. @triangle also has 4 nested arrays, so our while loop conditional statement will stop as soon as the two arrays are equal to each other. Once we’ve pushed all rows into @choppedtriangle, we want to end the loop.

That work will all be done in our for loop. Outside the for loop, but still in the while loop, we’ll create an array reference for each row, and you can see this in the line:

my $row = \@row;
push @chopped, $row;

Then we push that reference into our @chopped array. I’m only using @choppedtriangle as a conditional for my while loop, and once the loop is done, I won’t need to use it any more. But, all of the nested arrays that I will need, are contained in @chopped .

I’m also using $start, $stop and $length variables in the loop, to determine the beginning and end of each row, using those variables as indexes. For example, both start at 0, which would be an index of [$start]. Since the first element we want to extract is at index 0, this is perfect. After each loop, we’ll adjust the variables $start and $stop to let Perl know the length of the new row. The new row will start where the first row ended, thus:

$start = $stop + 1;

Then the length will also be $length = $length + 1, since we start at row 1, and each subsequent row is longer by one character. Then, whatever the new length is, we’ll use that value to set our new $stop variable, which will be just one character after the length, allowing us to get everything in between the new $start and $stop variables.

With that out of the way, we should now have an array with 4 nested arrays. What we want to do next is to loop through the array, and then loop through each nested array. We’ll want to do the same exact thing mentioned earlier, comparing the current row to the row below it, and replacing the values with the greatest sum. My thought here is this. Let’s say our current row is this:

[ 2, 4] and the row below this is [1, 2, 3].

I’ll create a temporary Array, and iterate over the first row. This loop will only contain 2 iterations since the length of the array is only 2. At each iteration, I’ll make 2 comparisons, 2+1 and 2+2, then 4+2 and 4+3. At each iteration, I’ll take the largest comparison sum, and push it into the temporary array. So my temporary array would now hold the values [4, 7].

Next, I’ll replace the previous row of [2, 4] with the new temporary array [4,7].

Finally, I’ll reset the row I’m working on, with the new row being the next row up the triangle, and I’ll reset the row I’m comparing, which would be the one we just set, [4, 7].

Let me put the code I’ll use here, and then I’ll explain how I came about it:

First, I’m going to create two variables to control which row I’m iterating over:

my $iter = 1;
my $iter2 = 2;

My idea here, is that I’ll change these variables at the end of each loop, which will allow me to move to the next row.

my @tempArray = ();
my $chopRows = @chopped-1;
for (my $i=0; $i < $chopRows; $i++) {
@tempArray = ();
for (my $i=0; $i < @{ $chopped[scalar @chopped-$iter2] }; $i++) {
if(@{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i] > @{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i+1])
{
push @tempArray, @{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i];
}
else
{push @tempArray,@{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i+1];
};
};
@{ $chopped[scalar @chopped-$iter2] } = @tempArray;
$iter = $iter + 1;
$iter2 = $iter2 + 1;
};

This first section might seem weird:

my @tempArray = ();
my $chopRows = @chopped-1;
for (my $i=0; $i < $chopRows; $i++) {
@tempArray = ();

Since I’m creating the temporary array, and then calling it again just inside the for loop. The reason is, when the for loop finishes, I should have new values in there, representing the sums. Once I go back to the beginning of the loop, I’ll need to empty the temporary array. Since I created the array outside of the loop, if I didn’t empty it’s contents at the beginning of each loop, then by the 4th iteration (row 4 of our triangle), we would have an array of all the sums from the entire triangle…when all we want is only the current row. Now, inside my second loop, we have this:

for (my $i=0; $i < @{ $chopped[scalar @chopped-$iter2] }; $i++) {

Now, this is just a very basic for loop, with a fancy length control in the middle. @{ $chopped[scalar @chopped-$iter2] };

Hopefully this looks familiar, as it is exactly what we used earlier to dereference an array and access it’s contents:

@{ $array1[3] }[0];

[scalar @chopped-$iter2] here, the $iter2 variable, which currently is “2”, points to the second to last row, basically working from the end of the @chopped array, and going backwards two places. This mean, we’re going to loop across the numbers in the second to last row, and compare them against the last row. After we’re done, we can change $iter2 to 3, or simply, $iter2 = $iter2 + 1. This will allow us to access the third row from the end, and then we can continue to repeat this process.

To make things simple, we only have one conditional statement inside the for loop. This compares the current integer against the two possible integers under it.

if(@{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i] > @{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i+1])

Here, we’re accessing the value of the dereferenced array @{ $chopped[scalar @chopped-$iter2] }[$i]. This represents the current iteration of the upper row, and we’re seeing if the sum of it and @{ $chopped[scalar @chopped-$iter] }[$i] is greater than @{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i+1]). Here, we’re using both the variables $iter2 and $iter, giving us the value of each row, $iter being the bottom row, and $iter2 being the upper row.

If this conditional returns true, we push that current value:

@tempArray, @{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i];

Else we push the other value. This way, we don’t need to compare if the values are equal. Because if the first isn’t larger, we push the other value regardless, because it doesn’t matter if they are the same.

Finally at the end of our loop, we reset the value of our current row, replacing it’s values with the temparray, and also resetting the values of both of our $iter and $iter2 variables:

@{ $chopped[scalar @chopped-$iter2] } = @tempArray;
$iter = $iter + 1;
$iter2 = $iter2 + 1;

This brings us back to the top of the loop where we reset our @temparray values, and begin with the next row. We can now print the value of @tempArray, which will hold the final value.

foreach my $x (@tempArray) {
print "$x \n";
}

To see the entire block of code together, here’s what I’ve got:

#!/usr/bin/perl
use strict;
use warnings;
# By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.
# 3,
# 7, 4
# 2, 4, 6
# 8, 5, 9, 3
# 23!
# starting triangle array
my @triangle = (3,7,4,2,4,6,8,5,9,3);
# array to hold
my @choppedTriangle = [];
my @chopped = ();
my $start = 0;
my $stop = 0;
my $length = 1;
my $row = 1;
my $triangle = @triangle;
while (@choppedTriangle < @triangle) {
my @row = ();
for (my $i=$start; $i <= $stop; $i++) {
push @choppedTriangle, $triangle[$i];
push @row, $triangle[$i];
};
my $row = \@row;
push @chopped, $row;
$start = $stop + 1;
$length = $length + 1;
$stop = $stop + $length;
};
my $iter = 1;
my $iter2 = 2;
my @tempArray = ();
my $chopRows = @chopped-1;
for (my $i=0; $i < $chopRows; $i++) {
@tempArray = ();
for (my $i=0; $i < @{ $chopped[scalar @chopped-$iter2] }; $i++) {
if(@{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i] > @{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i+1])
{
push @tempArray, @{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i];
}
else
{push @tempArray,@{ $chopped[scalar @chopped-$iter2] }[$i] + @{ $chopped[scalar @chopped-$iter] }[$i+1];
};
};
@{ $chopped[scalar @chopped-$iter2] } = @tempArray;
$iter = $iter + 1;
$iter2 = $iter2 + 1;
};
foreach my $x (@tempArray) {
print "$x \n";
}

This is probably a little cumbersome to follow, but I’ve found that when getting used to referecing and dereferencing nested data in Perl, a convoluted scenario such as this helps my understand what’s happening, and how this can be so useful. Let me know if you have any feedback!

More by Ethan Jarrell

More Related Stories