568 reads

How to Flatten 2D Lists vs. 2D Tensors

by Harshit SharmaJuly 9th, 2022

Too Long; Didn't Read

It is very common to deploy the wrong strategy while flattening 2D (lists/tensors).

Companies Mentioned

featured image - How to Flatten 2D Lists vs. 2D Tensors

It is very common to deploy the wrong strategy while flattening 2D (lists/tensors). Its embarrassingly obvious, but the most important point to realize is that:

Hence:

Knowing your data before going for the approach is the very first step

Once you know your data, simply pick one of the below approaches accordingly. (We won’t be going into depths of these functions)

(Note: For tensors, the approaches shown are NumPy based, but every scientific computing library has its own similar tensor functions)

So we know the data type, but how big it is ?

Not all the approaches scale efficiently with data size as shown below:(These are just the Tensor type approaches)

And here comes the List type approaches:

(Note: I used Perfplot for plotting the comparisons)

Hence, the approaches list above is already in the increasing order of complexity.

But wait, shouldn’t we merge the two plots above to compare ALL the approaches together?

No. The data types for both these plots are different. It was a tensor for the first one, and a list of lists for the second one.

If you don’t get any error, doesn’t meant it is correct

Example1: Using Tensor approach on a 2D list

I once happened to apply np.ravel() on a list of lists, and was happy that it worked. But did it?

In my mind, I was expecting this:

but in actual, it was like this:

Notice the difference in the data. The second one doesn’t have all sublists of the same size. ravel() didn’t give any error but didn’t work as expected.

Example2: Using List approach on a Tensor

This is how reduce() works on a 2D list.

But when given a Tensor, it simply adds the values along a particular axis:

Conclusion: Stay away from mixing approaches from different categories.

My preference:

If it’s a 2D list, simply use sum(a, [ ]) as we don’t have to import anything but stay away if the data is huge, as it will be painfully slow.
If it’s a Tensor, go for ravel(), since it is faster than others (because it returns a modified view and doesn’t create a copy of tensor unlike flatten().
Remember, the approaches will be different if data has more than 2 dimensions. So simply test the function before moving on.

Let me know if I missed any of your handy approaches :)

Also Published Here