Neural networks are at the forefront of Machine Learning (ML) today, and Python is undoubtedly the go-to programming language for any ML task, regardless of whether one intends to use Neural Networks to solve it or not. There is a vast array of Python libraries available that cover the entire spectrum of ML tasks, such as NumPy, Pandas, Keras, TensorFlow, PyTorch, and so on. These libraries usually rely on C or C++ implementations of ML algorithms and approaches under the hood because Python is too slow for them. However, Python is not the only programming language in existence, and it is not the one I use in my daily work.
This article is not a guide on how to write something in Swift; rather, it is more like a thought piece about the current mindset of many developers who view Python as a bridge to the ultimate solution for ML libraries that will resolve any problem or task they encounter, regardless of the language they are using. I would wager that most developers prefer to invest their time in finding ways to integrate Python libraries into their language/environment rather than considering alternative solutions without them. While this is not inherently bad – reuse has been a significant driver of progress in IT over the past few decades – I have started to sense that many developers no longer even consider alternative solutions. This mindset becomes even more entrenched with the current state and advancements in Large Language Models.
The balance is lacking; we are rushing towards asking LLMs to resolve our issues, obtaining some Python code, copying it, and enjoying our productivity with potentially significant overhead from unnecessary dependencies.
Let's explore alternative approach for solving the task at hand using only Swift, mathematics, and no other tools.
When people start learning Neural Networks, there are two classic Hello World examples that you can find in most tutorials and introductory materials for it. The first one is a handwritten digits recognition. The second is a data classification. I will focus on the second one in this article, but the solution I will go through will work for the first one as well.
The very good visual example for it can be found in TensorFlow Playground, where you can play around with different neural network structures and visually observe how well the resulting model solves the task.
You might ask what the practical meaning of these dots on an image with different colors is? The thing is that it's a visual representation of some data sets. You can present many different types of data in exactly the same or similar way, such as social groups of people who buy specific products or music preferences. Since I primarily focus on mobile iOS development, I will also give an example of a real task I was solving that can be visually represented in a similar manner: finding electric wires inside walls using a gyroscope and magnetometer on a mobile phone. In this particular example, we have a set of parameters related to the wire found and another set of parameters for nothing is inside the wall.
Let's take a look at the data we'll be using.
We have two types of data here: red dots and blue dots. As I described above, it may be a visual representation of any kind of classified data. For example, let's take the red area as the one where we have a signal from the magnetometer and gyroscope in cases when we have an electric wire in the wall, and the blue area in case we don't.
We can see that these dots are grouped together somehow and form some sort of red and blue shapes. The way these dots were generated is by taking random points from the following image:
We will use this picture as a random model for our train process by taking random points for training the model and other random points for testing our trained model.
The original picture is 300 x 300 pixels, containing 90,000 dots (points). For training purposes, we will use only 0.2% of these dots, which is less than 100 points. To gain a better understanding of the model's performance, we will randomly select 3000 points and draw circles around them on the picture. This visual representation will provide us with a more comprehensive idea of the results. We can also measure the percentage of accuracy to verify the model's efficiency.
How we gonna make a model? If we take a look at these two images together and try to simplify our task, we will find out that the task, in fact, is to recreate the Origin picture from the data we have (batch of red and blue dots). And as closer the picture we get from our model to the original one will be as more accurate our model works. We can also consider our test data as some sort of extremely compressed version of our original image and have a goal of decompressing it back.
What we are going to do is to transform our dots into mathematical functions that will be represented in code as arrays or vectors (I will use vector term here in the text just because it's between function from math world and array from software development). Then, we will use these vectors to challenge every test point and identify to which vector it belongs more.
To transform our data, I will try a Discrete Cosine Transform (DCT). I won't go into any mathematical explanations about what it is and how it works, as you can easily find that information if you wish. However, I can explain in simple terms how it can help us and why it's useful. The DCT is used in many areas, including image compression (such as JPEG format). It transforms the data into a more compact format by keeping only the important parts of the image while removing the unimportant details. If we apply the DCT to our 300x300 image containing only red dots, we will get a 300x300 matrix of values that can be transformed into an array (or vector) by taking each row separately.
Let's finally write some code for it. First, we need to create a simple object that will represent our point (dot).
enum Category {
case red
case blue
case none
}
struct Point: Hashable {
let x: Int
let y: Int
let category: Category
}
You may notice that we have an additional category called none
. We will actually create three vectors in the end: one for red
points, second for blue
points, and the third one for anything else that is represented by none
. While we could just have two of them, having a trained vector for not red and not blue will make things a bit simpler.
We have `Point` conforms to the Hashable
protocol to use a Set
to avoid having points with the same coordinates in our test vector.
func randomPoints(from points: [Point], percentage: Double) -> [Point] {
let count = Int(Double(points.count) * percentage)
var result = Set<Point>()
while result.count < count {
let index = Int.random(in: 0 ..< points.count)
result.insert(points[index])
}
return Array<Point>(result)
}
Now we can use it to take 0.2%
random points from our original image for red, blue, and none points.
redTrainPoints = randomPoints(from: redPoints, percentage: 0.002)
blueTrainPoints = randomPoints(from: bluePoints, percentage: 0.002)
noneTrainPoints = randomPoints(from: nonePoints, percentage: 0.002)
We are ready to transform these training data using DCT. Here's an implementation of it:
final class CosTransform {
private var sqrtWidthFactorForZero: Double = 0
private var sqrtWidthFactorForNotZero: Double = 0
private var sqrtHeightFactorForZero: Double = 0
private var sqrtHeightFactorForNotZero: Double = 0
private let cosLimit: Int
init(cosLimit: Int) {
self.cosLimit = cosLimit
}
func discreteCosTransform(for points: [Point], width: Int, height: Int) -> [[Double]] {
if sqrtWidthFactorForZero == 0 {
prepareSupportData(width: width, height: height)
}
var result = Array(repeating: Array(repeating: Double(0), count: width), count: height)
for y in 0..<height {
for x in 0..<width {
let cos = cosSum(
points: points,
width: width,
height: height,
x: x,
y: y
)
result[y][x] = cFactorHeight(index: y) * cFactorWidth(index: x) * cos
}
}
return result
}
func shortArray(matrix: [[Double]]) -> [Double] {
let height = matrix.count
guard let width = matrix.first?.count else { return [] }
var array: [Double] = []
for y in 0..<height {
for x in 0..<width {
if y + x <= cosLimit {
array.append(matrix[y][x])
}
}
}
return array
}
private func prepareSupportData(width: Int, height: Int) {
sqrtWidthFactorForZero = Double(sqrt(1 / CGFloat(width)))
sqrtWidthFactorForNotZero = Double(sqrt(2 / CGFloat(width)))
sqrtHeightFactorForZero = Double(sqrt(1 / CGFloat(height)))
sqrtHeightFactorForNotZero = Double(sqrt(2 / CGFloat(height)))
}
private func cFactorWidth(index: Int) -> Double {
return index == 0 ? sqrtWidthFactorForZero : sqrtWidthFactorForNotZero
}
private func cFactorHeight(index: Int) -> Double {
return index == 0 ? sqrtHeightFactorForZero : sqrtHeightFactorForNotZero
}
private func cosSum(
points: [Point],
width: Int,
height: Int,
x: Int,
y: Int
) -> Double {
var result: Double = 0
for point in points {
result += cosItem(point.x, x, height) * cosItem(point.y, y, width)
}
return result
}
private func cosItem(
_ firstParam: Int,
_ secondParam: Int,
_ lenght: Int
) -> Double {
return cos((Double(2 * firstParam + 1) * Double(secondParam) * Double.pi) / Double(2 * lenght))
}
}
Let's create an instance of CosTransform
object and test it.
let math = CosTransform(cosLimit: Int.max)
...
redCosArray = cosFunction(points: redTrainPoints)
blueCosArray = cosFunction(points: blueTrainPoints)
noneCosArray = cosFunction(points: noneTrainPoints)
We use some simple helper functions here:
func cosFunction(points: [Point]) -> [Double] {
return math.shortArray(
matrix: math.discreteCosTransform(
for: points,
width: 300,
height: 300
)
)
}
There is a cosLimit
parameter in CosTransform
that is used inside shortArray function, I will explain the purpose of it later, for now let's ignore it and check the result of 3000 random points from original image against our created Vectors redCosArray
, blueCosArray
and noneCosArray
. To make it work, we need to create another DCT vector from a single point taken from the original image. This we do exactly the same way and using the same functions we already did for our Red
, Blue
and None
cos Vectors. But how can we find which one this new vector belongs to? There is a very simple math approach for it: Dot Product
. Since we have a task of comparing two Vectors and finding the most similar pair, Dot Product will give us exactly this. If you apply a Dot Product operation for two identical Vectors, it will give you some positive value that will be greater than any other Dot Product result applying to the same Vector and any other Vector that has different values. And if you apply a Dot Product to the orthogonal Vectors (Vectors that don't have anything common between each other), you will get a 0 as a result. Taking this into consideration, we can come up with a simple algorithm:
redCosArray
, then with blueCosArray
, and then with noneCosArray
.Red
, Blue
, None
.
The only missing functionality here is a Dot Product, let's write a simple function for it:
func dotProduct(_ first: [Double], _ second: [Double]) -> Double {
guard first.count == second.count else { return 0 }
var result: Double = 0
for i in 0..<first.count {
result += first[i] * second[i]
}
return result
}
And here is an implementation of the algorithm:
var count = 0
while count < 3000 {
let index = Int.random(in: 0 ..< allPoints.count)
let point = allPoints[index]
count += 1
let testArray = math.shortArray(
matrix: math.discreteCosTransform(
for: [point],
width: 300,
height: 300
)
)
let redResult = dotProduct(redCosArray, testArray)
let blueResult = dotProduct(blueCosArray, testArray)
let noneResult = dotProduct(noneCosArray, testArray)
var maxValue = redResult
var result: Category = .red
if blueResult > maxValue {
maxValue = blueResult
result = .blue
}
if noneResult > maxValue {
maxValue = noneResult
result = .none
}
fillPoints.append(Point(x: point.x, y: point.y, category: result))
}
All we need to do now is to draw an image from fillPoints
. Let's take a look at the train points we've used, DCT vectors we've created from our train data, and the end result we've got:
Well, looks like random noise. But let's take a look at the visual representation of vectors. You can see some spikes there, that's exactly the information we need to focus on and remove most of the noise from our DCT result. If we take a look at the simple visual representation of the DCT matrix, we will find that the most useful information (the one that describes the unique features of the image) is concentrated at the top left corner:
Now let's take a step back and check the shortArray
function once again. We use a cosLimit
parameter here exactly for the reason of taking the top left corner of the DCT matrix and using just the most active parameters that make our vector unique.
func shortArray(matrix: [[Double]]) -> [Double] {
let height = matrix.count
guard let width = matrix.first?.count else { return [] }
var array: [Double] = []
for y in 0..<height {
for x in 0..<width {
if y + x <= cosLimit {
array.append(matrix[y][x])
}
}
}
return array
}
Let's create our math
object with different cosLimit
:
let math = CosTransform(cosLimit: 30)
Now instead of using all 90,000 values, we will use just 30 x 30 / 2 = 450
of them from the top left corner of the DCT matrix. Let's take a look at the result we've obtained:
As you can see, it's already better. We can also observe that most of the spikes that make Vectors unique are still located in the front part (as selected with green in the picture), let's try to use CosTransform(cosLimit: 6)
which means we will use just 6 x 6 / 2 = 18
values out of 90,000 and check the result:
It's much better now, very close to the original image. However, there is only one little problem - this implementation is slow. You wouldn't need to be an expert in algorithm complexity to realise that DCT is a time-consuming operation, but even the dot product, which has a linear time complexity, is not fast enough when working with large vectors using Swift arrays. The good news is that we can do it much faster and simpler in implementation by using vDSP
from Apple's Accelerate
framework, which we already have as a standard library. You can read about vDSP
here, but in simple words, it's a set of methods for digital signal processing tasks execution in a very fast way. It has a lot of low-level optimisations under the hood that work perfect with large data sets. Let's implement our dot product and DCT using vDSP
:
infix operator •
public func •(left: [Double], right: [Double]) -> Double {
return vDSP.dot(left, right)
}
prefix operator ->>
public prefix func ->>(value: [Double]) -> [Double] {
let setup = vDSP.DCT(count: value.count, transformType: .II)
return setup!.transform(value.compactMap { Float($0) }).compactMap { Double($0) }
}
To make it less tedious, I've used some operators to make it more readable. Now you can use these functions in the following way:
let cosRedArray = ->> redValues
let redResult = redCosArray • testArray
There is a problem with the new DCT implementation regarding our current matrix size. It wouldn't work with our 300 x 300 image as it's optimised to work with specific sizes that are powers of 2. Therefore, we will need to put in some effort to scale the image before giving it to the new method.
Thanks to anyone who managed to read this text until now or was lazy enough to scroll through without reading. The purpose of this article was to show that many tasks that people don't consider solving with some native instruments can be solved with minimal effort. It's enjoyable to look for alternative solutions, and don't limit your mind to Python library integration as the only option for solving such tasks.