Percy Jaiswal

@percy.jaiswal

Defense Against Dark Arts —The Mirror of Erised

December 21st 2018
The Mirror of Erised from Harry Potter and the Sorcerer’s Stone

As in Harry Potter saga, we are surrounded by unknown Death Eaters (read Hackers) who are trying to attack poor Potter (read Client / Data) for one reason or another. Some time they want data (like Soccer’s stone or Prophecy in Order of Phoenix) or sometimes they just want to kill / corrupt Potter to fend off a threat / competition. In either case, we need to protect our Data and Deep Learning model from such kind of Dark Attacks.

A regular Deep Learning model is very prone to attacks as it usually collects huge amount of data in a centralized server where deep learning models are trained. This is called centralized learning as learning occurs on a central server. But with a technique called Federated Learning, instead of bringing data to model, we bring model to training data.

In this post, I will be walking you through a small step in Federated Learning by explaining you an example from PySyft library. In specific, we will see how an end client called ‘Bob’ can have its data used by a centralized server without ever actually having to share it. You will understand more as we walk through the example.

We will be using two Jupyter Notebooks for this tutorial. One for “Bob’ (Bob.ipynb) who will prepare its data, give reference to its data with which data (actually its pointer) can be used by centralized server and finally open a connection wherein it can receive requests by centralized server. On the other hand, centralized server will also be having a file (Model.ipynb) running on its end, which will first of all make connection with Bob’s socket, get pointers to actual data and finally run its algorithm using just the pointer variables.

Bob.ipynb

##### Import files#############
import syft as sy
hook = sy.TorchHook() # Start a hook to make a connection
"""#########################
Give your hook a id: “Bob”
Port Address: 3000
is_pointer: Will update more about it
is_client_worker:(bool, optional)** a boolean which determines whether this worker is associated with an end user client. If so, it assumes that the client will maintain control over when tensors/variables/models are instantiated or deleted as opposed to handling tensor/variable/model lifecycle internally
verbose:(bool, optional)** A flag for whether or not to print events to stdout.
##########################"""
hook.local_worker = sy.SocketWorker(hook=hook,
id="Bob",
port=3000,
is_pointer=False,
is_client_worker=False,
verbose=True)
sy.local_worker = hook.local_worker
# create two sample Pytorch tensors for our demonstration
import torch
x = torch.FloatTensor([1, 2, 3, 4, 5])
y = torch.FloatTensor([1, 1, 1, 1, 1])

Check list of object registered with bob, an easy way to verify that tensors we just now created are properly configured to Bob.

hook.local_worker._objects

Output:
{38610448342: [_LocalTensor — id:38610448342 owner:Bob],
 51021230807: [_LocalTensor — id:51021230807 owner:Bob]}

The result clearly shows that x and y tensors we created as local tensors and are owned by Bob.

# Set ID with which client can request for pointers for our tensors
x.set_id("#X")
y.set_id("#Y")
# start listening to commands and wait for magic to happen!
hook.local_worker.listen()

Model.ipynb

# Give yourself a name and port address
import syft as sy
hook = sy.TorchHook(local_worker = sy.SocketWorker(id=0, port=3001))
# Connect with Bob
remote_client = sy.SocketWorker(hook=hook,
id="Bob",
port=3000,
is_pointer=True)
hook.local_worker.add_worker(remote_client)
# Get pointer variables for actual data
x_set = remote_client.search(["#X"])
y_set = remote_client.search(["#Y"])

If you tried to access either x_set or y_set, you will immediately see message like

[FloatTensor[_PointerTensor — id:22003035565 owner:0 loc:Bob id@loc:#Y]]

Meaning to say that y_set is “PointerTensor” (and not actual value of Y), you are owner of this pointer, but “location” of actual data is “Bob” and it’s ID is #Y

# Perform Tensor Operation on actual data using pointer variables
import torch
z = torch.add(x_set[0],y_set[0])
z

Output:
FloatTensor[_PointerTensor — id:12443450818 owner:0 loc:Bob id@loc:30395812931]

Again, as in case when we tried to access y_set, when you try to access value of z, you will get above message

# To get actual value of z
z.get()

When you are performing all this operations on Bob’s data, bob is getting notifications like “Received Command From: (‘127.0.0.1’, 54484)”
on its end whenever you are trying to access it data. Along with that, a log file will also get generated.

And that’s it. In this small example, we saw how PySyft was used to get pointer variable to Bob’s data, perform operations (run Deep Learning models) on it and get results. This is big step from Privacy’s standpoint, as never in this complete operation did anyone have access to actual data contents of Bob’s.

In hindsight, our program above was very much similar to Mirror of Erised from the first movie, as in it will give you what you desire…..as long as you don’t (mis)use it!

So that’s it for now, if you managed to stay with me till so far, I encourage you to go to PySyft github page and start taking a deep dive into it. You can find complete code for both Jupyter notebooks at my github repo here.

As usual, if you liked my article, show your appreciation with likes and comments. You can also find me and my other articles on twitter.

Till next time….cheers!!

More Related Stories