My better half teaches for the (IB) program at a nearby school. I had a previous encounter with their Math club, on the topic of “Math at work”. Back then, was focused on the roll-out of Scrum at scale, so I touched on Fibonacci numbers (used for ) and scratched the surface of queuing theory, , to model service time in a work queue. further mathematics International Baccalaureate work effort estimation M/M/1 queues in particular Fast forward a month ago, the further mathematics class completed a healthy introduction to Graph Theory, including Dijkstra’s algorithm and the traveling salesman problem. Students remembered the “Math at work” session and asked for a sequel on “Graphs at work”. Based on conversations at home, I fully expected wickedly smart kids to wander in that class, so this was bound to be fun — beyond the refreshing change of scenery. I published the code-like content to , including a where it made sense. Gitlab CD pipeline To warm up and gauge the audience, I started with an introduction to the usage of trees in general and file system trees in particular. From mounting IaaS data volumes to accessing secrets in Kubernetes, this is the kind of tree that IT and Software people keep climbing up and down all day long. You have to give it to the Math folks: they like clear concepts and unambiguous definitions — you can tell that they have been at it for a while. As says, Wikipedia A tree is an undirected graph that satisfies any of the following equivalent conditions: is and (contains no cycles). - G connected acyclic is acyclic, and a simple cycle is formed if any is added to . - G edge G is connected, but would become if any single edge is removed from . - G disconnected G is connected and the 3-vertex _K_3 is not a of . - G complete graph minor G - Any two vertices in can be connected by a unique . G simple path The Unix filesystem being a very academic example of a tree, a fuller system (such as an entire cloud) can be seen as a , where mounting operations allow users to stitch trees together. And bind-mounting operations let parts of the tree be transparently mirrored in convenient places. forest We imagined a data volume provided as a service on a cloud provider, containing the home, www and log directories. The OS of a virtual machine connected to the volume would identify it as a device under /dev, for example as /dev/xvdh on AWS EC2. Per , the data volume would be mounted under /media/vol. Getting the subtrees home, www, log back where we expect them in the filesystem tree is achieved by bind-mounting them, as shown in the figure below. FHS The resulting graph remains a tree, because home is accessible both under / and under /media/vol. However, interacting with the files and children directories under /home has exactly the same effects as interacting with them under /media/vol/home. The value of this model is that the piece of software running against this tree does not need to (nor wants to) know where all the bits and pieces of tree are coming from. Only that the files and directories it relies on will be there, always in the place. separately right The same argument holds in the world of containerization. A container sees its (typically ) filesystem as a tree with always the right structure, no matter where the container is actually running. Container orchestration services such as Kubernetes take care of mounting data in the tree. The current best practice is to use the same mechanism to inject in the containers, using the filesystem — knowing that Kubernetes handles these “secret” objects as such, and not as regular data volumes. namespaced volumes secrets I then to Google’s 20-year-old page ranking algorithm, and went on their now famous In order to measure the relative importance of web pages, we propose PageRank, a method for computing a ranking for every web page based on the graph of the web. There is undeniable elegance in the concept of ranking a web page based on the rank of pages linking to it. And quite some power, as this simple equation made sense of the web as we know it. The paper first defines as the number of links from a web page . The graph of the web is captured in a square matrix , where is not far from an adjacency matrix of sorts ( if there is an edge from to ). PageRank is kept in , a vector of ranks over the web pages. is a normalization factor, so that the total rank of all web pages is constant. The PageRank inventors showed that is an eigenvector of with eigenvalue c: . If we want to inject a bit of Physics in this: we can see PageRank as the vector that describes the modes of vibration of the web — and as its level. N_u u A A A_{u,v}=1/N_u u v R c R A R=cAR 1/c energy Even though there are probably a number of additional algorithms in place to tweak these equations in Google Search, we spent some time pondering the kind of pages that such an algorithm will favor. How to best describe these top pages? Are they the most… pages? Or the most… pages? Or is it just some kind of beauty contest, where the winners are the most pages, whatever resonates with people, no matter if the pages are popular because they are or, a lot more problematic, because they are the center of attention, for the best or worst reasons we can imagine? important truthful popular accurate Switching gear and getting lower in the IP stack, I graphs as used in IP networking in general, and in OSPF in particular. With the advent of in the Linux kernel, it is possible to operate a large number of independent routing tables in a single host, therefore making it easy to simulate relatively complex network behavior. I borrowed the from Edwin Cordeiro and fixed it for recent versions of Ubuntu (18.04 LTS) and Mininet. introduced namespacing very good work OSPF Network A single VM can simulate a decent network with several hosts and routers , with a very limited memory and CPU footprint. For a limited investment, we can enter the brave world of Link State Advertisement and Shortest Path First (as known as Dijkstra). (h010_*) (r010_*) We can connect to the Zebra daemon in random routers, and start by investigating how routing works for directly-connected hosts. $ sudo nsenter -n -t $(cat /tmp/zebra-r010_1.pid) telnet localhost zebra
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Hello, this is Quagga (version 1.2.4).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

User Access Verification

Password:
r010_1> enable
Password:
r010_1# show ip route connected
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, A - Babel, N - NHRP,
       > - selected route, * - FIB route

C>* 10.0.0.0/30 is directly connected, r010_1-eth4
C>* 10.0.0.4/30 is directly connected, r010_1-eth5
C>* 10.1.0.0/24 is directly connected, r010_1-eth3
C>* 10.10.0.1/32 is directly connected, lo
C>* 10.255.0.0/30 is directly connected, r010_1-eth1
C>* 10.255.0.16/30 is directly connected, r010_1-eth2
C>* 127.0.0.0/8 is directly connected, lo
r010_1# What OSPF allows is for each router to know how to reach each of the networks advertised, i.e. to compute the next router that should be used to reach a remote network. In the example below, r010_1 learned from OSPF that it should send packets destined to 10.2.0.0/24 (a network behind r010_2) via its neighbor 10.255.0.2 through its eth1 interface. r010_1# show ip route ospf
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, A - Babel, N - NHRP,
       > - selected route, * - FIB route

O>* 10.2.0.0/24 [110/20] via 10.255.0.2, r010_1-eth1, 00:44:36
O>* 10.3.0.0/24 [110/20] via 10.255.0.2, r010_1-eth1, 00:44:36
O>* 10.4.0.0/24 [110/20] via 10.255.0.17, r010_1-eth2, 00:44:36
O>* 10.5.0.0/24 [110/20] via 10.255.0.17, r010_1-eth2, 00:44:41
O   10.10.0.1/32 [110/0] is directly connected, lo, 00:45:27
O>* 10.10.0.2/32 [110/10] via 10.255.0.2, r010_1-eth1, 00:44:37
O>* 10.10.0.3/32 [110/20] via 10.255.0.2, r010_1-eth1, 00:44:37
O>* 10.10.0.4/32 [110/20] via 10.255.0.17, r010_1-eth2, 00:44:37
O>* 10.10.0.5/32 [110/10] via 10.255.0.17, r010_1-eth2, 00:44:42
O   10.255.0.0/30 [110/10] is directly connected, r010_1-eth1, 00:44:47
O>* 10.255.0.4/30 [110/20] via 10.255.0.2, r010_1-eth1, 00:44:37
O>* 10.255.0.8/30 [110/30] via 10.255.0.2, r010_1-eth1, 00:44:37
  *                        via 10.255.0.17, r010_1-eth2, 00:44:37
O>* 10.255.0.12/30 [110/20] via 10.255.0.17, r010_1-eth2, 00:44:42
O   10.255.0.16/30 [110/10] is directly connected, r010_1-eth2, 00:44:42 We end up with end-to-end connectivity, a key property of IP networks that is never overvalued. Yes, IPv4 NAT, I am targeting you. We can then examine end-to-end paths across the network: mininet> h010_51 traceroute h010_31
traceroute to 10.3.0.2 (10.3.0.2), 30 hops max, 60 byte packets
 1  10.5.0.1 (10.5.0.1)  0.060 ms  0.005 ms  0.004 ms
 2  10.255.0.13 (10.255.0.13)  0.015 ms  0.005 ms  0.005 ms
 3  10.255.0.9 (10.255.0.9)  0.017 ms  0.007 ms  0.006 ms
 4  10.3.0.2 (10.3.0.2)  0.024 ms  0.008 ms  0.007 ms Welcome in the magic of distributed systems, where nodes eventually reach a consensus on how they should behave individually so that the overall system performs the desired end-to-end routing function. As much as Dijkstra’s algorithm allows to find the shortest path in a finite search space, I could not resist and introduced the problem of in AI planning. The (hey, another kind of extremely important tree) contains a collection of tools for working with planning domains. Its very good on-line editor lets us import the tower of Hanoi domain, which contains a single action: searching a plan .domains Internet TLD (define (domain hanoi)
 (:requirements :strips)
 (:predicates (clear ?x)
              (on ?x ?y)
              (smaller ?x ?y))

 (:action move
  :parameters (?disc ?from ?to)
  :precondition (and
        (smaller ?to ?disc) 
        (on ?disc ?from) 
        (clear ?disc) 
        (clear ?to))
  :effect (and
        (clear ?from) 
        (on ?disc ?to) 
        (not (on ?disc ?from)) 
        (not (clear ?to)))
 )
) The domain is expressed in a -based language called (Planning Domain Definition Language). LISP PDDL The domain tells us that the objects we manipulate can be clear (they have no object on top of them), they can be placed on of top one of the other and that they are ordered via a smaller relation. The domain allows our world to be changed by moving discs, from a position to another one. There are four conditions to be allowed to move a disc from being on top of an object to being on top of another object: the disc needs to be smaller than the object it is moved to the disc needs to be currently on the object it is moving from the disc needs to be currently clear of any object the object it is moving to needs to be currently clear as well Solution to the tower of Hanoi problem with 4 discs (animation from Wolfram MathWorld) There are four effects of moving that disc: the disc we are moving from is now clear the disc is now on top of the disc we are moving to the disc is no longer on top of the disc we are moving from the disc we are moving to is no longer clear The problem definition lists all the objects we have in our world, along with their respective size and position. For example, with four discs: (define (problem hanoi-4)
 (:domain hanoi)
 (:objects peg1 peg2 peg3 d1 d2 d3 d4 )
 (:init
  (smaller peg1 d1)
  (smaller peg1 d2)
  (smaller peg1 d3)
  (smaller peg1 d4)
  (smaller peg2 d1)
  (smaller peg2 d2)
  (smaller peg2 d3)
  (smaller peg2 d4)
  (smaller peg3 d1)
  (smaller peg3 d2)
  (smaller peg3 d3)
  (smaller peg3 d4)
  (smaller d1 d1)
  (smaller d2 d1)
  (smaller d3 d1)
  (smaller d4 d1)
  (smaller d2 d2)
  (smaller d3 d2)
  (smaller d4 d2)
  (smaller d3 d3)
  (smaller d4 d3)
  (clear peg2)
  (clear peg3)
  (clear d1)
  (on d4 peg1)
  (on d3 d4)
  (on d2 d3)
  (on d1 d2)
 )
 (:goal
  (and 
   (on d4 peg3)
   (on d3 d4)
   (on d2 d3)
   (on d1 d2)
  )
 )
) Our four-discs problem starts by defining the three pegs and four discs, all of them being objects. It then goes on to define the initial state of the world. The pegs enter in the same order relation as discs: every disc is smaller than all pegs. Then d1 is smaller than all the discs included itself; d2 is smaller than itself, d3 and d4, d3. peg2 and peg3 are clear, while d4 is on peg1, d3 on d4, d2 on d3 and d1 on d2. The goal state is top have d4 on peg3, d3 on top of d4, d2 on d3 and d1 on d2. We can have a planner calculate the plan, i.e. the sequence of actions that will get us from the initial state to the goal state. (move d1 d2 peg2)
(move d2 d3 peg3)
(move d1 peg2 d2)
(move d3 d4 peg2)
(move d1 d2 d4)
(move d2 peg3 d3)
(move d1 d4 d2)
(move d4 peg1 peg3)
(move d1 d2 d4)
(move d2 d3 peg1)
(move d1 d4 d2)
(move d3 peg2 d4)
(move d1 d2 peg2)
(move d2 peg1 d3)
(move d1 peg2 d2) The planner invites us to move the disc d1 from being on top of d2 to being on top of peg2. Then, we should move the disk d2 from d3 to peg3, etc. Much of the power of this technology comes from its ability to replan if something goes wrong at some point during plan execution and we are left in-between the initial state and the goal state. The planner can reason based on a new initial state of the world, keep the previous goal, and devise a new plan that will let us reach the goal. , we can get the planner to show us the graph of the search space that it considered when looking for a solution. The links between the actions considered are called causal links. The plan, the sequence of actions that will get us from the initial state of the world to the goal state, can be derived without ambiguity from the set of causal links. By using a custom planner Search space We notice in particular that this planner starts by the goal state, and backtracks to the initial state. Beyond this toy example, AI planning is used to solve harder, real-world planning problems whose domain representation can include functions (floating-point numbers), require the planner to understand time or even reason about continuous change in functions. When the search state becomes large, visualizing it as a graph helps finding potential plateaus where the planner struggles to progress in its search. These plateaus can be eliminated by tweaking either the heuristic that the planner uses, or the domain definition. To cool down a bit, we with Directed Acyclic Graphs (DAG) as used in analytics workflows. Apache Airflow, or its Google-managed version Cloud Composer, allows to . Google’s example shows a very linear graph (create a Dataproc cluster, run a Hadoop job, delete the Dataproc cluster), so we expended it a bit: finished programmatically author, schedule and monitor workflows quickstart create_dataproc_cluster >> run_dataproc_hadoop >> delete_dataproc_cluster >> victory_bash
create_dataproc_cluster >> keep_busy >> victory_bash To a fairly large extent, the graph that we are specifying here manually as an input to Airflow can be compared to the output of an AI planner that we saw earlier. In our Airflow use case, the developer spells out the plan and Airflow executes it. In robotic terms, Airflow acts as a , i.e. executes the plan that some intelligence created to solve a problem. In the AI planning use case we saw earlier, the (human) domain modeler defines the individual actions (through their prerequisites and effects), the initial state of the world and its goal state, and the AI searches for the plan. See slide 12 of by Gerhard Wickler, U. Ediburgh to see where the two pieces fit in a robotic architecture. controller planner Automated Planning Getting back to our (robotic) : Airflow will draw the graph and walk through execution, specifying the execution status of each node. controller DAG for our simple workflow In order to be able to show the execution status, run after run, Airflow decomposes the DAG into a tree, walking backwards from the final operator: Tree representation Airflow can also represent the execution of a single run as a Gantt chart. The Gantt chart representation offers the ability to identify the critical path, i.e. the tasks that, if they were shorter, would reduce the total duration of the run. Gantt Chart Representation Full-fledged Gantt charts also typically specify constraints between tasks. They are relatively simple in the case of Airflow, with constraints in term of execution time and success of the preceding operator. In a robot, they can include invariants, failure conditions, all the way to operational constraints such as timeouts, minimum or maximum time between tasks starting up or ending. We reached the end of our quick through graphs. The mathematics were extremely light, but hopefully the students left convinced that graphs are an indispensable tool, in computer science as much as in Software Engineering or in Information Technology. <a href="https://medium.com/media/3c851dac986ab6dbb2d1aaa91205a8eb/href">https://medium.com/media/3c851dac986ab6dbb2d1aaa91205a8eb/href</a>

The Graph

Apache

Graphs at Work. At school. And in other places, too.

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Shifting Containers Left: On the Quest for Reproducible Development Environments

Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy

Apache Airflow And Its Contribution to Enterprise Data Integration

Apache Airflow: Is It a Good Tool for Data Quality Checks?

Automating a Machine Learning Workflow using Google BigQuery and Amazon Managed Apache Airflow

Data Pipelines: OpenWeatherMap-Airflow [A How-To Guide]

Shifting Containers Left: On the Quest for Reproducible Development Environments

Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy

Apache Airflow And Its Contribution to Enterprise Data Integration

Apache Airflow: Is It a Good Tool for Data Quality Checks?

Automating a Machine Learning Workflow using Google BigQuery and Amazon Managed Apache Airflow

Data Pipelines: OpenWeatherMap-Airflow [A How-To Guide]

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps