Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 1.1 Our Contribution 1.1 Our Contribution 1.2 Setting 1.2 Setting 1.3 The algorithm 1.3 The algorithm Related Work Algorithm 3.1 The Structural Decomposition Phase 3.2 The Routing Phase 3.3 Variants of WormHole Theoretical Analysis 4.1 Preliminaries 4.2 Sublinearity of Inner Ring 4.3 Approximation Error 4.4 Query Complexity Experimental Results 5.1 WormHole𝐸, WormHole𝐻 and BiBFS 5.2 Comparison with index-based methods 5.3 WormHole as a primitive: WormHole𝑀 Related Work Related Work Related Work Algorithm 3.1 The Structural Decomposition Phase 3.2 The Routing Phase 3.3 Variants of WormHole Algorithm 3.1 The Structural Decomposition Phase 3.1 The Structural Decomposition Phase 3.2 The Routing Phase 3.2 The Routing Phase 3.3 Variants of WormHole 3.3 Variants of WormHole Theoretical Analysis 4.1 Preliminaries 4.2 Sublinearity of Inner Ring 4.3 Approximation Error 4.4 Query Complexity Theoretical Analysis 4.1 Preliminaries 4.1 Preliminaries 4.2 Sublinearity of Inner Ring 4.2 Sublinearity of Inner Ring 4.3 Approximation Error 4.3 Approximation Error 4.4 Query Complexity 4.4 Query Complexity Experimental Results 5.1 WormHole𝐸, WormHole𝐻 and BiBFS 5.2 Comparison with index-based methods 5.3 WormHole as a primitive: WormHole𝑀 Experimental Results Experimental Results 5.1 WormHole𝐸, WormHole𝐻 and BiBFS 5.1 WormHole𝐸, WormHole𝐻 and BiBFS 5.2 Comparison with index-based methods 5.2 Comparison with index-based methods 5.3 WormHole as a primitive: WormHole𝑀 5.3 WormHole as a primitive: WormHole𝑀 References References 5.1 WormHole𝐸, WormHole𝐻 and BiBFS Query Cost. To examine query cost, we look at the number of vertices seen by the algorithm over the first 5000 inquiries and compare it to BiBFS. We direct the reader to Figure 2(b) for a summary of the query cost for our larger graphs, and Figure 5 for the same in the smaller ones. Consistently across all graphs, BiBFS quickly views between 70% and 100% of the vertices in just a few hundred inquiries. In comparison, the query cost of WormHole is quite small for all networks: in the smaller networks, we see less than 30% of the vertices even after 5000 inquiries, and in the larger ones this number is less than 10% (in the largest ones, wikipedia and soc-twitter, it is <2%). Query Cost. Accuracy. We consider two error measures, absolute (additive) and relative (multiplicative). Clearly, an additive error of, say, 2, is less preferable for shorter paths than for longer ones. The relative error 𝑟𝑒 (𝑠, 𝑡) is more refined, as it measures the error with respect to the actual distance of the pair at question. Formally, Accuracy The other key accuracy statistic is additive error: this is summarized in Table 3. For WormHole𝐸 across almost all networks, the vast majority of the pair inquiries are estimated perfectly. Our worst performance is on soc-pokec and soc-live. Even there, we have perfect estimates for 60% of vertices, and over 94% of vertices have an additive error of less than 1. In all networks, more than 99% of the pairs are estimated with absolute error lesser or equal to 2 in WormHole𝐸. The accuracy is poorer in WormHole𝐻 (recall that in this variant we do not compute all-pairs shortest paths in the core, resorting to an approximate heuristic instead). However, we note that even then, in most graphs, we have an additive error of at most 2 in over 99% of the queries, and over 90% for all graphs. 5.2 Comparison with index-based methods As discussed, index construction allows for much faster inquiry times, but setup times that can take up to several hours even for relatively small-sized graphs. We attempt to benchmark our algorithm against two state of the art methods, PLL and MLL. PLL solves the easier task of finding distances, while MLL does explicit shortest path construction (the authors note that PLL may be extended to output paths, but no code is publicly available). However, once the graphs hit a few million vertices, these methods either take too long to run the setup, or even if they succeed in index construction, they may be too large to load into memory. We summarize the results in Table 5, and expand on the discussion in the following paragraphs. Mean inquiry time. In the cases where index based methods do succeed (limited to graphs with fewer than 30 million edges), they have a clear advantage in per inquiry cost. Their typical inquiry time is in the microsecond range, where PLL is faster since it only computes distances, and MLL is about 3 times slower than PLL. Mean inquiry time. Setup cost. In terms of setup times, both WormHole𝐸 and WormHole𝐻 have a massive advantage over the index-based methods. We direct the reader to Table 4 and Table 5 for a comparison of setup times: we let all methods run for 12 hours, and terminate if they do not finish in that time. Both PLL and MLL failed to complete setup for any graph with more than 30 million vertices. In comparison, even for our largest graph, soc-twitter, of over a 1.5 billion edges, the setup time for WormHole𝐸 is just minutes. Even in the cases where these methods do terminate, the storage footprint is massive. We observed that if allowed to run, MLL completes index construction on soc-pokec in a little under 24 hours, but the constructed files are almost a combined 45 gigabytes in size; in comparison, the input COO file is only 250 megabytes! A detailed comparison Setup cost of the space footprint is given in Figure 2 in §1, and the time comparisons can be found in Figure 7, and in Table 5. When is indexing better? A running time comparison: Given the very high setup time of index-based methods, WormHole𝐸 has a head start.However,index-based solutions do catch up and outdo WormHole after sufficiently many shortest path inquiries. We quantify this threshold to give the reader a sense of when to favor each approach. We refer to the threshold as Breakeven (BE𝑋 for a competing method𝑋) and provideits values for different graphs in Table 5. Since the index-based methods have setup times between 106 and 109 orders of magnitude higher than the inquiry times, we look at how many inquiries are needed for them to have an average gain over WormHole𝐸 in the net time taken. Formally, we define breakeven for PLL(likewise MLL) as: When is indexing better? A running time comparison We can similarly compute the breakeven with respect to WormHole𝐻, but we note that it is already quite high even against the much slower version WormHole𝐸. This implies that even with the low time per inquiry, the setup cost is so prohibitively high that the gains take hundreds of thousands to even millions of inquiries to set in, even on the small networks. Authors: (1) Talya Eden, Bar-Ilan University (talyaa01@gmail.com); (2) Omri Ben-Eliezer, MIT (omrib@mit.edu); (3) C. Seshadhri, UC Santa Cruz (sesh@ucsc.edu). Authors: Authors: (1) Talya Eden, Bar-Ilan University (talyaa01@gmail.com); (2) Omri Ben-Eliezer, MIT (omrib@mit.edu); (3) C. Seshadhri, UC Santa Cruz (sesh@ucsc.edu). This paper is available on arxiv under CC BY 4.0 license. This paper is available on arxiv under CC BY 4.0 license. available on arxiv available on arxiv