Dynamic processes on networks

Suppose we have these ship trajectories, the first two ships are going back and forth between port A and port B. The third ship goes between port A and port C. We can construct both the first-order network and the higher-order network from these trajectories.

Then based on the network structure, we predict where the ship is going. On the first-order network, if a ship used to go between a and b, it is not guaranteed to go back to b again. But on the higher-order network, if a ship goes from b to a, according to the network structure, it will go back to b for sure.

We can compare the predicted trajectories with the testing set, and know how accurately movements are simulated on these two networks. If we use HON, the accuracy is doubled compared with the first order network when simulating the next step. In comparison, the fixed 2nd order network is not as accurate as HON, but the network size is already larger than that of HON. When simulating multiple steps (for example, PageRank can be seen as implicit random walkers with multiple steps), the advantage of HON is even larger.

Note that we are not attempting to perform "next step prediction"; rather, the accuracies for simulating the next step(s) is a measurement of the quality of representation, in terms of how well the network can help reproduce the true movement patterns in the raw data.

Clustering

For the control of species invasion through the global shipping network

The estimated damage and control costs of invasive species in the US are $120 billion per year, which is an invisible tax that everyone has to pay. One of the major vectors of species introduction is through global shipping; species may be taken up with ballast water at one location, and discharged at another.

The question is: can we discover the species flow patterns by studying the global shipping network, in order to inform the policy makers to devise effective invasive species control policies?

Clustering methods based on flow dynamics, such as Infomap [Rosvall et al.] and Walktrap [Pons], can help us identify clusters of ports that are loosely connected to each other by shipping traffic. Controlling the connections between clusters can prevent species propagation from one cluster of ports to another.

If we do clustering on the first-order network, the nodes shown appear to be equivalent. In reality, given ship movements, species are potentially more likely to be introduced to Los Angeles indirectly from Shanghai. Clustering on HON will automatically take this into account and cluster Tokyo and Seattle together.

 

Clustering on first-order network

If we cluster the ports on the first-order network, it shows reasonable results, where different colors represent different clusters. Ports geographically close to each other are likely to bring invasive species to each other.

We see that for Malta, a small country in the Mediterranean, these two ports belong to the same Mediterranean cluster.

Clustering on HON

On the higher-order network, however, clustering shows that Malta Freeport belongs to as many as five clusters, very different from Valletta, even they are geographically close to each other. The reason is that Valletta is a small local port, while Malta Freeport is an international port, with lots of traffic and is potentially susceptible to multiple sources of invasions.

Clustering on HON is able to reflect such differences, with no changes to the clustering algorithm itself. The ability to capture indirect species invasions is critical for devising effective species control policies.

Ranking

For modeling Web surfing behavior

Given the same clickstreams of Web pages, one could build a first-order network as shown. According to the network structure, after a user views or uploads a photo, the user is very likely to go back to the home page.

The higher-order network representation of the same data, however, shows additional information. It keeps the nodes and connections of the first-order network, but in addition, it adds to other higher-order nodes, representing the natural scenario that once a user uploads and views a photo, the user is very likely to keep uploading more photos rather than going back to the home page.

 

Given that PageRank is essentially random walks with resets on a network, the random walker is more likely to get trapped in the strong loop in HON, leading to differences in PR scores. More than 90% pages lose PageRank scores, while a few pages gain significant scores.

In the raw data, we see weather forecast and obituaries gain the most scores on HON, which is natural considering the data set is from a news company. By using the HON, PageRank is able to simulate complex user browsing behaviors, with no changes to the PageRank algorithm.

 

 

And more applications