How shall we build the network from big data,

without losing important information about higher-order dependencies?

Network-based representation has quickly emerged as the norm in representing rich interactions among the components of a complex system for analysis and modeling: movement of hundreds of thousands of ships form a global shipping network, powering the transportation and economy while inadvertently translocating invasive species; interactions of billions of people on social networks, facilitating the diffusion of information.

Given the ship trajectories, to construct a global shipping network, the conventional approach is to count the number of voyages between port pairs as edge weights in the network.

### First-order network

While ships from Shanghai to Singapore are in fact more likely to go to Los Angeles according to the raw data, the conventional first-order network captures only *pairwise* connections in the data and neglects *higher-order dependencies*, therefore subsequent network analyses that rely on the network structure can potentially lead to incorrect results or conclusions.

### Higher-order network

The higher-order network preserves higher-order dependencies in the network structure, therefore movements simulated on the network more accurately reflects the true movement patterns in the raw data.

### Methods in a nutshell

What are the advantages of using HON

instead of the conventional first-order network or the fixed second-order network?

### Accuracy

HON is more accurate in representing the true movement patterns in data in comparison with the conventional first-order network or the fixed second-order network, because the higher-order nodes and edges in HON can provide more detailed guidance for simulated movements.

### Scalability

HON is more compact than fixed-order networks by using variable orders of dependencies, embedding higher-order dependencies only when necessary. Thus, not only is HON scalable for big data, but also network analysis algorithms run faster on HON.

### Compatibility

The data structure of HON is consistent with the conventional network representation, so existing network analysis methods can be applied directly without being modified.

### Broad Influence

Existing network analyses either simply ignore higher-order patterns in the data and use the first-order network representation (which is inaccurate), or modify existing algorithms to incorporate higher-order movement patterns (which do not generalize to all algorithms).

The higher-order network resolves this dilemma, by fundamentally improve the network itself, such that network algorithms, without any modification, will be aware of higher-order dependencies.