The NAS research section invites MSc students to join our group, and do exciting research on the forefront of Network Science. If you are interested in one of these projects, contact us by e-mailing Maksim Kitsak at** **This email address is being protected from spambots. You need JavaScript enabled to view it.**.**

If you already have **your own project** in mind, **propose it to us**! We will advise you on **feasibility**, and judge whether there’s a **match**.

**See the instructions for the information on how to apply for a MSc thesis at the NAS Group.**

**Do NOT email Prof. Van Mieghem directly.**

## Linear Clustering Process on Networks

Graph clustering is a fundamental operation on networks, as it reveals network communities. Groups of nodes that dominantly share links while a minority of links are shared with other nodes in the network are called communities or clusters [1]. However, a formal definition of a cluster does not exist. In addition, the underlying clustering structure of a real-world network is not known. The most commonly used quality function for a given graph partition is the Newman modularity, proposed in [2].

We have proposed a linear clustering process on the network that embeds the nodes in a one-dimensional space and governs their position in time so that nodes from the same cluster are grouped on a line while being relatively far away from other nodes. The proposed clustering process outperforms the most popular clustering methods in the literature, such as the Louvain method [4] and the Newman method [5], while possessing a comparable computational complexity.

The student is expected to

- Understand the most popular clustering algorithms, such as those in [4,5]
- Understand the proposed Linear Clustering Process in [3]
- Implement various clustering algorithms in Matlab and/or Python
- Perform graph partitioning on various real-world networks
- Assess the clustering performance of the proposed linear clustering process by comparing it to the considered clustering algorithms from the literature
- Write a thesis report about the findings

This project is supervised by Ivan Jokić, a PhD researcher in the NAS group.

- Fortunato, Community detection in graphs, Physics reports 486, 75 (2010).
- E. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical review E 69, 026113 (2004).
- Linear Clustering Process on Network, submitted (2022)
- D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, Journal of statistical mechanics: theory and experiment 2008, P10008 (2008).
- E. Newman, Modularity and community structure in networks, Proceedings of the national academy of sciences 103, 8577 (2006).

## Forecasting the spread of COVID-19 using Bayesian Model Averaging

The ongoing COVID-19 pandemic, which originates from the Chinese province Wuhan in November 2019, has impacted the world severely. Of primary concern is a detailed investigation of the impact of certain measures to restrict the spread of the virus. We try to forecast, based on the available number of reported cases per day per region, the future number of infected cases. We developed the Network-Inference-based Prediction Algorithm (NIPA), which forecasts the number of cases in two steps [1]. First, it reconstructs the contact network based on the given daily number of reported cases. Second, it forecasts the number of future cases using an SIR model.

We have evaluated the performance of NIPA and several other forecast algorithms on COVID-19 in several countries [2]. Unfortunately, all considered prediction algorithms only consider point forecasts. In this project, we will use Bayesian Model Averaging (BMA) [3], which is a statistical method that considers the weighted average of several separate algorithms. The weight of each model is determined by looking at the previous forecast accuracy of that model. We compute BMA for two reasons: (i) to compute more accurate forecasts than the individual forecast algorithms and (ii) to provide prediction intervals rather than point predictions, which considerably improves the utility for policymakers. For example, if the models predict a very likely future wave of COVID, counter-measures must be taken. On the other hand, if the forecast is very uncertain, measures may be taken later or not at all. We finally hope to implement BMA in our COVID-19 prediction tool available at the NAS website: https://nas.ewi.tudelft.nl/nipa/covid-prediction/

The student is expected to

- Roughly understand all considered algorithms in [2]
- Study Bayesian Model Averaging (BMA) and implement BMA in Matlab and/or Python
- Assess the prediction accuracy of BMA by comparing it to the individual forecast algorithms
- Extend the NAS website to include the BMA forecast if the BMA accuracy appears reasonable
- Write a thesis report about the findings

This project is supervised by Massimo Achterberg, a PhD researcher in the NAS group.

[1] Network-inference-based prediction of the COVID-19 epidemic outbreak in the Chinese province Hubei, B. Prasse, M. A. Achterberg, L. Ma and P. Van Mieghem, Applied Network Science 5:35, Jul 2020.

[2] Comparing the accuracy of several network-based COVID-19 prediction algorithms, M. A. Achterberg, B. Prasse, L. Ma, S. Trajanovski, M. Kitsak and P. Van Mieghem, International Journal of Forecasting, Apr-Jun 2022.

[3] Bayesian Model Averaging: A Tutorial, J. A. Hoeting, D. Madigan, A. E. Raftery and C. T. Volinsky, Statistical Science, 1999.

## Mean-field analysis of the SIS model on adaptive networks

The Markovian SIS model describes the spread of an infectious disease in a heterogeneous population [1]. Each node represents a person or entity, and is either (I) infected or (S) susceptible at time t. We assume that the contact graph of each node changes over time, based on decisions of nodes to break or create links with their neighbours. We start from the NIMFA equations [2] and extend these equations to additionally consider adaptive networks, called the aNIMFA equations [3]. We look for quantitative solutions and qualitative properties of this model for various choices of the network dynamics.

The student is expected to

- Study the NIMFA differential equations from [2]
- Study the aNIMFA equations for adaptive networks from [3]
- Compute properties of the dynamical system, such as the number of steady-state solutions, their (linear) stability, etc.
- Construct implicit or explicit solutions and perform bifurcation analysis. We are particularly interested in the location of the epidemic threshold, which is similar to the famous basic reproduction number R_0, and the number of infected nodes in the steady state.
- Write a thesis report about the findings

This project is supervised by Massimo Achterberg, a PhD researcher in the NAS group.

[1] Performance Analysis of Complex Networks and Systems, Piet Van Mieghem, Cambridge University Press, 2014 (Chapter 17).

[2] The N -Intertwined SIS epidemic network model, P. Van Mieghem, Computing (Springer), Vol. 93, Issue 2, p. 147-169, 2011.

[3] Moment closure approximations of SIS epidemics on adaptive networks, M. A. Achterberg and P. Van Mieghem, *unpublished*

## A two-dimensional birth-and-death-approximation of the SIS epidemic model on adaptive networks

The Markovian SIS model describes the spread of an infectious disease in a heterogeneous population [1]. Each node represents a person or entity, and is either (I) infected or (S) susceptible at time t. Since each node has two possible viral states, the total Markov chain consists of 2^N states. Exact analysis is only feasible for small graphs or graphs with a lot of symmetry.

To overcome the large state space for general graphs, several approximation algorithms have been developed. One example is the birth-and-death (BDP) approximation, whereby the 2^N states are approximated with N+1 aggregated states. Each of the aggregated states represents a certain number of infected nodes. There are several ways to determine the transition rates of the BDP, either exact [2] or using simulations [3].

We intend to extend this formalism to adaptive networks, where the topology and the dynamics are coupled. The Generalised Adaptive SIS (G-ASIS) model [4] comprises several types of dynamics for the coupling between the topology and the dynamics. We intend to apply a two-dimensional birth-and-death approximation as an approximation for the 2^(N+L) states in the Adaptive Markov chain using the exact method [2] and from simulations [3], and decide under which criteria/parameters the approximation is reasonable and when it is not.

The student is expected to:

- Study and understand the G-ASIS model [1]
- Understand and implement the birth-and-death approximation from [2] and [3]
- Extend to the two-dimensional birth-and-death approximation for adaptive networks
- Determine under which conditions the approximation is reasonable
- Write a thesis report about the findings

This project is supervised by Massimo Achterberg, a PhD researcher in the NAS group.

[1] Performance Analysis of Complex Networks and Systems, Piet Van Mieghem, Cambridge University Press, 2014 (Chapter 17).

[2] Unified mean-field framework for SIS epidemics on networks, based on graph partitioning and the isoperimetric inequality , K. Devriendt and P. Van Mieghem, 2017, Physical Review E, Vol. 96, No. 5, p. 052314.

[3] PDE limits of stochastic SIS epidemics on networks, F. Di Lauro, J.-C. Croix, L. Berthouze and I.Z. Kiss, *Journal of Complex Networks*, 2020.

[4] Classification of link-breaking and link-creation updating rules in SIS epidemics on adaptive networks, M. A. Achterberg, J. L. A. Dubbeldam, C. J. Stam and P. Van Mieghem, 2020, Physical Review E 101(5), May, p.052302.

## Spectral clustering of the exact Markovian SIS model

The Markovian SIS model describes the spread of an infectious disease in a heterogeneous population [1]. Each node represents a person or entity, and is either (I) infected or (S) susceptible at time t. Since each node has two possible viral states, the total Markov chain consists of 2^N states. Exact analysis is only feasible for small graphs or graphs with a lot of symmetry.

To overcome the large state space for general graphs, several approximation algorithms have been developed. One example is the birth-and-death approximation, whereby the 2^N states are approximated with N+1 aggregated states. Each of the aggregated states represents a certain number of infected nodes [2]. Here, we take another approach and intend to approximate the 2^N states of the SIS Markov chain using spectral clustering, such as k-means [3].

Possibly, we can also look into a related question: Look at the probability of n<N nodes being infected, and comparing that with (n+1) nodes being infected. If the latter is much smaller, perhaps this feature can be used to select which states can be aggregated?

The student is expected to

- Understand the SIS Markov chain from [1]
- Apply spectral clustering or other clustering algorithms to the SIS Markov chain
- Determine and assess the approximation accuracy from spectral clustering
- Write a thesis report about the findings

This project is supervised by Massimo Achterberg, a PhD researcher in the NAS group.

[1] Performance Analysis of Complex Networks and Systems, Piet Van Mieghem, Cambridge University Press, 2014 (Chapter 17).

[2] PDE limits of stochastic SIS epidemics on networks, F. Di Lauro, J.-C. Croix, L. Berthouze and I.Z. Kiss, *Journal of Complex Networks*, 2020.

[3] Wikipedia, Spectral Clustering: https://en.wikipedia.org/wiki/Spectral_clustering

## Controller placement with optimal availability

**Introduction **

The facility location problem [1] is a well-known optimization problem and it appears in many contexts, such as minimizing firefighting response times, locating warehouses near factories, and optimizing the locations of content distribution nodes and proxy servers. We will look at the problem in terms of Software-Defined Networks (SDNs), which move the control logic off packet processing devices and onto external controllers. The resulting optimization problem, looks for the best placement of *k* controllers, such that a certain performance metric is optimized.

This problem has been studied by Heller at al. [2]. In particular they tackled the problem for two performance metrics, namely average delay and worst-case delay. These delay metrics are computed over all nodes in the network that do not have a controller in place. By using exhaustive search the impact controller placement on delay was determined for over 200 real-world telecom networks, that are available at the so-called Internet Topology Zoo [3].

**Project **

The aim of this project is to study optimal controller placement with respect to the performance metric *availability*. To this end, we will consider a (finite, undirected) graph *G* where nodes are always operational and edges are independently operational with probability *p* ∈ [0, 1]. For every sensor placement we want to compute, for every node, the probability that it can reach a controller. Although it is known that in general determining such probabilities is a NP-hard problem [4], methods based upon Binary Decision Diagrams or path-width are available, which can determine network availability for networks of moderate sizes [5]. Similar as for the delay metric, we will consider both the average and minimum availability over all nodes in the network that do not have a controller in place.

The supervisor of the project will be prof. dr. Robert Kooij. Duration of the project is nine months.

**Requirements **

The successful student is expected to have strong programming skills, and have some background in Network Science. The courses EE4C06 (Networking) and IN4341(Performance Analysis) are recommended.

**References**

[1] https://en.wikipedia.org/wiki/Facility_location_problem

[2] Brandon Heller, Rob Sherwood, Nick McKeown, The Controller Placement Problem, ACM SIGCOMM Computer Communication Review 42(4), 2012.

[3] http://www.topology-zoo.org/

[4] C.J. Colbourn, The combinatorics of network reliability, New York: Oxford University Press, 1987.

[5] Willem Pino, Teresa Gomes, and Robert Kooij, A Comparison between Two All-Terminal Reliability Algorithms, Journal of Advances in Computer Networks, vol. 3, no. 3, pp. 284-290, 2015.

## QUANTUM APPROACH FOR THE OPTIMAL ARBITRAGE PROBLEM

**Introduction**

Arbitrage strategies represent a wide range of ways to make a profit from differing prices for the same asset in different markets. A particular example hereof is cross-currency arbitrage: one might make a small profit when converting a sum of money from EUR to USD, then to GPB then back to EUR, because of the favorable transaction costs. This problem can be modeled in terms of a directed graph, where each node represents a currency and each weight marks the conversion cost. Finding the optimal arbitrage on such a graph is equivalent to finding the most profitable cycle (of to finding the negative cycle with the most negative weight). Checking if at least one arbitration opportunity exists can be solved in polynomial time with algorithms such as Bellman-Ford. However, finding the best arbitrage opportunity (the most negative cycle of arbitrary length) is NP-hard. This problem can be formulated as a Quadratic Unconstrained Binary Problem (QUBO) which can then be implemented on a quantum annealer such as D-Wave. An example can be found in [1].

**Project**

The goal of this project is to study whether quantum annealing can be a suitable method for solving the optimal arbitrage problem in comparison to classical algorithms. Typical questions to be answered include:

- can we expect better or faster solutions using quantum annealing?
- what is the scale of the problems that can be solved with quantum annealing?
- can the problem formulation be extended to non-fixed weights?
- in which other applications is this problem relevant?

You will perform this assignment at TNO (Netherlands Organization for Applied Scientific Research), where you will be supervised by an expert from the department of Cybersecurity and Robustness.

**Requirements**

You are in the final stages of your degree in artificial intelligence, computer science, mathematics, electrical engineering or a similar degree and have some background in the field of quantum. You have experience in programming, networks, etc. and an interest in artificial intelligence and optimisation. You are pragmatic and focused on making things work. Next to technical expertise we value high level of independence and ability to work remotely, communication skills and a results-driven attitude.

**To apply**

Part of the application is an interview at TNO. However, as a first step you need to contact the supervisor from NAS: Rob Kooij This email address is being protected from spambots. You need JavaScript enabled to view it.

**References**

[1] 1Qbit white-paper: Finding Optimal Arbitrage Opportunities Using a Quantum Annealer (2016)

## QUANTUM ALGORITHMS FOR THE SENSOR PLACEMENT PROBLEM

**Introduction**

The sensor placement problem, i.e., how many and where to place, in an optimal sense, a limited number of sensors in a network, occurs in several application domains, for instance in telecom networks [1] and in water distribution networks [2] Also in signal reconstruction, a sparse sensor placement problem arises. Here, the greedy method based on matrix QR factorization with column pivoting, see [3], [4], provides a particularly simple and effective sensor optimization method. However, for bigger problems and a high number of sensors to be placed, this can be computationally expensive. An alternative method is to use the quadratic programming feature selection (QPFS) approach of [5] and run this on the D-Wave quantum annealer.

**Project**

The goal of this project is to implement both the classical method and the quantum method and compare their performance for a variety of scenarios. Scenarios will include use cases from signal reconstruction, but it is also interesting whether the cases in the telecom and the water domain as well as analysis of synthetic networks can use the same methods.

You will perform this assignment at TNO (Netherlands Organization for Applied Scientific Research), where you will be supervised by an expert from the department of Cybersecurity and Robustness.

**Requirements**

You are in the final stages of your degree in artificial intelligence, computer science, mathematics, electrical engineering or a similar degree and have some background in the field of quantum. You have experience in programming, networks, etc. and an interest in artificial intelligence and optimisation. You are pragmatic and focused on making things work. Next to technical expertise we value high level of independence and ability to work remotely, communication skills and a results-driven attitude.

**To apply**

Part of the application is an interview at TNO. However, as a first step you need to contact the supervisor from NAS: Rob Kooij This email address is being protected from spambots. You need JavaScript enabled to view it.

**References**

[1] Brandon Heller, Rob Sherwood, Nick McKeown, The Controller Placement Problem, ACM SIGCOMM Computer Communication Review 42(4), 2012.

[2] Venkata Reddy Palleti, Shankar Narasimhan, R. Rengaswamy, Ravi Teja, S. Murty Bhallamudi, Sensor network design for contaminant detection and identification in water distribution networks, Computers & Chemical Engineering, Volume 87, 2016, pp. 246-256,

[3] Drmac, Zlatko, and Serkan Gugercin. "A new selection operator for the discrete empirical interpolation method---improved a priori error bound and extensions." SIAM Journal on Scientific Computing 38.2 (2016): A631-A648.

[4] Brunton, Steven L., and J. Nathan Kutz. Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press, 2019.

[5] Nguyen, Xuan Vinh, et al. "Effective global approaches for mutual information based feature selection." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.

## Computation of Laplacian eigenvalues by series expansion(s)

Recently, we found that some Laplacian eigenvalues can be computed by Euler summation of the classical matrix perturbation series of perturbed eigenvalues. An explicit four terms approximation is deduced (see [1]). Our discovery is acknowledged by experts in the field to be new and, hence, promising. The proposed program of the master thesis consists of: a) an elaborated accuracy test of the four terms approximation by comparing with the 'exact' eigenvalues, computed by eigenvalue solvers (either via Matlab or Mathematica). Find in which classes of graphs the four term approximation is reasonable and in which graph class it is worst. Compare the four term approximation with upper and lower bounds, available in the literature (also mentioned in [1]). b) Try to extend the theory to perturbation of multiple eigenvalues (it is possible!) c) Try to solve the other open questions in [1] in conclusion. We expect that better converging series for Laplacian eigenvalues can be deduced.

Obviously, the interested student should love mathematics and should be able to program well to run extensive simulations. The fundamental nature of this topic will lead, with high probability, to an international journal publication.

The thesis is guided by Professor Van Mieghem.

**References**

[1] Van Mieghem, P., 2021, "Some Laplacian eigenvalues can be computed by matrix perturbation", Delft University of Technology, report20211109

## Semantic Networks: Properties, Representations, and Applications

**Introduction**

Large tech companies (Facebook, Google) and telecom providers (KPN, Ziggo) each collect and process tons of textual data on what their users say online. For these companies, the automated analysis of the text is an important tool with applications in internet search, product recommendation, and sentiment analysis. A systematic analysis of unstructured textual data is, however, extremely non-trivial.

Examples of text analysis tasks that have been considered in the literature are analogy finding, sentiment analysis, and sentence completion. Unfortunately, existing algorithms do not perform optimally, and understanding text computationally remains a difficult task.

One way to analyze text is through the lens of a semantic network: Semantic network is, in most general terms, a network of words or phrases that represent nodes that are connected in case the nodes are semantically related (e.g., co-occur in text, are synonyms or antonyms, etc.). Semantic networks represent knowledge or ‘meaning’—and they could allow computers to reason like a real person would [1]. In general, semantic networks are expected to play an important role in creating better artificially intelligent systems.

**Track 1: Topological Properties of Semantic Networks**

While semantic networks are extensively used in text analysis, their general topological properties are poorly studied. Early results demonstrate that semantic networks are sparse (number of links is comparable to the number of nodes), heterogeneous (number of links per node varies widely), and possess hierarchical structure [2]. A better understanding of the structural properties of semantic networks will be instrumental in understanding their design principles and will ultimately lead to better performance of text analysis algorithms.

__Goal:__ In this project, you will learn to extract semantic network data from various sources and/or build semantic networks from raw text. You will catalog the extracted networks and analyze their structural properties.

__Prerequisites:__ The successful student is expected to have good programming and data extraction skills, background in *Probability and Statistics*, and *Network Science* (EE4C06, CS4195 (or analogous) courses are recommended).

**Track 2: Network Embedding Techniques**

Text analysis can be performed through network embedding: network nodes can be mapped to points in certain (latent) space, such that related words are mapped close to each other in the space.

Then, new relations between words can be uncovered through the identification of unconnected word pairs in the geometric vicinity of one another. Another network embedding application is analogy solving, e.g., find X, such that X to Germany is like Paris to France. A basic geometric idea to the analogy problem is to find a vector connecting France to Paris in the latent space, then draw the same vector from Germany. The resulting vector should point to X, which is expected to be Berlin.

Although in the past decade, researchers developed a lot of network embedding techniques, most of these techniques are ad-hoc and are not systematically evaluated.

__Goal:__ In this project, you will study existing network embedding techniques for semantic data analysis, learn basic principles of statistical inference, and conduct a systematic comparative analysis of the performance of network embedding techniques on different semantic tasks.

__Prerequisites:__ The successful student is expected to have strong programming skills, have a background in *Probability and Statistics* and *Network Science* (EE4C06, CS4195, or analogous courses are recommended).

**To apply: **

Contact Maksim Kitsak, This email address is being protected from spambots. You need JavaScript enabled to view it.

**References**

*[1] Sowa, J. F. (1987). Semantic networks. Wiley.*

*[2] Van Mieghem, P. (2014). Performance analysis of complex networks and systems. Cambridge University Press.*

## Attacks against AI/ML-based systems in communication networks

**Introduction**

AI/ML-based systems are considered to be an important component of the (future) fixed and mobile networks [1], [2]. They can help in detecting previously unknown security threats or make decisions about resource allocation, thus being a powerful tool in the network security, management and orchestration. At the same time, AI/ML-based systems can become a target on their own, potentially giving an attacker an enormous advantage due to a large span of control these systems are expected to have. Examples of attacks include inferring the model of the AI/ML system and fooling the threat detection system so that malicious traffic is classified as benign [3]; sending a series of (legitimate) requests to the resource management system resulting in resources exhaustion, and ultimately to a denial of service situation; poisoning the data used in the training of the AI/ML system so to be able to influence its behavior [4].

**Project**

The goal of this assignment is to investigate the weak spots of the AI/ML-based systems used in communication networks and possible mitigation means.

You will perform this assignment by TNO, as part of the cross-domain AINET/PROTECT project. The team is composed of researchers from the departments of Cybersecurity and Robustness, Networks as well as Data Science.

**Requirements**

You are in the final stages of your degree in artificial intelligence, computer science, mathematics, electrical engineering or a similar degree and have some track record in the field of cybersecurity. You have experience in programming, networks, Linux system etc. and an interest in machine learning and artificial intelligence. You are pragmatic and focused on making things work. Next to technical expertise we value high level of independence and ability to work remotely, communication skills and a results-driven attitude.

**How to apply**

Applications have to be submitted via TNO webpage, by attaching curriculum vitae and studiorum, and a motivational letter.

**References**

[2] F. Hussain, R. Hussain, S. A. Hassan and E. Hossain, "Machine Learning in IoT Security: Current Solutions and Future Challenges," in *IEEE Communications Surveys & Tutorials*, vol. 22, no. 3, pp. 1686-1721, thirdquarter 2020, doi: 10.1109/COMST.2020.2986444.

[3] Sagduyu, Yalin E., Yi Shi, and Tugba Erpek. "IoT network security from the perspective of adversarial deep learning." *2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)*. IEEE, 2019.

[4] N. Baracaldo, B. Chen, H. Ludwig, A. Safavi and R. Zhang, "Detecting Poisoning Attacks on Machine Learning in IoT Environments," *2018 IEEE International Congress on Internet of Things (ICIOT)*, San Francisco, CA, 2018, pp. 57-64, doi: 10.1109/ICIOT.2018.00015.

## Hyperbolic Representations of Complex Networks

**Introduction**

Network embeddings or mappings of networks to latent geometric spaces are standard tools in the arsenal of data analysis, and are routinely used in machine learning, visualization, network science, and graph theory. In general, a procedure of network embedding is mapping network nodes to points in a suitable latent metric space, such that latent distances between connected node pairs are smaller than those between disconnected node pairs. Latent-geometric distances are often interpreted as generalized measures of similarities: similar nodes are separated by small distances.

Applications of network embeddings include prediction of missing links, network reconstruction, as well as the analysis of dynamic processes taking place on the network, e.g., routing or epidemic spreading.

Real networks are often characterized by heterogeneous distributions of number of connections per node (node degrees): while the majority of nodes in a network have only a handful connections to other nodes, there are also hubs, nodes with abnormally large number of connections to other nodes [1]. Networks with heterogeneous degree distributions are often referred to as scale-free.

It has recently been shown that scale-free networks are best represented/embedded in spaces with non-Euclidean geometries. One example, to this end, is embedding of networks into spaces with constant negative curvature, i.e., hyperbolic spaces [2, 3].

The biggest challenge/bottleneck of hyperbolic network representations is the lack of scalable embedding methods. Existing hyperbolic embedding methods are either (i) accurate and non-scalable or (ii) scalable but not very accurate.

**Project**

Rely on latest developments in Statistical Inference and Machine Learning to develop a SCALABLE and ACCURATE hyperbolic embedding algorithm.

**Deliverables**

Hyperbolic embedding algorithm.

**Requirements**

This is a very challenging project at the intersection of Machine Learning, Network Science and non-Euclidean geometry. Successful student will have excellent coding skills, prior experience with Euclidean network embeddings, and some knowledge of Statistical Machine Learning.

**References**

*[1] A-L. Barabási, Network Science, Cambridge University Press, (2016).**[2] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná, Hyperbolic Geometry of Complex Networks, Physical Review E, 82 3 (2010).**[3] M. Boguná, F. Papadopoulos, D. Krioukov, Sustaining the internet with hyperbolic mapping, Nature communications, 1 1 (2010).*

# MSc Thesis Topics Currently Being Researched

## System identification: theory and applications

**Introduction**

Complex networks, in general, have two properties:

- The underlying topology, defined by a graph
- The processes/functions that happen in the network, determined by the dynamic equation

To analyse the dynamics of the entire network, both underlying topology and individual systems processes should be merged.

**Challenges**

- Given the measurements of input and output from a (partly) unknown system, to find the systems equations with the best descriptive/predictive power
- Extraction of the underlying topology
- How to steer the dynamics on a network towards some “desired” regime/state
- To apply this methodology to a real-world case and possibly extend/specialise the methods

**Reference**

*[1] L. Xiang, F. Chen, W. Ren and G. Chen, "Advances in Network Controllability," in IEEE Circuits and Systems Magazine, vol. 19, no. 2, pp. 8-32, Secondquarter 2019.*

## Effective Resistance versus Weight of Shortest Path

**Introduction**

In graph theory, two of the many metrics that are used to qualify a graph *G* are (i) Effective resistance [1,2,3] and (ii) weight of the Shortest path. An effective resistance *w** _{ij}* can be defined between each node pair (

*i*,

*j*) with

*i*,

*j*∈

*N*, where

*N*is the number of nodes in the graph. Likewise, the weight

*s*of the Shortest path can be defined between each node pair (

_{ij}*i*,

*j*), assuming that graph

*G*is connected. The set of

*w*

*for all (*

_{ij}*i*,

*j*) can be reflected in an Effective resistance matrix

*Ω*, with elements

*w*

*. Similarly, the set of all weights*

_{ij}*s*of the Shortest path can be reflected in a shortest path matrix

_{ij}*S*.

Shortest paths are used for *path based communication*, such as in data networks and communication networks, whereby information (data) will flow through the shortest path between two points in the network. Effective resistance, on the other hand, is used for *flow based communication*, such as in electrical circuits, whereby current will flow through all possible paths, in accordance with the resistance of the respective paths.

Both the effective resistance matrix *Ω* and the shortest path matrix *S* provide a description of the graph *G*. The distribution of the elements in *Ω* and the distribution of the elements in *S* provide insight in how effective transport in flow and path networks is.

**Challenges**

A research question is formed by the relation between *Ω* and *S* for a graph *G*. In other words, how are the effective resistance values, for all pairs of nodes, correlated with the corresponding shortest paths. We define hereto the difference matrix *C* = *S* – *Ω*. Each element *c _{ij}* indicates how much the shortest path and the effective resistance differ for node pair (

*i*,

*j*).

The challenge for this MSc thesis project is to quantify, visualize and explain the relation between *S* and *Ω* for different classes of graph.

**Approach**

The suggested approach is as follows:

- Graph simulation: generate graphs of various class, and determine
*Ω*,*S*and*C*for each generated graph. Study the corresponding probability distribution of their elements. - Analysis (I): apply general analysis of the
*C*matrix in relation to the graph class, e.g. using probability distribution of*c*._{ij} - Analysis (II): derive a possible cause of an unexpected high
*c*value or an unexpected low_{ij}*c*value for a node pair (_{ij}*i*,*j*). - Graph qualification: define a suitable qualification of a graph through the
*C*matrix; for example, the*C*matrix is a non-negative matrix and its spectrum may contain interesting information. - The effective graph resistance
*R*captures the entire_{G}*Ω*matrix in a single scalar value. Propose a similar metric*S*for the_{G}*S*matrix*.*

**References**

[1] D.J. Klein, M. Randic, Resistance distance, J. Math. Chem. 12 (1993) 81–95.

[2] D.J. Klein, Resistance-distance sum rules, Croatica Chemica Acta 73(2) (2002).

[3] P. Van Mieghem, K. Devriendt and H. Cetinay, 2017, "Pseudoinverse of the Laplacian and best spreader node in a network", Physical Review E, vol. 96, No. 3, p 032311.