Available Topics

The NAS research section invites MSc students to join our group, and do exciting research on the forefront of Network Science. If you are interested in one of these projects, contact us by e-mailing Maksim Kitsak at This email address is being protected from spambots. You need JavaScript enabled to view it..

If you already have your own project in mind, propose it to us! We will advise you on feasibility, and judge whether there’s a match.

See the  instructions for the information on how to apply for a MSc thesis at the NAS Group.

Do NOT email Prof. Van Mieghem directly.


Epidemic Walks

Introduction

Consider a walk Wp of length k of a person p in its contact graph G, which is a subgraph of the human population contact graph,

Wp = {n1(t1), n2(t2), ... , nk(tk)}

where nj (tj) is the node ID of person j at time tj of the encounter. Mobile apps can construct the walk Wp (by bluetooth interactions) and store the node ID nj (tj) in an encrypted and secure way (by encrypting the MAC-address of the smart phone), which is not directly traceable to the name of person j. In the ideal world, we also would like to have the corresponding infection state Xj of each encounterd person j, in which Xj = 0 if person j is healthy, else Xj = 1. Hence, to the walk Wp with nodal IDs, there corresponds an infection vector,

Ip = {X1 (t1) , X2 (t2), .... , Xk (tk)}

After a walk Wp of k contacts encounters, the person p is assumed to be infected (with high probability) if at least one the elements in his infection vector Ip equals Xj (tj) = 1. In practice, however, the infection vector Ip is not known, for three basic reasons:

  • if people also report their health condition in their app by specifying X, then the reported infection vector Ip;reported  would only contain 0 bits, because infected people should stay at home;
  • there is a non-negligible fraction of asymptomatic people with Xj = 1 who do not know that they are infected and infectious, and thus report a (wrong) healthy state Xj;reported = 0;
  • there are the exposed and already infectious, hence Xj (tj) = 1 and Xj;reported (tj) = 0, but they only experience infection after some delay Tj , implying that they update their state later into Xj;reported (t+Tj) = 1.

Challenges

  • Given these uncertainties, what is the probability that person p is infected at time t ≥ tk, after a series of k contacts?
  • Vice versa, if person p figures out to be infected, how many of its encountered persons in Wp are then infected by him?
  • Is it possible to deduce the infection tree over all walks that have met an infected person?

 


Measuring the Recoverability of Networks

Introduction

Network recoverability refers to the ability of a network to return to a desired performance level after suffering malicious attacks or random failures. In [1], we proposed a general topological approach and recoverability indicators to measure the network recoverability in two scenarios: 1) recovery of damaged connections and 2) any disconnected pair of nodes can be connected to each other. 

Challenges

  • Measure and compare the recoverability of different network types, such as random networks, scale free networks, regular networks as well as real-world networks, etc. Among each type of networks, how do the network parameters influence the recoverability of these networks? This work will give us insight into how to design networks with high recoverability.
  • How to analytically measure the recovery efficiency of different recovery strategies? Can we find the best strategy for all types of networks we are investigating? How good is this strategy compared with greedy recovery strategy?
  • You are encouraged to steer the master thesis topic towards your own interest and skills. For instance, you could propose analytical approximations to estimate the average failure and recovery process; you can use machine learning method to estimate the recoverability of networks and compare the results with simulation results.

Reference

[1] He, Z., Sun, P. and P. Van Mieghem, 2019, "Topological approach to measure network recoverability", best paper award, The 11th International Workshop on Resilient Networks Design and Modeling, 14-16 October, Nicosia, Cyprus.

 


Hyperbolic Representations of Complex Networks

Introduction

Network embeddings or mappings of networks to latent geometric spaces are standard tools in the arsenal of data analysis, and are routinely used in machine learning, visualization, network science, and graph theory. In general, a procedure of network embedding is mapping network nodes to points in a suitable latent metric space, such that latent distances between connected node pairs are smaller than those between disconnected node pairs. Latent-geometric distances are often interpreted as generalized measures of similarities: similar nodes are separated by small distances. 

Applications of network embeddings include prediction of missing links, network reconstruction, as well as the analysis of dynamic processes taking place on the network, e.g., routing or epidemic spreading.

Real networks are often characterized by heterogeneous distributions of number of connections per node (node degrees): while the majority of nodes in a network have only a handful connections to other nodes, there are also hubs, nodes with abnormally large number of connections to other nodes [1]. Networks with heterogeneous degree distributions are often referred to as scale-free.

It has recently been shown that scale-free networks are best represented/embedded in spaces with non-Euclidean geometries. One example, to this end, is embedding of networks into spaces with constant negative curvature, i.e., hyperbolic spaces [2, 3]. 

The biggest challenge/bottleneck of hyperbolic network representations is the lack of scalable embedding methods. Existing hyperbolic embedding methods are either (i) accurate and non-scalable or (ii) scalable but not very accurate.

Project

Rely on latest developments in Statistical Inference and Machine Learning to develop a SCALABLE and ACCURATE hyperbolic embedding algorithm. 

Deliverables

Hyperbolic embedding algorithm.

Requirements

This is a very challenging project at the intersection of Machine Learning, Network Science and non-Euclidean geometry. Successful student will have excellent coding skills, prior experience with Euclidean network embeddings, and some knowledge of Statistical Machine Learning. 

References

[1] A-L. Barabási, Network Science, Cambridge University Press, (2016).
[2] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná, Hyperbolic Geometry of Complex Networks, Physical Review E, 82 3 (2010).
[3] M. Boguná, F. Papadopoulos, D. Krioukov, Sustaining the internet with hyperbolic mapping, Nature communications, 1 1 (2010).

 


Non-Consensus Opinion models with Byzantine Nodes

Introduction

Social dynamics has been studied extensively in recent years using concepts and methods based on ideas from statistical physics. An important approach is complex networks, where the nodes represent agents and the links represent the interactions between them. There is considerable current interest in the problem of how two competing opinions evolve in populations [1], [2].

Li et al. [3] proposed the so-called Non-Consensus Opinion (NCO) model. The models assume a random initialization where a fraction of nodes has one opinion (let us say Black) and all other nodes have the opinion White. Then at each iteration step considers the opinions of all its neighbors, also taking into account its own opinions. The opinion of the node is changed if and only if the number of opposite opinions is larger than its own opinion. It is shown in [3] that the NCO model allows for stable coexistence of two opinions by forming clusters of agents holding the same opinion.

Project

The aim of this project is to study the properties of the NCO model, under the assumption that a fraction of the nodes in the network are so-called Byzantine nodes. Every time such nodes communicate their opinion to their neighbors, they will lie about it. So far example, a Byzantine node with opinion Black will tell its neighbors it has opinion White.

The project seeks answers to the following questions:

  • What is typical steady-state behavior for the Byzantine-NCO model?

  • What is the relation between initial parameter settings and steady-state behavior?

  • How are the dynamics of the Byzantine-NCO model on toy network models and on real-world networks?

  • What is a realistic application of the Byzantine-NCO model?

The supervisor will be prof. dr. Robert Kooij. He already did some initial research on the topic and has some software available that can be used as starting point for the project. Duration of the project is nine months. 

Deliverables

    syste
  • User-friendly tool for the simulation of the Byzantine-NCO model

  • Report containing the results of the project

References

[1] C. Castellano et al., Rev. Mod. Phys. 81, 591 (2009)
[2] S. Galam, Europhys. Lett. 70, 705 (2005).
[3]  Qian Li, Lidia A Braunstein, Huijuan Wang, Jia Shao, H Eugene Stanley and Shlomo Havlin. 2013. Non-consensus opinion models on complex networks. Journal of Statistical Physics 151, 1-2 (2013), 92–112.

 


Properties of Node Reliability Polynomials

Introduction

The most common measure of robustness of a network to random failures of components is all-terminal reliability. For a (finite, undirected) graph G where nodes are always operational and edges are independently operational with probability p ∈ [0, 1], the all-terminal reliability of G is the probability that all of the nodes can communicate with one another. The all-terminal reliability of a graph gives rise to a polynomial in p, and algorithms for calculating and efficiently estimating this function have been a major focus in the area, see [1]. There is an analogous notion for robustness in a network with node rather than edge failures, see [2]. Given a graph G in which edges are always operational and nodes are independently operational with probability p, the node reliability of G, is the probability that at least one node is operational and that all of the operational nodes can communicate with one another. The node reliability of G also gives rise to a polynomial in p, the so called node reliability polynomial.

Project

The aim of this project is to study properties of node reliability polynomials. In particular, the project will look at node reliability from three perspectives: theoretical, computational and applied. 

For the theoretical perspective we will look for a method to construct two graphs, distinct, but with the same number of nodes N and links L, such that their node reliability polynomials intersect an arbitrary number of times. This result would generalize the results of [3], where a construction is given for pairs of graphs with an arbitrary number of crossing for the all-terminal reliability polynomials.

For the computational perspective we study how the exact method for the computation of all-terminal reliability suggested in [4], can be adjusted, such that it can be used to compute node reliability as well.

Note that the method in [4] depends on concepts such as decomposition and path-width.

Finally, for the applied perspective, we want to apply the concept of node reliability to a real-world network. In particular, we want to use the algorithm obtained from the computational perspective to some critical infrastructure, cf. the gas distribution network that was studied in [5].

The supervisor will be prof. dr. Robert Kooij. In addition, prof. Jason Brown, from the Dalhousie University in Canada, will be involved, as he is an expert of (node) reliability polynomials. The involvement of prof. Brown will be mainly through regular Skype sessions and e-mail contact. For the realization of the applied perspective, we will seek contact with owners of critical infrastructures, such as telecom networks, power grids or water distribution networks. Duration of the project is nine months. 

Deliverables

  • User-friendly tool for the computation of node reliability

  • Report containing the results of the project

References

[1] C.J. Colbourn, The combinatorics of network reliability, New York: Oxford University Press, 1987.
[2] Jason Brown, Lucas Mol, On the roots of the node reliability polynomial, On the roots of the node reliability polynomial. Networks. 68. 10.1002/net.21697, 2016.
[3] J.I. Brown, Y. Koç and R.E. Kooij, Reliability polynomials crossing more than twice, Proc. of 3rd Int. Workshop on Reliable Network Design and Modeling, Budapest, Hungary, Oct. 5-7, 2011, pp.135-140.
[4] Willem Pino, Teresa Gomes, and Robert Kooij, A Comparison between Two All-Terminal Reliability Algorithms, Journal of Advances in Computer Networks,  vol. 3, no. 3, pp. 284-290, 2015.
[5] W. Pino, D. Worm, R. van der Linden, R.E. Kooij, “The Reliability of a Gas Distribution Network: A Case Study”, Proc. of 2016 Int. Conf. on System Reliability and Science, Paris, France, Nov. 15-18, 2016.

 


Solving Differential Equations of Epidemic Models around the Epidemic Threshold

Introduction

Various epidemic models have been proposed to understand and mitigate the spread of infectious diseases, such as gonorrhoea, syphilis, or the flu. A common epidemic modelling approach is to split the total population into groups and describe the viral spread among these groups by non-linear differential equations. Then, each group has a viral state that equals the fraction of infected individuals in the respective group, and the contact network describes how likely an infection is transmitted from one group to another group. A crucial quantity in epidemic models is the epidemic threshold: Above the epidemic threshold, the virus dies out, and below the epidemic threshold, the virus reaches an endemic state. Recently, we have solved the non-linear NIMFA differential equations around the epidemic threshold, that have been proposed by Lajmanovich and Yorke [1] as an epidemic model for gonorrhoea.

Challenges

There are two possible directions for this master thesis:

  • Extend our results to directed contact networks: In our solution for the NIMFA differential equations, we considered that the underlying contact network is undirected. However, simulations indicate that the solution also holds (approximately?) for directed contact networks, but a proof is an open challenge.
  • Consider other epidemic models: We focussed on the NIMFA differential equations, whereby every group consists of individuals that are either infected or healthy. Other epidemic model consider that individuals could also be, for instance, removed from the population (by dying or gaining immunity). Can our solution approach be applied to differential equations from epidemic models other than NIMFA? 

Reference

[1] Lajmanovich A, Yorke JA (1976) A deterministic model for gonorrhea in a nonhomogeneous population. Mathematical Biosciences 28(3-4):221-236

 


System identification: theory and applications

Introduction

Complex networks, in general, have two properties:

  • The underlying topology, defined by a graph
  • The processes/functions that happen in the network, determined by the dynamic equation

To analyse the dynamics of the entire network, both underlying topology and individual systems processes should be merged.

Challenges

  • Given the measurements of input and output from a (partly) unknown system, to find the systems equations with the best descriptive/predictive power
  • Extraction of the underlying topology
  • How to steer the dynamics on a network towards some “desired” regime/state
  • To apply this methodology to a real-world case and possibly extend/specialise the methods

Reference

[1] L. Xiang, F. Chen, W. Ren and G. Chen, "Advances in Network Controllability," in IEEE Circuits and Systems Magazine, vol. 19, no. 2, pp. 8-32, Secondquarter 2019.

 


MSc Thesis Topics Currently Being Researched

 

Fair comparison among different network reconstruction methods

Introduction

Data based identification of complex networks has been an active area of research in many scientific disciplines including physical, biological, computer, and social sciences. The susceptible-infected-susceptible (SIS) model is one of the most classical epidemic models to study a variety of spreading behaviours in social and computer science. So far, many computationally efficient methods have been proposed to reconstruct networks using binary time series generated by SIS dynamics. Nevertheless, these methods have not been discussed with respect to their limitations. The goal is to have a comparative analysis of different existing network reconstruction methods. The relative results can be further applied to select the appropriate reconstruction algorithm according to the data characteristics. 

Challenges 

  • Many different algorithms require to be studied.
  • The influence of many parameters on reconstruction algorithms require to be studied.
  • Both the continuous-time SIS process and the discrete-time SIS process will be considered.

Reference

 [1] Peixoto, Tiago P. "Network reconstruction and community detection from dynamics." arXiv preprint arXiv:1903.10833 (2019).

 


Network Reconstruction from Observing an Epidemic Outbreak with Mobile Individuals

Introduction

Modern epidemiology is the study of general spreading phenomena, such as the spread of opinions, trends, fake news or posts on online social media. These phenomena have two defining characteristics. First, individuals can be either infected (with the disease, trend, etc.) or healthy. Second, an infection may occur from an infected to a healthy individual if the two individuals are in contact. For instance, the contact between two individuals could be a friendship, or a follower on online social media. However, in many real-world applications, we do not know which individuals have contact with another, which puts a hard constraint on controlling an epidemic outbreak. Network reconstruction methods aim to estimate the contact network, which describes whether any pair of two individuals is in contact or not. The particular focus of this master thesis is to develop a network reconstruction method that accounts for the mobility of individuals, for instance by considering that individuals commute between their work place and their home.

Challenges

  • Literature study: get acquainted with epidemic models on networks and state-of-the-art network reconstruction methods. What are their advantages and drawbacks?
  • Problem formulation: Define an epidemic process that accounts for the mobility of individuals, and formulate the network reconstruction problem (for instance, in the maximum likelihood sense).
  • Algorithmic design: Propose a (heuristic, approximate or optimal) network reconstruction method.
  • You are encouraged to steer the master thesis topic towards your personal interest and skills. For instance, you could focus on one of the two following:
    1. Apply the network reconstruction framework to real-world data (online social network data, or epidemic data)
    2. Explore theoretical aspects (such as reconstructability criteria). 

 


Measuring the robustness of network controllability

Introduction

A system is controllable if it can be driven from any arbitrary state to any desired state in finite time under the control of the driver nodes which are attached to external inputs. The minimum number of driver nodes which gain full controllability after failures or attacks have occurred, can be chosen as the robustness metric.

Challenge

  • Though the robustness of network controllability has been studied in the literature, existing approaches tend to study random failure in terms of average-case behavior, giving no idea of how badly network controllability can degrade purely by chance. How can we get the worst-case and best-case for the robustness metric under random failures? Can we use the machine learning method to get the approximations for the worst-case, the best-case as well as the average-case?
  • We have robustness envelope method to measure the robustness of networks. How to apply this method to evaluating the performance of the approximations we obtained?
  • There are some special attack strategies, which can be seen as the combination of random failure and targeted attack. What is the impact of these special strategies on the robustness of network controllability?
  1. Can we find which is the most dangerous or effective attack strategy in terms of their impact on the network controllability?

References

[1] Sun, P., R. E. Kooij, Z. He and P. Van Mieghem, 2019, "Quantifying the Robustness of Network Controllability", 4th International Conference on System Reliability and Safety (ICSRS), 20-22 November, Rome, Italy.

[2] Trajanovski, S., J. Martin-Hernandez, W. Winterbach and P. Van Mieghem, 2013, "Robustness Envelopes of Networks" , Journal of Complex Networks, Vol. 1, pp. 44-62.

 


Link weight tolerance in communication networks

Introduction

We consider a weighted graph G(N,L) that represents a communications network, comprising N nodes, connected through L weighted links. A data communication flow through the network causes load on each of the nodes and on each of the links. The load on a node or a link is defined as the Betweenness [1]. The betweenness is dependent on the shortest path between each node pair; the shortest paths are, in turn, dependent on the link set and on the link weights.

We define the shortest paths matrix: the shortest path between each node pair. This is needed for, among others, OSPF based communication networks. When a link weight changes, this may affect the shortest path matrix. Each link has a tolerance zone; the link weight may vary within the tolerance boundaries without affecting the shortest path matrix.

Challenges

A link weight change may affect the shortest path for one or more node pairs. So long as the link weight varies within its tolerance boundaries, the set of shortest paths is unaffected and so the betweenness (load) of the nodes and links is unaffected. We define a “tolerance vector of TG”, containing the tolerance (weight boundaries) of each link. TG forms an indication of the “criticality” of the weight of the individual links, in order to maintain a stable system.

TG is defined for a specific graph instance. When the weight of a link changes (within its tolerance boundaries), this affects the tolerance of other links. TG is hence a function of the process of changing link weights. Goal is to achieve a stable system (network). Changing the weight of one link may improve the link tolerance of other links.

The stability of a system may be expressed in a single metric, |TG|ave, e.g. the average link weight tolerance. When a link weight changes, |TG|ave varies.

  • What relation is there between a change in link weight and TG.
  • Links with a high betweenness value are generally “critical” links in a network. What relation is there between link betweenness and link tolerance.

Approach

Link weight tolerance is largely unexplored.

  • The student is expected to embark on rudimentary graph theoretic study, to describe and quantify the suggested metrics.
  • The student is further expected to conduct a set of experiments on a defined class of graphs, such as Erdős-Rényi random graph or Barabási-Albert scale free graph.

References

 [1] Freeman; A set of measures of centrality based on betweenness; Sociometry. 40 (1): 35–41 (1977)

Contact

Mekelweg 4, 9th Floor
2628 CD, Delft, The Netherlands
Tel : +31 (0)15 27 86111