Commit 363b3a78 authored by Polina Rozenshtein's avatar Polina Rozenshtein
Browse files

tkde

parents
In this paper we study a problem of determining when entities are active based
on their interactions with each other. More formally,
we consider a set of entities $V$
and a sequence of time-stamped edges $E$ among the entities.
Each edge $(u,v,t)\in E$ denotes an interaction between entities $u$ and $v$ that takes place at time $t$.
We view this input as a {\em temporal network}.
We then assume a simple {\em activity model} in which each entity is {\em active} during a short time interval.
An interaction $(u,v,t)$ can be explained if at least one of $u$ or $v$ are active at time $t$.
Our goal is to reconstruct the activity intervals, for all entities in the network,
so as to explain the observed interactions.
This problem, which we refer to as the {\em network-untangling problem},
can be applied to discover timelines of events from complex interactions among entities.
We provide two formulations for the network-untangling problem:
($i$)~minimizing the total interval length over all entities, and
($ii$)~minimizing the maximum interval length.
We show that the sum problem is \np-hard,
while, surprisingly, the max problem can be solved optimally in linear time,
using a mapping to \prbtwosat.
For the sum problem we provide efficient and effective algorithms based on realistic assumptions.
Furthermore, we complement our study with an
evaluation on synthetic and real-world datasets,
which demonstrates the validity of our concepts and the good performance of our algorithms.
\ No newline at end of file
\spara{Acknowledgements.}
This work was supported by
the Tekes project ``Re:Know,''
the Academy of Finland project ``Nestor'' (286211),
and the EC H2020 RIA project ``SoBigData'' (654024).
\ No newline at end of file
\section{Computational complexity and algorithms}
\label{sec:algorithm}
Surprisingly, while \prbsum is an \np-hard problem,
\prbmax can be solved optimally efficiently.
The optimality of \prbmax is a result of the algorithm presented in Section~\ref{sec:budget}.
In this section we establish the complexity of \prbsum,
and we present two efficient algorithms for \prbsum and \prbmax.
\begin{proposition}
\label{prop:nphard}
The decision version of the \prbsum problem is \np-complete.
Namely, given a temporal network $G = (V, E)$ and a budget $\ell$,
it is \np-complete to decide whether there is timeline $\tl^*=\set{\aint{u}}_{u\in V}$
that covers $G$ and has $\spn{\tl^*}\le\ell$.
\end{proposition}
\begin{proof}
We will prove the hardness by reducing \prbvertex to \prbsum.
Assume that we are given a (static) network $H = (W, A)$ with $n$ vertices $W = \{ w_1, \ldots, w_n \}$ and a budget $\ell$.
In the \prbvertex problem we are asked to decide whether there exists a subset $U \subseteq W$
of at most $\ell$ vertices ($|U|\le\ell$) covering all edges in $A$.
We map an instance of \prbvertex to an instance of \prbsum by creating a temporal network $G = (V, E)$, as follows.
The vertices $V$ consists of $2n$ vertices:
for each $w_i \in W$, we add vertex $v_i$ and $u_i$.
The edges are as follows:
For each edge $(w_i, w_j) \in A$, we add a temporal edge $(v_i, v_j, 0)$ to $E$.
For each vertex $w_i \in W$, we add two temporal edges $(v_i, u_i, 1)$ and $(v_i, u_i, 2n + 1)$ to $E$.
Let $\tl^*$ be an optimal timeline covering $G$.
We claim that $\spn{\tl^*} \leq \ell$ if and only if there is a vertex cover of $H$ with $\ell$ vertices.
To prove the \emph{if} direction, consider a vertex cover of $H$, say $U$, with $\ell$ vertices.
Consider the following coverage:
cover each $u_i$ at $2n + 1$, and each $v_i$ at $1$.
For each $w_i \in U$, cover $v_i$ at~$0$.
The resulting intervals are indeed forming a timeline with a total span of~$\ell$.
To prove the other direction,
first note that if we cover each $v_i$ by an interval $[0, 1]$ and each $u_i$ by an interval $[2n + 1, 2n + 1]$,
then this yields a timeline $\tl^*$ covering $G$.
The total span intervals $\tl^*$ is $n$.
Thus, $\spn{\tl^*} \leq n$.
This guarantees that if $0 \in \aint{v_i}$, then $2n + 1 \notin \aint{v_i}$, so $2n + 1 \in \aint{u_i}$.
This implies that $1 \notin \aint{u_i}$ and so $1 \in \aint{v_i}$.
In summary, if $0 \in \aint{v_i}$, then $\len{\aint{v_i}} = 1$.
This implies that if $\spn{\tl^*} \leq \ell$,
then we have at most $\ell$ vertices covered at $0$.
Let $U$ be the set of those vertices.
Since $\tl^*$ is timeline covering $G$, then $U$ is a vertex cover for $H$.
\qed
\end{proof}
\subsection{Iterative method based on inner points}
As we saw, \prbsum is an \np-hard problem.
The next logical question is whether we can approximate this problem.
Unfortunately, there is evidence that such an algorithm would be highly non-trivial:
we can show that if we extend our problem definition to hyper-edges---the coverage
then means that one vertex needs to be covered per edge---then such a problem
is inapproximable. This suggests that an approximation algorithm would have to rely
on the fact that we are dealing with edges and not hyper-edges.
Luckily, we can consider meaningful subproblems.
Assume that we are given a temporal network
$G = (V, E)$ and we also given a set of time point $\set{m_v}_{v \in V}$,
i.e., one time point $m_v$ for each vertex $v\in V$,
and we are asked whether we can find an optimal activity timeline $\tl=\set{\aint{u}}_{u\in V}$
so that the interval $\aint{v}$ of vertex $v$ contains the corresponding time point $m_v$,
i.e., $m_v\in\aint{v}$, for each $v \in V$.
Note that these inner points can be located \emph{anywhere} within the interval
(not just, say, in the center of the interval).
This problem definition is useful when we know one time point that each vertex was active,
and we want to extend this to an optimal timeline.
We refer to this problem as \prbint.
\begin{problem}(\prbint)
\label{problem:interior}
Given a temporal network $G = (V, E)$
and a set of inner time points $\set{m_v}_{v \in V}$,
find a timeline $\tl=\set{\aint{u}}_{u\in V}$
that covers $G$,
satisfies $m_v \in \aint{v}$ for each $v \in V$,
and minimizes the sum-span $\spn{\tl}$.
\end{problem}
Interestingly, we can show that the \prbint problem can be solved approximately, in {\em linear time},
within a factor of 2 of the optimal solution.
The 2-approximation algorithm is presented in Section~\ref{sec:middle}.
Being able to solve \prbint, motivates the following algorithm for \prbsum,
which uses \prbint as a subroutine:
initialize $m_v = (\min\tst{v} + \max\tst{v}) / 2$
to be an inner time point for vertex~$v$;
recall that $\tst{v}$ are the time stamps of the edges containing~$v$.
We then use our approximation algorithm for \prbint to obtain a set of intervals $\set{\aint{v}} = \set{ [\sint{v},\eint{v}]}_{v\in V}$.
We use these intervals to set the new inner points, $m_v = (\sint{v} + \eint{v}) / 2$,
and repeat until the score no longer improves.
We call this algorithm \alginterior.
\subsection{Iterative method based on budgets}
Our algorithm for \prbmax also relies on the idea of using a subproblem that is easier to solve.
%and then use
%and then obtain a solution for \prbsum by iteratively refining the solution of the subproblem.
In this case, we consider as subproblem an instance in which,
in addition to the temporal network $G$,
we are also given a set of budgets $\set{\budget{v}}$ of interval durations;
one budget $\budget{v}$ for each vertex $v$.
The goal is to find a timeline $\tl=\set{\aint{u}}_{u\in V}$ that covers the temporal network $G$
and the length of each activity interval $\aint{v}$ is at most $\budget{v}$.
We refer to this problem as \prbbudget.
\begin{problem}(\prbbudget)
\label{problem:budget}
Given a temporal network $G = (V, E)$
and a set of budgets $\set{\budget{v}}_{v \in V}$,
find a timeline $\tl=\set{\aint{u}}_{u\in V}$
that covers $G$ and
satisfies $\len{\aint{v}} \leq \budget{v}$ for each $v \in V$.
\end{problem}
Surprisingly, the \prbbudget problem can be solved {\em optimally} in {\em linear time}.
The algorithm is presented in Section~\ref{sec:budget}.
Note that this result is compatible with the \np-hardness of \prbsum,
since here we know the budgets for \emph{individual} intervals, and thus,
there are an exponential number of ways that we can distribute the total budget among the individual intervals.
We can now use binary search to find the optimal value $\diam{\tl}$.
We call this algorithm \algbudget.
To guarantee a small number of binary steps, some attention is required:
Let $T = t_1, \ldots, t_m$ be all the time stamps, sorted.
Assume that we have $L$, the largest known infeasible budget and $U$, the smallest known feasible budget.
To define a new candidate budget, we first define $W(i) = \set{t_j - t_i \mid L < t_j - t_i < U}$.
The optimal budget is either $U$ or one of the numbers in $W(i)$. If every $W(i)$ is empty, then the answer is $U$.
Otherwise, we compute $m(i)$ to be the median of $W(i)$, ignore any empty $W(i)$. Finally, we test the weighted median
of all $m(i)$, weighted by $\abs{W(i)}$, as a new budget. We can show that at each iteration $\sum \abs{W(i)}$ is reduced
by $1/4$, that is, only $\bigO{\log m}$ iterations is needed. We can determine the medians $m(i)$ and the sizes $\abs{W(i)}$
in linear time since $T$ is sorted, and we can determine the weighted median in linear time by using a modified median-of-medians
algorithm. This leads to a $\bigO{m \log m}$ running time.
However, in our experimental evaluation, we use a straightforward binary search by testing $(U + L) / 2$ as a budget.
%If $\abs{W(i)} \leq 1$, and all non-empty sets have only one value, say $b$, then we can show that $b$ is the optimal budget.
%Otherwise, we compute $m(i)$, the median of $W(i)$, and define $q(i)$ as an index of $T$, sorted using $m(i)$.
%We then define $j$ to be the smallest index such that $\sum_{i = 1}^j \abs{W(i)} \geq \sum_{i = 1}^m \abs{W(i)} / 2$,
%and test $m(j)$ as a new budget. We can show that after every iteration the quantity $\sum_{i = 1}^m \abs{W(i)}$
%reduces by a constant factor. This leads to at most $\bigO{\log m}$ iterations. A single iteration can be
%done in $\bigO{m \log m}$ time (we do not instantiate $W(i)$ explicitly), leading to a total time of
%$\bigO{m \log^2 m}$. However, in practice, we use a straightforward binary search
%by testing $(U + L) / 2$ as a budget.
%an algorithm for determining a new middle point is given in Algorithm~\ref{alg:newmiddle}.
%This algorithm runs in $\bigO{m \log m}$ time. We can show that if $W = 0$, then
%$U$, the smallest feasible budget, is the correct answer. On the other hand, if $W > 0$, then
%we can show that the next call will have it reduced by at least $1/4$. This leads to a running
%time of $\bigO{m \log^2 m}$. However, in practice, we use a straightforward binary search
%by testing $(U + L) / 2$ as a budget.
%The number of binary steps needed is $\bigO{\log \rho}$, where $\rho$
%is the ratio of whole interval divided by the ratio of the smallest non-zero interval.
%In practice, $\bigO{\log \rho}$ is small; however it is possible to construct a theoretical case
%where the number of calls is significant. Such a case can be solved by considering all possible
%intervals---there are $\bigO{m^2}$ of them---sort them based on duration, and do binary search
%on the list of interval durations.
%In practice, however it is better to use
%the normal binary search, which leads to $\bigO{m \log \rho}$ running time.
%\begin{algorithm}
%\caption{
%Input: $T = \set{t_i}$ are the sorted timestamps, $U$ and $L$ are upper and lower bounds for the budget}
%\label{alg:newmiddle}
%$e(i) \define \max \set{j \mid t_j - t_i \leq U}$\;
%$b(i) \define \min \set{j \mid t_j - t_i \geq L} \cup \set{e(i)}$\;
%$m(i) \define $ median of $t_{b(i)} , \ldots, t_{e(i)}$; \quad
%$q(i) \define $ indices of $T$ sorted by $m(i) - t_i$\;
%$w(i) \define \max (e(i) - b(i) - 1, 0)$;
%$W \define \sum_i w(i)$\;
%\Return $m(j) - t_j$, where $j$ is the smallest index such that $\sum_{i = 1}^j w(q(i)) \geq W / 2$\;
%\end{algorithm}
%On the other hand, the optimality of \prbbudget yields the optimality of \prbmax.
%Using the optimal algorithm of \prbbudget as subproblem,
%our second iterative algorithm for \prbsum works as follows.
%...{\bf XXX}...
\section{Exact algorithm for \prbbudget}
\label{sec:budget}
In this section we develop a linear-time algorithm for the problem \prbbudget.
Here we are given a temporal network $G$,
and a set of budgets $\set{\budget{v}}$ of interval durations,
and all activity intervals should satisfy $\len{\aint{v}}\le\budget{v}$.
%As mentioned before,
%$\prbbudget can be solved optimally by a linear-time algorithm.
The idea for this optimal algorithm is to map \prbbudget into \prbtwosat.
To do that we introduce a boolean variable $x_{vt}$ for each vertex $v$ and for each timestamp $t \in \tst{v}$.
To guarantee the solution will cover each edge $(u, v, t)$ we add a clause $(x_{vt} \lor x_{ut})$.
To make sure that we do not exceed the budget we require that
for each vertex $v$ and each pair of time stamps $s, t \in \tst{v}$
such that $\abs{s - t} > b_v$ either $x_{vs}$ is false or $x_{vt}$ is false, that is,
we add a clause $(\neg x_{vs} \lor \neg x_{vt})$.
It follows immediately, that \prbbudget has
a solution if and only if \prbtwosat has a solution.
The solution for \prbbudget can be obtained from the \prbtwosat solution
by taking the time intervals that contain all boolean variables set to true.
Since \prbtwosat is a polynomially-time solvable problem~\cite{aspvall1982linear}, we have the following.
\begin{proposition}
\prbbudget can be solved in a polynomial time.
\end{proposition}
Solving \prbtwosat can be done in linear-time with respect to the number of clauses~\cite{aspvall1982linear}.
However, in our case we may have $\bigO{m^2}$ clauses.
Fortunately, the \prbtwosat instances created with our mapping
have enough structure to be solvable in $\bigO{m}$ time.
This speed-up is described in the remainder of the section.
Let us first review the algorithm by~\citet{aspvall1982linear} for solving \prbtwosat.
The algorithm starts with constructing an \emph{implication graph} $H = (W, A)$.
The graph $H$ is {\rm directed} and its vertex set $W = P \cup Q$
has a vertex $p_{i}$ in $P$ and a vertex $q_{i}$ in $Q$ for each boolean variable $x_{i}$.
Then, for each clause $(x_i \lor x_j)$, there are two edges in $A$:
$(q_i \rightarrow p_j)$ and $(q_j \rightarrow p_i)$; The negations are handled similarly.
In our case, the edges $A$ are divided to two groups $A_1$ and $A_2$.
The set $A_1$ contains
two directed edges $(q_{vt} \rightarrow p_{ut})$ and $(q_{ut} \rightarrow p_{vt})$
for each edge $e = (u, v, t) \in E$.
The set $A_2$ contains
two directed edges $(p_{vt} \rightarrow q_{vs})$ and $(p_{vs} \rightarrow q_{vt})$
for each vertex $v$ and each pair of time stamps $s, t \in \tst{v}$
such that $\abs{s - t} > b_v$.
Note that $A_1$ goes from $Q$ to $P$ and $A_2$ goes from $P$ to $Q$.
Moreover, $\abs{A_1} \in \bigO{m}$ and $\abs{A_2} \in \bigO{m^2}$.
Next, we decompose $H$ in strongly connected components (SCC),
and order them topologically. % children first.
If any strongly connected component contains both $p_{vt}$ and $q_{vt}$,
then we know that \prbtwosat is not solvable.
Otherwise, to obtain the solution, we start enumerate over the components,
children first: if the boolean variables corresponding to the vertices
in the component do not have truth assignment,\!\footnote{Due to the property of
implication graph, either all or none variables will be set in the component.}
then we set $x_{vt}$ to be true if $p_{vt}$ is in the component,
and $x_{vt}$ to be false if $q_{vt}$ is in the component
The bottleneck of this method is the SCC decomposition, which requires $\bigO{\abs{W} + \abs{A}}$
time, and the remaining steps can be done in $\bigO{\abs{W}}$ time. Since $\abs{W} \in \bigO{m}$,
we need to optimize the SCC decomposition to perform in $\bigO{m}$ time.
We will use the algorithm by Kosajaru (see~\citep{hopcroft1983data}) for the SCC decomposition. This algorithm
consists of two depth-first searches, performing constant-time operations on each visited node.
Thus, we need to only optimize the DFS.
To speed-up the DFS, we need to design an oracle
such that given a vertex $p \in P$ it will return an \emph{unvisited} neighboring vertex $q \in Q$ in \emph{constant} time.
Since $\abs{Q} \in \bigO{m}$, this guarantees that DFS spends at most $\bigO{m}$ time processing vertices $p \in P$.
On the other hand, if we are at $q \in Q$, then we can use the standard DFS to find the neighboring vertex $p \in P$.
Since $\abs{A_1} \in \bigO{m}$, this guarantees that DFS spends at most $\bigO{m}$ time processing vertices $q \in Q$.
Next, we describe the oracle: first we keep the unvisited vertices $Q$ in lists
$\ell[v] = (q_{vt} \in Q; q_{vt} \text{ is not visited} )$ sorted chronologically.
Assume that we are at $p_{vt} \in P$. We retrieve the first vertex in $\ell[v]$,
say $q_{vs}$, and compare if $\abs{s - t} > b_v$. If true, then $q_{vs}$ is a neighbor
of $p_{vt}$, so we return $q_{vs}$. Naturally, we delete $q_{vs}$ from $\ell[v]$ the moment
we visit $q_{vs}$. If $\abs{s - t} \leq b_v$, then test similarly the \emph{last}
vertex in $\ell[v]$, say $q_{vs'}$. If both $q_{vs'}$ and $q_{vs}$ are non-neighbors
of $p_{vt}$, then, since $\ell[v]$ is sorted chronologically, we can conclude that $\ell[v]$ does not have unvisited neighbors
of $p_{vt}$. Since $p_{vt}$ does not have any neighbors outside $\ell[v]$, we conclude
that $p_{vt}$ does not have any unvisited neighbors.
%This concludes the description of the oracle.
Using this oracle we can now perform DFS in $\bigO{m}$ time, which in turns
allows us to do the SCC decomposition in $\bigO{m}$ time, which then allows
us to solve \prbbudget in $\bigO{m}$ time.
\iffalse
\begin{algorithm}
%$\ell[v] \define -\infty$\;
%$r[v] \define \infty$\;
$O[v] \define $ edges adjacent to $v$ in chronological order\;
$S \define$ empty stack\;
\lForEach {unvisited $(v, t)$} {
DFS$(v, t, S)$
}
$Q[v] \define \tst{v}$ sorted\;
$R \define$ empty stack\;
\lForEach {unvisited $(v, t) \in S$ in reverse visit order} {
Reverse$(v, t, R)$
}
$p_1[v] \define -\infty$;
$p_2[v] \define \infty$;
$i_1[v] \define \infty$;
$i_2[v] \define -\infty$\;
\ForEach {$(v, t) \in S$ in reverse visit order} {
\If {$p_1[v] \leq v \leq p_2[v]$} {
$p_1[v] \define \max (p_1[v], t - b_v)$\;
$p_2[v] \define \min (p_2[v], t + b_v)$\;
$i_1[v] \define \min (i_1[v], t)$\;
$i_2[v] \define \max (i_2[v], t)$\;
}
}
construct $\mathcal{S} = \set{S_v}$ with $S_v = [i_1[v], i_2[v]]$\;
\uIf {$\mathcal{S}$ covers all edges} {
\Return $\mathcal{S}$\;
}
\Else {
\Return \Null;
}
\end{algorithm}
\begin{algorithm}
\caption{DFS$(v, t, S)$}
push $(v, t)$ to stack $S$\;
mark $(v, t)$ as visited\;
\While{the timestamp of $top(O[v]) < t - b_v$} {
$e = (u, v, s) \define top(O[v])$; delete $e$ from $O[v]$\;
\lIf {$(u, s)$ is not visited} {
DFS$(u, s, S)$
}
%$s \in \tst{v}$, $\ell[v] < s < t - b_v$ in increasing order} {
%$\ell[v] \define \max(s, \ell[v]) $\;
%\lForEach {unvisited $(u, s)$ such that $(u, v, s) \in E$} {
%DFS$(u, s, S)$
%}
}
\While{the timestamp of $bottom(O[v]) > t + b_v$} {
$e = (u, v, s) \define bottom(O[v])$; delete $e$ from $O[v]$\;
\lIf {$(u, s)$ is not visited} {
DFS$(u, s, S)$
}
}
%\ForEach{$s \in \tst{v}$, $t + b_v < s < r[v]$ in decreasing order} {
%$r[v] \define \min(s, r[v]) $\;
%\lForEach {unvisited $(u, s)$ such that $(u, v, s) \in E$} {
%DFS$(u, s, S)$
%}
%}
\end{algorithm}
\begin{algorithm}
\caption{Reverse$(v, t, R)$}
push $(v, t)$ to stack $R$\;
mark $(v, t)$ as visited\;
\ForEach{$(u, v, t) \in E$} {
\While {$top(Q[v]) < t - b_u$} {
$s \define top(Q[v])$; delete $s$ from $O[v]$\;
\lIf {$(u, s)$ is not visited} {
Reverse$(u, s, S)$
}
}
\While {$bottom(Q[v]) > t + b_u$} {
$s \define bottom(Q[v])$; delete $s$ from $O[v]$\;
\lIf {$(u, s)$ is not visited} {
Reverse$(u, s, S)$
}
}
%\ForEach{$s \in T_u$, $t + b_u < s < r[u]$ in decreasing order} {
%$r[u] \define \min(s, r[u]) $\;
%\lIf {$(u, s)$ is not visited} {
%Reverse$(u, s, S)$
%}
%}
}
\end{algorithm}
\fi
\spara{Case study.}
Next we present our results on the \twitter dataset.
In Figure~\ref{fig:nov2013} we show a subset of hashtags from tweets posted in November 2013.
We also depict the activity intervals for those hashtags, as discovered by algorithm \algmaxgreedy.
Note that for not cluttering the image, we depict only a subset of all relevant hashtags.
In particular, we pick 3 ``seed'' hashtags: {\tt \#slush13}, {\tt \#mtvema} and
{\tt \#nokiaemg} and the set of hashtags that co-occur with the ``seeds.''
Each of the seeds corresponds to a known event:
{\tt \#slush13} corresponds to Slush'13 --
the world's leading startup and tech event, organized
in Helsinki in November 13-14, 2013.
{\tt \#mtvema} is dedicated to MTV Europe Music Awards, held on 10 November, 2013.
{\tt \#nokiaemg} is Extraordinary General Meeting (EGM) of Nokia Corporation, held
in Helsinki in November 19, 2013.
\iffalse
\begin{figure}[t]
\begin{center}
\includegraphics[width=\textwidth]{"figures/twitter/slush_nokia_mtv_g"}
\end{center}
\caption{Part of the output of \algmaxgreedy algorithm on Twitter dataset
for November'13. Intervals of activity of co-occurring tags, seeded from
hashtags {\tt \#slush13}, {\tt \#mtvema} and {\tt \#nokiaemg}. }
\label{fig:nov2013}
\end{figure}
\fi
For each hashtag we plot its entire interval with a light color,
and the discovered activity interval with a dark color.
For each selected hashtag,
we draw interactions (co-occurrence) with other selected hashtags using black vertical lines,
while we mark interactions with non-selected hashtags by ticks.
Figure~\ref{fig:nov2013} shows that the tag {\tt \#slush13} becomes active exactly
at the starting date of the event. During its activity this tag covers many technical
tags, e.g. {\tt \#zenrobotics} (Helsinki-based automation company), {\tt
\#younited} (personal cloud service by local company) and {\tt \#walkbase}
(local software company). Then on 19 November, the tag {\tt \#nokiaemg} becomes active:
this event is very narrow and covers mentions of Microsoft executive Stephen
Elop. Another large event is occurring around 10 November with active tags {\tt
\#emazing}, {\tt \#ema2013} and {\tt \#mtvema}. They cover {\tt \#bestpop},
{\tt \#bestvideo} and other related tags.
%\begin{figure}[t]
% \begin{center}
% \includegraphics[width=\textwidth]{"figures/twitter/slush13_graph"}
% \end{center}
% \caption{Part of the output of middle point algorithm on Twitter dataset for November'13. Domination graph of tags, seeded from tag slush13.}
% \label{fig:slush_graph}
%\end{figure}
\section{Conclusions}
\label{sec:conclusions}
In this paper we introduced and studied a new problem, which we called network untangling.
Given a set of temporal undirected interactions, our goal is to discover activity time intervals
for the network entities, so as to explain the observed interactions.
We consider two settings:
\prbsum, where we aim to minimize the total sum of activity-interval lengths, and \prbmax, where
we aim to minimize the maximum interval length. We show that the former problem is
\np-hard and we develop efficient iterative algorithms,
while the latter problem is
solvable in polynomial time.
There are several natural open questions: it is not known whether there is an
approximation algorithm for \prbsum or whether the problem is inapproximable.
Second, our model uses one activity interval for each entity.
A natural extension of the problem is to consider $k$ intervals per entity,
and/or different activity levels.
%% Note
\newcommand{\spara}[1]{{\smallskip\noindent{\bf {#1}}}}
\newcommand{\mpara}[1]{{\medskip\noindent{\bf {#1}}}}
%% Standard Tatti Stuff
\newcommand{\set}[1]{\left\{#1\right\}}
\newcommand{\pr}[1]{\left(#1\right)}
\newcommand{\fpr}[1]{\mathopen{}\left(#1\right)}
\newcommand{\spr}[1]{\left[#1\right]}
\newcommand{\fspr}[1]{\mathopen{}\left[#1\right]}
\newcommand{\brak}[1]{\left<#1\right>}
\newcommand{\abs}[1]{{\left|#1\right|}}
\newcommand{\norm}[1]{\left\|#1\right\|}
\newcommand{\enset}[2]{\left\{#1 ,\ldots , #2\right\}}
\newcommand{\enpr}[2]{\pr{#1 ,\ldots , #2}}
\newcommand{\enlst}[2]{{#1} ,\ldots , {#2}}
\newcommand{\vect}[1]{\spr{#1}}
\newcommand{\envec}[2]{\vect{#1 ,\ldots , #2}}
\newcommand{\real}{\mathbb{R}}
\newcommand{\np}{\textbf{NP}}
\newcommand{\poly}{\textbf{P}}
\newcommand{\apx}{\textbf{APX}}
\newcommand{\naturals}{\mathbb{N}}
\newcommand{\integers}{\mathbb{Z}}
\newcommand{\funcdef}[3]{{#1}:{#2} \to {#3}}
\newcommand{\define}{\leftarrow}
\newcommand{\reals}{{\mathbb{R}}}
\DeclareRobustCommand{\dispfunc}[2]{%
\ensuremath{%
\ifthenelse{\equal{#2}{}}%
{\mathit{#1}}%
{\mathit{#1}\fpr{#2}}}}
\newcommand{\bigO}[1]{\dispfunc{\mathcal{O}}{#1}}
\newcommand{\diam}[1]{\dispfunc{\Delta}{#1}}
\newcommand{\len}[1]{\dispfunc{\sigma}{#1}}
\newcommand{\spn}[1]{\dispfunc{S}{#1}}
\newcommand{\peri}[1]{\dispfunc{p}{#1}}
\newcommand{\adj}[1]{\dispfunc{h}{#1}}
\newcommand{\edges}[1]{\dispfunc{E}{#1}}
\newcommand{\nedges}[1]{\dispfunc{NE}{#1}}
\newcommand{\neigh}[1]{\dispfunc{N}{#1}}
\newcommand{\te}[1]{\dispfunc{t}{#1}}
\newcommand{\tst}[1]{\dispfunc{T}{#1}}
\newcommand{\aint}[1]{\ensuremath{I_{#1}}}
\newcommand{\sint}[1]{\ensuremath{s_{#1}}}
\newcommand{\eint}[1]{\ensuremath{e_{#1}}}
\newcommand{\tl}{\ensuremath{\mathcal{T}}}
\newcommand{\tlopt}{\ensuremath{\mathcal{T}^*}}
\newcommand{\budget}[1]{\ensuremath{b_{#1}}}
\newcommand{\prbsum}{\textsc{Min\-Time\-line}\xspace}
\newcommand{\prbmax}{\textsc{Min\-Time\-line$_\infty$}\xspace}
\newcommand{\prbbudget}{\textsc{Min\-Time\-line$_b$}\xspace}
\newcommand{\prbint}{\textsc{Min\-Time\-line$_m$}\xspace}
% \newcommand{\prbcover}{\textsc{Cover}\xspace}
% \newcommand{\prbinterior}{\textsc{Interior}\xspace}
% \newcommand{\prbbudget}{\textsc{Budget}\xspace}
\newcommand{\prbtwosat}{\textsc{2-SAT}\xspace}
\newcommand{\prbvertex}{\textsc{Vertex\-Cover}\xspace}
\newcommand{\algmaxgreedy}{{\tt Maximal}\xspace}
\newcommand{\alginterior}{{\tt Inner}\xspace}
\newcommand{\algbudget}{{\tt Budget}\xspace}
\newcommand{\fm}[1]{\mathcal{#1}}
\newcommand{\prob}[1]{p\pr{#1}}
\newcommand{\mean}[2]{\operatorname{E}_{#1}\fspr{#2}}
\newcommand{\efrac}[2]{\scriptscriptstyle\frac{#1}{#2}}
\newcommand{\choosetwo}[1]{{\ensuremath{{#1} \choose 2}}}
\newcommand{\degree}[1]{\dispfunc{\mathrm{deg}}{#1}}
\newcommand{\dtname}[1]{\textsl{#1}}
%\newtheorem{theorem}{Theorem}
%\newtheorem{lemma}[theorem]{Lemma}
%\newtheorem{proposition}[theorem]{Proposition}
%\newtheorem{corollary}[theorem]{Corollary}
%\newtheorem{definition}[theorem]{Definition}
%\newtheorem{example}[theorem]{Example}
%\newtheorem{problem}{Problem}
\newcommand{\dataset}[1]{\textsl{#1}\xspace}
\newcommand{\synth}{\dataset{Synthetic}}
\newcommand{\twitter}{\dataset{Twitter}}
%% PGF stuff
\SetKwComment{tcpas}{\{}{\}}
\SetCommentSty{textnormal}
\SetArgSty{textnormal}
\SetKw{False}{false}
\SetKw{True}{true}
\SetKw{Null}{null}
\SetKwInOut{Output}{output}
\SetKwInOut{Input}{input}
\SetKw{AND}{and}
\SetKw{OR}{or}
\SetKw{Break}{break}
\pgfdeclarelayer{background}
\pgfdeclarelayer{foreground}