Commit eb37acb5 authored by Sachith Pai's avatar Sachith Pai 💻

fixed the distance skewness issue and added LDA words

parent 445f5bf6
......@@ -138,6 +138,8 @@ The abstract should briefly summarize the contents of the paper in
% BibTeX users should specify bibliography style 'splncs04'.
% References will then be sorted and formatted in the correct style.
%
\clearpage
\bibliographystyle{splncs04}
\bibliography{references_simple}
%
......
......@@ -7,7 +7,19 @@
\item In this section we compare for a experiment setup, how much random noise we can add to the userlevel features until it reaches a the ContIndModel.
\end{itemize}
\color{black}
We evaluate the performance of our model against the baselines on synthetic datasets generated. For our experiments we generate random networks of 1000 nodes with 3 edges on average. We draw the edge specific parameters $\alpha$ from a Uniform distribution in the range [0.1,1]. Cascades are generated for baselines using a simulation using randomly assigned user level features in range [-1,1].
We evaluate the performance of our model against the baselines on synthetic datasets generated. For our experiments we generate random networks of 1000 nodes with 3 edges on average. We draw the edge specific parameters $\alpha$ from a Uniform distribution in the range [0.1,1]. The user level features were drawn from the a Uniform distribution in range [-1,1]. Cascades are generated for baselines using model specific generative processes.
\subsection{Distance functions}
The choice of the distance function plays an important role in the process. For our experiments we use absolute distance to calculate the difference in opinion and scale it to $[1-\epsilon_{low},1+\epsilon_{high}]$ range. For our choice of synthetic datasets chose a skewed distribution where we set $\epsilon_{low} = \frac{1}{3}\epsilon$ and $\epsilon_{high} = \frac{2}{3} \epsilon$.
This choice follows an attempt for fair construction of the datasets to compare baselines (by picking the mean of the distribution shown in Fig \ref{distance skew problems} than earlier version of splitting in half). Since the absolute difference of two uniform random variables in range $[-1,1]$ follows a triangular distribution, the mean of the which is situated at $\frac{2}{3}$.
\begin{figure}[ht!]
\includegraphics[width=\columnwidth]{images/distanceSkew.jpg}
\caption{The pdf of |X-Y| for X,Y follows Uni(-1,1) is a triangular dist. By splitting in half the earlier simulations put TopicContInd at a disadvantage as the cascades generated would be of smaller size.}
\label{distance skew problems}
\end{figure}
......@@ -17,10 +29,17 @@ To observe the effects of adding topic awareness to information propagation we c
\begin{figure}[ht!]
\includegraphics[width=\columnwidth]{images/errors_cvxpy.png}
\caption{The accuracy of the models with as a function of cascade counts.\color{red}Inspecting the reason for such similarity \color{black} }
\caption{The accuracy of the models with as a function of cascade counts.\color{red}Inspecting the reason for such similarity. SEE FIG \ref{accuracyVsCasCount_improved}\color{black} }
\label{accuracyVsCasCount}
\end{figure}
\begin{figure}[ht!]
\includegraphics[width=\columnwidth]{images/errors_cvxpy_V2_V3_comb.png}
\caption{Accuracy rate vs Cascade count. Each data point is an avg over 40 different experiments}
\label{accuracyVsCasCount_improved}
\end{figure}
\subsection{Learning in case of noisy user level features}
User level features are hard to infer and are often subject to a lot of noise. As our model uses user level features to model the information diffusion, the accuracy of our model is also subject to the noise present in these inferred features.
\todo{Accuracy vs noise curve (of only topic aware model)}
......@@ -29,4 +48,39 @@ User level features are hard to infer and are often subject to a lot of noise. A
\color{blue}
\subsection{Exploring various Distance functions}
The distance function used to scale induce topic awareness is another design choice that could be made. We could explore the effects various distance functions have on the generated cascades, inference accuracy etc.
\color{black}
\ No newline at end of file
\color{black}
\section*{MemeTracker dataset}
The real world experiments were performed on the meme tracker dataset. The dataset is of the form (URL of page, timestamp, extracted text, URL links to related articles). The entries were grouped by the domain name of the URL which was used as a node. All domain names with less than 500 events and their corresponding events were dropped. The extracted topics using LDA of the processed text documents are shown in Table \ref{LDA_words}
\begin{center}
\begin{tabular}{ |c|c| }
\hline
Topic & KeyWords \\
0 & health , patient , studi , care , drug , medic , cancer , diseas , treatment , effect \\
1 & go , think , like , know , want , good , time , peopl , thing , come \\
2 & nicht , ein , dass , sind , auch , sich , werden , haben , all , wird \\
3 & state , peopl , obama , presid , govern , american , right , polit , nation , say \\
4 & year , market , econom , economi , money , financi , bank , need , time , compani \\
5 & user , inform , time , file , window , site , search , page , post , work \\
6 & jedi , pentru , care , est , blog , fost , sunt , jest , trebui , mult \\
7 & citi , water , black , island , night , food , hous , drink , park , white \\
8 & love , life , famili , live , friend , home , girl , beauti , person , heart \\
9 & pour , nou , dan , plu , tout , avec , mai , sont , ell , fait \\
10 & school , children , book , world , peopl , student , women , write , educ , kid \\
11 & kill , forc , death , know , world , power , come , attack , order , peopl \\
12 & busi , custom , servic , product , provid , develop , technolog , commun , compani , market \\
13 & report , comment , yang , http , news , inappropri , headlin , escap , submiss , saya \\
14 & david , internet , michael , angel , list , roll , newspap , secret , award , fett \\
15 & music , film , star , sound , movi , song , charact , rock , video , sith \\
16 & church , religion , muslim , christian , religi , islam , europ , hiss , steaua , cathol \\
17 & para , como , todo , pero , porqu , esta , est , tica , estado , sobr \\
18 & della , int , dell , sono , alla , come , till , anch , questo , perch \\
19 & niet , zijn , voor , maar , hebben , grand , viva , heeft , geen , meer \\
\hline
\end{tabular}
\label{LDA_words}
\end{center}
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment