[
{
"content": "CS 224W Project Proposal\n",
"children": [
{
"content": "###Introduction\n"
},
{
"content": "###Survey"
},
{
"content": "###Reaction",
"children": [
{
"content": "\n\n\n\n\n\n\n\n\n\n###Facebook Location Paper\n Notes:\n * problems with rank \n * prediction is for house address not realtime. \n * someone who has recently moved/ on travel.\n * using max likelihood assumes independence of friendships ? what happened to triads ?? \n\n###Jure's Contagion paper: \nNotes and Thoughts:\n * modelling contagions as being conditionally independent, loosing some info that way\n - Instead why not have two sets of contagions; one is like a max heap where the ones that have had most influence over you exist and the other set models the recency effect where contagions you have interacted with most recently have a bigger weight. \n * a highly infectious URL can help a less infectious but highly correlated url go viral while it can suppress a slightly less infectious but unrelated URL. \n\n\n####Understanding the model they have used\n\n *Consider probability of a user adopting a piece of content based on what they have seen before\n * Estimate the prob of a user being infected by a contagion given the sequence of contagions she has been exposed to so far.\n * Can we make an improvement in the model by modeling the dependence of the contagions. We are saying X is conditioned on Y1 to Yk but a time step ago Y1 was X. Pre computing or some sort of sampling can help us find the joint probability distribution rather than assuming independence. This seems to be a standard hand waving that is going on\n *Contagion interaction function models the effect contagion uj has on our current contagion ui when ui was exposed to uj k exposures ago. \n *A way to model this will be in a k different WXW matrices where rows represent different possible ui's and columns uj. \n * So the best way to do this is to identify classes or clusters to which each contagion belongs and model the interactions between these classes. \n *A way to set up this class is via a Membership matrix where which is of size W X T where W is number of contagions and T is the number of clusters(classes) , here M(i,j) is the probability that contagion ui is a member of cluster cj. \n\nThis new matrix gives rise to a new interaction function. Instead of modelling the interaction between two contagions we are now modeling the `interaction between two clusters` scaled by the probablities.\n * Therefore now to find the interaction between any two clusters: For any contagions ui and uj we find the probability that ui belongs to any class/cluster t and for any such case we find the probability that uj belongs to any class/cluster s and inside a double summation we multiply the above two probabilities with the interaction function for these two clusters s and t. \n\u0001**Didn't understand how their clustering matrix is a low rank **\n\nQuestions:\n- How do they find the source(first person who posted a URL) [Start from time zero and work forwards ? BFS ? ] of the url ? \n- For example: if some one copies a URL instead of retweeting is he a source or an infected party: \n- An interesting thing would simply be to find the source of the contagion? Why has this not been done.\n- This could help in finding people who are trying to game the system.\n- Pretty sure that in the log likelihood we are trying to find the parameters such that we match the observations in D as closely as possible ? \n- `95% for training and only 5% for testing ?`isn't that kinda bad. You could overfit the model and since your test and cross-validation are so small it will likely not matter.\n- The paper claims that user bias made no difference and incorporating it made their prediction worse. This leads to the question: Is targeted ads useless ? or (more likely) did they model user bias in a wrong manner ? \n\n"
},
{
"content": "###Uncovering structure of nested/Overlapping Communities\nNotes:\n* Existing methods of finding communities work well if they are disjoint but in real life we have intersections\n* Each node `i` can belong to `m_i communities` \n* Further, for communities `A and B` the number of overlapping nodes is `S_AB`\n* Community degree: number of edges **alpha** that overlap with another community\n* size of the community is simply the number of nodes it has\n* To begin characterizing these communities we introduce CDF's `P(m)`, `P(S_ov)`, `P(deg_comm)` and `P(size_comm)`\n* Example **P(S_ov)** refers to the proportion of overlaps that are greater than the current one we are looking at right now. \n* *Basic Intuition:* Communites are made of several complete graphs. i:e several k-clique graphs who share a lot of nodes are a community \n* *More formal definition*: A k-clique community is one which refers to the nodes we can reach by starting of from a k-clique graph and moving to adjacent cliques. `Adjacent Cliques` are ones which share atleast k-1 nodes. \n* Here we model that the way we reach a community is important , we want deep ties to be exhibited. \n* Keep in mind that a single node can belong to multiple communities\n* Less strict norms for communities \n - reduce k (or equivalently say need not be a complete clique) \n - no cut links or cut nodes (that is cutting a edge or node ) will result in a disjoint community) \n - density of edges\n - `Allows overlaps`\n* Their algo used on unweighted undirected graph. \n* Reducing k is like reducing the resolution of the graph. \n* For weighted graph only follow edges above a certain threshold w* \n\n###Conclusion: \n\n* This paper gives an algorithm to find clusters of people in a way that represents the real world the most and allows significant overlaps. \n\n\n"
},
{
"content": "##MemeTracking and the Dynamics of the News Cycle\n\nNotes:\n* track and \"see\" how certain phrases travel relatively intact through the internet. \n* utility to us: find the mutated versions of similar hashtags and group them together. \n* Terminology:\n - **item** - blog post or news article\n - **phrase** - quote or text whose variants we want to track \n - **phrase graph**: nodes are phrases and edges point from one phrase to another variation of the same \n* We want to construct the phrase graph and partition it in such a way that the components of a partition are related to each other. \n* Edge from p to q such that p is strictly shorter than \n* **Dead end ** didn't find it to be very useful to our project\n"
},
{
"content": "###Romero's Paper\n* `Stickiness` and `Persistence` both involved in adoption of hashtag. **Stickiness** could be considered the prior probability of infection while **Persistence** is can be quantified by the marginal benefit that is achieved from repeated exposure to the same hashtag.\n* Build a network based on X using the `@twitter-user` name. This indicates X is paying attention to Y, something that is not obvious from the follows relationship.\n* For a given user `X` we call the set of nodes that `X` has an edge to (under the above specification) as the `neighbour set = N(X) ` of `X`\n* As members of *N(X)* start tweeting a particular hashTag *H* we start looking for when **X** will adopting the hashtag. We are looking for the effect of repeated exposure to a particular topic on X. \n"
},
{
"content": "### Patterns of Temporal Variation in Online Media\n\n"
}
]
},
{
"content": "###Project Proposal"
},
{
"content": "###References",
"children": [
{
"content": "###Experiments"
},
{
"content": "Papers Read:\n- paper1 \n- paper2 \n- paper3"
}
]
}
]
},
{
"content": "CS 229 Project Readings\n",
"children": [
{
"content": "###Evaluation and Analysis of Personalised searches.\n\nNotes:\n- Personalisation is not always useful\n- Long term and short term user history is relevant at different times. \n- Personalization works on queries which have large entropy\n- Also, click based methods work on repeated queries the best. \n- To effectively combine different rankings use Borda's ranking fusion method. (unable to get relavance score from MSN) \n- Click approach for repeated query, (about 1/3rd of queries are repeats) \n- user profile ; weighted vector on 67 pre-decided topics. \n- KNN for group based personalization\n\n"
}
]
}
]