• # Event History & Projections

The goal is to provide a simple & clear process such that:

1. Given the history of your system (number of visitors/signups/conversions each day), you can see a best guess of the conversion/cancellation rates and how they’ve changed over time.

2. Given hypothetical future conversion/cancellation rates, what will your system look like at some point $t^\prime$ in the future?
(this includes number of free/paid/cancelled users, and revenue totals and rates, etc.)

For the first question, we will use statistics. For the second, we will use simulations.

• ## Event History Analysis

It’s usually easy to get a listing of events (“user subscribed to plan A”, “user cancelled from plan C”). We wish to take this list of events, and produce a estimate for the transition rates between each “bucket” at any given time in the past or present.

• ## Discrete Event Simulation

This method, DSE, can be used to to simulate, the movement of people across an online business, or to simulate the changing concentration of chemicals in a reaction.

“Discrete” means the objects jump from one type to another, but can’t have values in between. For example, a customer is either a “free trial” customer or a “monthly subscription” customer, nothing in between.

The method works even when the number of objects (people/molecules/etc) is very small, and when their change from one type to another happens randomly over time.

In general, it can simulate any system whose “state” (a list of numbers that completely describe the system) can only change discretely (by whole numbers).

• ### Rough Visual of the Goal

Given event data, get a plot like the following for the rate of events as a function of time:

• ## Some possible avenues for this solution

nonparametric regression
smoothing estimators
Bayesian statistics of time series
Inhomogenous Poisson process. (or “time-varying Poisson process”)
spike rate estimation (requires multiple “runs” ?)

• Notes:

• We are interested in how these rates change over time (i.e. improvements lead to higher conversion, or introducing a bug suddenly increases cancellations).
• We probably need to select a time interval over which we assume the rate doesn’t change. For instance, the “visitor → free” conversion rate changes over the course of weeks or months, but we don’t expect it to change every minute.

• ### Bayesian Probability

Bayesian probability is a way of updating your best guess distribution of an uncertain result, when you come across new data. There are two main steps:

1. Choosing a starting guess for the distribution based on what you know about the possible values it could take.
2. When you obtain new data, use Bayes’ Theorem to update this distribution with your new best guess.
• ### Nonhomogeneous Poisson Process

So far, the procedure that seems most applicable, and most straightforward.

• #### Distribution

A probability distribution is a function that tells you how likely a given result is.

Here are three example distributions for the value $x$. In the blue distribution, $x$ can be anywhere, but is very likely to be at $0$ (the center peak). In the green one, $x$ could still be anywhere, but is likely to be $-2$.

{"cards":[{"_id":"41ca03f1a401a9c37100035c","treeId":"41c9e040a401a9c371000358","seq":1,"position":0.5,"parentId":null,"content":"# Event History & Projections\nThe goal is to provide a simple & clear process such that:\n\n1. Given the **history **of your system (number of visitors/signups/conversions each day), you can see a best guess of the **conversion/cancellation rates** and how they've **changed over time**.\n\n2. Given **hypothetical future **conversion/cancellation rates, what will your system look like at **some point $t^\\prime$ in the future**?\n<span class=\"grey\">(this includes number of free/paid/cancelled users, and revenue totals and rates, etc.)</span>\n\nFor the first question, we will use statistics. For the second, we will use simulations.\n\n<style>.grey { color: #aaa;}</style>"},{"_id":"41ca93f2a401a9c371000360","treeId":"41c9e040a401a9c371000358","seq":1,"position":0.03125,"parentId":"41ca03f1a401a9c37100035c","content":"## Event History Analysis\nIt's usually easy to get a listing of events (\"user subscribed to plan A\", \"user cancelled from plan C\"). We wish to take this list of events, and produce a estimate for the transition rates between each \"bucket\" at any given time in the past or present."},{"_id":"41ca2a49a401a9c37100035e","treeId":"41c9e040a401a9c371000358","seq":1,"position":0.375,"parentId":"41ca93f2a401a9c371000360","content":"### Rough Visual of the Goal\nGiven event data, get a plot like the following for the **rate of events** as a function of time:\n![](http://2.bp.blogspot.com/-q9iFL2ANIa0/UEEt_kZYrdI/AAAAAAAABFE/oJ7oKqX1qTQ/s1600/visually_weighted_fixed_ink_smoothed_spaghetti_CI.jpg)"},{"_id":"41d05447d771f3ec43000362","treeId":"41c9e040a401a9c371000358","seq":1,"position":0.75,"parentId":"41ca93f2a401a9c371000360","content":"## Some possible avenues for this solution\nnonparametric regression\n(locally adaptive) kernel density estimation\nsmoothing estimators\nBayesian statistics of time series\nInhomogenous Poisson process. (or \"time-varying Poisson process\")\nspike rate estimation (requires multiple \"runs\" ?)"},{"_id":"41d61c4d0e4878ccd5000363","treeId":"41c9e040a401a9c371000358","seq":1,"position":1,"parentId":"41d05447d771f3ec43000362","content":"### Bayesian Probability\nBayesian probability is a way of updating your best guess distribution of an uncertain result, when you come across new data. There are two main steps:\n1. Choosing a starting guess for the distribution based on what you know about the possible values it could take.\n2. When you obtain new data, use Bayes' Theorem to update this distribution with your new best guess."},{"_id":"41d6355f0e4878ccd5000364","treeId":"41c9e040a401a9c371000358","seq":1,"position":1,"parentId":"41d61c4d0e4878ccd5000363","content":"#### Distribution\nA probability distribution is a function that tells you how likely a given result is.\n![](http://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Normal_Distribution_PDF.svg/350px-Normal_Distribution_PDF.svg.png)\nHere are three example distributions for the value $x$. In the blue distribution, $x$ can be anywhere, but is very likely to be at $0$ (the center peak). In the green one, $x$ could still be anywhere, but is likely to be $-2$."},{"_id":"41d6a1320e4878ccd5000366","treeId":"41c9e040a401a9c371000358","seq":1,"position":2,"parentId":"41d05447d771f3ec43000362","content":"### Nonhomogeneous Poisson Process\nSo far, the procedure that seems most applicable, and most straightforward."},{"_id":"41caa1d9a401a9c371000361","treeId":"41c9e040a401a9c371000358","seq":1,"position":1.5,"parentId":"41ca93f2a401a9c371000360","content":"Notes:\n\n- We are interested in how these rates change over time (i.e. improvements lead to higher conversion, or introducing a bug suddenly increases cancellations).\n- We probably need to select a time interval over which we assume the rate *doesn't* change. For instance, the \"visitor &rarr; free\" conversion rate changes over the course of weeks or months, but we don't expect it to change every minute."},{"_id":"41ca1320a401a9c37100035d","treeId":"41c9e040a401a9c371000358","seq":1,"position":2,"parentId":"41ca93f2a401a9c371000360","content":"### References & People to contact\n- [Multilevel Discrete-time Event History Analysis](https://www.dropbox.com/s/vbnug6jlqfp6wev/Multilevel%20Discrete-time%20Event%20History%20Analysis.ppt)\nUniversity of Bristol, Fiona Steele\n- [Discrete-Time Fixed-Lag Smoothing Algorithms](http://cim.mcgill.ca/~haptic/pub/FS-VH-CSC-TCST-00.pdf)\nVincent hayward(at)isir.upmc.fr\nhttps://github.com/radarsat1/FOAW\n- [Estimating and Simulating Nonhomogeneous Poisson Processes](http://www.math.wm.edu/~leemis/icrsa03.pdf)\nLarry Leemis, William & Mary College\nleemis@math.wm.edu"},{"_id":"41c9e065a401a9c37100035a","treeId":"41c9e040a401a9c371000358","seq":1,"position":1,"parentId":"41ca03f1a401a9c37100035c","content":"## Discrete Event Simulation\nThis method, DSE, can be used to to simulate, the movement of people across an online business, or to simulate the changing concentration of chemicals in a reaction.\n\n\"Discrete\" means the objects jump from one type to another, but can't have values in between. For example, a customer is either a \"free trial\" customer or a \"monthly subscription\" customer, nothing in between.\n\nThe method works even when the number of objects (people/molecules/etc) is very small, and when their change from one type to another happens randomly over time.\n\nIn general, it can simulate any system whose \"state\" (a list of numbers that completely describe the system) can only change discretely (by whole numbers)."}],"tree":{"_id":"41c9e040a401a9c371000358","name":"Event History & Projections","publicUrl":"event-history-prediction"}}