Don Dini (AT&T, USA) opened his talk on Predictive Analytics with a comment on how machine learning was the second best solution to every problem. He made a self-contained pun on the word TERRIFYING being equivalent to:
[‘T’, ‘EH1’, ‘R’, ‘AH0’, ‘F’, ‘AY2’, ‘IH0’, ‘NG’]
He continued with a generalization that is the general answer was complicated, data science was generally the answer.
If you believe your servers are being attacked, perform hypothesis testing using null distribution KDE fromed from the last month of data. Then propose a model capable of addressing the inference problem. Finally, utilize k-nearest neighbors linear regression support vector machines.
As an example, Mr. Dini gave us a simple “Predict the next number” problem. To solve this, suppose
xi ~N(mu, sigma2)
a. Then µ - hat = (x1 + x2 + … + xn) / N
b. Then the coder needs to evaluate how well the model did. How sure is the model about what is recommended.
c. Then determine computational confidence intervals
To perform computational confidence intervals, “bootstrapping” was a helpful method. Consider having 3 similar data sets:
a. x1, x2, …, xn à r(x)
b. x1, x2, …, xn à r(x)
c. x1, x2, …, xn à r(x)
Ideally, if a coder had a perfect model, and the data the coder had was perfect, the outcome should be the same for all models. Moreover, the amount of data present was a direct (and non-removable) cause of uncertainty in the prediction. In any prediction made, it was vitally important to communicate uncertainty. For example, Don tried to show us how he answered the question “What are the things that influence communication on social media?”
For example, if thirty seconds had elapsed from a user’s Tweet, how much longer until that same user would Tweet again? Don’s point was that uncertainty, here, made a real prediction less meaningful.
Continuing on to the next point, Don highlighted the next relevant problem “How do we know if two variables had anything to with each other?”
Consider: Y = x2
x: x1, x2, ..., xn
y: y1, y2, …, yn
In the above instance, covariance decreased as odd-degree polynomials increased in degree. Conversely, the same result occurred for all even degree polynomials.
The next topic Don shared with the Microsoft Hack Reactor Office was Entropy (a principle that explained how decision trees worked).
X took on a series of values
x: x1, x2, …, xk
Pr(x1), Pr(x2), …, Pr(xk)
H(x): Pr(x1) * log(Pr(x1))
x also took on a minimum value of 0 when it was most skewed:
[0, 0, …, 0, 1, 0, …0, 0]
x took on a maximum when it was least skewed:
[1/k, 1/k, … 1/k]
Don explained that we could sort variable by how much they could cause x’s entropy to decrease. The goal was to find the variable that was going to make him maximally certain. This feat required defining relative entropy as:
H(X|Y) = Pr(y = y1) * H(X|Y = y1) + Pr(y = y2)
If Don had a collection of variables: A, B, C, …, he could sort them out by how much they caused entropy to drop:
a. H(X) – H(X|A)
b. H(X) – H(X|B)
c. H(X) – H(X|C)
This brought Don to his next principle, “Mention Distance.” Rather, the amount of seconds that had elapsed since someone had Tweeted. With mention distance, Don was trying to answer the following question: “If I knew how long it had been since someone in this front network had mentioned them, did it influence if that person would respond?”
Don closed with a group activity to have us practice the boostrapping method.