Don Dini (AT&T, USA) opened his talk on Predictive
Analytics with a comment on how machine learning was the second best solution
to every problem. He made a self-contained pun on the word TERRIFYING being
equivalent to:

[‘T’, ‘EH1’, ‘R’, ‘AH0’, ‘F’, ‘AY2’, ‘IH0’, ‘NG’]

He continued with a generalization that is the general
answer was complicated, data science was generally the answer.

If you believe your servers are being attacked, perform
hypothesis testing using null distribution KDE fromed from the last month of
data. Then propose a model capable of addressing the inference problem.
Finally, utilize k-nearest neighbors linear regression support vector machines.

As an example, Mr. Dini gave us a simple “Predict the next
number” problem. To solve this, suppose

x_{i} ~N(mu, sigma^{2})

a.
Then µ - hat = (x_{1} + x_{2 }+ … +
x_{n}) / N

b.
Then the coder needs to evaluate how well the model
did. How sure is the model about what is recommended.

c.
Then determine computational confidence intervals

To perform computational confidence intervals,
“bootstrapping” was a helpful method. Consider having 3 similar data sets:

a.
x_{1}, x_{2, }…, x_{n} à r(x)

b.
x_{1}, x_{2, }…, x_{n} à r(x)

c.
x_{1}, x_{2, }…, x_{n} à r(x)

Ideally, if a coder had a perfect model, and the data the
coder had was perfect, the outcome should be the same for all models. Moreover,
the amount of data present was a direct (and non-removable) cause of
uncertainty in the prediction. In any prediction made, it was vitally important
to communicate uncertainty. For example, Don tried to show us how he answered
the question “What are the things that influence communication on social
media?”

For example, if thirty seconds had elapsed from a user’s
Tweet, how much longer until that same user would Tweet again? Don’s point was
that uncertainty, here, made a real prediction less meaningful.

Continuing on to the next point, Don highlighted the next
relevant problem “How do we know if two variables had anything to with each
other?”

Consider: Y = x^{2}

x: x_{1}, x_{2}, ..., x_{n}

y: y_{1}, y_{2,} …, y_{n}

In the above instance, covariance decreased as odd-degree
polynomials increased in degree. Conversely, the same result occurred for all
even degree polynomials.

The next topic Don shared with the Microsoft Hack Reactor
Office was Entropy (a principle that explained how decision trees worked).

X took on a series of values

x:
x_{1}, x_{2}, …, x_{k}

Pr(x_{1}),
Pr(x_{2}), …, Pr(x_{k})

H(x):
Pr(x_{1}) * log(Pr(x_{1}))

x also took on a minimum value of 0 when it was most
skewed:

[0, 0, …, 0, 1, 0, …0, 0]

x took on a maximum when it was least skewed:

[1/k, 1/k, … 1/k]

Don explained that we could sort variable by how much they
could cause x’s entropy to decrease. The goal was to find the variable that was
going to make him maximally certain. This feat required defining relative
entropy as:

H(X|Y) = Pr(y = y_{1}) * H(X|Y = y_{1}) +
Pr(y = y_{2})

If Don had a collection of variables: A, B, C, …, he could
sort them out by how much they caused entropy to drop:

a.
H(X) – H(X|A)

b.
H(X) – H(X|B)

c.
H(X) – H(X|C)

This brought Don to his next principle, “Mention Distance.”
Rather, the amount of seconds that had elapsed since someone had Tweeted. With
mention distance, Don was trying to answer the following question: “If I knew
how long it had been since someone in this front network had mentioned them,
did it influence if that person would respond?”

Don closed with a group activity to have us practice the
boostrapping method.