Commit 52573324 authored by Jung Alex's avatar Jung Alex

Merge branch 'master' of version.aalto.fi:junga1/MLBP2017Public

parents b0cbe14b efc36997
......@@ -75,11 +75,11 @@ for the gradient $\nabla f(\mathbf{w}^{(k)})$ in terms of the current iterate $\
\section{Bayes' Classifier - I}
Consider a binary classification problem where the goal is classify or label a webcam snapshot into ``winter'' ($y=-1$) or ``summer'' ($y=1$) based on the feature vector
$\vx=(x_{\rm g},1)^{T} \in \mathbb{R}^{2}$ with the image greenness $x_{\rm g}$. We might interpret
$\vx=(x_{\rm g},x_{\rm r})^{T} \in \mathbb{R}^{2}$ with the image greenness $x_{\rm g}$ and redness $x_{\rm r}$. We might interpret
the feature vector and label as (realizations) of random variables, whose statistics is specified by a joint distribution $p(\vx,y)$. This joint distribution factors as $p(\vx,y) = p(\vx| y) p(y)$
with the conditional distribution $p(\vx| y)$ of the feature vector given the true label $y$ and the prior distribution $p(y)$ of the label values. The prior probability $p(y=1)$ is the fraction of overall
summer snapshots. Assume that we know the distributions $p(\vx| y)$ and $p(y)$ and we want to construct a classifier $h(\vx)$, which classifies a snapshot with feature vector $\vx$ as $\hat{y}=h(\vx) \in \{-1,1\}$.
Which classifier map $h(\cdot): \vx \mapsto \hat{y}=h(\vx)$, mapping the feature vector $\vx$ to a predicted label $\hat{y}$, yields the smallest error probability (which is $p( y \neq h(\vx))$) ?
Which classifier map $h(\cdot): \vx \mapsto \hat{y}=h(\vx)$, mapping the feature vector $\vx$ to a predicted label $\hat{y}$, yields the smallest error probability (which is $p( y \!\neq\! h(\vx))$)?
\noindent {\bf Answer.}
......@@ -87,7 +87,7 @@ Which classifier map $h(\cdot): \vx \mapsto \hat{y}=h(\vx)$, mapping the feature
\section{Bayes' Classifier - II}
Reconsider the binary classification problem of Problem 3, where the goal is classify or label a webcam snapshot into ``winter'' ($y=-1$) or ``summer'' ($y=1$) based on the feature vector
$\vx=(x_{\rm g},1)^{T} \in \mathbb{R}^{2}$ with the image greenness $x_{\rm g}$. While in Problem 3 we assumed perfect knowledge
$\vx=(x_{\rm g},x_{\rm r})^{T} \in \mathbb{R}^{2}$ with the image greenness $x_{\rm g}$ and redness $x_{\rm r}$. While in Problem 3 we assumed perfect knowledge
of the joint distribution $p(\vx,y)$ of features $\vx$ and label $y$ (which are modelled as random variables), now we consider only knowledge of the prior probability $P(y=1)$, which we denote $P_{1}$.
A useful ``guess'' for the distribution of the features $\vx$, given the label $y$, is via a Gaussian distribution. Thus, we assume
\begin{equation}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment