@@ -75,11 +75,11 @@ for the gradient $\nabla f(\mathbf{w}^{(k)})$ in terms of the current iterate $\

\section{Bayes' Classifier - I}

Consider a binary classification problem where the goal is classify or label a webcam snapshot into ``winter'' ($y=-1$) or ``summer'' ($y=1$) based on the feature vector

$\vx=(x_{\rm g},1)^{T}\in\mathbb{R}^{2}$ with the image greenness $x_{\rm g}$. We might interpret

$\vx=(x_{\rm g},x_{\rm r})^{T}\in\mathbb{R}^{2}$ with the image greenness $x_{\rm g}$ and redness $x_{\rm r}$. We might interpret

the feature vector and label as (realizations) of random variables, whose statistics is specified by a joint distribution $p(\vx,y)$. This joint distribution factors as $p(\vx,y)= p(\vx| y) p(y)$

with the conditional distribution $p(\vx| y)$ of the feature vector given the true label $y$ and the prior distribution $p(y)$ of the label values. The prior probability $p(y=1)$ is the fraction of overall

summer snapshots. Assume that we know the distributions $p(\vx| y)$ and $p(y)$ and we want to construct a classifier $h(\vx)$, which classifies a snapshot with feature vector $\vx$ as $\hat{y}=h(\vx)\in\{-1,1\}$.

Which classifier map $h(\cdot): \vx\mapsto\hat{y}=h(\vx)$, mapping the feature vector $\vx$ to a predicted label $\hat{y}$, yields the smallest error probability (which is $p( y \neq h(\vx))$) ?

Which classifier map $h(\cdot): \vx\mapsto\hat{y}=h(\vx)$, mapping the feature vector $\vx$ to a predicted label $\hat{y}$, yields the smallest error probability (which is $p( y \!\neq\! h(\vx))$)?

\noindent{\bf Answer.}

...

...

@@ -87,7 +87,7 @@ Which classifier map $h(\cdot): \vx \mapsto \hat{y}=h(\vx)$, mapping the feature

\section{Bayes' Classifier - II}

Reconsider the binary classification problem of Problem 3, where the goal is classify or label a webcam snapshot into ``winter'' ($y=-1$) or ``summer'' ($y=1$) based on the feature vector

$\vx=(x_{\rm g},1)^{T}\in\mathbb{R}^{2}$ with the image greenness $x_{\rm g}$. While in Problem 3 we assumed perfect knowledge

$\vx=(x_{\rm g},x_{\rm r})^{T}\in\mathbb{R}^{2}$ with the image greenness $x_{\rm g}$ and redness $x_{\rm r}$. While in Problem 3 we assumed perfect knowledge

of the joint distribution $p(\vx,y)$ of features $\vx$ and label $y$ (which are modelled as random variables), now we consider only knowledge of the prior probability $P(y=1)$, which we denote $P_{1}$.

A useful ``guess'' for the distribution of the features $\vx$, given the label $y$, is via a Gaussian distribution. Thus, we assume