Assignment 2 (400 Points)
Note: Written problems must be typed and submitted as a Portable Document Format (PDF) file. All major word processors provide equation editing capabilities; therefore, all mathematics must also be typed. Here is a small list of equation editing software (there are many others):
-
Study the following Open Source Computer Vision (OpenCV) Machine Learning Library (MLL) code:
This code is an example of how to read a Comma-Separated Values (CSV) file into an OpenCV data structure, eliminate irrelevant columns, create a training and a testing data set, and then perform k-Nearest Neighbors classification using those data sets.
As you go through this file, ask yourself the following questions:
-
What is happening?
-
What do we want to happen?
-
Why is it in that order?
-
How does this relate to Evaluating and Choosing the Best Hypothesis (Section 18.4 on Page 708)?
Download the ZIP file containing the files below for testing: assignment_02_mldata.zip
Note: This is not an example of good code. This is a flat code file with no comments that exists solely as a way to submerge you into to the MLL documentation at OpenCV.
Note: The g++ version 4.7.2 and the OpenCV version 2.4.4 libraries only exist on linprog4.cs.fsu.edu
. You will need to make
this on linprog4.cs.fsu.edu
.
Reference: Machine Learning Library (MLL) from Open Source Computer Vision (OpenCV).
Reference: Breast Cancer Wisconsin (Diagnostic) Data Set from the University of California, Irvine (UCI) Machine Learning Repository.
-
For convenience only, Assignment 02 is provided in the following formats:
Note: These files are provided as is and may be incorrect. The authoritative version of Assignment 02 is this web page. If there is a difference between this web page and the files provided above, this web page is considered correct and should be preferred.
-
Complete the following written problems:
-
(10 Points) Given that a loaded coin has the following probability for coming up heads:
. What is the probability that the loaded coin will come up tails? In other words, what is
?
-
(20 Points) Given loaded coin tosses are independent events and that a different loaded coin has the following probability for coming up heads twice in a row:
. What is the probability that the loaded coin will come up tails twice in a row? In other words, what is
?
-
(40 Points) Given a fair coin with
and a loaded coin with
, if we pick a coin at random (i.e.
) and flip it, what is the probability that it is the loaded coin given that we observe heads? In other words, what is
?
-
(40 Points) Given the following Bayes Network:

With the following probabilities:

-
What is
?
-
What is
?
-
(130 Points) Using the data below, construct a Naïve Bayesian Network that does NOT use Laplacian Smoothing to predict that an Iris is an Iris versicolor based on if its sepals are long or wide.
Note: Sepals are the green petal-like objects surrounding a flower.
Long Sepals | Wide Sepals | Iris versicolor |
false | true | false |
false | false | false |
false | true | false |
false | true | false |
false | true | false |
false | true | false |
false | true | false |
false | true | false |
false | false | false |
false | true | false |
true | true | true |
true | true | true |
true | true | true |
true | false | true |
true | false | true |
true | false | true |
true | true | true |
false | false | true |
true | false | true |
false | false | true |
Do the following:
-
Draw the graph of the Naïve Bayesian Network.
-
Given the data above, answer the following questions:
-
What is the probability that an Iris is an Iris versicolor? In other words, what is
?
-
Given an Iris versicolor, what is the probability that its sepals are long? In other words, what is
?
-
Given an Iris versicolor, what is the probability that its sepals are wide? In other words, what is
?
-
Given an Iris versicolor, what is the probability that its sepals are both long and wide? In other words, what is
?
-
Given an Iris that is not an Iris versicolor, what is the probability that its sepals are both long and wide? In other words, what is
?
-
Given an Iris with both long and wide sepals, what is the probability that it's an Iris versicolor? In other words, what is
?
-
(60 Points) Using the data from the previous problem, construct a Naïve Bayesian Network that DOES use Laplacian Smoothing to predict that an Iris is an Iris versicolor based on if its sepals are long or wide.
For Laplacian Smoothing, use
.
Given the data above, answer the following questions:
-
What is the probability that an Iris is an Iris versicolor? In other words, what is
?
-
Given an Iris versicolor, what is the probability that its sepals are long? In other words, what is
?
-
Given an Iris versicolor, what is the probability that its sepals are wide? In other words, what is
?
-
Given an Iris versicolor, what is the probability that its sepals are both long and wide? In other words, what is
?
-
Given an Iris that is not an Iris versicolor, what is the probability that its sepals are both long and wide? In other words, what is
?
-
Given an Iris with both long and wide sepals, what is the probability that it's an Iris versicolor? In other words, what is
?
-
Complete the following programming problem on
linprog4.cs.fsu.edu
:
Download the ZIP file containing the directory structure and files for this programming problem: assignment_02.zip
-
(100 Points) Use the method of gradient descent to find the minimum of a function.
Given the function: 
With the following plots:




Use the method of gradient descent to find the
value that produces the minimum
when we start gradient descent from
and use the learning rate
.
The gradient descent update formula is:
; therefore, it is
for this problem.
Stop the gradient descent when either:
-
The difference in consecutive
values is less than
. In other words, when
.
-
The number of full gradient descent iterations exceeds 1024. In other words, don't do more than 1024 updates of gradient descent.
Use the following files:
-
print.hpp
: The file containing operator <<
to print arrays and std::vector
.
-
main.cpp
: The file for editing.
-
makefile
: The makefile for linprog4.cs.fsu.edu
.
Do not make changes to the makefile
. Only make changes to main.cpp
.
Use std::cout
to output information exactly in the following format:
1: ( 0, 0 )
2: ( 0.622222, -0.577778 )
3: ( 1.12395, -1.03605 )
.
.
.
Note: The ellipses above should not be included in your output. The ellipses represent the rest of your properly formatted output for this gradient descent problem.
-
After completing Assignment 02, create an
assignment_02_lastname.pdf
file for your written assignment and an assignment_02_lastname.zip
file for your programming assignment (where lastname
is your last name). Ensure that your assignment_02_lastname.zip
retains the directory structure of the original zip file. In other words, ensure your zip file has the following directory structure:
-
/
-
gradient_descent/
-
print.hpp
-
main.cpp
-
makefile
Upload both your assignment_02_lastname.pdf
file for your written assignment and your assignment_02_lastname.zip
file for your programming assignment to the Assignment 02 location on the BlackBoard site: https://campus.fsu.edu.
-
Note: Questions derived from the following sources: