Matlab tasks

Your tasks
(1) We recommend that you use Matlab “Statistics and Machine Learning”
Toolbox or similar for calculation and visualization.
(2) You may use Excel but we do not believe it has the capability to address
all elements of this Assessment.
b. You may manually edit the graphics that you generate for Questions 2 and 3
sometimes a precise layout or effect may be difficult to achieve directly from a
plotting package
2. Edit your responses into this document
a. Replace the text in blue italics with your responses
b. Associated data and original graphs are provided separately in an Excel
spreadsheet.
3. You will submit your responses to this assessment as a PDF or word document which
you will upload to Turnitin. Your document should include:
a. This template with responses for each question included in the corresponding
section.
b. Your original source code (ie. Matlab scripts) which you generated to complete
relevant questions, included either after each question or at the end of the
document as an appendix (copy and paste the code into the document, or if using
live scripts merge the document PDF with the exported live script PDF). Please
note your Matlab code will not be run except under exceptional circumstances but
may be used to evaluate the process that you used to answer the question.
Question 3: Recommending an Automatic Security System Provider
Your firm has a contract to provide an intelligent surveillance platform to detect events of
interest at a local airport. They have two different environments in which they want to use the
system, both of which have different requirements:
1. Outside the terminal around the passenger pick-up and drop-off areas, where false alarms
need to be minimized.
2. On the tarmac, where missed events need to be minimized.
For the first environment, the cost of a false alarm is 5 times that of a missed event. For the
second, the cost for a missed event is 8 times that of a false alarm.
Based on an initial study, your firm has narrowed the choice of security platform to those
provided by Alert’n’Alarmed Security, BigBrother Solutions, and Captive Industries.
You are to recommend:
Which security product is best suited to each of the sites;
If only one product could be purchased, which one provides the best trade-off between
the two use-cases (assume the performance of both sites is equally important).
You have been provided with evaluation results for each of the systems. The evaluation results
consist of likelihood scores and ground truth for each of the anomaly detection systems.
Using the data available in AlertAlarmed.csv, BigBrother.csv and Captive.csv, assess the
performance of each system under the operating conditions specified and recommend the
optimal system for each use case, and the single system that best meets the needs of both.
You should draw on the unit content presented regarding measuring performance and ROC
curves to answer this question, and should:
Compute the decision cost for each scenario;
Consider how to decide on an optimal overall system.
Note that recommendations should be based on more than visual inspection of graphs.
(Provide any code or visualizations you use to justify your response.)
Outside the terminal around the passenger pick-up and drop-off areas
Outside the terminal based on the results obtained from the ROC curve, it indicates that
Alert’n’Alarmed security is best suited for minimizing false alarms outside the terminal this is
because it provides best truth value and false value that justifies its claim and accuracy of
detecting events outside. The basic interpretation of a ROC curve is that the more it leans to the
left and to the top, the more accurate the test. This is similar to the area under curve where the
larger it is, the more accurate it is. The ROC curve also demonstrates that sensitivity is inversely
proportional to specificity.
On the tarmac
On the Tarmac, the best security platform suited for minimizing events is BigBrother this
is because it offers a higher percentage for its true positive value and a lower false positive value
to the same. It provides a reasonable conclusion and it is rather good from the curve standpoint.
The curve agrees to the ROC curve summary that the further the curve is from 45 degrees
heading towards 90 degrees, the more accurate the result obtained is.
Optimal Overall System
The best overall system is Alert’n’Alarmed security which performs better outside the
terminal and is below BigBrother on the Tarmac, this shows that its overall performance is best.
This result is also reflected in the Overall of the best system from the ROC curve. Its
interpretation is that at its maximum threshold is better among all other three surveillance
platforms.
CODE:
%Question 3
% read data from csv files
AlertAlarmed = readtable('AlertAlarmed.csv');
BigBrother = readtable('BigBrother.csv');
Captive= readtable('Captive.csv');
% plot Likelihood and GroundTruth of AlertAlarmed figure
plot(AlertAlarmed.Likelihood,AlertAlarmed.GroundTruth,'bo');
%% For all 3 surveillance platforms
[false_positives_1, true_positives_1, threshold_1] =
perfcurve(AlertAlarmed.GroundTruth, AlertAlarmed.Likelihood, 0);
[false_positives_2, true_positives_2, threshold_2] =
perfcurve(BigBrother.GroundTruth, BigBrother.Likelihood, 0);
[false_positives_3, true_positives_3, threshold_3] =
perfcurve(Captive.GroundTruth, Captive.Likelihood, 0);
%ploting all three security figure
plot(false_positives_1, true_positives_1,'LineWidth',2); hold on
plot(false_positives_2, true_positives_2,'LineWidth',2);
plot(false_positives_3, true_positives_3,'LineWidth',2);
title('The 3 Securities at Airport '),legend('Alert Alarmed','Big
Brother','Captive'),xlabel('False positive rate'),ylabel('True positive
rate');
%% Outside the terminal
[false_positives_1, true_positives_1, threshold_1] =
perfcurve(AlertAlarmed.GroundTruth, AlertAlarmed.Likelihood, 0);
[false_positives_2, true_positives_2, threshold_2] =
perfcurve(BigBrother.GroundTruth, BigBrother.Likelihood, 0);
[false_positives_3, true_positives_3, threshold_3] =
perfcurve(Captive.GroundTruth, Captive.Likelihood, 0);
%using perfcurve function to calculate ROC false_positive_cost=5;
[x1, y1, cost1, idx1] = perfcurve(false_positives_1, true_positives_1,1);
[x2, y2, cost2, idx2] = perfcurve(false_positives_2, true_positives_2,1);
[x3, y3, cost3, idx3] = perfcurve(false_positives_3, true_positives_3,1);
%%figure,clf
plot(false_positives_1, true_positives_1,'LineWidth',2); hold on
plot(false_positives_2, true_positives_2,'LineWidth',2); hold on
plot(false_positives_3, true_positives_3,'LineWidth',2);
plot(x1, y1,'gx','MarkerSize',15,'LineWidth',2);hold on
plot(x2, y2,'bx','MarkerSize',15,'LineWidth',2); hold on
plot(x3, y3,'rx','MarkerSize',15,'LineWidth',2);
title('Location-1: Outside Terminal'),legend('Alert Alarmed','Big
Brother','Captive','Alert Alarmed OOP', 'Big Brother OOP','Captive
OOP'),xlabel('False positive rate'),ylabel('True positive rate');
%% On the Tarmac
[false_positives_1, true_positives_1, threshold_1] =
perfcurve(AlertAlarmed.GroundTruth, AlertAlarmed.Likelihood, 0);
[false_positives_2, true_positives_2, threshold_2] =
perfcurve(BigBrother.GroundTruth, BigBrother.Likelihood, 0);
[false_positives_3, true_positives_3, threshold_3] =
perfcurve(Captive.GroundTruth, Captive.Likelihood, 0);
false_positive_cost=8;
%using perfcurve function to calculate ROC
[x1, y1, cost1, idx1] = perfcurve(false_positives_1, true_positives_1,1);
[x2, y2, cost2, idx2] = perfcurve(false_positives_2, true_positives_2,1);
[x3, y3, cost3, idx3] = perfcurve(false_positives_3, true_positives_3,1);
% ploting on Tarmac figure,clf
plot(false_positives_1, true_positives_1,'LineWidth',2); hold on
plot(false_positives_2, true_positives_2,'LineWidth',2); hold on
plot(false_positives_3, true_positives_3,'LineWidth',2); plot(x1,
y1,'gx','MarkerSize',15,'LineWidth',2);hold on
plot(x2, y2,'bx','MarkerSize',15,'LineWidth',2); hold on
plot(x3, y3,'rx','MarkerSize',15,'LineWidth',2); title('Location-2: Tarmac')
legend('Alert Alarmed','Big Brother','Captive','Alert Alarmed OOP', 'Big
Brother OOP','Captive OOP'),xlabel('False positive rate'),ylabel('True
positive rate');
Question 4: Activity Recognition from Wearable Sensors
You work for a biomedical engineering firm who are interested in the use of wearable sensors for
activity recognition for monitoring person behavior in an office environment. You have been
tasked with investigating how feasible it is to use a single chest mounted accelerometer to
recognize the type of activity being performed. You have been provided with data for 15 subjects
(see Q6-ActivityRecognition.zip), each of which performs seven activities:
1. Working at Computer
2. Standing Up, Walking and Going up/down stairs
3. Standing
4. Walking
5. Going Up/Down Stairs
6. Walking and Talking with Someone
7. Talking while Standing
The data consists of five columns that are (in order):
1. A sequential index
2. X acceleration
3. Y acceleration
4. Z acceleration
5. Activity type
Using this data, your supervisor has asked you to determine:
How well can we recognize the activities of a single individual (i.e. train and test on the
same subject)? What activities are most likely to be mixed up?
Do the models generalize well between subjects? Can train a model on data from multiple
subjects help improve performance on unseen subjects?
You should draw on the unit content regarding classification and visualization to answer this
question, and should consider:
Are there any errors in the data, or any data series (columns) that have no benefit?
Are there any other characteristics of the data that may impact the ability to learn a
model?
Are trends consistent across multiple subjects?
(Provide any code or worksheets you use to justify your response.)
How well can we recognize the activities of a single individual (i.e. train and test on the
same subject)? What activities are most likely to be mixed up?
The activities of a single individual are well distinguished from those of multiple
individuals. An individual’s activities are well defined and unique such that it is easy to
identify who performs in this manner.
The activities that are likely to mix up are the going up and down a stair, walking and
talking, and talking while standing. These activities can provide the same accelerometer
value because we everyone is likely to behave the same when going up and down a
staircase and talking while standing.
Do the models generalize well between subjects? Can training a model on data from
multiple subjects help improve performance on unseen subjects?
Yes, the values between the models for similar subjects are well guided and therefore
they provide the well generalize values between the models. Training data can help
improve performance but to a measurable level because not all activities are dependent
on each other. Some of these activities can portray similar sensor values but not all.
Are there any errors in the data, or any data series (columns) that have no benefit?
There are no errors in the data; each identifier can map the accelerometer value for each
type of activity.
Are there any other characteristics of the data that may impact the ability to learn a
model?
Yes, the uniqueness of the data may affect a model’s learning ability such that the model
exponentially grows to some extent some data values are obsolete while others are not.
Are trends consistent across multiple subjects?
Yes, this is because most models can be trained and tested across the same models.
#CODE
% Importing all the files in .csv format
number1=readtable('ActivityRecognition/1.csv');
number2=readtable('ActivityRecognition/2.csv');
number3=readtable('ActivityRecognition/3.csv');
number4=readtable('ActivityRecognition/4.csv');
number5=readtable('ActivityRecognition/5.csv');
number6=readtable('ActivityRecognition/6.csv');
number7=readtable('ActivityRecognition/7.csv');
number8=readtable('ActivityRecognition/8.csv');
number9=readtable('ActivityRecognition/9.csv');
number10=readtable('ActivityRecognition/10.csv');
number11=readtable('ActivityRecognition/11.csv');
number12=readtable('ActivityRecognition/12.csv');
number13=readtable('ActivityRecognition/13.csv');
number14=readtable('ActivityRecognition/14.csv');
number15=readtable('ActivityRecognition/15.csv');
%% training and testing the 1st subject
[M,N] = size(number1) ;
P = 0.70 ;
idx = randperm(M) ;
Training_1 = number1(idx(1:round(P*M)),:) ; Testing_1 =
number1(idx(round(P*M)+1:end),:) ;
% training and testing the fifth subject
[M5,N5] = size(number5) ;
idx = randperm(M5) ;
Training_5 = number5(idx(1:round(P*M5)),:) ;
Testing_5 = number5(idx(round(P*M5)+1:end),:) ;
% training and testing the 10th subject
[M10,N10] = size(number10) ;
idx = randperm(M10)
Question 5: Comparing Taxi and Uber Usage
Your firm is investigating the impact of ride sharing services such as Uber on transport. In
particular, they wish to see if the geographic distribution of ride sharing users is the same as taxi
users, or if different regions have a preference for different services.
You have been provided with one week’s data for New York City, split into three files:
yellow_tripdata_2014_09_reduced.csv
green_tripdata_2014_09_reduced.csv
uber-raw-data-sep14_reduced.csv
Using this data, cluster the data based on pickup location (use 20 clusters) and investigate:
Is the overall distribution of users the same between a taxi and uber users?
Do any differences become more or less pronounced at different times of the day? In
particular consider the time windows:
o 7 am 9 am
o 8 pm-midnight.
You should draw on the unit content regarding clustering and visualization to answer this
question, and should consider:
Does the data need to be filtered in any way to remove noise and/or outliers?
How should data be clustered such that the clustering results for different datasets can be
compared?
What plots/visualizations are suitable for comparing distributions?
Students should refer to examples presented as part of the online content, workshop, and tutorials
for further ideas.
(Provide any code or worksheets you use to justify your response.)
Is the overall distribution of users the same between a taxi and uber users?
No, this is because we have fewer taxi users than uber users in longitudes (- 74±0.1,
40.7±0.2) than we have in the same region of uber users. The uber data is evenly scattered along
the latitudes and longitudes. The uber data is more concentrated in the location (- 74±0.1,
40.7±0.2) which shows evidence of more uber users than taxi users in the same region.
Do any differences become more or less pronounced at different times of the day?
7 am 9 am
Yes, the difference becomes more announced at the time of day such that we have more people
using taxis than Ubers’.This is because more people tend to go to work during that time and taxis
can be relied on.
Does the data need to be filtered in any way to remove noise and/or outliers?
Yes, the data needs to be filtered to remove noise because comparing the filtered data to the
unfiltered data, such as the uber locations below and the uber-filtered data observed before, we
see the difference between filtered and unfiltered data. Filtered data conforms neatly to the
specifications required as compared to unfiltered data.
How should data be clustered such that the clustering results for different datasets can
be compared?
The data should be clustered in a density model because it isolates various density regions
based on the different densities available making the datasets easily comparable.
What plots/visualizations are suitable for comparing distributions?
Histograms, box plots, bar charts. Box plots provide a necessary distribution of the outliers
and the distribution of the variables.
CODE
%Question 5
clear
clc
%Importing data from csv files
yellow_tripdata = readtable('yellow_tripdata_2014_09_reduced.csv');
green_tripdata= readtable('green_tripdata_reduced.csv');
uber_data = readtable('uber-raw-data-sep14_reduced.csv');
%merging the green yellow data in one matrix
taxi_tripdata = [yellow_tripdata; green_tripdata];
% Filtered Uber Data
median_pickup_long_uber = median(uber_data.pickup_latitude);
mad_pickup_long_uber = mad(uber_data.pickup_latitude);
median_pickup_lat_uber = median(uber_data.pickup_longitude);
mad_pickup_lat_uber = mad(uber_data.pickup_longitude);
% outliers_uber = (abs(uber_raw_data_sep14_reduced.pickup_latitude -
median_pickup_long_uber)
outliers_uber=(abs(uber_data.pickup_longitude - median_pickup_lat_uber));
filtered_uber_data = uber_data(~outliers_uber, :);
outliers_uber=((uber_data.pickup_longitude >= -74.259090)
&(uber_data.pickup_longitude <= -73.700272)) & ((uber_data.pickup_latitude
>=40.477399));
filtered_uber_data = uber_data(outliers_uber, :);
%figure,clf
scatter(uber_data.pickup_longitude, uber_data.pickup_latitude);
title('Uber locations'),xlabel('longitude'), ylabel('latitude');
scatter(filtered_uber_data.pickup_longitude,
filtered_uber_data.pickup_latitude);
title('Uber Filtered data'),xlabel('longitude'),ylabel('latitude');
% Taxi Filtered data
median_pickup_long_taxi = median(taxi_tripdata.pickup_latitude);
mad_pickup_long_taxi = mad(taxi_tripdata.pickup_latitude);
median_pickup_lat_taxi = median(taxi_tripdata.pickup_longitude);
mad_pickup_lat_taxi = mad(taxi_tripdata.pickup_longitude);
%
outliers_taxi = (abs(taxi_tripdata.pickup_latitude -
median_pickup_long_taxi)) | (abs(taxi_tripdata.pickup_longitude -
median_pickup_lat_taxi) > mad_pickup_lat_taxi);
%
filtered_taxi_data = taxi_tripdata(~outliers_taxi, :); outliers_taxi =
((taxi_tripdata.pickup_longitude >= -
74.259090)&(taxi_tripdata.pickup_longitude <= -73.700272)) &
((taxi_tripdata.pickup_latitude>=40.477399) & (taxi_tripdata.pickup_latitude
<= 40.917577));
filtered_taxi_data = taxi_tripdata(outliers_taxi, :);
%figure,clf
scatter(taxi_tripdata.pickup_longitude, taxi_tripdata.pickup_latitude,
'mo','MarkerFaceColor','m','MarkerEdgeColor', 'k');
title('Taxi locations'),xlabel('longitude'),ylabel('latitude');
scatter(filtered_taxi_data.pickup_longitude,
filtered_taxi_data.pickup_latitude,'mo','MarkerFaceColor',
'm','MarkerEdgeColor', 'k');
title('Taxi filtered data'),xlabel('longitude'),ylabel('latitude');
% 20 clusters for taxi and uber data togather rng(1);
merged_all_filtered_data = [filtered_uber_data; filtered_taxi_data];
% clustering the data
[idx,
c]=kmeans([merged_all_filtered_data.pickup_longitude,merged_all_filtered_data
.pickup_latitude], 20);
x_axis_limits = xlim; y_axis_limits = ylim; figure,clf
set(gcf, 'position', [1, 1, 1500, 1500])
for n = 1:20
scatter(merged_all_filtered_data.pickup_longitude(idx == n,
:),merged_all_filtered_data.pickup_latitude(idx == n,:),
'ro','MarkerFaceColor', 'b','MarkerEdgeColor', 'k')
scatter(merged_all_filtered_data.pickup_longitude(idx == n, :),
merged_all_filtered_data.pickup_latitude(idx == n, :),
'ro','MarkerFaceColor', 'b','MarkerEdgeColor', 'k')
xlabel('Longitude'),ylabel('Latitude'),xlim(x_axis_limits),ylim(y_axis_limits
), title('20 clusters for Taxi and Uber');
end
%idx's for every sperated data
taxi_clusters = idx(1:size(filtered_taxi_data)); uber_clusters =
idx(size(filtered_taxi_data) + 1:end);
%figure,clf
set(gcf, 'Position', [1, 1, 1500, 1500]);
histogram(uber_clusters, 20), title('Uber'),xlabel('Number of
clusters'),xlim([0,20]);
histogram(taxi_clusters, 20),title('Taxi'),xlabel('Number of
clusters'),xlim([0,20]);
%comparing the differenct times
%Uber morning
uber_7am_9am = (filtered_uber_data.pickup_datetime.Hour >= 7) &
(filtered_uber_data.pickup_datetime.Hour <= 9);
uber_morning=filtered_uber_data(uber_7am_9am,:);
%Uber night
uber_8pm_midnight = (filtered_uber_data.pickup_datetime.Hour >= 20) &
(filtered_uber_data.pickup_datetime.Hour <= 24);
uber_evening=filtered_uber_data(uber_8pm_midnight,:);
%taxi morning
taxi_7am_9am = (filtered_taxi_data.pickup_datetime.Hour >= 7) &
(filtered_taxi_data.pickup_datetime.Hour <= 9);
taxi_morning=filtered_taxi_data(taxi_7am_9am,:);
%taxi night
taxi_8pm_midnight = (filtered_taxi_data.pickup_datetime.Hour >= 20) &
(filtered_taxi_data.pickup_datetime.Hour <= 24);
taxi_evening=filtered_taxi_data(taxi_8pm_midnight,:);
%Figure,clf
set(gcf, 'Position', [1, 1, 1500, 1500]);
histogram(uber_clusters(uber_7am_9am), 20),title('Uber 7am-
9am'),xlabel('Number of clusters'),xlim([0,20]);
histogram(taxi_clusters(taxi_7am_9am), 20),title('Taxi 7am-
9am'),xlabel('Number of clusters'),xlim([0,20]);
%Figure, 2
set(gcf, 'Position', [1, 1, 1500, 1500]);
histogram(uber_8pm_midnight),title('uber_8pm_midnight'),xlabel('Number of
clusters'),xlim([0,20]);
histogram(taxi_8pm_midnight),title('Taxi_8pm_midnight'),xlabel('Number of
clusters'),xlim([0,20]);

Place new order. It's free, fast and safe

-+
550 words

Our customers say

Customer Avatar
Jeff Curtis
USA, Student

"I'm fully satisfied with the essay I've just received. When I read it, I felt like it was exactly what I wanted to say, but couldn’t find the necessary words. Thank you!"

Customer Avatar
Ian McGregor
UK, Student

"I don’t know what I would do without your assistance! With your help, I met my deadline just in time and the work was very professional. I will be back in several days with another assignment!"

Customer Avatar
Shannon Williams
Canada, Student

"It was the perfect experience! I enjoyed working with my writer, he delivered my work on time and followed all the guidelines about the referencing and contents."

  • 5-paragraph Essay
  • Admission Essay
  • Annotated Bibliography
  • Argumentative Essay
  • Article Review
  • Assignment
  • Biography
  • Book/Movie Review
  • Business Plan
  • Case Study
  • Cause and Effect Essay
  • Classification Essay
  • Comparison Essay
  • Coursework
  • Creative Writing
  • Critical Thinking/Review
  • Deductive Essay
  • Definition Essay
  • Essay (Any Type)
  • Exploratory Essay
  • Expository Essay
  • Informal Essay
  • Literature Essay
  • Multiple Choice Question
  • Narrative Essay
  • Personal Essay
  • Persuasive Essay
  • Powerpoint Presentation
  • Reflective Writing
  • Research Essay
  • Response Essay
  • Scholarship Essay
  • Term Paper
We use cookies to provide you with the best possible experience. By using this website you are accepting the use of cookies mentioned in our Privacy Policy.