Matlab Tasks

Your tasks

(1) We recommend that you use Matlab “Statistics and Machine Learning”

Toolbox or similar for calculation and visualization.

(2) You may use Excel but we do not believe it has the capability to address

all elements of this Assessment.

b. You may manually edit the graphics that you generate for Questions 2 and 3—

sometimes a precise layout or effect may be difficult to achieve directly from a

plotting package

2. Edit your responses into this document

a. Replace the text in blue italics with your responses

b. Associated data and original graphs are provided separately in an Excel

spreadsheet.

3. You will submit your responses to this assessment as a PDF or word document which

you will upload to Turnitin. Your document should include:

a. This template with responses for each question included in the corresponding

section.

b. Your original source code (ie. Matlab scripts) which you generated to complete

relevant questions, included either after each question or at the end of the

document as an appendix (copy and paste the code into the document, or if using

live scripts merge the document PDF with the exported live script PDF). Please

note your Matlab code will not be run except under exceptional circumstances but

may be used to evaluate the process that you used to answer the question.

Question 3: Recommending an Automatic Security System Provider

Your firm has a contract to provide an intelligent surveillance platform to detect events of

interest at a local airport. They have two different environments in which they want to use the

system, both of which have different requirements:

1. Outside the terminal around the passenger pick-up and drop-off areas, where false alarms

need to be minimized.

2. On the tarmac, where missed events need to be minimized.

For the first environment, the cost of a false alarm is 5 times that of a missed event. For the

second, the cost for a missed event is 8 times that of a false alarm.

Based on an initial study, your firm has narrowed the choice of security platform to those

provided by Alert’n’Alarmed Security, BigBrother Solutions, and Captive Industries.

You are to recommend:

 Which security product is best suited to each of the sites;

 If only one product could be purchased, which one provides the best trade-off between

the two use-cases (assume the performance of both sites is equally important).

You have been provided with evaluation results for each of the systems. The evaluation results

consist of likelihood scores and ground truth for each of the anomaly detection systems.

Using the data available in AlertAlarmed.csv, BigBrother.csv and Captive.csv, assess the

performance of each system under the operating conditions specified and recommend the

optimal system for each use case, and the single system that best meets the needs of both.

You should draw on the unit content presented regarding measuring performance and ROC

curves to answer this question, and should:

 Compute the decision cost for each scenario;

 Consider how to decide on an optimal overall system.

Note that recommendations should be based on more than visual inspection of graphs.

(Provide any code or visualizations you use to justify your response.)

Outside the terminal around the passenger pick-up and drop-off areas

Outside the terminal based on the results obtained from the ROC curve, it indicates that

Alert’n’Alarmed security is best suited for minimizing false alarms outside the terminal this is

because it provides best truth value and false value that justifies its claim and accuracy of

detecting events outside. The basic interpretation of a ROC curve is that the more it leans to the

left and to the top, the more accurate the test. This is similar to the area under curve where the

larger it is, the more accurate it is. The ROC curve also demonstrates that sensitivity is inversely

proportional to specificity.

On the tarmac

On the Tarmac, the best security platform suited for minimizing events is BigBrother this

is because it offers a higher percentage for its true positive value and a lower false positive value

to the same. It provides a reasonable conclusion and it is rather good from the curve standpoint.

The curve agrees to the ROC curve summary that the further the curve is from 45 degrees

heading towards 90 degrees, the more accurate the result obtained is.

Optimal Overall System

The best overall system is Alert’n’Alarmed security which performs better outside the

terminal and is below BigBrother on the Tarmac, this shows that its overall performance is best.

This result is also reflected in the Overall of the best system from the ROC curve. Its

interpretation is that at its maximum threshold is better among all other three surveillance

platforms.

CODE:

%Question 3

% read data from csv files

AlertAlarmed = readtable('AlertAlarmed.csv');

BigBrother = readtable('BigBrother.csv');

Captive= readtable('Captive.csv');

% plot Likelihood and GroundTruth of AlertAlarmed figure

plot(AlertAlarmed.Likelihood,AlertAlarmed.GroundTruth,'bo');

%% For all 3 surveillance platforms

[false_positives_1, true_positives_1, threshold_1] =

perfcurve(AlertAlarmed.GroundTruth, AlertAlarmed.Likelihood, 0);

[false_positives_2, true_positives_2, threshold_2] =

perfcurve(BigBrother.GroundTruth, BigBrother.Likelihood, 0);

[false_positives_3, true_positives_3, threshold_3] =

perfcurve(Captive.GroundTruth, Captive.Likelihood, 0);

%ploting all three security figure

plot(false_positives_1, true_positives_1,'LineWidth',2); hold on

plot(false_positives_2, true_positives_2,'LineWidth',2);

plot(false_positives_3, true_positives_3,'LineWidth',2);

title('The 3 Securities at Airport '),legend('Alert Alarmed','Big

Brother','Captive'),xlabel('False positive rate'),ylabel('True positive

rate');

%% Outside the terminal

[false_positives_1, true_positives_1, threshold_1] =

perfcurve(AlertAlarmed.GroundTruth, AlertAlarmed.Likelihood, 0);

[false_positives_2, true_positives_2, threshold_2] =

perfcurve(BigBrother.GroundTruth, BigBrother.Likelihood, 0);

[false_positives_3, true_positives_3, threshold_3] =

perfcurve(Captive.GroundTruth, Captive.Likelihood, 0);

%using perfcurve function to calculate ROC false_positive_cost=5;

[x1, y1, cost1, idx1] = perfcurve(false_positives_1, true_positives_1,1);

[x2, y2, cost2, idx2] = perfcurve(false_positives_2, true_positives_2,1);

[x3, y3, cost3, idx3] = perfcurve(false_positives_3, true_positives_3,1);

%%figure,clf

plot(false_positives_1, true_positives_1,'LineWidth',2); hold on

plot(false_positives_2, true_positives_2,'LineWidth',2); hold on

plot(false_positives_3, true_positives_3,'LineWidth',2);

plot(x1, y1,'gx','MarkerSize',15,'LineWidth',2);hold on

plot(x2, y2,'bx','MarkerSize',15,'LineWidth',2); hold on

plot(x3, y3,'rx','MarkerSize',15,'LineWidth',2);

title('Location-1: Outside Terminal'),legend('Alert Alarmed','Big

Brother','Captive','Alert Alarmed OOP', 'Big Brother OOP','Captive

OOP'),xlabel('False positive rate'),ylabel('True positive rate');

%% On the Tarmac

[false_positives_1, true_positives_1, threshold_1] =

perfcurve(AlertAlarmed.GroundTruth, AlertAlarmed.Likelihood, 0);

[false_positives_2, true_positives_2, threshold_2] =

perfcurve(BigBrother.GroundTruth, BigBrother.Likelihood, 0);

[false_positives_3, true_positives_3, threshold_3] =

perfcurve(Captive.GroundTruth, Captive.Likelihood, 0);

false_positive_cost=8;

%using perfcurve function to calculate ROC

[x1, y1, cost1, idx1] = perfcurve(false_positives_1, true_positives_1,1);

[x2, y2, cost2, idx2] = perfcurve(false_positives_2, true_positives_2,1);

[x3, y3, cost3, idx3] = perfcurve(false_positives_3, true_positives_3,1);

% ploting on Tarmac figure,clf

plot(false_positives_1, true_positives_1,'LineWidth',2); hold on

plot(false_positives_2, true_positives_2,'LineWidth',2); hold on

plot(false_positives_3, true_positives_3,'LineWidth',2); plot(x1,

y1,'gx','MarkerSize',15,'LineWidth',2);hold on

plot(x2, y2,'bx','MarkerSize',15,'LineWidth',2); hold on

plot(x3, y3,'rx','MarkerSize',15,'LineWidth',2); title('Location-2: Tarmac')

legend('Alert Alarmed','Big Brother','Captive','Alert Alarmed OOP', 'Big

Brother OOP','Captive OOP'),xlabel('False positive rate'),ylabel('True

positive rate');

Question 4: Activity Recognition from Wearable Sensors

You work for a biomedical engineering firm who are interested in the use of wearable sensors for

activity recognition for monitoring person behavior in an office environment. You have been

tasked with investigating how feasible it is to use a single chest mounted accelerometer to

recognize the type of activity being performed. You have been provided with data for 15 subjects

(see Q6-ActivityRecognition.zip), each of which performs seven activities:

1. Working at Computer

2. Standing Up, Walking and Going up/down stairs

3. Standing

4. Walking

5. Going Up/Down Stairs

6. Walking and Talking with Someone

7. Talking while Standing

The data consists of five columns that are (in order):

1. A sequential index

2. X acceleration

3. Y acceleration

4. Z acceleration

5. Activity type

Using this data, your supervisor has asked you to determine:

 How well can we recognize the activities of a single individual (i.e. train and test on the

same subject)? What activities are most likely to be mixed up?

 Do the models generalize well between subjects? Can train a model on data from multiple

subjects help improve performance on unseen subjects?

You should draw on the unit content regarding classification and visualization to answer this

question, and should consider:

 Are there any errors in the data, or any data series (columns) that have no benefit?

 Are there any other characteristics of the data that may impact the ability to learn a

model?

 Are trends consistent across multiple subjects?

(Provide any code or worksheets you use to justify your response.)

 How well can we recognize the activities of a single individual (i.e. train and test on the

same subject)? What activities are most likely to be mixed up?

The activities of a single individual are well distinguished from those of multiple

individuals. An individual’s activities are well defined and unique such that it is easy to

identify who performs in this manner.

The activities that are likely to mix up are the going up and down a stair, walking and

talking, and talking while standing. These activities can provide the same accelerometer

value because we everyone is likely to behave the same when going up and down a

staircase and talking while standing.

 Do the models generalize well between subjects? Can training a model on data from

multiple subjects help improve performance on unseen subjects?

Yes, the values between the models for similar subjects are well guided and therefore

they provide the well generalize values between the models. Training data can help

improve performance but to a measurable level because not all activities are dependent

on each other. Some of these activities can portray similar sensor values but not all.

 Are there any errors in the data, or any data series (columns) that have no benefit?

There are no errors in the data; each identifier can map the accelerometer value for each

type of activity.

 Are there any other characteristics of the data that may impact the ability to learn a

model?

Yes, the uniqueness of the data may affect a model’s learning ability such that the model

exponentially grows to some extent some data values are obsolete while others are not.

 Are trends consistent across multiple subjects?

Yes, this is because most models can be trained and tested across the same models.

#CODE

% Importing all the files in .csv format

number1=readtable('ActivityRecognition/1.csv');

number2=readtable('ActivityRecognition/2.csv');

number3=readtable('ActivityRecognition/3.csv');

number4=readtable('ActivityRecognition/4.csv');

number5=readtable('ActivityRecognition/5.csv');

number6=readtable('ActivityRecognition/6.csv');

number7=readtable('ActivityRecognition/7.csv');

number8=readtable('ActivityRecognition/8.csv');

number9=readtable('ActivityRecognition/9.csv');

number10=readtable('ActivityRecognition/10.csv');

number11=readtable('ActivityRecognition/11.csv');

number12=readtable('ActivityRecognition/12.csv');

number13=readtable('ActivityRecognition/13.csv');

number14=readtable('ActivityRecognition/14.csv');

number15=readtable('ActivityRecognition/15.csv');

%% training and testing the 1st subject

[M,N] = size(number1) ;

P = 0.70 ;

idx = randperm(M) ;

Training_1 = number1(idx(1:round(P*M)),:) ; Testing_1 =

number1(idx(round(P*M)+1:end),:) ;

% training and testing the fifth subject

[M5,N5] = size(number5) ;

idx = randperm(M5) ;

Training_5 = number5(idx(1:round(P*M5)),:) ;

Testing_5 = number5(idx(round(P*M5)+1:end),:) ;

% training and testing the 10th subject

[M10,N10] = size(number10) ;

idx = randperm(M10)

Question 5: Comparing Taxi and Uber Usage

Your firm is investigating the impact of ride sharing services such as Uber on transport. In

particular, they wish to see if the geographic distribution of ride sharing users is the same as taxi

users, or if different regions have a preference for different services.

You have been provided with one week’s data for New York City, split into three files:

 yellow_tripdata_2014_09_reduced.csv

 green_tripdata_2014_09_reduced.csv

 uber-raw-data-sep14_reduced.csv

Using this data, cluster the data based on pickup location (use 20 clusters) and investigate:

 Is the overall distribution of users the same between a taxi and uber users?

 Do any differences become more or less pronounced at different times of the day? In

particular consider the time windows:

o 7 am – 9 am

o 8 pm-midnight.

You should draw on the unit content regarding clustering and visualization to answer this

question, and should consider:

 Does the data need to be filtered in any way to remove noise and/or outliers?

 How should data be clustered such that the clustering results for different datasets can be

compared?

 What plots/visualizations are suitable for comparing distributions?

Students should refer to examples presented as part of the online content, workshop, and tutorials

for further ideas.

(Provide any code or worksheets you use to justify your response.)

Is the overall distribution of users the same between a taxi and uber users?

No, this is because we have fewer taxi users than uber users in longitudes (- 74±0.1,

40.7±0.2) than we have in the same region of uber users. The uber data is evenly scattered along

the latitudes and longitudes. The uber data is more concentrated in the location (- 74±0.1,

40.7±0.2) which shows evidence of more uber users than taxi users in the same region.

Do any differences become more or less pronounced at different times of the day?

7 am – 9 am

Yes, the difference becomes more announced at the time of day such that we have more people

using taxis than Ubers’.This is because more people tend to go to work during that time and taxis

can be relied on.

Does the data need to be filtered in any way to remove noise and/or outliers?

Yes, the data needs to be filtered to remove noise because comparing the filtered data to the

unfiltered data, such as the uber locations below and the uber-filtered data observed before, we

see the difference between filtered and unfiltered data. Filtered data conforms neatly to the

specifications required as compared to unfiltered data.

How should data be clustered such that the clustering results for different datasets can

be compared?

The data should be clustered in a density model because it isolates various density regions

based on the different densities available making the datasets easily comparable.

What plots/visualizations are suitable for comparing distributions?

Histograms, box plots, bar charts. Box plots provide a necessary distribution of the outliers

and the distribution of the variables.

CODE

%Question 5

clear

clc

%Importing data from csv files

yellow_tripdata = readtable('yellow_tripdata_2014_09_reduced.csv');

green_tripdata= readtable('green_tripdata_reduced.csv');

uber_data = readtable('uber-raw-data-sep14_reduced.csv');

%merging the green yellow data in one matrix

taxi_tripdata = [yellow_tripdata; green_tripdata];

% Filtered Uber Data

median_pickup_long_uber = median(uber_data.pickup_latitude);

mad_pickup_long_uber = mad(uber_data.pickup_latitude);

median_pickup_lat_uber = median(uber_data.pickup_longitude);

mad_pickup_lat_uber = mad(uber_data.pickup_longitude);

% outliers_uber = (abs(uber_raw_data_sep14_reduced.pickup_latitude -

median_pickup_long_uber)

outliers_uber=(abs(uber_data.pickup_longitude - median_pickup_lat_uber));

filtered_uber_data = uber_data(~outliers_uber, :);

outliers_uber=((uber_data.pickup_longitude >= -74.259090)

&(uber_data.pickup_longitude <= -73.700272)) & ((uber_data.pickup_latitude

>=40.477399));

filtered_uber_data = uber_data(outliers_uber, :);

%figure,clf

scatter(uber_data.pickup_longitude, uber_data.pickup_latitude);

title('Uber locations'),xlabel('longitude'), ylabel('latitude');

scatter(filtered_uber_data.pickup_longitude,

filtered_uber_data.pickup_latitude);

title('Uber Filtered data'),xlabel('longitude'),ylabel('latitude');

% Taxi Filtered data

median_pickup_long_taxi = median(taxi_tripdata.pickup_latitude);

mad_pickup_long_taxi = mad(taxi_tripdata.pickup_latitude);

median_pickup_lat_taxi = median(taxi_tripdata.pickup_longitude);

mad_pickup_lat_taxi = mad(taxi_tripdata.pickup_longitude);

outliers_taxi = (abs(taxi_tripdata.pickup_latitude -

median_pickup_long_taxi)) | (abs(taxi_tripdata.pickup_longitude -

median_pickup_lat_taxi) > mad_pickup_lat_taxi);

filtered_taxi_data = taxi_tripdata(~outliers_taxi, :); outliers_taxi =

((taxi_tripdata.pickup_longitude >= -

74.259090)&(taxi_tripdata.pickup_longitude <= -73.700272)) &

((taxi_tripdata.pickup_latitude>=40.477399) & (taxi_tripdata.pickup_latitude

<= 40.917577));

filtered_taxi_data = taxi_tripdata(outliers_taxi, :);

%figure,clf

scatter(taxi_tripdata.pickup_longitude, taxi_tripdata.pickup_latitude,

'mo','MarkerFaceColor','m','MarkerEdgeColor', 'k');

title('Taxi locations'),xlabel('longitude'),ylabel('latitude');

scatter(filtered_taxi_data.pickup_longitude,

filtered_taxi_data.pickup_latitude,'mo','MarkerFaceColor',

'm','MarkerEdgeColor', 'k');

title('Taxi filtered data'),xlabel('longitude'),ylabel('latitude');

% 20 clusters for taxi and uber data togather rng(1);

merged_all_filtered_data = [filtered_uber_data; filtered_taxi_data];

% clustering the data

[idx,

c]=kmeans([merged_all_filtered_data.pickup_longitude,merged_all_filtered_data

.pickup_latitude], 20);

x_axis_limits = xlim; y_axis_limits = ylim; figure,clf

set(gcf, 'position', [1, 1, 1500, 1500])

for n = 1:20

scatter(merged_all_filtered_data.pickup_longitude(idx == n,

:),merged_all_filtered_data.pickup_latitude(idx == n,:),

'ro','MarkerFaceColor', 'b','MarkerEdgeColor', 'k')

scatter(merged_all_filtered_data.pickup_longitude(idx == n, :),

merged_all_filtered_data.pickup_latitude(idx == n, :),

'ro','MarkerFaceColor', 'b','MarkerEdgeColor', 'k')

xlabel('Longitude'),ylabel('Latitude'),xlim(x_axis_limits),ylim(y_axis_limits

), title('20 clusters for Taxi and Uber');

end

%idx's for every sperated data

taxi_clusters = idx(1:size(filtered_taxi_data)); uber_clusters =

idx(size(filtered_taxi_data) + 1:end);

%figure,clf

set(gcf, 'Position', [1, 1, 1500, 1500]);

histogram(uber_clusters, 20), title('Uber'),xlabel('Number of

clusters'),xlim([0,20]);

histogram(taxi_clusters, 20),title('Taxi'),xlabel('Number of

clusters'),xlim([0,20]);

%comparing the differenct times

%Uber morning

uber_7am_9am = (filtered_uber_data.pickup_datetime.Hour >= 7) &

(filtered_uber_data.pickup_datetime.Hour <= 9);

uber_morning=filtered_uber_data(uber_7am_9am,:);

%Uber night

uber_8pm_midnight = (filtered_uber_data.pickup_datetime.Hour >= 20) &

(filtered_uber_data.pickup_datetime.Hour <= 24);

uber_evening=filtered_uber_data(uber_8pm_midnight,:);

%taxi morning

taxi_7am_9am = (filtered_taxi_data.pickup_datetime.Hour >= 7) &

(filtered_taxi_data.pickup_datetime.Hour <= 9);

taxi_morning=filtered_taxi_data(taxi_7am_9am,:);

%taxi night

taxi_8pm_midnight = (filtered_taxi_data.pickup_datetime.Hour >= 20) &

(filtered_taxi_data.pickup_datetime.Hour <= 24);

taxi_evening=filtered_taxi_data(taxi_8pm_midnight,:);

%Figure,clf

set(gcf, 'Position', [1, 1, 1500, 1500]);

histogram(uber_clusters(uber_7am_9am), 20),title('Uber 7am-

9am'),xlabel('Number of clusters'),xlim([0,20]);

histogram(taxi_clusters(taxi_7am_9am), 20),title('Taxi 7am-

9am'),xlabel('Number of clusters'),xlim([0,20]);

%Figure, 2

set(gcf, 'Position', [1, 1, 1500, 1500]);

histogram(uber_8pm_midnight),title('uber_8pm_midnight'),xlabel('Number of

clusters'),xlim([0,20]);

histogram(taxi_8pm_midnight),title('Taxi_8pm_midnight'),xlabel('Number of

clusters'),xlim([0,20]);

Place new order. It's free, fast and safe

Paper type

Deadline

Page count

550 words

Total price: $ 15.00

Our customers say

Jeff Curtis

USA, Student

"I'm fully satisfied with the essay I've just received. When I read it, I felt like it was exactly what I wanted to say, but couldn’t find the necessary words. Thank you!"

Ian McGregor

UK, Student

"I don’t know what I would do without your assistance! With your help, I met my deadline just in time and the work was very professional. I will be back in several days with another assignment!"

Shannon Williams

Canada, Student

"It was the perfect experience! I enjoyed working with my writer, he delivered my work on time and followed all the guidelines about the referencing and contents."

Place new order. It's free, fast and safe

Our customers say

Jeff Curtis

Ian McGregor

Shannon Williams