class: middle center hide-slide-number monash-bg-gray80 .info-box.w-50.bg-white[ These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See <a href=lecture-04A.pdf>here for the PDF <i class="fas fa-file-pdf"></i></a>. ] <br> .white[Press the **right arrow** to progress to the next slide!] --- class: title-slide count: false background-image: url("images/bg-01.png") # .monash-blue[ETC5521: Exploratory Data Analysis] <h1 class="monash-blue" style="font-size: 30pt!important;"></h1> <br> <h2 style="font-weight:900!important;">Working with a single variable, making transformations, detecting outliers, using robust statistics</h2> .bottom_abs.width100[ Lecturer: *Emi Tanaka* <i class="fas fa-envelope"></i> ETC5521.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 4 - Session 1 <br> ] --- class: transition middle # Continuous variables This lecture is partly based on Chapter 3 of Unwin (2015) Graphical Data Analysis with R --- # Possible features of a single continuous variable <table class=" lightable-classic" style='font-family: "Arial Narrow", "Source Sans Pro", sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> Feature </th> <th style="text-align:left;"> Example </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Asymmetry </td> <td style="text-align:left;"> <img src="images/week4A/plots-1.png" height="54px"> </td> <td style="text-align:left;"> The distribution is not symmetrical. </td> </tr> <tr> <td style="text-align:left;"> Outliers </td> <td style="text-align:left;"> <img src="images/week4A/plots-2.png" height="54px"> </td> <td style="text-align:left;"> Some observations are that are far from the rest. </td> </tr> <tr> <td style="text-align:left;"> Multimodality </td> <td style="text-align:left;"> <img src="images/week4A/plots-3.png" height="54px"> </td> <td style="text-align:left;"> There are more than one "peak" in the observations. </td> </tr> <tr> <td style="text-align:left;"> Gaps </td> <td style="text-align:left;"> <img src="images/week4A/plots-4.png" height="54px"> </td> <td style="text-align:left;"> Some continuous interval that are contained within the range but no observations exists. </td> </tr> <tr> <td style="text-align:left;"> Heaping </td> <td style="text-align:left;"> <img src="images/week4A/plots-5.png" height="54px"> </td> <td style="text-align:left;"> Some values occur unexpectedly often. </td> </tr> <tr> <td style="text-align:left;"> Discretized </td> <td style="text-align:left;"> <img src="images/week4A/plots-6.png" height="54px"> </td> <td style="text-align:left;"> Only certain values are found, e.g. due to rounding. </td> </tr> <tr> <td style="text-align:left;"> Implausible </td> <td style="text-align:left;"> <img src="images/week4A/plots-7.png" height="54px"> </td> <td style="text-align:left;"> Values outside of plausible or likely range. </td> </tr> </tbody> </table> --- # Numerical features of a single contiuous variables <img src="images/week4A/example-plot-1.png" width="432" style="display: block; margin: auto;" /> * A measure of .monash-blue[**_central tendency_**], e.g. mean, median and mode -- * A measure of .monash-blue[**_dispersion_**] (also called variability or spread), e.g. variance, standard deviation and interquartile range -- * There are other measures, e.g. .monash-blue[**_skewness_**] and .monash-blue[**_kurtosis_**] that measures "tailedness", but these are not as common as the measures of first two -- * The mean is also the _first moment_ and variance, skewness and kurtosis are _second, third, and fourth central moments_ -- **Significance tests** or **hypothesis tests** * Testing for `\(H_0: \mu = \mu_0\)` vs. `\(H_1: \mu \neq \mu_0\)` (often `\(\mu_0 = 0\)`) * The `\(t\)`-test is commonly used if the underlying data are believed to be normally distributed --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 1/8] .flex[ .w-70[ **Context** * There are 151 seats in the House of Representative for the 2019 Australian federal election * The major parties in Australia are: * the .monash-blue[**Coalition**], comprising of the: * **Liberal**, * **Liberal National** <span class="f6">(Qld)</span>, * **National**, and * **Country Liberal** <span class="f6">(NT)</span> parties, and * the Australian .monash-blue[**Labor**] party * The .green[**Greens**] party is a small but notable party ] .w-30.center[ <img src="https://upload.wikimedia.org/wikipedia/commons/3/39/Scott_Morrison_2014_%28cropped_2%29.jpg" class="w-50 ba" alt="Scott Morrison"> <img src="https://upload.wikimedia.org/wikipedia/commons/7/7d/Bill_Shorten-crop.jpg" class="w-50 ba" alt="Bill Shorten"> ] ] --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 2/8] .f5[<i class="fas fa-database"></i> https://results.aec.gov.au/24310/Website/Downloads/HouseFirstPrefsByCandidateByVoteTypeDownload-24310.csv]
Given this data, what questions would you ask?
02
:
00
.footnote.f5[ Data source: Australian Electoral Commission. (2019). Federal Elections (website), accessed August 2021. URL: https://results.aec.gov.au/ ] --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 3/8] .question-box[ What is the number of the seats won in the House of Representatives by parties? ] -- .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ <table class=" lightable-classic" style='font-size: 20px; font-family: "Arial Narrow", "Source Sans Pro", sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> Party </th> <th style="text-align:right;"> # of seats </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Coalition </td> <td style="text-align:right;"> 77 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;color: #C8C8C8 !important;" indentlevel="1"> Liberal </td> <td style="text-align:right;color: #C8C8C8 !important;"> 44 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;color: #C8C8C8 !important;" indentlevel="1"> Liberal National Party Of Queensland </td> <td style="text-align:right;color: #C8C8C8 !important;"> 23 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;color: #C8C8C8 !important;" indentlevel="1"> The Nationals </td> <td style="text-align:right;color: #C8C8C8 !important;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Australian Labor Party </td> <td style="text-align:right;"> 68 </td> </tr> <tr> <td style="text-align:left;"> The Greens </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Centre Alliance </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Katter's Australian Party (Kap) </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Independent </td> <td style="text-align:right;"> 3 </td> </tr> </tbody> </table> ] .w-50[ **What does this table tell you?** {{content}} ] ]] .panel[.panel-name[data] .scroll-sign[ .f5.s400[ ```r df1 <- read_csv(here::here("data/HouseFirstPrefsByCandidateByVoteTypeDownload-24310.csv"), skip = 1, col_types = cols( .default = col_character(), OrdinaryVotes = col_double(), AbsentVotes = col_double(), ProvisionalVotes = col_double(), PrePollVotes = col_double(), PostalVotes = col_double(), TotalVotes = col_double(), Swing = col_double())) ``` ```r skimr::skim(df1) ``` ``` ## ── Data Summary ──────────────────────── ## Values ## Name df1 ## Number of rows 1207 ## Number of columns 18 ## _______________________ ## Column type frequency: ## character 11 ## numeric 7 ## ________________________ ## Group variables None ## ## ── Variable type: character ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate min max empty n_unique whitespace ## 1 StateAb 0 1 2 3 0 8 0 ## 2 DivisionID 0 1 3 3 0 151 0 ## 3 DivisionNm 0 1 4 15 0 151 0 ## 4 CandidateID 0 1 3 5 0 1057 0 ## 5 Surname 0 1 2 18 0 890 0 ## 6 GivenNm 0 1 1 25 0 613 0 ## 7 BallotPosition 0 1 1 3 0 14 0 ## 8 Elected 0 1 1 1 0 2 0 ## 9 HistoricElected 0 1 1 1 0 2 0 ## 10 PartyAb 151 0.875 2 4 0 40 0 ## 11 PartyNm 2 0.998 5 61 0 45 0 ## ## ── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 OrdinaryVotes 0 1 10401. 12446. 167 1867 4317 14768. 54535 ▇▁▁▁▁ ## 2 AbsentVotes 0 1 511. 569. 13 117 246 711 3287 ▇▂▁▁▁ ## 3 ProvisionalVotes 0 1 41.4 51.7 0 8 20 56 444 ▇▁▁▁▁ ## 4 PrePollVotes 0 1 514. 607. 11 108. 211 761 5248 ▇▂▁▁▁ ## 5 PostalVotes 0 1 1033. 1476. 14 181 317 1216. 9837 ▇▁▁▁▁ ## 6 TotalVotes 0 1 12501. 14860. 250 2348 5196 18142 61202 ▇▁▁▁▁ ## 7 Swing 0 1 1.07 4.26 -28.1 -0.73 1.21 2.75 43.5 ▁▆▇▁▁ ``` ```r recode_party_names <- c("Australian Labor Party (Northern Territory) Branch" = "Australian Labor Party", "Labor" = "Australian Labor Party", "The Greens (Vic)" = "The Greens", "The Greens (Wa)" = "The Greens", "Katter's Australian Party (KAP)" = "Katter's Australian Party", "Country Liberals (Nt)" = "Country Liberals (NT)") ``` ```r tdf1 <- df1 %>% filter(Elected == "Y") %>% mutate(PartyNm = str_to_title(PartyNm), PartyNm = recode(PartyNm, !!!recode_party_names)) %>% count(PartyNm, sort = TRUE) %>% slice(2:4, 1, 8, 6, 7, 5) ``` ] ]] .panel[.panel-name[R] .f5[ <i class="fas fa-pencil-alt"></i> Note: `tidyverse` is expected to be loaded already. ```r data.frame(PartyNm = "Coalition", n = sum(tdf1$n[1:3])) %>% rbind(tdf1) %>% knitr::kable(col.names = c("Party", "# of seats")) %>% kableExtra::add_indent(2:4) %>% kableExtra::row_spec(2:4, color = "#C8C8C8") %>% kableExtra::kable_classic(full_width = FALSE, font_size = 20) ``` ]]] -- * The Coalition won the government * Labor and Coalition hold majority of the seats in the House of Representatives (lower house) * Parties such as The Greens, Centre Alliance and Katter's Australian Party (KAP) won _only_ a single seat {{content}} -- Only? {{content}} -- Wait... **Did the parties compete in all electoral districts?** --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 4/8] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[
] .w-50[ **What do you notice from this table?** {{content}} ] ]] .panel[.panel-name[data] .f5[ ```r tdf2 <- df1 %>% mutate(PartyNm = str_to_title(PartyNm), PartyNm = recode(PartyNm, !!!recode_party_names)) %>% count(PartyNm, sort = TRUE) ``` ]] .panel[.panel-name[R] .f5[ You can omit `table_options` and `toggle_select` or have a look at the source Rmd to find out what it is ```r tdf2 %>% DT::datatable(rownames = FALSE, escape = FALSE, width = "500px", options = table_options(scrollY = "400px", title = "Australian Federal Election 2019 - Party Distribution", csv = "aus-election-2019-party-dist"), elementId = "tab1B", colnames = c("Party", "# of electorates"), callback = toggle_select) ``` ]]] -- * The Greens are represented in every electoral districts * United Australia Party is the only other non-major party to be represented in every electoral district * KAP is represented in 7 electoral districts * Centre Alliance is only represented in 3 electoral districts! {{content}} -- Let's have a closer look at the Greens party... --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 5/8] .panelset[ .panel[.panel-name[📊] .flex[ .w-70[ <img src="images/week4A/aus-election-plot1-1.png" width="720" style="display: block; margin: auto;" /> ] .w-30[ **What does this graph tell you?** {{content}} ] ]] .panel[.panel-name[data] .scroll-sign[ .f5.s500[ ```r tdf3 <- df1 %>% group_by(DivisionID) %>% summarise(DivisionNm = unique(DivisionNm), State = unique(StateAb), votes_GRN = TotalVotes[which(PartyAb=="GRN")], votes_total = sum(TotalVotes)) %>% mutate(perc_GRN = votes_GRN / votes_total * 100) ``` ```r skimr::skim(tdf3) ``` ``` ## ── Data Summary ──────────────────────── ## Values ## Name tdf3 ## Number of rows 151 ## Number of columns 6 ## _______________________ ## Column type frequency: ## character 3 ## numeric 3 ## ________________________ ## Group variables None ## ## ── Variable type: character ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate min max empty n_unique whitespace ## 1 DivisionID 0 1 3 3 0 151 0 ## 2 DivisionNm 0 1 4 15 0 151 0 ## 3 State 0 1 2 3 0 8 0 ## ## ── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 votes_GRN 0 1 9821. 5581. 2744 6555 8676 11532. 45876 ▇▂▁▁▁ ## 2 votes_total 0 1 99925. 9801. 51009 96372. 100936 105588 116216 ▁▁▁▇▅ ## 3 perc_GRN 0 1 9.87 5.63 2.89 6.43 8.55 11.4 47.8 ▇▂▁▁▁ ``` ]]] .panel[.panel-name[R] .f5[ ```r tdf3 %>% ggplot(aes(perc_GRN)) + geom_histogram(color = "white", fill = "#00843D") + labs(x = "Percentage of first preference votes per division", y = "Count", title = "First preference votes for the Greens party") ``` ] ]] ??? * Australia uses full-preference instant-runoff voting in single member seats * Following the full allocation of preferences, it is possible to derive a two-party-preferred figure, where the votes have been allocated between the two main candidates in the election. * In Australia, this is usually between the candidates from the Coalition parties and the Australian Labor Party. -- <ul> <li>Majority of the country does not have first preference for the Greens</li> <li>Some constituents are slightly more supportive than the others</li> </ul> {{content}} -- **What further questions would you ask?**
02
:
00
--- # Formulating questions for EDA * Think .monash-blue[**broad (open-ended) questions**] that promotes discussion and divergent thinking -- * Polar questions (i.e. the answer is yes or no) are restrictive in exploring the data -- * For example, .flex[ .w-50[ .center[ <div class="question-box w-80 tl"> Is the outlying observation the electoral district that won the seat? </div> {{content}} ] ] .w-50[ <img src="images/week4A/aus-election-plot1-1.png" width="432" style="display: block; margin: auto;" /> ] ] -- <div class="center"> <div class="question-box w-80 tl"> What is characterising the distribution of the percentage of first preference votes for the Greens party? </div> </div> -- * What promotes a higher level of exploration? --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 6/8] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ <table class=" lightable-classic" style='font-family: "Arial Narrow", "Source Sans Pro", sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="empty-cells: hide;" colspan="1"></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #111111; margin-bottom: -1px; ">% of first preference for the Greens</div></th> <th style="empty-cells: hide;" colspan="2"></th> </tr> <tr> <th style="text-align:left;"> State </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Median </th> <th style="text-align:right;"> SD </th> <th style="text-align:right;"> IQR </th> <th style="text-align:right;"> Skewness </th> <th style="text-align:right;"> Kurtosis </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ACT </td> <td style="text-align:right;"> 16.406 </td> <td style="text-align:right;"> 13.988 </td> <td style="text-align:right;"> 5.602 </td> <td style="text-align:right;"> 5.196 </td> <td style="text-align:right;"> 0.645 </td> <td style="text-align:right;"> 1.500 </td> </tr> <tr> <td style="text-align:left;"> VIC </td> <td style="text-align:right;"> 11.400 </td> <td style="text-align:right;"> 8.570 </td> <td style="text-align:right;"> 8.210 </td> <td style="text-align:right;"> 6.717 </td> <td style="text-align:right;"> 2.603 </td> <td style="text-align:right;"> 11.360 </td> </tr> <tr> <td style="text-align:left;"> WA </td> <td style="text-align:right;"> 10.993 </td> <td style="text-align:right;"> 10.756 </td> <td style="text-align:right;"> 3.018 </td> <td style="text-align:right;"> 3.116 </td> <td style="text-align:right;"> 0.802 </td> <td style="text-align:right;"> 3.026 </td> </tr> <tr> <td style="text-align:left;"> QLD </td> <td style="text-align:right;"> 9.764 </td> <td style="text-align:right;"> 8.808 </td> <td style="text-align:right;"> 5.096 </td> <td style="text-align:right;"> 4.753 </td> <td style="text-align:right;"> 1.092 </td> <td style="text-align:right;"> 3.886 </td> </tr> <tr> <td style="text-align:left;"> TAS </td> <td style="text-align:right;"> 9.721 </td> <td style="text-align:right;"> 9.339 </td> <td style="text-align:right;"> 4.009 </td> <td style="text-align:right;"> 0.985 </td> <td style="text-align:right;"> 0.326 </td> <td style="text-align:right;"> 2.493 </td> </tr> <tr> <td style="text-align:left;"> NT </td> <td style="text-align:right;"> 9.572 </td> <td style="text-align:right;"> 9.572 </td> <td style="text-align:right;"> 2.473 </td> <td style="text-align:right;"> 1.748 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> 1.000 </td> </tr> <tr> <td style="text-align:left;"> SA </td> <td style="text-align:right;"> 9.120 </td> <td style="text-align:right;"> 8.903 </td> <td style="text-align:right;"> 3.024 </td> <td style="text-align:right;"> 3.412 </td> <td style="text-align:right;"> 0.384 </td> <td style="text-align:right;"> 2.920 </td> </tr> <tr> <td style="text-align:left;"> NSW </td> <td style="text-align:right;"> 8.101 </td> <td style="text-align:right;"> 6.635 </td> <td style="text-align:right;"> 4.087 </td> <td style="text-align:right;"> 3.948 </td> <td style="text-align:right;"> 1.502 </td> <td style="text-align:right;"> 4.859 </td> </tr> <tr> <td style="text-align:left;border-top: 2px solid black;"> National </td> <td style="text-align:right;border-top: 2px solid black;"> 9.874 </td> <td style="text-align:right;border-top: 2px solid black;"> 8.547 </td> <td style="text-align:right;border-top: 2px solid black;"> 5.632 </td> <td style="text-align:right;border-top: 2px solid black;"> 5.001 </td> <td style="text-align:right;border-top: 2px solid black;"> 2.671 </td> <td style="text-align:right;border-top: 2px solid black;"> 15.798 </td> </tr> </tbody> </table> ] .w-50.pl3[ {{content}} ]]] .panel[.panel-name[data] .f5[ ```r tdf3 <- df1 %>% group_by(DivisionID) %>% summarise(DivisionNm = unique(DivisionNm), State = unique(StateAb), votes_GRN = TotalVotes[which(PartyAb=="GRN")], votes_total = sum(TotalVotes)) %>% mutate(perc_GRN = votes_GRN / votes_total * 100) ``` ]] .panel[.panel-name[R] .f5[ ```r tdf3 %>% group_by(State) %>% summarise(mean = mean(perc_GRN), median = median(perc_GRN), sd = sd(perc_GRN), iqr = IQR(perc_GRN), skewness = moments::skewness(perc_GRN), kurtosis = moments::kurtosis(perc_GRN)) %>% arrange(desc(mean)) %>% rbind(data.frame(State = "National", mean = mean(tdf3$perc_GRN), median = median(tdf3$perc_GRN), sd = sd(tdf3$perc_GRN), iqr = IQR(tdf3$perc_GRN), skewness = moments::skewness(tdf3$perc_GRN), kurtosis = moments::kurtosis(tdf3$perc_GRN))) %>% knitr::kable(col.names = c("State", "Mean", "Median", "SD", "IQR", "Skewness", "Kurtosis"), digits = 3) %>% kableExtra::kable_classic() %>% kableExtra::add_header_above(c(" ", "% of first preference for the Greens" = 4, " " = 2)) %>% kableExtra::row_spec(9, extra_css = "border-top: 2px solid black;") ``` ]]] -- * Why are the means and the medians different? * How are the standard deviations and the interquartile ranges similar or different? * Are there some other numerical statistics we should show?
01
:
00
--- # Robust measure of central tendency .flex[ .w-40[ * <span style="color:#D81B60">**Mean**</span> is a non-robust measure of location. * <span style="color:#1E88E5">**Median**</span> is the 50% quantile of the observations * <span style="color:#FFC107">**Trimmed mean**</span> is the sample mean after discarding observations at the tails. * <span style="color:#004D40">**Winsorized mean**</span> is the sample mean after replacing observations at the tails with the minimum or maximum of the observations that remain. ] .w-60[ <img src='images/week4A/robust-mean-1.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-2.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-3.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-4.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-5.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-6.png' class='ba pl2' height ='150px'/> <table class=" lightable-classic" style='font-size: 12px; font-family: "Arial Narrow", "Source Sans Pro", sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:right;"> Plot </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Median </th> <th style="text-align:right;"> Trimmed Mean<sup>*</sup> </th> <th style="text-align:right;"> Winsorized Mean<sup>*</sup> </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;color: #D81B60 !important;"> 0.109 </td> <td style="text-align:right;color: #1E88E5 !important;"> 0.114 </td> <td style="text-align:right;color: #FFC107 !important;"> 0.120 </td> <td style="text-align:right;color: #004D40 !important;"> 0.103 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;color: #D81B60 !important;"> 0.054 </td> <td style="text-align:right;color: #1E88E5 !important;"> -0.045 </td> <td style="text-align:right;color: #FFC107 !important;"> -0.016 </td> <td style="text-align:right;color: #004D40 !important;"> -0.029 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;color: #D81B60 !important;"> 1.177 </td> <td style="text-align:right;color: #1E88E5 !important;"> 0.729 </td> <td style="text-align:right;color: #FFC107 !important;"> 0.820 </td> <td style="text-align:right;color: #004D40 !important;"> 0.888 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;color: #D81B60 !important;"> 0.523 </td> <td style="text-align:right;color: #1E88E5 !important;"> 0.552 </td> <td style="text-align:right;color: #FFC107 !important;"> 0.531 </td> <td style="text-align:right;color: #004D40 !important;"> 0.530 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;color: #D81B60 !important;"> 0.432 </td> <td style="text-align:right;color: #1E88E5 !important;"> 0.285 </td> <td style="text-align:right;color: #FFC107 !important;"> 0.337 </td> <td style="text-align:right;color: #004D40 !important;"> 0.361 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;color: #D81B60 !important;"> 2.972 </td> <td style="text-align:right;color: #1E88E5 !important;"> 2.477 </td> <td style="text-align:right;color: #FFC107 !important;"> 2.624 </td> <td style="text-align:right;color: #004D40 !important;"> 2.721 </td> </tr> </tbody> </table> .f5[ <sup>*</sup> Both trimmed and Winsorized mean trimmed 20% of the tails. ] ] ] --- # Robust measure of dispersion .flex[ .w-50[ * <span style="color:#648FFF">**Standard deviation**</span> or its square, **variance**, is a popular choice of measure of dispersion but is not robust to outliers * Standard deviation for sample `\(x_1, ..., x_n\)` is calculated as `$$\sqrt{\sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n - 1}}$$` * <span style="color:#785EF0">**Interquartile range**</span> is the difference between 1st and 3rd quartile and is a more robust measure of spread * <span style="color:#FE6100">**Median absolute deviance**</span> (MAD) is also more robust and defined as `$$\text{median}(|x_i - \text{median}(x_i)|)$$` ] .w-50.pl3[ <img src='images/week4A/robust-mean-1.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-2.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-3.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-4.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-5.png' class='ba pl2' height ='150px'/> <img src='images/week4A/robust-mean-6.png' class='ba pl2' height ='150px'/> <table class=" lightable-classic" style='font-size: 12px; font-family: "Arial Narrow", "Source Sans Pro", sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="empty-cells: hide;" colspan="1"></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #111111; margin-bottom: -1px; ">Measure of dispersion</div></th> <th style="empty-cells: hide;" colspan="2"></th> </tr> <tr> <th style="text-align:right;"> Plot </th> <th style="text-align:right;"> SD </th> <th style="text-align:right;"> IQR </th> <th style="text-align:right;"> MAD </th> <th style="text-align:right;"> Skewness </th> <th style="text-align:right;"> Kurtosis </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;color: #648FFF !important;"> 0.898 </td> <td style="text-align:right;color: #785EF0 !important;"> 1.186 </td> <td style="text-align:right;color: #FE6100 !important;"> 0.870 </td> <td style="text-align:right;"> -0.072 </td> <td style="text-align:right;"> 3.008 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;color: #648FFF !important;"> 0.986 </td> <td style="text-align:right;color: #785EF0 !important;"> 1.411 </td> <td style="text-align:right;color: #FE6100 !important;"> 1.077 </td> <td style="text-align:right;"> 0.358 </td> <td style="text-align:right;"> 2.212 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;color: #648FFF !important;"> 1.326 </td> <td style="text-align:right;color: #785EF0 !important;"> 1.176 </td> <td style="text-align:right;color: #FE6100 !important;"> 0.793 </td> <td style="text-align:right;"> 1.944 </td> <td style="text-align:right;"> 7.184 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;color: #648FFF !important;"> 0.288 </td> <td style="text-align:right;color: #785EF0 !important;"> 0.450 </td> <td style="text-align:right;color: #FE6100 !important;"> 0.335 </td> <td style="text-align:right;"> -0.126 </td> <td style="text-align:right;"> 1.837 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;color: #648FFF !important;"> 0.468 </td> <td style="text-align:right;color: #785EF0 !important;"> 0.499 </td> <td style="text-align:right;color: #FE6100 !important;"> 0.343 </td> <td style="text-align:right;"> 1.691 </td> <td style="text-align:right;"> 6.372 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;color: #648FFF !important;"> 2.784 </td> <td style="text-align:right;color: #785EF0 !important;"> 5.362 </td> <td style="text-align:right;color: #FE6100 !important;"> 2.984 </td> <td style="text-align:right;"> -0.351 </td> <td style="text-align:right;"> 1.678 </td> </tr> </tbody> </table> ]] --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 7/8] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ <img src="images/week4A/aus-election-plot2-1.png" width="432" style="display: block; margin: auto;" /> ] .w-50[ **We should plot the data!** * The width of the boxplot is proportional to the number of electoral districts in the corresponding state (which is roughly proportional to the population) **What do you notice from this graph?**
01
:
30
] ]] .panel[.panel-name[data] .f5[ ```r tdf3 <- df1 %>% group_by(DivisionID) %>% summarise(DivisionNm = unique(DivisionNm), State = unique(StateAb), votes_GRN = TotalVotes[which(PartyAb=="GRN")], votes_total = sum(TotalVotes)) %>% mutate(perc_GRN = votes_GRN / votes_total * 100) ``` ]] .panel[.panel-name[R] .f5[ ```r tdf3 %>% mutate(State = fct_reorder(State, perc_GRN)) %>% ggplot(aes(perc_GRN, State)) + geom_boxplot(varwidth = TRUE) + labs(x = "Percentage of first preference votes per division", y = "State", title = "First preference votes for the Greens party") ``` ]]] --- # Outliers .info-box.w-60[ **Outliers** are *observations* that are significantly different from the majority. ] <br> .flex[ .w-50[ * Outliers can _**occur by chance in almost all distributions**_, but could be indicative of: * a measurement error, * a different population, or * an issue with the sampling process. ] .w-50[ <img src="images/week4A/aus-election-plot2-1.png" width="432" style="display: block; margin: auto;" /> ] ] --- # Closer look at the _boxplot_ <img src="images/week4A/annotated-boxplot-1.png" width="432" style="display: block; margin: auto;" /> * Observations that are outside the range of lower to upper thresholds are referred at times as .monash-blue[outliers] * Plotting boxplots for data from a skewed distribution will almost always show these "outliers" but these are not necessary outliers * Some definitions of outliers assume a symmetrical population distribution (e.g. in boxplots or observations a certain standard deviations away from the mean) and these definitions are ill-suited for asymmetrical distributions -- .center[ **But are there some things we .red[*cannot*] see from boxplots?** ] --- # .orange[Case study] .circle.bg-orange.white[1] 2019 Australian Federal Election .f4[Part 8/8] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ <img src="images/week4A/aus-election-2019-plot3-1.png" width="432" style="display: block; margin: auto;" /> ] .w-50[ {{content}} ] ]] .panel[.panel-name[data] .f5[ ```r tdf3 <- df1 %>% group_by(DivisionID) %>% summarise(DivisionNm = unique(DivisionNm), State = unique(StateAb), votes_GRN = TotalVotes[which(PartyAb=="GRN")], votes_total = sum(TotalVotes)) %>% mutate(perc_GRN = votes_GRN / votes_total * 100) ``` ]] .panel[.panel-name[R] .f5[ ```r tdf3 %>% mutate(State = fct_reorder(State, perc_GRN)) %>% ggplot(aes(perc_GRN, State)) + ggbeeswarm::geom_quasirandom(groupOnX = FALSE, varwidth = TRUE) + labs(x = "Percentage of first preference votes per division", y = "State", title = "First preference votes for the Greens party") ``` ]]] -- **Now what do you notice from this graph that you didn't notice before?** {{content}} -- * There are only two electoral districts in NT! * And only 3 and 5 electoral districts in ACT and TAS, respectively! {{content}} -- * We have _not_ computed the number of electoral districts for each state so far! {{content}} -- <div class="info-box"> <i class="fas fa-book-reader"></i> Both numerical and graphical summaries can either <i>reveal</i> and/or <i>hide</i> aspects of the data </div> --- class: transition # Transformations --- # .orange[Case study] .bg-orange.circle.white[2] Melbourne Housing Prices .f4[Part 1/5] .flex[ .w-50[ <table class=" lightable-classic" style='font-size: 12px; font-family: "Arial Narrow", "Source Sans Pro", sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> Suburb </th> <th style="text-align:right;"> Rooms </th> <th style="text-align:left;"> Type </th> <th style="text-align:right;"> Price ($) </th> <th style="text-align:left;"> Date </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Abbotsford </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 1,490,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Abbotsford </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 1,220,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Abbotsford </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 1,420,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Aberfeldie </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 1,515,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Airport West </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 670,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Airport West </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Townhouse </td> <td style="text-align:right;"> 530,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Airport West </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Unit </td> <td style="text-align:right;"> 540,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Airport West </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 715,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Albanvale </td> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Albert Park </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 1,925,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Albion </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Unit </td> <td style="text-align:right;"> 515,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Albion </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 717,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Alphington </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 1,675,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Alphington </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 2,008,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Altona </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 860,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Altona Meadows </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Altona North </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 720,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Armadale </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Unit </td> <td style="text-align:right;"> 836,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Armadale </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 2,110,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> <tr> <td style="text-align:left;"> Armadale </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Home </td> <td style="text-align:right;"> 1,386,000 </td> <td style="text-align:left;"> 2017-04-01 </td> </tr> </tbody> </table> ] .w-50.pl3[ * This data was scrapped each week from domain.com.au from 2016-01-28 to 2018-10-13 * In total there are **63,023** observations * All variables shown .f5[(there are more variables not shown here)], except price, have complete records * The are **48,433** property prices across Melbourne (roughly 23% missing) {{content}} ]] .footnote.f5[ Data source: Tony Pio (2018) Melbourne Housing Market, Version 27. Retrieved August 2021 from https://www.kaggle.com/anthonypino/melbourne-housing-market. ] -- **How would you explore this data first?**
01
:
00
--- # .orange[Case study] .bg-orange.circle.white[2] Melbourne Housing Prices .f4[Part 2/5] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ Observations arranged by Suburb and Date: <img src="images/week4A/melb-house-plot-miss-1.png" width="432" style="display: block; margin: auto;" /> ] .w-50[ Comparing distribution of room number for observations with missing and non-missing price records: <img src="images/week4A/melb-house-plot-room-miss-1.png" width="576" style="display: block; margin: auto;" /> {{content}} ]]] .panel[.panel-name[data] .scroll-sign[ .f5.s400[ ```r df2 <- read_csv(here::here("data/MELBOURNE_HOUSE_PRICES_LESS.csv"), col_types = cols( .default = col_character(), Rooms = col_double(), Price = col_double(), Date = col_date(format = "%d/%m/%Y"), Propertycount = col_double(), Distance = col_double())) ``` ```r skimr::skim(df2) ``` ``` ## ── Data Summary ──────────────────────── ## Values ## Name df2 ## Number of rows 63023 ## Number of columns 13 ## _______________________ ## Column type frequency: ## character 8 ## Date 1 ## numeric 4 ## ________________________ ## Group variables None ## ## ── Variable type: character ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate min max empty n_unique whitespace ## 1 Suburb 0 1 3 18 0 380 0 ## 2 Address 0 1 7 27 0 57754 0 ## 3 Type 0 1 1 1 0 3 0 ## 4 Method 0 1 1 2 0 9 0 ## 5 SellerG 0 1 1 27 0 476 0 ## 6 Postcode 0 1 4 4 0 225 0 ## 7 Regionname 0 1 16 26 0 8 0 ## 8 CouncilArea 0 1 17 30 0 34 0 ## ## ── Variable type: Date ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate min max median n_unique ## 1 Date 0 1 2016-01-28 2018-10-13 2017-09-03 112 ## ## ── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Rooms 0 1 3.11 0.958 1 3 3 4 31 ▇▁▁▁▁ ## 2 Price 14590 0.768 997898. 593499. 85000 620000 830000 1220000 11200000 ▇▁▁▁▁ ## 3 Propertycount 0 1 7618. 4424. 39 4380 6795 10412 21650 ▅▇▅▂▁ ## 4 Distance 0 1 12.7 7.59 0 7 11.4 16.7 64.1 ▇▆▁▁▁ ``` ]]] .panel[.panel-name[R] .f5[ ```r df2 %>% select(Suburb, Rooms, Type, Price, Date) %>% arrange(Suburb, Date) %>% visdat::vis_miss() df2 %>% mutate(miss = ifelse(is.na(Price), "Missing", "Recorded")) %>% count(Rooms, miss) %>% group_by(miss) %>% mutate(perc = n / sum(n) * 100) %>% ggplot(aes(as.factor(Rooms), perc, fill = miss)) + geom_col(position = "dodge") + scale_fill_viridis_d() + labs(x = "Rooms", y = "Percentage", fill = "Price") ``` ]]] -- * Okay nothing notable as far as I can see * What next? --- # .orange[Case study] .bg-orange.circle.white[2] Melbourne Housing Prices .f4[Part 3/5] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ <img src="images/week4A/melb-house-price-plot1-1.png" width="432" style="display: block; margin: auto;" /> ] .w-50[ **What can we say from this plot?** {{content}} ]]] .panel[.panel-name[data] .f5[ ```r df2 <- read_csv(here::here("data/MELBOURNE_HOUSE_PRICES_LESS.csv"), col_types = cols( .default = col_character(), Rooms = col_double(), Price = col_double(), Date = col_date(format = "%d/%m/%Y"), Propertycount = col_double(), Distance = col_double())) ``` ]] .panel[.panel-name[R] .f5[ ```r df2 %>% ggplot(aes(Price/1e6)) + geom_histogram(color = "white") + labs(x = "Price ($1,000,000)", y = "Count") ``` ]]] -- * The housing prices are right-skewed {{content}} -- * There appears to be a lot of outlying housing prices (how can we tell?) --- # .orange[Case study] .bg-orange.circle.white[2] Melbourne Housing Prices .f4[Part 4/5] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ <img src="images/week4A/melb-house-price-plot2-1.png" width="432" style="display: block; margin: auto;" /> ] .w-50[ {{content}} ]]] .panel[.panel-name[data] .f5[ ```r df2 <- read_csv(here::here("data/MELBOURNE_HOUSE_PRICES_LESS.csv"), col_types = cols( .default = col_character(), Rooms = col_double(), Price = col_double(), Date = col_date(format = "%d/%m/%Y"), Propertycount = col_double(), Distance = col_double())) ``` ]] .panel[.panel-name[R] .f5[ ```r df2 %>% ggplot(aes(Price/1e6)) + geom_histogram(color = "white") + labs(x = "Price ($1,000,000)", y = "Count") + scale_x_log10() ``` ]]] -- * The x-axis has been `\(\log_{10}\)` transformed in this plot {{content}} -- * The plot appears more symmetrical now {{content}} -- * What is a measure of central tendancy here? <span class='f4'>With no transformation:</span> <table class=" lightable-classic" style='font-family: "Arial Narrow", "Source Sans Pro", sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Median </th> <th style="text-align:right;"> Trimmed Mean </th> <th style="text-align:right;"> Winsorised Mean </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> $997,898 </td> <td style="text-align:right;"> $830,000 </td> <td style="text-align:right;"> $871,375 </td> <td style="text-align:right;"> $903,823 </td> </tr> </tbody> </table> <span class='f4'>With log transformation (and back transformed to original scale):</span> <table class=" lightable-classic" style='font-family: "Arial Narrow", "Source Sans Pro", sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Median </th> <th style="text-align:right;"> Trimmed Mean </th> <th style="text-align:right;"> Winsorised Mean </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> $874,166 </td> <td style="text-align:right;"> $830,000 </td> <td style="text-align:right;"> $847,973 </td> <td style="text-align:right;"> $859,325 </td> </tr> </tbody> </table> --- class: transition # Multi-modality --- # .orange[Case study] .bg-orange.circle.white[2] Melbourne Housing Prices .f4[Part 5/5] .panelset[ .panel[.panel-name[📊] .flex[ .w-50[ <img src="images/week4A/melb-house-by-room-1.png" width="432" style="display: block; margin: auto;" /> ] .w-50[ {{content}} ]]] .panel[.panel-name[data] .f5.s500[ ```r df2 <- read_csv(here::here("data/MELBOURNE_HOUSE_PRICES_LESS.csv"), col_types = cols( .default = col_character(), Rooms = col_double(), Price = col_double(), Date = col_date(format = "%d/%m/%Y"), Propertycount = col_double(), Distance = col_double())) ``` ]] .panel[.panel-name[R] .f5[ ```r df2 %>% ggplot(aes(Price/1e6, as.factor(Rooms))) + geom_violin() + geom_boxplot(width = 0.1) + scale_x_log10() + labs(x = "Price ($1,000,000)", y = "# of Rooms") ``` ]]] -- * You can see that drawing separate violin and box plots for each room number show that higher number of rooms generally are pricier * You could not see this, however, when the data are combined <img src="images/week4A/melb-house-price-plot2-1.png" width="432" style="display: block; margin: auto;" /> --- # Take away messages .flex[ .w-70.f2[ <ul class="fa-ul"> {{content}} </ul> ] ] -- <li><span class="fa-li"><i class="fas fa-paper-plane"></i></span>Numerical and graphical summaries can reveal, but also hide, aspects of data</li> {{content}} -- <li><span class="fa-li"><i class="fas fa-paper-plane"></i></span><b>Do many numerical and graphical summaries of the data!</b></li> --- background-size: cover class: title-slide background-image: url("images/bg-01.png") <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .bottom_abs.width100[ Lecturer: *Emi Tanaka* <i class="fas fa-envelope"></i> ETC5521.Clayton-x@monash.edu <i class="fas fa-calendar-alt"></i> Week 4 - Session 1 <br> ]