These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See here for the PDF .

Press the right arrow to progress to the next slide!

1/39

ETC5512: Wild Caught Data

Australian census

Lecturer: Emi Tanaka

Department of Econometrics and Business Statistics

ETC5512.Clayton-x@monash.edu

Week 4

1/39

Population data

Recall from lecture 2:

Collecting data on the entire population is normally too expensive or infeasible! (If we can, it is called a census.)

We therefore collect data only on a subset of the population.

2/39

Population data

Recall from lecture 2:

Collecting data on the entire population is normally too expensive or infeasible! (If we can, it is called a census.)

We therefore collect data only on a subset of the population.

There are exceptions to this and one such example, as mentioned, is the census.

2/39

Population data

Recall from lecture 2:

Collecting data on the entire population is normally too expensive or infeasible! (If we can, it is called a census.)

We therefore collect data only on a subset of the population.

There are exceptions to this and one such example, as mentioned, is the census.

When was the last time that the Australian census was run?
How often is the census conducted in Australia?

2/39

Population data

Recall from lecture 2:

Collecting data on the entire population is normally too expensive or infeasible! (If we can, it is called a census.)

We therefore collect data only on a subset of the population.

There are exceptions to this and one such example, as mentioned, is the census.

When was the last time that the Australian census was run?
How often is the census conducted in Australia?
Why do we run the census?
What data does the Australian census collect?

2/39

Sample survey
Census

Advantages

Disadvantages
3/39

Sample survey
Census

Advantages

Disadvantages
Reduces cost
Timely collection of data

Data available, even for small geographical areas or subpopulations
Statistics are not subject to sampling error
Better accuracy and details

Lack of data on sub-population (particularly minorities) or small geographical areas
Requires careful construction of sampling design
Estimates are subject to sampling error
The estimates may not be accurate or reliable 
Estimating and communicating precision of estimates is difficult

Expensive or infeasible
Time consuming to collect all data

3/39

Australian Bureau of Statistics (ABS)

ABS is the independent statistical agency of the Government of Australia.

4/39

Australian Bureau of Statistics (ABS)

ABS is the independent statistical agency of the Government of Australia.
If you are from outside Australia, find the statistical government agency in your country , e.g.
- in 🇯🇵 Japan, this is the Statistics Bureau of Japan,
- in 🇨🇳 China, the National Bureau of Statistics of China,
- in 🇮🇳 India, the Ministry of Statistics and Programme Implementation, and
- in 🇳🇿 New Zealand, the Statistics New Zealand.

4/39

Australian Bureau of Statistics (ABS)

ABS is the independent statistical agency of the Government of Australia.
If you are from outside Australia, find the statistical government agency in your country , e.g.
- in 🇯🇵 Japan, this is the Statistics Bureau of Japan,
- in 🇨🇳 China, the National Bureau of Statistics of China,
- in 🇮🇳 India, the Ministry of Statistics and Programme Implementation, and
- in 🇳🇿 New Zealand, the Statistics New Zealand.

ABS provides key statistics on a wide range of economic, population, environmental and social issues, to assist and encourage informed decision making, research and discussion within governments and the community.

4/39

ABS Census Data

The first Australian census was held in 1911.

5/39

ABS Census Data

The first Australian census was held in 1911.
Since 1961, the census occurs every 5 years in Australia.

5/39

ABS Census Data

The first Australian census was held in 1911.
Since 1961, the census occurs every 5 years in Australia.
The census in 2016 at a cost of $440 million.

5/39

ABS Census Data

The first Australian census was held in 1911.
Since 1961, the census occurs every 5 years in Australia.
The census in 2016 at a cost of $440 million.
The next census will be held in 2026!

5/39

ABS Census Data

The first Australian census was held in 1911.
Since 1961, the census occurs every 5 years in Australia.
The census in 2016 at a cost of $440 million.
The next census will be held in 2026!
The ABS is legislated to collect and disseminate census data under the ABS Act 1975 and Census and Statistics Act 1905.

5/39

ABS Census Data

The first Australian census was held in 1911.
Since 1961, the census occurs every 5 years in Australia.
The census in 2016 at a cost of $440 million.
The next census will be held in 2026!
The ABS is legislated to collect and disseminate census data under the ABS Act 1975 and Census and Statistics Act 1905.
Similar legislation are in place in many countries.

5/39

Getting the ABS Census Data

https://www.abs.gov.au/census/find-census-data

There are two main types of data that you can download:

DataPacks https://datapacks.censusdata.abs.gov.au/datapacks/
GeoPackages https://datapacks.censusdata.abs.gov.au/geopackages/

6/39

Navigating ABS Census dataThe DataPacks is available only for the 2011 and 2016 census.

7/39

Navigating ABS Census dataThe DataPacks is available only for the 2011 and 2016 census.
There are slight differences in the available profiles between years, e.g. the General Community Profile in 2016 is a replacement for Basic and Expanded Community Profiles in 2011.

7/39

Navigating ABS Census data

The DataPacks is available only for the 2011 and 2016 census.
There are slight differences in the available profiles between years, e.g. the General Community Profile in 2016 is a replacement for Basic and Expanded Community Profiles in 2011.
The information related the census are detailed on the website. See for example here.

7/39

Navigating ABS Census data

The DataPacks is available only for the 2011 and 2016 census.
There are slight differences in the available profiles between years, e.g. the General Community Profile in 2016 is a replacement for Basic and Expanded Community Profiles in 2011.
The information related the census are detailed on the website. See for example here.
Note: there are sometimes data corrections at a later date.

7/39

Navigating ABS Census data

The DataPacks is available only for the 2011 and 2016 census.
There are slight differences in the available profiles between years, e.g. the General Community Profile in 2016 is a replacement for Basic and Expanded Community Profiles in 2011.
The information related the census are detailed on the website. See for example here.
Note: there are sometimes data corrections at a later date.

Navigating data and deducing what it is often requires you to do some "detective work" 🕵️‍♀️

Much like real detective work, just locating the data and understanding the data variables can take a long time; the work often is not glamorous; and there's far more attention in "catching criminals" (the discoveries from statistical analysis).

7/39

Today,

We'll navigate through the personal income data from the 2016 census together for you to get some "detective" experience
You'll learn to manipulate strings and a bit about regular expressions to deal with string data.
You'll learn about tidy data.

8/39

DataPack directory structure

2016_GCP_ALL_for_Vic_short-header
- 2016 Census GCP All Geographies for VIC
  - CED
    - VIC
      - 2016Census_G01_VIC_CED.csv
      - 2016Census_G02_VIC_CED.csv
      - ...
  - GCCSA
    - VIC
      - 2016Census_G01_VIC_GCCSA.csv
      - 2016Census_G01_VIC_GCCSA.csv
      - ...
  - LGA
    - VIC
      - 2016Census_G01_VIC_LGA.csv
      - 2016Census_G02_VIC_LGA.csv
      - ...
  - POA
    - VIC
      - 2016Census_G01_VIC_POA.csv
      - 2016Census_G02_VIC_POA.csv
      - ...
  - RA
    - VIC
      - 2016Census_G01_VIC_RA.csv
      - 2016Census_G02_VIC_RA.csv
      - ...
  - SA1
    - VIC
      - 2016Census_G01_VIC_SA1.csv
      - 2016Census_G92_VIC_SA1.csv
      - ...
  - SA2
    - VIC
      - 2016Census_G01_VIC_SA2.csv
      - 2016Census_G02_VIC_SA2.csv
      - ...
  - SA3
    - VIC
      - 2016Census_G01_VIC_SA3.csv
      - 2016Census_G01_VIC_SA3.csv
      - ...
  - SA4
    - VIC
      - 2016Census_G01_VIC_SA4.csv
      - 2016Census_G02_VIC_SA4.csv
      - ...
  - SED
    - VIC
      - 2016Census_G01_VIC_SED.csv
      - 2016Census_G02_VIC_SED.csv
      - ...
  - SOS
    - VIC
      - 2016Census_G01_VIC_SOS.csv
      - 2016Census_G02_VIC_SOS.csv
      - ...
  - SOSR
    - VIC
      - 2016Census_G01_VIC_SOSR.csv
      - 2016Census_G02_VIC_SOSR.csv
      - ...
  - SSC
    - VIC
      - 2016Census_G01_VIC_SSC.csv
      - 2016Census_G02_VIC_SSC.csv
      - ...
  - STE
    - VIC
      - 2016Census_G01_VIC_STE.csv
      - 2016Census_G02_VIC_STE.csv
      - ...
  - SUA
    - VIC
      - 2016Census_G01_VIC_SUA.csv
      - 2016Census_G02_VIC_SUA.csv
      - ...
  - UCL
    - VIC
      - 2016Census_G01_VIC_UCL.csv
      - 2016Census_G02_VIC_UCL.csv
      - ...
- Metadata
  - 2016_GCP_Sequential_Template.xlsx
  - 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
  - Metadata_2016_GCP_DataPack.xlsx
- Readme
  - 2016POA_readme.txt
  - AboutDatapacks_readme.txt
  - CreativeCommons_Licensing_readme.txt
  - esri_arcmap_readme.txt
  - Formats_readme.txt
  - mapinfo_readme.txt
  - Summary_of_Changes.txt

The data is nested within folders.
Click on the folder name to see folders and files nested within.

9/39

DataPack directory structure

2016_GCP_ALL_for_Vic_short-header
- 2016 Census GCP All Geographies for VIC
  - CED
    - VIC
      - 2016Census_G01_VIC_CED.csv
      - 2016Census_G02_VIC_CED.csv
      - ...
  - GCCSA
    - VIC
      - 2016Census_G01_VIC_GCCSA.csv
      - 2016Census_G01_VIC_GCCSA.csv
      - ...
  - LGA
    - VIC
      - 2016Census_G01_VIC_LGA.csv
      - 2016Census_G02_VIC_LGA.csv
      - ...
  - POA
    - VIC
      - 2016Census_G01_VIC_POA.csv
      - 2016Census_G02_VIC_POA.csv
      - ...
  - RA
    - VIC
      - 2016Census_G01_VIC_RA.csv
      - 2016Census_G02_VIC_RA.csv
      - ...
  - SA1
    - VIC
      - 2016Census_G01_VIC_SA1.csv
      - 2016Census_G92_VIC_SA1.csv
      - ...
  - SA2
    - VIC
      - 2016Census_G01_VIC_SA2.csv
      - 2016Census_G02_VIC_SA2.csv
      - ...
  - SA3
    - VIC
      - 2016Census_G01_VIC_SA3.csv
      - 2016Census_G01_VIC_SA3.csv
      - ...
  - SA4
    - VIC
      - 2016Census_G01_VIC_SA4.csv
      - 2016Census_G02_VIC_SA4.csv
      - ...
  - SED
    - VIC
      - 2016Census_G01_VIC_SED.csv
      - 2016Census_G02_VIC_SED.csv
      - ...
  - SOS
    - VIC
      - 2016Census_G01_VIC_SOS.csv
      - 2016Census_G02_VIC_SOS.csv
      - ...
  - SOSR
    - VIC
      - 2016Census_G01_VIC_SOSR.csv
      - 2016Census_G02_VIC_SOSR.csv
      - ...
  - SSC
    - VIC
      - 2016Census_G01_VIC_SSC.csv
      - 2016Census_G02_VIC_SSC.csv
      - ...
  - STE
    - VIC
      - 2016Census_G01_VIC_STE.csv
      - 2016Census_G02_VIC_STE.csv
      - ...
  - SUA
    - VIC
      - 2016Census_G01_VIC_SUA.csv
      - 2016Census_G02_VIC_SUA.csv
      - ...
  - UCL
    - VIC
      - 2016Census_G01_VIC_UCL.csv
      - 2016Census_G02_VIC_UCL.csv
      - ...
- Metadata
  - 2016_GCP_Sequential_Template.xlsx
  - 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
  - Metadata_2016_GCP_DataPack.xlsx
- Readme
  - 2016POA_readme.txt
  - AboutDatapacks_readme.txt
  - CreativeCommons_Licensing_readme.txt
  - esri_arcmap_readme.txt
  - Formats_readme.txt
  - mapinfo_readme.txt
  - Summary_of_Changes.txt

The data is nested within folders.
Click on the folder name to see folders and files nested within.
Preserve the data in the original structure as much as you can! That is, don't modify the data!

9/39

DataPack directory structure

2016_GCP_ALL_for_Vic_short-header
- 2016 Census GCP All Geographies for VIC
  - CED
    - VIC
      - 2016Census_G01_VIC_CED.csv
      - 2016Census_G02_VIC_CED.csv
      - ...
  - GCCSA
    - VIC
      - 2016Census_G01_VIC_GCCSA.csv
      - 2016Census_G01_VIC_GCCSA.csv
      - ...
  - LGA
    - VIC
      - 2016Census_G01_VIC_LGA.csv
      - 2016Census_G02_VIC_LGA.csv
      - ...
  - POA
    - VIC
      - 2016Census_G01_VIC_POA.csv
      - 2016Census_G02_VIC_POA.csv
      - ...
  - RA
    - VIC
      - 2016Census_G01_VIC_RA.csv
      - 2016Census_G02_VIC_RA.csv
      - ...
  - SA1
    - VIC
      - 2016Census_G01_VIC_SA1.csv
      - 2016Census_G92_VIC_SA1.csv
      - ...
  - SA2
    - VIC
      - 2016Census_G01_VIC_SA2.csv
      - 2016Census_G02_VIC_SA2.csv
      - ...
  - SA3
    - VIC
      - 2016Census_G01_VIC_SA3.csv
      - 2016Census_G01_VIC_SA3.csv
      - ...
  - SA4
    - VIC
      - 2016Census_G01_VIC_SA4.csv
      - 2016Census_G02_VIC_SA4.csv
      - ...
  - SED
    - VIC
      - 2016Census_G01_VIC_SED.csv
      - 2016Census_G02_VIC_SED.csv
      - ...
  - SOS
    - VIC
      - 2016Census_G01_VIC_SOS.csv
      - 2016Census_G02_VIC_SOS.csv
      - ...
  - SOSR
    - VIC
      - 2016Census_G01_VIC_SOSR.csv
      - 2016Census_G02_VIC_SOSR.csv
      - ...
  - SSC
    - VIC
      - 2016Census_G01_VIC_SSC.csv
      - 2016Census_G02_VIC_SSC.csv
      - ...
  - STE
    - VIC
      - 2016Census_G01_VIC_STE.csv
      - 2016Census_G02_VIC_STE.csv
      - ...
  - SUA
    - VIC
      - 2016Census_G01_VIC_SUA.csv
      - 2016Census_G02_VIC_SUA.csv
      - ...
  - UCL
    - VIC
      - 2016Census_G01_VIC_UCL.csv
      - 2016Census_G02_VIC_UCL.csv
      - ...
- Metadata
  - 2016_GCP_Sequential_Template.xlsx
  - 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
  - Metadata_2016_GCP_DataPack.xlsx
- Readme
  - 2016POA_readme.txt
  - AboutDatapacks_readme.txt
  - CreativeCommons_Licensing_readme.txt
  - esri_arcmap_readme.txt
  - Formats_readme.txt
  - mapinfo_readme.txt
  - Summary_of_Changes.txt

The data is nested within folders.
Click on the folder name to see folders and files nested within.
Preserve the data in the original structure as much as you can! That is, don't modify the data!
Where do we get started??

9/39

Getting startedFirst, pray hard that there is some description!
10/39

Getting started

First, pray hard that there is some description!
Without some description or understanding of the variables, it will be near impossible to extract meaningful information from the data.

10/39

Getting started

First, pray hard that there is some description!
Without some description or understanding of the variables, it will be near impossible to extract meaningful information from the data.
- 2016_GCP_ALL_for_Vic_short-header
- Metadata
  - 2016_GCP_Sequential_Template.xlsx
  - 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
  - Metadata_2016_GCP_DataPack.xlsx
- Readme
  - 2016POA_readme.txt
  - AboutDatapacks_readme.txt
  - CreativeCommons_Licensing_readme.txt
  - esri_arcmap_readme.txt
  - Formats_readme.txt
  - mapinfo_readme.txt
  - Summary_of_Changes.txt
- Readme is a good place to start here (phew!)
"About DataPacks_readme.md - "Read Me" documentation containing helpful information for users about the data and how it is structured (.md)"
- But there is no `DataPacks_readme.md`??

10/39

Getting started

First, pray hard that there is some description!
Without some description or understanding of the variables, it will be near impossible to extract meaningful information from the data.
- 2016_GCP_ALL_for_Vic_short-header
- Metadata
  - 2016_GCP_Sequential_Template.xlsx
  - 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
  - Metadata_2016_GCP_DataPack.xlsx
- Readme
  - 2016POA_readme.txt
  - AboutDatapacks_readme.txt
  - CreativeCommons_Licensing_readme.txt
  - esri_arcmap_readme.txt
  - Formats_readme.txt
  - mapinfo_readme.txt
  - Summary_of_Changes.txt
- Readme is a good place to start here (phew!)
"About DataPacks_readme.md - "Read Me" documentation containing helpful information for users about the data and how it is structured (.md)"
- But there is no `DataPacks_readme.md`??
- We go through other files in the Readme.

10/39

Meta-data

2016_GCP_ALL_for_Vic_short-header
Metadata
- 2016_GCP_Sequential_Template.xlsx
- 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
- Metadata_2016_GCP_DataPack.xlsx
Readme

We could also try going through the meta-data.

11/39

Meta-data

2016_GCP_ALL_for_Vic_short-header
Metadata
- 2016_GCP_Sequential_Template.xlsx
- 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
- Metadata_2016_GCP_DataPack.xlsx
Readme

We could also try going through the meta-data.

Metadata_2016_GCP_DataPack.xlsx

Table number	Table name	Table population
G17	Total Personal Income (Weekly) by Age by Sex	Persons aged 15 years and over
G28	Total Family Income (Weekly) by Family Composition	Families in family households
G29	Total Household Income (Weekly) by Household Composition	Occupied private dwellings
...	...	...

11/39

Finding Table G17

2016_GCP_ALL_for_Vic_short-header
- 2016 Census GCP All Geographies for VIC
  - CED
    - VIC
      - ...
      - 2016Census_G17A_VIC_CED.csv
      - 2016Census_G17B_VIC_CED.csv
      - 2016Census_G17C_VIC_CED.csv
      - ...
  - GCCSA
    - VIC
      - ...
      - 2016Census_G17A_VIC_GCCSA.csv
      - 2016Census_G17B_VIC_GCCSA.csv
      - 2016Census_G17C_VIC_GCCSA.csv
      - ...
  - LGA
    - VIC
      - ...
      - 2016Census_G17A_VIC_LGA.csv
      - 2016Census_G17B_VIC_LGA.csv
      - 2016Census_G17C_VIC_LGA.csv
      - ...
  - POA
    - VIC
      - ...
      - 2016Census_G17A_VIC_POA.csv
      - 2016Census_G17B_VIC_POA.csv
      - 2016Census_G17C_VIC_POA.csv
      - ...
  - RA
    - VIC
      - ...
      - 2016Census_G17A_VIC_RA.csv
      - 2016Census_G17B_VIC_RA.csv
      - 2016Census_G17C_VIC_RA.csv
      - ...
  - SA1
    - VIC
      - ...
      - 2016Census_G17A_VIC_SA1.csv
      - 2016Census_G17B_VIC_SA1.csv
      - 2016Census_G17C_VIC_SA1.csv
      - ...
  - SA2
    - VIC
      - ...
      - 2016Census_G17A_VIC_SA2.csv
      - 2016Census_G17B_VIC_SA2.csv
      - 2016Census_G17C_VIC_SA2.csv
      - ...
  - SA3
    - VIC
      - ...
      - 2016Census_G17A_VIC_SA3.csv
      - 2016Census_G17B_VIC_SA3.csv
      - 2016Census_G17C_VIC_SA3.csv
      - ...
  - SA4
    - VIC
      - ...
      - 2016Census_G17A_VIC_SA4.csv
      - 2016Census_G17B_VIC_SA4.csv
      - 2016Census_G17C_VIC_SA4.csv
      - ...
  - SED
    - VIC
      - ...
      - 2016Census_G17A_VIC_SED.csv
      - 2016Census_G17B_VIC_SED.csv
      - 2016Census_G17C_VIC_SED.csv
      - ...
  - SOS
    - VIC
      - ...
      - 2016Census_G17A_VIC_SOS.csv
      - 2016Census_G17B_VIC_SOS.csv
      - 2016Census_G17C_VIC_SOS.csv
      - ...
  - SOSR
    - VIC
      - ...
      - 2016Census_G17A_VIC_SOSR.csv
      - 2016Census_G17B_VIC_SOSR.csv
      - 2016Census_G17C_VIC_SOSR.csv
      - ...
  - SSC
    - VIC
      - ...
      - 2016Census_G17A_VIC_SSC.csv
      - 2016Census_G17B_VIC_SSC.csv
      - 2016Census_G17C_VIC_SSC.csv
      - ...
  - STE
    - VIC
      - ...
      - 2016Census_G17A_VIC_STE.csv
      - 2016Census_G17B_VIC_STE.csv
      - 2016Census_G17C_VIC_STE.csv
      - ...
  - SUA
    - VIC
      - ...
      - 2016Census_G17A_VIC_SUA.csv
      - 2016Census_G17B_VIC_SUA.csv
      - 2016Census_G17C_VIC_SUA.csv
      - ...
  - UCL
    - VIC
      - ...
      - 2016Census_G17A_VIC_UCL.csv
      - 2016Census_G17B_VIC_UCL.csv
      - 2016Census_G17C_VIC_UCL.csv
      - ...
- Metadata
- Readme

Where is Table G17?

Which Table G17?

12/39

Back to metadata

Metadata
- 2016_GCP_Sequential_Template.xlsx
- 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
- Metadata_2016_GCP_DataPack.xlsx

Let's open 2016Census_geog_desc_1st_2nd_3rd_release.xlsx

13/39

Back to metadata

Metadata
- 2016_GCP_Sequential_Template.xlsx
- 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
- Metadata_2016_GCP_DataPack.xlsx

Let's open 2016Census_geog_desc_1st_2nd_3rd_release.xlsx

... and there are the region names of each geographical code.

13/39

Back to metadata

Metadata
- 2016_GCP_Sequential_Template.xlsx
- 2016Census_geog_desc_1st_2nd_3rd_release.xlsx
- Metadata_2016_GCP_DataPack.xlsx

Let's open 2016Census_geog_desc_1st_2nd_3rd_release.xlsx

... and there are the region names of each geographical code.

Let's go with the easy one: STE Victoria.

13/39

Found Table G17?

2016_GCP_ALL_for_Vic_short-header

2016 Census GCP All Geographies for VIC
- ...
- STE
  - VIC
    - ...
    - 2016Census_G17A_VIC_STE.csv
    - 2016Census_G17B_VIC_STE.csv
    - 2016Census_G17C_VIC_STE.csv
    - ...
- ...

G17A, G17B, G17C?

Why is the table organised like this?

14/39

Tables G17A-G17C

2016Census_G17A_VIC_STE.csv

STE_CODE_2016	M_Neg_Nil_income_15_19_yrs	M_Neg_Nil_income_20_24_yrs	M_Neg_Nil_income_25_34_yrs	M_Neg_Nil_income_35_44_yrs	M_Neg_Nil_income_45_54_yrs	M_Neg_Nil_income_55_64_yrs	M_Neg_Nil_income_65_74_yrs	M_Neg_Nil_income_75_84_yrs	M_Negtve_Nil_incme_85_yrs_ovr	M_Neg_Nil_income_Tot	M_1_149_15_19_yrs	M_1_149_20_24_yrs	M_1_149_25_34_yrs	M_1_149_35_44_yrs	M_1_149_45_54_yrs	M_1_149_55_64_yrs	M_1_149_65_74_yrs	M_1_149_75_84_yrs	M_1_149_85ov	M_1_149_Tot	M_150_299_15_19_yrs	M_150_299_20_24_yrs	M_150_299_25_34_yrs	M_150_299_35_44_yrs	M_150_299_45_54_yrs	M_150_299_55_64_yrs	M_150_299_65_74_yrs	M_150_299_75_84_yrs	M_150_299_85ov	M_150_299_Tot	M_300_399_15_19_yrs	M_300_399_20_24_yrs	M_300_399_25_34_yrs	M_300_399_35_44_yrs	M_300_399_45_54_yrs	M_300_399_55_64_yrs	M_300_399_65_74_yrs	M_300_399_75_84_yrs	M_300_399_85ov	M_300_399_Tot	M_400_499_15_19_yrs	M_400_499_20_24_yrs	M_400_499_25_34_yrs	M_400_499_35_44_yrs	M_400_499_45_54_yrs	M_400_499_55_64_yrs	M_400_499_65_74_yrs	M_400_499_75_84_yrs	M_400_499_85ov	M_400_499_Tot	M_500_649_15_19_yrs	M_500_649_20_24_yrs	M_500_649_25_34_yrs	M_500_649_35_44_yrs	M_500_649_45_54_yrs	M_500_649_55_64_yrs	M_500_649_65_74_yrs	M_500_649_75_84_yrs	M_500_649_85ov	M_500_649_Tot	M_650_799_15_19_yrs	M_650_799_20_24_yrs	M_650_799_25_34_yrs	M_650_799_35_44_yrs	M_650_799_45_54_yrs	M_650_799_55_64_yrs	M_650_799_65_74_yrs	M_650_799_75_84_yrs	M_650_799_85ov	M_650_799_Tot	M_800_999_15_19_yrs	M_800_999_20_24_yrs	M_800_999_25_34_yrs	M_800_999_35_44_yrs	M_800_999_45_54_yrs	M_800_999_55_64_yrs	M_800_999_65_74_yrs	M_800_999_75_84_yrs	M_800_999_85ov	M_800_999_Tot	M_1000_1249_15_19_yrs	M_1000_1249_20_24_yrs	M_1000_1249_25_34_yrs	M_1000_1249_35_44_yrs	M_1000_1249_45_54_yrs	M_1000_1249_55_64_yrs	M_1000_1249_65_74_yrs	M_1000_1249_75_84_yrs	M_1000_1249_85ov	M_1000_1249_Tot	M_1250_1499_15_19_yrs	M_1250_1499_20_24_yrs	M_1250_1499_25_34_yrs	M_1250_1499_35_44_yrs	M_1250_1499_45_54_yrs	M_1250_1499_55_64_yrs	M_1250_1499_65_74_yrs	M_1250_1499_75_84_yrs	M_1250_1499_85ov	M_1250_1499_Tot	M_1500_1749_15_19_yrs	M_1500_1749_20_24_yrs	M_1500_1749_25_34_yrs	M_1500_1749_35_44_yrs	M_1500_1749_45_54_yrs	M_1500_1749_55_64_yrs	M_1500_1749_65_74_yrs	M_1500_1749_75_84_yrs	M_1500_1749_85ov	M_1500_1749_Tot	M_1750_1999_15_19_yrs	M_1750_1999_20_24_yrs	M_1750_1999_25_34_yrs	M_1750_1999_35_44_yrs	M_1750_1999_45_54_yrs	M_1750_1999_55_64_yrs	M_1750_1999_65_74_yrs	M_1750_1999_75_84_yrs	M_1750_1999_85ov	M_1750_1999_Tot	M_2000_2999_15_19_yrs	M_2000_2999_20_24_yrs	M_2000_2999_25_34_yrs	M_2000_2999_35_44_yrs	M_2000_2999_45_54_yrs	M_2000_2999_55_64_yrs	M_2000_2999_65_74_yrs	M_2000_2999_75_84_yrs	M_2000_2999_85ov	M_2000_2999_Tot	M_3000_more_15_19_yrs	M_3000_more_20_24_yrs	M_3000_more_25_34_yrs	M_3000_more_35_44_yrs	M_3000_more_45_54_yrs	M_3000_more_55_64_yrs	M_3000_more_65_74_yrs	M_3000_more_75_84_yrs	M_3000_more_85ov	M_3000_more_Tot	M_PI_NS_15_19_yrs	M_PI_NS_ns_20_24_yrs	M_PI_NS_ns_25_34_yrs	M_PI_NS_ns_35_44_yrs	M_PI_NS_ns_45_54_yrs	M_PI_NS_ns_55_64_yrs	M_PI_NS_ns_65_74_yrs	M_PI_NS_ns_75_84_yrs	M_PI_NS_ns_85_yrs_ovr	M_PI_NS_ns_Tot	M_Tot_15_19_yrs	M_Tot_20_24_yrs	M_Tot_25_34_yrs	M_Tot_35_44_yrs	M_Tot_45_54_yrs	M_Tot_55_64_yrs	M_Tot_65_74_yrs	M_Tot_75_84_yrs	M_Tot_85ov	M_Tot_Tot	F_Neg_Nil_income_15_19_yrs	F_Neg_Nil_income_20_24_yrs	F_Neg_Nil_income_25_34_yrs	F_Neg_Nil_income_35_44_yrs	F_Neg_Nil_income_45_54_yrs	F_Neg_Nil_income_55_64_yrs	F_Neg_Nil_income_65_74_yrs	F_Neg_Nil_income_75_84_yrs	F_Neg_Nil_incme_85_yrs_ovr	F_Neg_Nil_income_Tot	F_1_149_15_19_yrs	F_1_149_20_24_yrs	F_1_149_25_34_yrs	F_1_149_35_44_yrs	F_1_149_45_54_yrs	F_1_149_55_64_yrs	F_1_149_65_74_yrs	F_1_149_75_84_yrs	F_1_149_85ov	F_1_149_Tot	F_150_299_15_19_yrs	F_150_299_20_24_yrs	F_150_299_25_34_yrs	F_150_299_35_44_yrs	F_150_299_45_54_yrs	F_150_299_55_64_yrs	F_150_299_65_74_yrs	F_150_299_75_84_yrs	F_150_299_85ov	F_150_299_Tot	F_300_399_15_19_yrs	F_300_399_20_24_yrs	F_300_399_25_34_yrs	F_300_399_35_44_yrs	F_300_399_45_54_yrs	F_300_399_55_64_yrs	F_300_399_65_74_yrs	F_300_399_75_84_yrs	F_300_399_85ov	F_300_399_Tot

STE_CODE_2016	M_Neg_Nil_income_15_19_yrs	M_Neg_Nil_income_20_24_yrs	M_Neg_Nil_income_25_34_yrs	M_Neg_Nil_income_35_44_yrs	M_Neg_Nil_income_45_54_yrs	M_Neg_Nil_income_55_64_yrs	M_Neg_Nil_income_65_74_yrs	M_Neg_Nil_income_75_84_yrs	M_Negtve_Nil_incme_85_yrs_ovr	M_Neg_Nil_income_Tot	M_1_149_15_19_yrs	M_1_149_20_24_yrs	M_1_149_25_34_yrs	M_1_149_35_44_yrs	M_1_149_45_54_yrs	M_1_149_55_64_yrs	M_1_149_65_74_yrs	M_1_149_75_84_yrs	M_1_149_85ov	M_1_149_Tot	M_150_299_15_19_yrs	M_150_299_20_24_yrs	M_150_299_25_34_yrs	M_150_299_35_44_yrs	M_150_299_45_54_yrs	M_150_299_55_64_yrs	M_150_299_65_74_yrs	M_150_299_75_84_yrs	M_150_299_85ov	M_150_299_Tot	M_300_399_15_19_yrs	M_300_399_20_24_yrs	M_300_399_25_34_yrs	M_300_399_35_44_yrs	M_300_399_45_54_yrs	M_300_399_55_64_yrs	M_300_399_65_74_yrs	M_300_399_75_84_yrs	M_300_399_85ov	M_300_399_Tot	M_400_499_15_19_yrs	M_400_499_20_24_yrs	M_400_499_25_34_yrs	M_400_499_35_44_yrs	M_400_499_45_54_yrs	M_400_499_55_64_yrs	M_400_499_65_74_yrs	M_400_499_75_84_yrs	M_400_499_85ov	M_400_499_Tot	M_500_649_15_19_yrs	M_500_649_20_24_yrs	M_500_649_25_34_yrs	M_500_649_35_44_yrs	M_500_649_45_54_yrs	M_500_649_55_64_yrs	M_500_649_65_74_yrs	M_500_649_75_84_yrs	M_500_649_85ov	M_500_649_Tot	M_650_799_15_19_yrs	M_650_799_20_24_yrs	M_650_799_25_34_yrs	M_650_799_35_44_yrs	M_650_799_45_54_yrs	M_650_799_55_64_yrs	M_650_799_65_74_yrs	M_650_799_75_84_yrs	M_650_799_85ov	M_650_799_Tot	M_800_999_15_19_yrs	M_800_999_20_24_yrs	M_800_999_25_34_yrs	M_800_999_35_44_yrs	M_800_999_45_54_yrs	M_800_999_55_64_yrs	M_800_999_65_74_yrs	M_800_999_75_84_yrs	M_800_999_85ov	M_800_999_Tot	M_1000_1249_15_19_yrs	M_1000_1249_20_24_yrs	M_1000_1249_25_34_yrs	M_1000_1249_35_44_yrs	M_1000_1249_45_54_yrs	M_1000_1249_55_64_yrs	M_1000_1249_65_74_yrs	M_1000_1249_75_84_yrs	M_1000_1249_85ov	M_1000_1249_Tot	M_1250_1499_15_19_yrs	M_1250_1499_20_24_yrs	M_1250_1499_25_34_yrs	M_1250_1499_35_44_yrs	M_1250_1499_45_54_yrs	M_1250_1499_55_64_yrs	M_1250_1499_65_74_yrs	M_1250_1499_75_84_yrs	M_1250_1499_85ov	M_1250_1499_Tot	M_1500_1749_15_19_yrs	M_1500_1749_20_24_yrs	M_1500_1749_25_34_yrs	M_1500_1749_35_44_yrs	M_1500_1749_45_54_yrs	M_1500_1749_55_64_yrs	M_1500_1749_65_74_yrs	M_1500_1749_75_84_yrs	M_1500_1749_85ov	M_1500_1749_Tot	M_1750_1999_15_19_yrs	M_1750_1999_20_24_yrs	M_1750_1999_25_34_yrs	M_1750_1999_35_44_yrs	M_1750_1999_45_54_yrs	M_1750_1999_55_64_yrs	M_1750_1999_65_74_yrs	M_1750_1999_75_84_yrs	M_1750_1999_85ov	M_1750_1999_Tot	M_2000_2999_15_19_yrs	M_2000_2999_20_24_yrs	M_2000_2999_25_34_yrs	M_2000_2999_35_44_yrs	M_2000_2999_45_54_yrs	M_2000_2999_55_64_yrs	M_2000_2999_65_74_yrs	M_2000_2999_75_84_yrs	M_2000_2999_85ov	M_2000_2999_Tot	M_3000_more_15_19_yrs	M_3000_more_20_24_yrs	M_3000_more_25_34_yrs	M_3000_more_35_44_yrs	M_3000_more_45_54_yrs	M_3000_more_55_64_yrs	M_3000_more_65_74_yrs	M_3000_more_75_84_yrs	M_3000_more_85ov	M_3000_more_Tot	M_PI_NS_15_19_yrs	M_PI_NS_ns_20_24_yrs	M_PI_NS_ns_25_34_yrs	M_PI_NS_ns_35_44_yrs	M_PI_NS_ns_45_54_yrs	M_PI_NS_ns_55_64_yrs	M_PI_NS_ns_65_74_yrs	M_PI_NS_ns_75_84_yrs	M_PI_NS_ns_85_yrs_ovr	M_PI_NS_ns_Tot	M_Tot_15_19_yrs	M_Tot_20_24_yrs	M_Tot_25_34_yrs	M_Tot_35_44_yrs	M_Tot_45_54_yrs	M_Tot_55_64_yrs	M_Tot_65_74_yrs	M_Tot_75_84_yrs	M_Tot_85ov	M_Tot_Tot	F_Neg_Nil_income_15_19_yrs	F_Neg_Nil_income_20_24_yrs	F_Neg_Nil_income_25_34_yrs	F_Neg_Nil_income_35_44_yrs	F_Neg_Nil_income_45_54_yrs	F_Neg_Nil_income_55_64_yrs	F_Neg_Nil_income_65_74_yrs	F_Neg_Nil_income_75_84_yrs	F_Neg_Nil_incme_85_yrs_ovr	F_Neg_Nil_income_Tot	F_1_149_15_19_yrs	F_1_149_20_24_yrs	F_1_149_25_34_yrs	F_1_149_35_44_yrs	F_1_149_45_54_yrs	F_1_149_55_64_yrs	F_1_149_65_74_yrs	F_1_149_75_84_yrs	F_1_149_85ov	F_1_149_Tot	F_150_299_15_19_yrs	F_150_299_20_24_yrs	F_150_299_25_34_yrs	F_150_299_35_44_yrs	F_150_299_45_54_yrs	F_150_299_55_64_yrs	F_150_299_65_74_yrs	F_150_299_75_84_yrs	F_150_299_85ov	F_150_299_Tot	F_300_399_15_19_yrs	F_300_399_20_24_yrs	F_300_399_25_34_yrs	F_300_399_35_44_yrs	F_300_399_45_54_yrs	F_300_399_55_64_yrs	F_300_399_65_74_yrs	F_300_399_75_84_yrs	F_300_399_85ov	F_300_399_Tot
2	88338	31685	21321	12176	12700	16883	11502	4864	1736	201199	38027	15443	5314	3872	4598	6578	6248	2831	947	83859	14404	24502	18377	13035	14432	19362	21286	12944	4000	142347	6041	16083	15153	11440	14479	20680	43541	32914	10052	170390	6633	16767	17420	12871	15611	19490	31744	20256	8549	149345	5249	20317	23775	15826	16990	19775	24533	12661	4180	143307	2890	21927	38051	25091	24766	24018	18372	8652	2893	166657	1600	20837	56308	38378	37087	31737	16527	5393	1990	209859	672	14079	63881	47236	43346	35021	15012	4330	1529	225102	214	5767	45712	38351	33054	24126	8626	2113	683	158640	138	2598	34901	36477	31611	21816	6398	1549	538	136031	63	1085	21647	29005	24774	16129	4212	960	380	98252	116	951	28713	49459	41738	25319	6466	1558	535	154852	201	671	9675	31944	34203	20247	6749	1903	724	106312	17255	17031	36907	30837	29984	26386	23609	16507	8828	207345	181849	209733	437167	395979	379374	327567	244826	129451	47567	2353499	77647	31317	47176	39001	32724	39129	16906	6789	3159	293852	46359	17240	14080	15564	13261	14321	8753	3584	1493	134653	18099	28026	28760	27562	25839	32391	26881	15068	4998	207614	5983	18708	24559	25164	26693	33671	54764	34779	12164	236491

2016Census_G17B_VIC_STE.csv

STE_CODE_2016	F_400_499_15_19_yrs	F_400_499_20_24_yrs	F_400_499_25_34_yrs	F_400_499_35_44_yrs	F_400_499_45_54_yrs	F_400_499_55_64_yrs	F_400_499_65_74_yrs	F_400_499_75_84_yrs	F_400_499_85ov	F_400_499_Tot	F_500_649_15_19_yrs	F_500_649_20_24_yrs	F_500_649_25_34_yrs	F_500_649_35_44_yrs	F_500_649_45_54_yrs	F_500_649_55_64_yrs	F_500_649_65_74_yrs	F_500_649_75_84_yrs	F_500_649_85ov	F_500_649_Tot	F_650_799_15_19_yrs	F_650_799_20_24_yrs	F_650_799_25_34_yrs	F_650_799_35_44_yrs	F_650_799_45_54_yrs	F_650_799_55_64_yrs	F_650_799_65_74_yrs	F_650_799_75_84_yrs	F_650_799_85ov	F_650_799_Tot	F_800_999_15_19_yrs	F_800_999_20_24_yrs	F_800_999_25_34_yrs	F_800_999_35_44_yrs	F_800_999_45_54_yrs	F_800_999_55_64_yrs	F_800_999_65_74_yrs	F_800_999_75_84_yrs	F_800_999_85ov	F_800_999_Tot	F_1000_1249_15_19_yrs	F_1000_1249_20_24_yrs	F_1000_1249_25_34_yrs	F_1000_1249_35_44_yrs	F_1000_1249_45_54_yrs	F_1000_1249_55_64_yrs	F_1000_1249_65_74_yrs	F_1000_1249_75_84_yrs	F_1000_1249_85ov	F_1000_1249_Tot	F_1250_1499_15_19_yrs	F_1250_1499_20_24_yrs	F_1250_1499_25_34_yrs	F_1250_1499_35_44_yrs	F_1250_1499_45_54_yrs	F_1250_1499_55_64_yrs	F_1250_1499_65_74_yrs	F_1250_1499_75_84_yrs	F_1250_1499_85ov	F_1250_1499_Tot	F_1500_1749_15_19_yrs	F_1500_1749_20_24_yrs	F_1500_1749_25_34_yrs	F_1500_1749_35_44_yrs	F_1500_1749_45_54_yrs	F_1500_1749_55_64_yrs	F_1500_1749_65_74_yrs	F_1500_1749_75_84_yrs	F_1500_1749_85ov	F_1500_1749_Tot	F_1750_1999_15_19_yrs	F_1750_1999_20_24_yrs	F_1750_1999_25_34_yrs	F_1750_1999_35_44_yrs	F_1750_1999_45_54_yrs	F_1750_1999_55_64_yrs	F_1750_1999_65_74_yrs	F_1750_1999_75_84_yrs	F_1750_1999_85ov	F_1750_1999_Tot	F_2000_2999_15_19_yrs	F_2000_2999_20_24_yrs	F_2000_2999_25_34_yrs	F_2000_2999_35_44_yrs	F_2000_2999_45_54_yrs	F_2000_2999_55_64_yrs	F_2000_2999_65_74_yrs	F_2000_2999_75_84_yrs	F_2000_2999_85ov	F_2000_2999_Tot	F_3000_more_15_19_yrs	F_3000_more_20_24_yrs	F_3000_more_25_34_yrs	F_3000_more_35_44_yrs	F_3000_more_45_54_yrs	F_3000_more_55_64_yrs	F_3000_more_65_74_yrs	F_3000_more_75_84_yrs	F_3000_more_85ov	F_3000_more_Tot	F_PI_NS_15_19_yrs	F_PI_NS_ns_20_24_yrs	F_PI_NS_ns_25_34_yrs	F_PI_NS_ns_35_44_yrs	F_PI_NS_ns_45_54_yrs	F_PI_NS_ns_55_64_yrs	F_PI_NS_ns_65_74_yrs	F_PI_NS_ns_75_84_yrs	F_PI_NS_ns_85_yrs_ovr	F_PI_NS_ns_Tot	F_Tot_15_19_yrs	F_Tot_20_24_yrs	F_Tot_25_34_yrs	F_Tot_35_44_yrs	F_Tot_45_54_yrs	F_Tot_55_64_yrs	F_Tot_65_74_yrs	F_Tot_75_84_yrs	F_Tot_85ov	F_Tot_Tot	P_Neg_Nil_income_15_19_yrs	P_Neg_Nil_income_20_24_yrs	P_Neg_Nil_income_25_34_yrs	P_Neg_Nil_income_35_44_yrs	P_Neg_Nil_income_45_54_yrs	P_Neg_Nil_income_55_64_yrs	P_Neg_Nil_income_65_74_yrs	P_Neg_Nil_income_75_84_yrs	P_Negtve_Nil_incme_85_yrs_ovr	P_Neg_Nil_income_Tot	P_1_149_15_19_yrs	P_1_149_20_24_yrs	P_1_149_25_34_yrs	P_1_149_35_44_yrs	P_1_149_45_54_yrs	P_1_149_55_64_yrs	P_1_149_65_74_yrs	P_1_149_75_84_yrs	P_1_149_85ov	P_1_149_Tot	P_150_299_15_19_yrs	P_150_299_20_24_yrs	P_150_299_25_34_yrs	P_150_299_35_44_yrs	P_150_299_45_54_yrs	P_150_299_55_64_yrs	P_150_299_65_74_yrs	P_150_299_75_84_yrs	P_150_299_85ov	P_150_299_Tot	P_300_399_15_19_yrs	P_300_399_20_24_yrs	P_300_399_25_34_yrs	P_300_399_35_44_yrs	P_300_399_45_54_yrs	P_300_399_55_64_yrs	P_300_399_65_74_yrs	P_300_399_75_84_yrs	P_300_399_85ov	P_300_399_Tot	P_400_499_15_19_yrs	P_400_499_20_24_yrs	P_400_499_25_34_yrs	P_400_499_35_44_yrs	P_400_499_45_54_yrs	P_400_499_55_64_yrs	P_400_499_65_74_yrs	P_400_499_75_84_yrs	P_400_499_85ov	P_400_499_Tot	P_500_649_15_19_yrs	P_500_649_20_24_yrs	P_500_649_25_34_yrs	P_500_649_35_44_yrs	P_500_649_45_54_yrs	P_500_649_55_64_yrs	P_500_649_65_74_yrs	P_500_649_75_84_yrs	P_500_649_85ov	P_500_649_Tot	P_650_799_15_19_yrs	P_650_799_20_24_yrs	P_650_799_25_34_yrs	P_650_799_35_44_yrs	P_650_799_45_54_yrs	P_650_799_55_64_yrs	P_650_799_65_74_yrs	P_650_799_75_84_yrs	P_650_799_85ov	P_650_799_Tot	P_800_999_15_19_yrs	P_800_999_20_24_yrs	P_800_999_25_34_yrs	P_800_999_35_44_yrs	P_800_999_45_54_yrs	P_800_999_55_64_yrs	P_800_999_65_74_yrs	P_800_999_75_84_yrs	P_800_999_85ov	P_800_999_Tot

STE_CODE_2016	F_400_499_15_19_yrs	F_400_499_20_24_yrs	F_400_499_25_34_yrs	F_400_499_35_44_yrs	F_400_499_45_54_yrs	F_400_499_55_64_yrs	F_400_499_65_74_yrs	F_400_499_75_84_yrs	F_400_499_85ov	F_400_499_Tot	F_500_649_15_19_yrs	F_500_649_20_24_yrs	F_500_649_25_34_yrs	F_500_649_35_44_yrs	F_500_649_45_54_yrs	F_500_649_55_64_yrs	F_500_649_65_74_yrs	F_500_649_75_84_yrs	F_500_649_85ov	F_500_649_Tot	F_650_799_15_19_yrs	F_650_799_20_24_yrs	F_650_799_25_34_yrs	F_650_799_35_44_yrs	F_650_799_45_54_yrs	F_650_799_55_64_yrs	F_650_799_65_74_yrs	F_650_799_75_84_yrs	F_650_799_85ov	F_650_799_Tot	F_800_999_15_19_yrs	F_800_999_20_24_yrs	F_800_999_25_34_yrs	F_800_999_35_44_yrs	F_800_999_45_54_yrs	F_800_999_55_64_yrs	F_800_999_65_74_yrs	F_800_999_75_84_yrs	F_800_999_85ov	F_800_999_Tot	F_1000_1249_15_19_yrs	F_1000_1249_20_24_yrs	F_1000_1249_25_34_yrs	F_1000_1249_35_44_yrs	F_1000_1249_45_54_yrs	F_1000_1249_55_64_yrs	F_1000_1249_65_74_yrs	F_1000_1249_75_84_yrs	F_1000_1249_85ov	F_1000_1249_Tot	F_1250_1499_15_19_yrs	F_1250_1499_20_24_yrs	F_1250_1499_25_34_yrs	F_1250_1499_35_44_yrs	F_1250_1499_45_54_yrs	F_1250_1499_55_64_yrs	F_1250_1499_65_74_yrs	F_1250_1499_75_84_yrs	F_1250_1499_85ov	F_1250_1499_Tot	F_1500_1749_15_19_yrs	F_1500_1749_20_24_yrs	F_1500_1749_25_34_yrs	F_1500_1749_35_44_yrs	F_1500_1749_45_54_yrs	F_1500_1749_55_64_yrs	F_1500_1749_65_74_yrs	F_1500_1749_75_84_yrs	F_1500_1749_85ov	F_1500_1749_Tot	F_1750_1999_15_19_yrs	F_1750_1999_20_24_yrs	F_1750_1999_25_34_yrs	F_1750_1999_35_44_yrs	F_1750_1999_45_54_yrs	F_1750_1999_55_64_yrs	F_1750_1999_65_74_yrs	F_1750_1999_75_84_yrs	F_1750_1999_85ov	F_1750_1999_Tot	F_2000_2999_15_19_yrs	F_2000_2999_20_24_yrs	F_2000_2999_25_34_yrs	F_2000_2999_35_44_yrs	F_2000_2999_45_54_yrs	F_2000_2999_55_64_yrs	F_2000_2999_65_74_yrs	F_2000_2999_75_84_yrs	F_2000_2999_85ov	F_2000_2999_Tot	F_3000_more_15_19_yrs	F_3000_more_20_24_yrs	F_3000_more_25_34_yrs	F_3000_more_35_44_yrs	F_3000_more_45_54_yrs	F_3000_more_55_64_yrs	F_3000_more_65_74_yrs	F_3000_more_75_84_yrs	F_3000_more_85ov	F_3000_more_Tot	F_PI_NS_15_19_yrs	F_PI_NS_ns_20_24_yrs	F_PI_NS_ns_25_34_yrs	F_PI_NS_ns_35_44_yrs	F_PI_NS_ns_45_54_yrs	F_PI_NS_ns_55_64_yrs	F_PI_NS_ns_65_74_yrs	F_PI_NS_ns_75_84_yrs	F_PI_NS_ns_85_yrs_ovr	F_PI_NS_ns_Tot	F_Tot_15_19_yrs	F_Tot_20_24_yrs	F_Tot_25_34_yrs	F_Tot_35_44_yrs	F_Tot_45_54_yrs	F_Tot_55_64_yrs	F_Tot_65_74_yrs	F_Tot_75_84_yrs	F_Tot_85ov	F_Tot_Tot	P_Neg_Nil_income_15_19_yrs	P_Neg_Nil_income_20_24_yrs	P_Neg_Nil_income_25_34_yrs	P_Neg_Nil_income_35_44_yrs	P_Neg_Nil_income_45_54_yrs	P_Neg_Nil_income_55_64_yrs	P_Neg_Nil_income_65_74_yrs	P_Neg_Nil_income_75_84_yrs	P_Negtve_Nil_incme_85_yrs_ovr	P_Neg_Nil_income_Tot	P_1_149_15_19_yrs	P_1_149_20_24_yrs	P_1_149_25_34_yrs	P_1_149_35_44_yrs	P_1_149_45_54_yrs	P_1_149_55_64_yrs	P_1_149_65_74_yrs	P_1_149_75_84_yrs	P_1_149_85ov	P_1_149_Tot	P_150_299_15_19_yrs	P_150_299_20_24_yrs	P_150_299_25_34_yrs	P_150_299_35_44_yrs	P_150_299_45_54_yrs	P_150_299_55_64_yrs	P_150_299_65_74_yrs	P_150_299_75_84_yrs	P_150_299_85ov	P_150_299_Tot	P_300_399_15_19_yrs	P_300_399_20_24_yrs	P_300_399_25_34_yrs	P_300_399_35_44_yrs	P_300_399_45_54_yrs	P_300_399_55_64_yrs	P_300_399_65_74_yrs	P_300_399_75_84_yrs	P_300_399_85ov	P_300_399_Tot	P_400_499_15_19_yrs	P_400_499_20_24_yrs	P_400_499_25_34_yrs	P_400_499_35_44_yrs	P_400_499_45_54_yrs	P_400_499_55_64_yrs	P_400_499_65_74_yrs	P_400_499_75_84_yrs	P_400_499_85ov	P_400_499_Tot	P_500_649_15_19_yrs	P_500_649_20_24_yrs	P_500_649_25_34_yrs	P_500_649_35_44_yrs	P_500_649_45_54_yrs	P_500_649_55_64_yrs	P_500_649_65_74_yrs	P_500_649_75_84_yrs	P_500_649_85ov	P_500_649_Tot	P_650_799_15_19_yrs	P_650_799_20_24_yrs	P_650_799_25_34_yrs	P_650_799_35_44_yrs	P_650_799_45_54_yrs	P_650_799_55_64_yrs	P_650_799_65_74_yrs	P_650_799_75_84_yrs	P_650_799_85ov	P_650_799_Tot	P_800_999_15_19_yrs	P_800_999_20_24_yrs	P_800_999_25_34_yrs	P_800_999_35_44_yrs	P_800_999_45_54_yrs	P_800_999_55_64_yrs	P_800_999_65_74_yrs	P_800_999_75_84_yrs	P_800_999_85ov	P_800_999_Tot
2	4020	17474	26607	26466	29789	31568	47499	37154	21386	241956	3205	20235	37882	36319	37225	31226	28445	14558	8426	217522	1810	20111	42785	37279	38889	29372	17729	7881	3589	199446	867	17972	49988	37838	39970	28577	12049	4675	2420	194360	392	11564	54077	36905	37977	26801	9151	3324	1761	181940	96	3658	38495	27640	25747	16926	4799	1356	740	119460	71	1125	24977	23872	21781	13900	3405	1079	577	90793	40	395	11895	15600	14463	9176	2189	731	364	54847	63	328	13073	21020	17329	9786	2806	1014	525	65948	183	375	3511	11690	11238	5823	2728	1318	691	37568	15667	15527	34151	28009	28119	27218	26679	22253	18139	215766	174492	204065	452031	409936	401040	349886	264772	155554	80427	2492203	165978	63007	68500	51179	45422	56016	28406	11651	4892	495052	84384	32685	19394	19436	17858	20896	14999	6416	2440	218506	32505	52526	47141	40596	40273	51755	48162	28014	8996	349958	12023	34790	39718	36611	41179	54351	98309	67688	22219	406878	10653	34240	44035	39330	45405	51060	79246	57407	29935	391308	8450	40552	61664	52138	54212	51004	52981	27216	12610	360838	4702	42035	80835	62373	63652	53390	36103	16532	6478	366105	2466	38808	106298	76216	77055	60313	28575	10077	4407	404215

2016Census_G17C_VIC_STE.csv

STE_CODE_2016	P_1000_1249_15_19_yrs	P_1000_1249_20_24_yrs	P_1000_1249_25_34_yrs	P_1000_1249_35_44_yrs	P_1000_1249_45_54_yrs	P_1000_1249_55_64_yrs	P_1000_1249_65_74_yrs	P_1000_1249_75_84_yrs	P_1000_1249_85ov	P_1000_1249_Tot	P_1250_1499_15_19_yrs	P_1250_1499_20_24_yrs	P_1250_1499_25_34_yrs	P_1250_1499_35_44_yrs	P_1250_1499_45_54_yrs	P_1250_1499_55_64_yrs	P_1250_1499_65_74_yrs	P_1250_1499_75_84_yrs	P_1250_1499_85ov	P_1250_1499_Tot	P_1500_1749_15_19_yrs	P_1500_1749_20_24_yrs	P_1500_1749_25_34_yrs	P_1500_1749_35_44_yrs	P_1500_1749_45_54_yrs	P_1500_1749_55_64_yrs	P_1500_1749_65_74_yrs	P_1500_1749_75_84_yrs	P_1500_1749_85ov	P_1500_1749_Tot	P_1750_1999_15_19_yrs	P_1750_1999_20_24_yrs	P_1750_1999_25_34_yrs	P_1750_1999_35_44_yrs	P_1750_1999_45_54_yrs	P_1750_1999_55_64_yrs	P_1750_1999_65_74_yrs	P_1750_1999_75_84_yrs	P_1750_1999_85ov	P_1750_1999_Tot	P_2000_2999_15_19_yrs	P_2000_2999_20_24_yrs	P_2000_2999_25_34_yrs	P_2000_2999_35_44_yrs	P_2000_2999_45_54_yrs	P_2000_2999_55_64_yrs	P_2000_2999_65_74_yrs	P_2000_2999_75_84_yrs	P_2000_2999_85ov	P_2000_2999_Tot	P_3000_more_15_19_yrs	P_3000_more_20_24_yrs	P_3000_more_25_34_yrs	P_3000_more_35_44_yrs	P_3000_more_45_54_yrs	P_3000_more_55_64_yrs	P_3000_more_65_74_yrs	P_3000_more_75_84_yrs	P_3000_more_85ov	P_3000_more_Tot	P_PI_NS_15_19_yrs	P_PI_NS_ns_20_24_yrs	P_PI_NS_ns_25_34_yrs	P_PI_NS_ns_35_44_yrs	P_PI_NS_ns_45_54_yrs	P_PI_NS_ns_55_64_yrs	P_PI_NS_ns_65_74_yrs	P_PI_NS_ns_75_84_yrs	P_PI_NS_ns_85_yrs_ovr	P_PI_NS_ns_Tot	P_Tot_15_19_yrs	P_Tot_20_24_yrs	P_Tot_25_34_yrs	P_Tot_35_44_yrs	P_Tot_45_54_yrs	P_Tot_55_64_yrs	P_Tot_65_74_yrs	P_Tot_75_84_yrs	P_Tot_85ov	P_Tot_Tot

STE_CODE_2016	P_1000_1249_15_19_yrs	P_1000_1249_20_24_yrs	P_1000_1249_25_34_yrs	P_1000_1249_35_44_yrs	P_1000_1249_45_54_yrs	P_1000_1249_55_64_yrs	P_1000_1249_65_74_yrs	P_1000_1249_75_84_yrs	P_1000_1249_85ov	P_1000_1249_Tot	P_1250_1499_15_19_yrs	P_1250_1499_20_24_yrs	P_1250_1499_25_34_yrs	P_1250_1499_35_44_yrs	P_1250_1499_45_54_yrs	P_1250_1499_55_64_yrs	P_1250_1499_65_74_yrs	P_1250_1499_75_84_yrs	P_1250_1499_85ov	P_1250_1499_Tot	P_1500_1749_15_19_yrs	P_1500_1749_20_24_yrs	P_1500_1749_25_34_yrs	P_1500_1749_35_44_yrs	P_1500_1749_45_54_yrs	P_1500_1749_55_64_yrs	P_1500_1749_65_74_yrs	P_1500_1749_75_84_yrs	P_1500_1749_85ov	P_1500_1749_Tot	P_1750_1999_15_19_yrs	P_1750_1999_20_24_yrs	P_1750_1999_25_34_yrs	P_1750_1999_35_44_yrs	P_1750_1999_45_54_yrs	P_1750_1999_55_64_yrs	P_1750_1999_65_74_yrs	P_1750_1999_75_84_yrs	P_1750_1999_85ov	P_1750_1999_Tot	P_2000_2999_15_19_yrs	P_2000_2999_20_24_yrs	P_2000_2999_25_34_yrs	P_2000_2999_35_44_yrs	P_2000_2999_45_54_yrs	P_2000_2999_55_64_yrs	P_2000_2999_65_74_yrs	P_2000_2999_75_84_yrs	P_2000_2999_85ov	P_2000_2999_Tot	P_3000_more_15_19_yrs	P_3000_more_20_24_yrs	P_3000_more_25_34_yrs	P_3000_more_35_44_yrs	P_3000_more_45_54_yrs	P_3000_more_55_64_yrs	P_3000_more_65_74_yrs	P_3000_more_75_84_yrs	P_3000_more_85ov	P_3000_more_Tot	P_PI_NS_15_19_yrs	P_PI_NS_ns_20_24_yrs	P_PI_NS_ns_25_34_yrs	P_PI_NS_ns_35_44_yrs	P_PI_NS_ns_45_54_yrs	P_PI_NS_ns_55_64_yrs	P_PI_NS_ns_65_74_yrs	P_PI_NS_ns_75_84_yrs	P_PI_NS_ns_85_yrs_ovr	P_PI_NS_ns_Tot	P_Tot_15_19_yrs	P_Tot_20_24_yrs	P_Tot_25_34_yrs	P_Tot_35_44_yrs	P_Tot_45_54_yrs	P_Tot_55_64_yrs	P_Tot_65_74_yrs	P_Tot_75_84_yrs	P_Tot_85ov	P_Tot_Tot
2	1061	25642	117956	84132	81324	61821	24164	7657	3287	407041	311	9424	84206	65993	58799	41049	13420	3469	1422	278098	210	3720	59880	60349	53396	35718	9803	2624	1115	226824	103	1480	33544	44600	39237	25306	6403	1687	741	153095	174	1279	41788	70485	59071	35105	9266	2575	1061	220801	382	1044	13185	43637	45438	26071	9480	3222	1417	143877	32924	32554	71062	58843	58102	53603	50289	38765	26966	423108	356340	413792	889190	805920	780420	677453	509599	285006	127993	4845710

15/39

Table G17

There are few things to note:

There are 201 columns in G17A and G17B and 81 columns in G17C.
Perhaps there is an export limitation for a data that contains more than 200 columns, thus it is broken up into different csv files.
Which means that you have to join the tables G17A, G17B and G17C as one (you'll do this in the tutorial ).

But what does the data show?

16/39

What is Tidy Data?

Tidy Data Principles

Each variable must have its own column
Each observation must have its own row
Each value must have its own cell

Wickham (2014) Tidy Data. Journal of Statistical Software 59

17/39

What is Tidy Data?

Tidy Data Principles

Each variable must have its own column
Each observation must have its own row
Each value must have its own cell

So what about the ABS 2016 Census Data?

The table header in fact contains information!
E.g. F_400_499_15_19_yrs is female aged 15-19 years old who earn $400-499 per week (in Victoria).
The number in the cells are the counts.
Is the data tidy?

Wickham (2014) Tidy Data. Journal of Statistical Software 59

17/39

Tidying the ABS 2016 Census Data

Ideally we want the data to look like:

age_min	age_max	gender	income_min	income_max	count
15	19	female	400	499	4020

You can include other information, e.g. geography code (useful if combining with other geographical area) or average age/income.
Note that some don't have upper bounds, e.g. . In R, -Inf and Inf are used to represent $-\infty$ and $\infty$ , respectively.
You'll wrangle the data into the tidy form in tutorial

18/39

Manipulating strings19/39

Manipulating strings

The stringr package is powered by the stringi package which in turn uses the ICU C library to provide fast peformance for string manipulation

library(tidyverse) # includes `stringr`

Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0.

Gagolewski M. and others (2020). R package stringi: Character string processing facilities.

20/39

Manipulating strings

The stringr package is powered by the stringi package which in turn uses the ICU C library to provide fast peformance for string manipulation

library(tidyverse) # includes `stringr`

Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0.

Gagolewski M. and others (2020). R package stringi: Character string processing facilities.

Main functions in stringr prefix with str_ (stringi prefix with stri_) and the first argument is string (or a vector of strings)

20/39

Manipulating strings

The stringr package is powered by the stringi package which in turn uses the ICU C library to provide fast peformance for string manipulation

library(tidyverse) # includes `stringr`

Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0.

Gagolewski M. and others (2020). R package stringi: Character string processing facilities.

Main functions in stringr prefix with str_ (stringi prefix with stri_) and the first argument is string (or a vector of strings)
What do you think str_trim and str_squish do?

str_trim(c("    Apple ", "  Goji    Berry     "))

## [1] "Apple"         "Goji    Berry"

str_squish(c("    Apple ", "  Goji    Berry     "))

## [1] "Apple"      "Goji Berry"

20/39

Base R and `stringr`

Base R	stringr
gregexpr(pattern, x)	str_locate_all(x, pattern)
grep(pattern, x, value = TRUE)	str_subset(x, pattern)
grep(pattern, x)	str_which(x, pattern)
grepl(pattern, x)	str_detect(x, pattern)
gsub(pattern, replacement, x)	str_replace_all(x, pattern, replacement)
nchar(x)	str_length(x)
order(x)	str_order(x)
regexec(pattern, x) + regmatches()	str_match(x, pattern)
regexpr(pattern, x) + regmatches()	str_extract(x, pattern)
regexpr(pattern, x)	str_locate(x, pattern)

Previous1 2Next

See more at https://stringr.tidyverse.org/articles/from-base.html

21/39

Why use stringr?There are a number of considerations to ensure there is consistency in syntax and user expectation (both for input and output)
22/39

Why use stringr?There are a number of considerations to ensure there is consistency in syntax and user expectation (both for input and output)
For example, let's consider combining multiple strings into one.
22/39

Why use `stringr`?

There are a number of considerations to ensure there is consistency in syntax and user expectation (both for input and output)

For example, let's consider combining multiple strings into one.

Base R

paste0("Area", "1", c("A", "B"))

## [1] "Area1A" "Area1B"

stringr

str_c("Area", "1", c("A", "B"))

## [1] "Area1A" "Area1B"

22/39

Why use `stringr`?

There are a number of considerations to ensure there is consistency in syntax and user expectation (both for input and output)

For example, let's consider combining multiple strings into one.

Base R

paste0("Area", "1", c("A", "B"))

## [1] "Area1A" "Area1B"

paste0("Area", "1", c("A", NA, "C"))

stringr

str_c("Area", "1", c("A", "B"))

## [1] "Area1A" "Area1B"

str_c("Area", "1", c("A", NA, "C"))

22/39

Why use `stringr`?

There are a number of considerations to ensure there is consistency in syntax and user expectation (both for input and output)

For example, let's consider combining multiple strings into one.

Base R

paste0("Area", "1", c("A", "B"))

## [1] "Area1A" "Area1B"

paste0("Area", "1", c("A", NA, "C"))

## [1] "Area1A"  "Area1NA" "Area1C"

stringr

str_c("Area", "1", c("A", "B"))

## [1] "Area1A" "Area1B"

str_c("Area", "1", c("A", NA, "C"))

## [1] "Area1A" NA       "Area1C"

If the Base R result is preferable then NA can be replaced with character with str_replace_na("A", NA, "C") first

22/39

Case study Aussie Local Government Area

LGA <- ozmaps::abs_lga %>% pull(NAME)
LGA[1:7]

## [1] "Broken Hill (C)" "Waroona (S)"     "Toowoomba (R)"   "West Arthur (S)"
## [5] "Moreton Bay (R)" "Etheridge (S)"   "Cleve (DC)"

C = Cities	A = Areas	RC = Rural Cities
B = Boroughs	S = Shires	DC = District Councils
M = Municipalities	T = Towns	AC = Aboriginal Councils
RegC = Regional Councils

🎯 Extract the LGA status from the LGA names

Michael Sumner (2020). ozmaps: Australia Maps. R package version 0.3.6.

23/39

Case study Aussie Local Government Area

LGA <- ozmaps::abs_lga %>% pull(NAME)
LGA[1:7]

## [1] "Broken Hill (C)" "Waroona (S)"     "Toowoomba (R)"   "West Arthur (S)"
## [5] "Moreton Bay (R)" "Etheridge (S)"   "Cleve (DC)"

C = Cities	A = Areas	RC = Rural Cities
B = Boroughs	S = Shires	DC = District Councils
M = Municipalities	T = Towns	AC = Aboriginal Councils
RegC = Regional Councils

🎯 Extract the LGA status from the LGA names

How?

Michael Sumner (2020). ozmaps: Australia Maps. R package version 0.3.6.

23/39

Extracting the string

str_extract(LGA, "\\(.+\\)")

##   [1] "(C)"        "(S)"        "(R)"        "(S)"        "(R)"       
##   [6] "(S)"        "(DC)"       "(R)"        "(DC)"       "(C)"       
##  [11] "(DC)"       "(S)"        "(S)"        "(S)"        "(DC)"      
##  [16] "(A)"        "(C)"        "(A)"        "(T)"        "(RC)"      
##  [21] "(A)"        "(S)"        "(S)"        "(S)"        "(C)"       
##  [26] "(DC)"       "(R)"        "(A)"        "(C)"        "(DC)"      
##  [31] "(S)"        "(S)"        "(A)"        "(S)"        "(S)"       
##  [36] "(R)"        "(M)"        "(A)"        "(C)"        "(S)"       
##  [41] "(S)"        "(C)"        "(A)"        "(S)"        "(C)"       
##  [46] "(AC)"       "(A)"        "(S)"        "(A)"        "(C)"       
##  [51] "(A)"        "(R)"        "(S)"        "(T)"        "(C)"       
##  [56] "(S)"        "(S)"        "(R)"        "(C)"        "(T)"       
##  [61] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
##  [66] "(C)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
##  [71] "(R)"        "(R)"        "(S)"        "(B)"        "(DC)"      
##  [76] "(M)"        "(A)"        "(C)"        "(S)"        "(S)"       
##  [81] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
##  [86] "(C)"        "(A)"        "(C)"        "(A)"        "(S)"       
##  [91] "(C)"        "(A)"        "(S)"        "(S)"        "(S)"       
##  [96] "(S)"        "(DC)"       "(S)"        "(S)"        "(S)"       
## [101] "(C)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [106] "(C)"        "(S)"        "(DC)"       "(C)"        "(C)"       
## [111] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [116] "(S)"        "(A)"        "(DC)"       "(S)"        "(A)"       
## [121] "(C)"        "(A)"        "(S)"        "(A)"        "(DC)"      
## [126] "(S)"        "(C)"        "(S)"        "(A)"        "(S)"       
## [131] "(M)"        "(S)"        "(DC)"       "(R)"        "(C)"       
## [136] "(C)"        "(S)"        "(C)"        "(S)"        "(T)"       
## [141] "(S)"        "(S)"        "(DC)"       "(S)"        "(T)"       
## [146] "(C)"        "(S)"        "(M)"        "(S)"        "(DC)"      
## [151] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [156] "(C)"        "(C)"        "(R)"        "(S)"        "(C)"       
## [161] "(C)"        "(R)"        "(S)"        "(C)"        "(A)"       
## [166] "(T)"        "(S)"        "(RC)"       "(C)"        "(A)"       
## [171] "(A)"        "(A)"        "(S)"        "(A)"        "(S)"       
## [176] "(S)"        "(T)"        "(S)"        "(S)"        "(S)"       
## [181] "(A)"        "(DC)"       "(M)"        "(C)"        "(S)"       
## [186] "(A)"        "(T)"        "(A)"        "(C)"        "(S)"       
## [191] "(C)"        "(R)"        "(C)"        "(S)"        "(S)"       
## [196] "(S)"        "(S)"        "(R)"        "(C)"        "(DC)"      
## [201] "(A)"        "(DC)"       "(R)"        "(C)"        "(S)"       
## [206] "(S)"        "(C)"        "(C)"        "(R)"        "(S)"       
## [211] "(S)"        "(C)"        "(A)"        "(S)"        "(S)"       
## [216] "(C)"        "(DC)"       "(S)"        "(M) (Tas.)" "(M) (Tas.)"
## [221] "(C) (Vic.)" "(C) (Vic.)" "(S)"        "(DC)"       "(S)"       
## [226] "(RC)"       "(S)"        "(DC)"       "(S)"        "(S)"       
## [231] "(R)"        "(S)"        "(A)"        "(C)"        "(C)"       
## [236] "(A)"        "(A)"        "(RC)"       "(S)"        "(C)"       
## [241] "(S)"        "(S)"        "(S)"        "(C)"        "(C)"       
## [246] "(S)"        "(C)"        "(C)"        "(C)"        "(A)"       
## [251] "(C)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [256] "(S)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [261] "(A)"        "(A)"        "(S)"        "(S)"        "(C)"       
## [266] "(A)"        "(M)"        "(S)"        "(S)"        "(C)"       
## [271] "(R)"        "(S)"        "(R)"        "(DC)"       "(R)"       
## [276] "(C)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [281] "(A)"        "(R)"        "(DC)"       "(A)"        "(C)"       
## [286] "(A)"        "(S)"        "(S)"        "(A)"        "(C)"       
## [291] "(C)"        "(A)"        "(T)"        "(S)"        "(C)"       
## [296] "(A)"        "(A)"        "(S)"        "(S)"        "(T)"       
## [301] "(C)"        "(A)"        "(A)"        "(DC)"       "(A)"       
## [306] "(C)"        "(M)"        "(M)"        "(S)"        "(A)"       
## [311] "(A)"        "(C)"        "(C)"        "(S)"        "(DC)"      
## [316] "(S)"        "(C)"        "(S)"        "(S)"        "(DC)"      
## [321] "(RegC)"     "(C)"        "(S)"        "(S)"        NA          
## [326] "(A)"        "(S)"        "(A)"        "(S)"        "(A)"       
## [331] "(S)"        "(C)"        "(R)"        "(C)"        "(S)"       
## [336] "(A)"        "(DC)"       "(S)"        "(A)"        "(R)"       
## [341] "(S)"        "(S)"        "(RC)"       "(T)"        "(A)"       
## [346] "(M)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [351] "(S)"        "(A)"        "(RC)"       "(S)"        "(A)"       
## [356] "(R)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [361] "(DC)"       "(M)"        "(M)"        "(AC)"       "(DC)"      
## [366] "(A)"        "(A)"        "(S)"        "(S)"        "(A)"       
## [371] "(C)"        "(S)"        "(S)"        "(C)"        "(R)"       
## [376] "(S)"        "(S)"        NA           "(A)"        "(T)"       
## [381] "(S)"        "(A)"        "(C)"        "(C)"        "(A)"       
## [386] "(C)"        "(DC)"       "(C)"        "(A)"        "(A)"       
## [391] "(A)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
## [396] "(M)"        "(R)"        "(DC)"       "(C)"        "(S)"       
## [401] "(S)"        "(C)"        "(C)"        "(C)"        "(C)"       
## [406] "(C)"        "(S)"        "(A)"        NA           "(S)"       
## [411] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [416] "(S)"        NA           "(C)"        "(S)"        "(C)"       
## [421] "(DC)"       "(S)"        "(C)"        "(S)"        "(C)"       
## [426] "(M)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [431] "(C)"        "(S)"        "(S)"        "(S)"        "(A)"       
## [436] "(A)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [441] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
## [446] "(C) (NSW)"  "(S) (Qld)"  "(R) (Qld)"  "(DC) (SA)"  "(C) (SA)"  
## [451] "(M) (Tas.)" "(M) (Tas.)" "(C)"        "(R)"        "(M)"       
## [456] "(C)"        "(R)"        "(S)"        "(RC)"       "(S)"       
## [461] "(M)"        "(C)"        "(R)"        "(C)"        "(DC)"      
## [466] "(C)"        "(C)"        "(M)"        "(C)"        "(S)"       
## [471] "(C)"        "(DC)"       "(M)"        "(S)"        "(C)"       
## [476] "(C)"        "(A)"        "(DC)"       "(R)"        "(C)"       
## [481] "(C)"        "(A)"        "(M)"        "(C)"        "(C)"       
## [486] "(S)"        "(S)"        "(S)"        "(A)"        "(R)"       
## [491] "(M)"        "(A)"        "(R)"        "(A)"        "(A)"       
## [496] "(R)"        "(R)"        "(R)"        "(S)"        "(C)"       
## [501] "(C)"        "(S)"        "(A)"        "(S)"        "(M)"       
## [506] "(M)"        "(S)"        "(A)"        "(A)"        "(S)"       
## [511] "(A)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [516] NA           "(A)"        NA           "(R)"        "(C)"       
## [521] "(S)"        "(C)"        "(S)"        "(A)"        "(A)"       
## [526] "(A)"        "(A)"        "(C)"        "(A)"        "(A)"       
## [531] "(A)"        "(A)"        "(C) (NSW)"  "(A)"        "(C)"       
## [536] "(R)"        "(S)"        "(A)"        "(R)"        "(C)"       
## [541] "(A)"        "(S)"        "(A)"        "(A)"

24/39

Extracting the string

str_extract(LGA, "\\(.+\\)")

##   [1] "(C)"        "(S)"        "(R)"        "(S)"        "(R)"       
##   [6] "(S)"        "(DC)"       "(R)"        "(DC)"       "(C)"       
##  [11] "(DC)"       "(S)"        "(S)"        "(S)"        "(DC)"      
##  [16] "(A)"        "(C)"        "(A)"        "(T)"        "(RC)"      
##  [21] "(A)"        "(S)"        "(S)"        "(S)"        "(C)"       
##  [26] "(DC)"       "(R)"        "(A)"        "(C)"        "(DC)"      
##  [31] "(S)"        "(S)"        "(A)"        "(S)"        "(S)"       
##  [36] "(R)"        "(M)"        "(A)"        "(C)"        "(S)"       
##  [41] "(S)"        "(C)"        "(A)"        "(S)"        "(C)"       
##  [46] "(AC)"       "(A)"        "(S)"        "(A)"        "(C)"       
##  [51] "(A)"        "(R)"        "(S)"        "(T)"        "(C)"       
##  [56] "(S)"        "(S)"        "(R)"        "(C)"        "(T)"       
##  [61] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
##  [66] "(C)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
##  [71] "(R)"        "(R)"        "(S)"        "(B)"        "(DC)"      
##  [76] "(M)"        "(A)"        "(C)"        "(S)"        "(S)"       
##  [81] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
##  [86] "(C)"        "(A)"        "(C)"        "(A)"        "(S)"       
##  [91] "(C)"        "(A)"        "(S)"        "(S)"        "(S)"       
##  [96] "(S)"        "(DC)"       "(S)"        "(S)"        "(S)"       
## [101] "(C)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [106] "(C)"        "(S)"        "(DC)"       "(C)"        "(C)"       
## [111] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [116] "(S)"        "(A)"        "(DC)"       "(S)"        "(A)"       
## [121] "(C)"        "(A)"        "(S)"        "(A)"        "(DC)"      
## [126] "(S)"        "(C)"        "(S)"        "(A)"        "(S)"       
## [131] "(M)"        "(S)"        "(DC)"       "(R)"        "(C)"       
## [136] "(C)"        "(S)"        "(C)"        "(S)"        "(T)"       
## [141] "(S)"        "(S)"        "(DC)"       "(S)"        "(T)"       
## [146] "(C)"        "(S)"        "(M)"        "(S)"        "(DC)"      
## [151] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [156] "(C)"        "(C)"        "(R)"        "(S)"        "(C)"       
## [161] "(C)"        "(R)"        "(S)"        "(C)"        "(A)"       
## [166] "(T)"        "(S)"        "(RC)"       "(C)"        "(A)"       
## [171] "(A)"        "(A)"        "(S)"        "(A)"        "(S)"       
## [176] "(S)"        "(T)"        "(S)"        "(S)"        "(S)"       
## [181] "(A)"        "(DC)"       "(M)"        "(C)"        "(S)"       
## [186] "(A)"        "(T)"        "(A)"        "(C)"        "(S)"       
## [191] "(C)"        "(R)"        "(C)"        "(S)"        "(S)"       
## [196] "(S)"        "(S)"        "(R)"        "(C)"        "(DC)"      
## [201] "(A)"        "(DC)"       "(R)"        "(C)"        "(S)"       
## [206] "(S)"        "(C)"        "(C)"        "(R)"        "(S)"       
## [211] "(S)"        "(C)"        "(A)"        "(S)"        "(S)"       
## [216] "(C)"        "(DC)"       "(S)"        "(M) (Tas.)" "(M) (Tas.)"
## [221] "(C) (Vic.)" "(C) (Vic.)" "(S)"        "(DC)"       "(S)"       
## [226] "(RC)"       "(S)"        "(DC)"       "(S)"        "(S)"       
## [231] "(R)"        "(S)"        "(A)"        "(C)"        "(C)"       
## [236] "(A)"        "(A)"        "(RC)"       "(S)"        "(C)"       
## [241] "(S)"        "(S)"        "(S)"        "(C)"        "(C)"       
## [246] "(S)"        "(C)"        "(C)"        "(C)"        "(A)"       
## [251] "(C)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [256] "(S)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [261] "(A)"        "(A)"        "(S)"        "(S)"        "(C)"       
## [266] "(A)"        "(M)"        "(S)"        "(S)"        "(C)"       
## [271] "(R)"        "(S)"        "(R)"        "(DC)"       "(R)"       
## [276] "(C)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [281] "(A)"        "(R)"        "(DC)"       "(A)"        "(C)"       
## [286] "(A)"        "(S)"        "(S)"        "(A)"        "(C)"       
## [291] "(C)"        "(A)"        "(T)"        "(S)"        "(C)"       
## [296] "(A)"        "(A)"        "(S)"        "(S)"        "(T)"       
## [301] "(C)"        "(A)"        "(A)"        "(DC)"       "(A)"       
## [306] "(C)"        "(M)"        "(M)"        "(S)"        "(A)"       
## [311] "(A)"        "(C)"        "(C)"        "(S)"        "(DC)"      
## [316] "(S)"        "(C)"        "(S)"        "(S)"        "(DC)"      
## [321] "(RegC)"     "(C)"        "(S)"        "(S)"        NA          
## [326] "(A)"        "(S)"        "(A)"        "(S)"        "(A)"       
## [331] "(S)"        "(C)"        "(R)"        "(C)"        "(S)"       
## [336] "(A)"        "(DC)"       "(S)"        "(A)"        "(R)"       
## [341] "(S)"        "(S)"        "(RC)"       "(T)"        "(A)"       
## [346] "(M)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [351] "(S)"        "(A)"        "(RC)"       "(S)"        "(A)"       
## [356] "(R)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [361] "(DC)"       "(M)"        "(M)"        "(AC)"       "(DC)"      
## [366] "(A)"        "(A)"        "(S)"        "(S)"        "(A)"       
## [371] "(C)"        "(S)"        "(S)"        "(C)"        "(R)"       
## [376] "(S)"        "(S)"        NA           "(A)"        "(T)"       
## [381] "(S)"        "(A)"        "(C)"        "(C)"        "(A)"       
## [386] "(C)"        "(DC)"       "(C)"        "(A)"        "(A)"       
## [391] "(A)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
## [396] "(M)"        "(R)"        "(DC)"       "(C)"        "(S)"       
## [401] "(S)"        "(C)"        "(C)"        "(C)"        "(C)"       
## [406] "(C)"        "(S)"        "(A)"        NA           "(S)"       
## [411] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [416] "(S)"        NA           "(C)"        "(S)"        "(C)"       
## [421] "(DC)"       "(S)"        "(C)"        "(S)"        "(C)"       
## [426] "(M)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [431] "(C)"        "(S)"        "(S)"        "(S)"        "(A)"       
## [436] "(A)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [441] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
## [446] "(C) (NSW)"  "(S) (Qld)"  "(R) (Qld)"  "(DC) (SA)"  "(C) (SA)"  
## [451] "(M) (Tas.)" "(M) (Tas.)" "(C)"        "(R)"        "(M)"       
## [456] "(C)"        "(R)"        "(S)"        "(RC)"       "(S)"       
## [461] "(M)"        "(C)"        "(R)"        "(C)"        "(DC)"      
## [466] "(C)"        "(C)"        "(M)"        "(C)"        "(S)"       
## [471] "(C)"        "(DC)"       "(M)"        "(S)"        "(C)"       
## [476] "(C)"        "(A)"        "(DC)"       "(R)"        "(C)"       
## [481] "(C)"        "(A)"        "(M)"        "(C)"        "(C)"       
## [486] "(S)"        "(S)"        "(S)"        "(A)"        "(R)"       
## [491] "(M)"        "(A)"        "(R)"        "(A)"        "(A)"       
## [496] "(R)"        "(R)"        "(R)"        "(S)"        "(C)"       
## [501] "(C)"        "(S)"        "(A)"        "(S)"        "(M)"       
## [506] "(M)"        "(S)"        "(A)"        "(A)"        "(S)"       
## [511] "(A)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [516] NA           "(A)"        NA           "(R)"        "(C)"       
## [521] "(S)"        "(C)"        "(S)"        "(A)"        "(A)"       
## [526] "(A)"        "(A)"        "(C)"        "(A)"        "(A)"       
## [531] "(A)"        "(A)"        "(C) (NSW)"  "(A)"        "(C)"       
## [536] "(R)"        "(S)"        "(A)"        "(R)"        "(C)"       
## [541] "(A)"        "(S)"        "(A)"        "(A)"

What is "\$.+\$"???

24/39

Extracting the string

str_extract(LGA, "\\(.+\\)")

##   [1] "(C)"        "(S)"        "(R)"        "(S)"        "(R)"       
##   [6] "(S)"        "(DC)"       "(R)"        "(DC)"       "(C)"       
##  [11] "(DC)"       "(S)"        "(S)"        "(S)"        "(DC)"      
##  [16] "(A)"        "(C)"        "(A)"        "(T)"        "(RC)"      
##  [21] "(A)"        "(S)"        "(S)"        "(S)"        "(C)"       
##  [26] "(DC)"       "(R)"        "(A)"        "(C)"        "(DC)"      
##  [31] "(S)"        "(S)"        "(A)"        "(S)"        "(S)"       
##  [36] "(R)"        "(M)"        "(A)"        "(C)"        "(S)"       
##  [41] "(S)"        "(C)"        "(A)"        "(S)"        "(C)"       
##  [46] "(AC)"       "(A)"        "(S)"        "(A)"        "(C)"       
##  [51] "(A)"        "(R)"        "(S)"        "(T)"        "(C)"       
##  [56] "(S)"        "(S)"        "(R)"        "(C)"        "(T)"       
##  [61] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
##  [66] "(C)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
##  [71] "(R)"        "(R)"        "(S)"        "(B)"        "(DC)"      
##  [76] "(M)"        "(A)"        "(C)"        "(S)"        "(S)"       
##  [81] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
##  [86] "(C)"        "(A)"        "(C)"        "(A)"        "(S)"       
##  [91] "(C)"        "(A)"        "(S)"        "(S)"        "(S)"       
##  [96] "(S)"        "(DC)"       "(S)"        "(S)"        "(S)"       
## [101] "(C)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [106] "(C)"        "(S)"        "(DC)"       "(C)"        "(C)"       
## [111] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [116] "(S)"        "(A)"        "(DC)"       "(S)"        "(A)"       
## [121] "(C)"        "(A)"        "(S)"        "(A)"        "(DC)"      
## [126] "(S)"        "(C)"        "(S)"        "(A)"        "(S)"       
## [131] "(M)"        "(S)"        "(DC)"       "(R)"        "(C)"       
## [136] "(C)"        "(S)"        "(C)"        "(S)"        "(T)"       
## [141] "(S)"        "(S)"        "(DC)"       "(S)"        "(T)"       
## [146] "(C)"        "(S)"        "(M)"        "(S)"        "(DC)"      
## [151] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [156] "(C)"        "(C)"        "(R)"        "(S)"        "(C)"       
## [161] "(C)"        "(R)"        "(S)"        "(C)"        "(A)"       
## [166] "(T)"        "(S)"        "(RC)"       "(C)"        "(A)"       
## [171] "(A)"        "(A)"        "(S)"        "(A)"        "(S)"       
## [176] "(S)"        "(T)"        "(S)"        "(S)"        "(S)"       
## [181] "(A)"        "(DC)"       "(M)"        "(C)"        "(S)"       
## [186] "(A)"        "(T)"        "(A)"        "(C)"        "(S)"       
## [191] "(C)"        "(R)"        "(C)"        "(S)"        "(S)"       
## [196] "(S)"        "(S)"        "(R)"        "(C)"        "(DC)"      
## [201] "(A)"        "(DC)"       "(R)"        "(C)"        "(S)"       
## [206] "(S)"        "(C)"        "(C)"        "(R)"        "(S)"       
## [211] "(S)"        "(C)"        "(A)"        "(S)"        "(S)"       
## [216] "(C)"        "(DC)"       "(S)"        "(M) (Tas.)" "(M) (Tas.)"
## [221] "(C) (Vic.)" "(C) (Vic.)" "(S)"        "(DC)"       "(S)"       
## [226] "(RC)"       "(S)"        "(DC)"       "(S)"        "(S)"       
## [231] "(R)"        "(S)"        "(A)"        "(C)"        "(C)"       
## [236] "(A)"        "(A)"        "(RC)"       "(S)"        "(C)"       
## [241] "(S)"        "(S)"        "(S)"        "(C)"        "(C)"       
## [246] "(S)"        "(C)"        "(C)"        "(C)"        "(A)"       
## [251] "(C)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [256] "(S)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [261] "(A)"        "(A)"        "(S)"        "(S)"        "(C)"       
## [266] "(A)"        "(M)"        "(S)"        "(S)"        "(C)"       
## [271] "(R)"        "(S)"        "(R)"        "(DC)"       "(R)"       
## [276] "(C)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [281] "(A)"        "(R)"        "(DC)"       "(A)"        "(C)"       
## [286] "(A)"        "(S)"        "(S)"        "(A)"        "(C)"       
## [291] "(C)"        "(A)"        "(T)"        "(S)"        "(C)"       
## [296] "(A)"        "(A)"        "(S)"        "(S)"        "(T)"       
## [301] "(C)"        "(A)"        "(A)"        "(DC)"       "(A)"       
## [306] "(C)"        "(M)"        "(M)"        "(S)"        "(A)"       
## [311] "(A)"        "(C)"        "(C)"        "(S)"        "(DC)"      
## [316] "(S)"        "(C)"        "(S)"        "(S)"        "(DC)"      
## [321] "(RegC)"     "(C)"        "(S)"        "(S)"        NA          
## [326] "(A)"        "(S)"        "(A)"        "(S)"        "(A)"       
## [331] "(S)"        "(C)"        "(R)"        "(C)"        "(S)"       
## [336] "(A)"        "(DC)"       "(S)"        "(A)"        "(R)"       
## [341] "(S)"        "(S)"        "(RC)"       "(T)"        "(A)"       
## [346] "(M)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [351] "(S)"        "(A)"        "(RC)"       "(S)"        "(A)"       
## [356] "(R)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [361] "(DC)"       "(M)"        "(M)"        "(AC)"       "(DC)"      
## [366] "(A)"        "(A)"        "(S)"        "(S)"        "(A)"       
## [371] "(C)"        "(S)"        "(S)"        "(C)"        "(R)"       
## [376] "(S)"        "(S)"        NA           "(A)"        "(T)"       
## [381] "(S)"        "(A)"        "(C)"        "(C)"        "(A)"       
## [386] "(C)"        "(DC)"       "(C)"        "(A)"        "(A)"       
## [391] "(A)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
## [396] "(M)"        "(R)"        "(DC)"       "(C)"        "(S)"       
## [401] "(S)"        "(C)"        "(C)"        "(C)"        "(C)"       
## [406] "(C)"        "(S)"        "(A)"        NA           "(S)"       
## [411] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [416] "(S)"        NA           "(C)"        "(S)"        "(C)"       
## [421] "(DC)"       "(S)"        "(C)"        "(S)"        "(C)"       
## [426] "(M)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [431] "(C)"        "(S)"        "(S)"        "(S)"        "(A)"       
## [436] "(A)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [441] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
## [446] "(C) (NSW)"  "(S) (Qld)"  "(R) (Qld)"  "(DC) (SA)"  "(C) (SA)"  
## [451] "(M) (Tas.)" "(M) (Tas.)" "(C)"        "(R)"        "(M)"       
## [456] "(C)"        "(R)"        "(S)"        "(RC)"       "(S)"       
## [461] "(M)"        "(C)"        "(R)"        "(C)"        "(DC)"      
## [466] "(C)"        "(C)"        "(M)"        "(C)"        "(S)"       
## [471] "(C)"        "(DC)"       "(M)"        "(S)"        "(C)"       
## [476] "(C)"        "(A)"        "(DC)"       "(R)"        "(C)"       
## [481] "(C)"        "(A)"        "(M)"        "(C)"        "(C)"       
## [486] "(S)"        "(S)"        "(S)"        "(A)"        "(R)"       
## [491] "(M)"        "(A)"        "(R)"        "(A)"        "(A)"       
## [496] "(R)"        "(R)"        "(R)"        "(S)"        "(C)"       
## [501] "(C)"        "(S)"        "(A)"        "(S)"        "(M)"       
## [506] "(M)"        "(S)"        "(A)"        "(A)"        "(S)"       
## [511] "(A)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [516] NA           "(A)"        NA           "(R)"        "(C)"       
## [521] "(S)"        "(C)"        "(S)"        "(A)"        "(A)"       
## [526] "(A)"        "(A)"        "(C)"        "(A)"        "(A)"       
## [531] "(A)"        "(A)"        "(C) (NSW)"  "(A)"        "(C)"       
## [536] "(R)"        "(S)"        "(A)"        "(R)"        "(C)"       
## [541] "(A)"        "(S)"        "(A)"        "(A)"

What is "\$.+\$"???
This is a pattern expressed as regular expression or regex for short

24/39

Extracting the string

str_extract(LGA, "\\(.+\\)")

##   [1] "(C)"        "(S)"        "(R)"        "(S)"        "(R)"       
##   [6] "(S)"        "(DC)"       "(R)"        "(DC)"       "(C)"       
##  [11] "(DC)"       "(S)"        "(S)"        "(S)"        "(DC)"      
##  [16] "(A)"        "(C)"        "(A)"        "(T)"        "(RC)"      
##  [21] "(A)"        "(S)"        "(S)"        "(S)"        "(C)"       
##  [26] "(DC)"       "(R)"        "(A)"        "(C)"        "(DC)"      
##  [31] "(S)"        "(S)"        "(A)"        "(S)"        "(S)"       
##  [36] "(R)"        "(M)"        "(A)"        "(C)"        "(S)"       
##  [41] "(S)"        "(C)"        "(A)"        "(S)"        "(C)"       
##  [46] "(AC)"       "(A)"        "(S)"        "(A)"        "(C)"       
##  [51] "(A)"        "(R)"        "(S)"        "(T)"        "(C)"       
##  [56] "(S)"        "(S)"        "(R)"        "(C)"        "(T)"       
##  [61] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
##  [66] "(C)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
##  [71] "(R)"        "(R)"        "(S)"        "(B)"        "(DC)"      
##  [76] "(M)"        "(A)"        "(C)"        "(S)"        "(S)"       
##  [81] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
##  [86] "(C)"        "(A)"        "(C)"        "(A)"        "(S)"       
##  [91] "(C)"        "(A)"        "(S)"        "(S)"        "(S)"       
##  [96] "(S)"        "(DC)"       "(S)"        "(S)"        "(S)"       
## [101] "(C)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [106] "(C)"        "(S)"        "(DC)"       "(C)"        "(C)"       
## [111] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [116] "(S)"        "(A)"        "(DC)"       "(S)"        "(A)"       
## [121] "(C)"        "(A)"        "(S)"        "(A)"        "(DC)"      
## [126] "(S)"        "(C)"        "(S)"        "(A)"        "(S)"       
## [131] "(M)"        "(S)"        "(DC)"       "(R)"        "(C)"       
## [136] "(C)"        "(S)"        "(C)"        "(S)"        "(T)"       
## [141] "(S)"        "(S)"        "(DC)"       "(S)"        "(T)"       
## [146] "(C)"        "(S)"        "(M)"        "(S)"        "(DC)"      
## [151] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [156] "(C)"        "(C)"        "(R)"        "(S)"        "(C)"       
## [161] "(C)"        "(R)"        "(S)"        "(C)"        "(A)"       
## [166] "(T)"        "(S)"        "(RC)"       "(C)"        "(A)"       
## [171] "(A)"        "(A)"        "(S)"        "(A)"        "(S)"       
## [176] "(S)"        "(T)"        "(S)"        "(S)"        "(S)"       
## [181] "(A)"        "(DC)"       "(M)"        "(C)"        "(S)"       
## [186] "(A)"        "(T)"        "(A)"        "(C)"        "(S)"       
## [191] "(C)"        "(R)"        "(C)"        "(S)"        "(S)"       
## [196] "(S)"        "(S)"        "(R)"        "(C)"        "(DC)"      
## [201] "(A)"        "(DC)"       "(R)"        "(C)"        "(S)"       
## [206] "(S)"        "(C)"        "(C)"        "(R)"        "(S)"       
## [211] "(S)"        "(C)"        "(A)"        "(S)"        "(S)"       
## [216] "(C)"        "(DC)"       "(S)"        "(M) (Tas.)" "(M) (Tas.)"
## [221] "(C) (Vic.)" "(C) (Vic.)" "(S)"        "(DC)"       "(S)"       
## [226] "(RC)"       "(S)"        "(DC)"       "(S)"        "(S)"       
## [231] "(R)"        "(S)"        "(A)"        "(C)"        "(C)"       
## [236] "(A)"        "(A)"        "(RC)"       "(S)"        "(C)"       
## [241] "(S)"        "(S)"        "(S)"        "(C)"        "(C)"       
## [246] "(S)"        "(C)"        "(C)"        "(C)"        "(A)"       
## [251] "(C)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [256] "(S)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [261] "(A)"        "(A)"        "(S)"        "(S)"        "(C)"       
## [266] "(A)"        "(M)"        "(S)"        "(S)"        "(C)"       
## [271] "(R)"        "(S)"        "(R)"        "(DC)"       "(R)"       
## [276] "(C)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [281] "(A)"        "(R)"        "(DC)"       "(A)"        "(C)"       
## [286] "(A)"        "(S)"        "(S)"        "(A)"        "(C)"       
## [291] "(C)"        "(A)"        "(T)"        "(S)"        "(C)"       
## [296] "(A)"        "(A)"        "(S)"        "(S)"        "(T)"       
## [301] "(C)"        "(A)"        "(A)"        "(DC)"       "(A)"       
## [306] "(C)"        "(M)"        "(M)"        "(S)"        "(A)"       
## [311] "(A)"        "(C)"        "(C)"        "(S)"        "(DC)"      
## [316] "(S)"        "(C)"        "(S)"        "(S)"        "(DC)"      
## [321] "(RegC)"     "(C)"        "(S)"        "(S)"        NA          
## [326] "(A)"        "(S)"        "(A)"        "(S)"        "(A)"       
## [331] "(S)"        "(C)"        "(R)"        "(C)"        "(S)"       
## [336] "(A)"        "(DC)"       "(S)"        "(A)"        "(R)"       
## [341] "(S)"        "(S)"        "(RC)"       "(T)"        "(A)"       
## [346] "(M)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [351] "(S)"        "(A)"        "(RC)"       "(S)"        "(A)"       
## [356] "(R)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [361] "(DC)"       "(M)"        "(M)"        "(AC)"       "(DC)"      
## [366] "(A)"        "(A)"        "(S)"        "(S)"        "(A)"       
## [371] "(C)"        "(S)"        "(S)"        "(C)"        "(R)"       
## [376] "(S)"        "(S)"        NA           "(A)"        "(T)"       
## [381] "(S)"        "(A)"        "(C)"        "(C)"        "(A)"       
## [386] "(C)"        "(DC)"       "(C)"        "(A)"        "(A)"       
## [391] "(A)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
## [396] "(M)"        "(R)"        "(DC)"       "(C)"        "(S)"       
## [401] "(S)"        "(C)"        "(C)"        "(C)"        "(C)"       
## [406] "(C)"        "(S)"        "(A)"        NA           "(S)"       
## [411] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [416] "(S)"        NA           "(C)"        "(S)"        "(C)"       
## [421] "(DC)"       "(S)"        "(C)"        "(S)"        "(C)"       
## [426] "(M)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [431] "(C)"        "(S)"        "(S)"        "(S)"        "(A)"       
## [436] "(A)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [441] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
## [446] "(C) (NSW)"  "(S) (Qld)"  "(R) (Qld)"  "(DC) (SA)"  "(C) (SA)"  
## [451] "(M) (Tas.)" "(M) (Tas.)" "(C)"        "(R)"        "(M)"       
## [456] "(C)"        "(R)"        "(S)"        "(RC)"       "(S)"       
## [461] "(M)"        "(C)"        "(R)"        "(C)"        "(DC)"      
## [466] "(C)"        "(C)"        "(M)"        "(C)"        "(S)"       
## [471] "(C)"        "(DC)"       "(M)"        "(S)"        "(C)"       
## [476] "(C)"        "(A)"        "(DC)"       "(R)"        "(C)"       
## [481] "(C)"        "(A)"        "(M)"        "(C)"        "(C)"       
## [486] "(S)"        "(S)"        "(S)"        "(A)"        "(R)"       
## [491] "(M)"        "(A)"        "(R)"        "(A)"        "(A)"       
## [496] "(R)"        "(R)"        "(R)"        "(S)"        "(C)"       
## [501] "(C)"        "(S)"        "(A)"        "(S)"        "(M)"       
## [506] "(M)"        "(S)"        "(A)"        "(A)"        "(S)"       
## [511] "(A)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [516] NA           "(A)"        NA           "(R)"        "(C)"       
## [521] "(S)"        "(C)"        "(S)"        "(A)"        "(A)"       
## [526] "(A)"        "(A)"        "(C)"        "(A)"        "(A)"       
## [531] "(A)"        "(A)"        "(C) (NSW)"  "(A)"        "(C)"       
## [536] "(R)"        "(S)"        "(A)"        "(R)"        "(C)"       
## [541] "(A)"        "(S)"        "(A)"        "(A)"

What is "\$.+\$"???
This is a pattern expressed as regular expression or regex for short
Note in R, you have to add an extra \ when \ is included in the pattern (yes this means that you can have a lot of backslashes... just keep adding \ until it works! Enjoy this xkcd comic.)

24/39

Extracting the string

str_extract(LGA, "\\(.+\\)")

##   [1] "(C)"        "(S)"        "(R)"        "(S)"        "(R)"       
##   [6] "(S)"        "(DC)"       "(R)"        "(DC)"       "(C)"       
##  [11] "(DC)"       "(S)"        "(S)"        "(S)"        "(DC)"      
##  [16] "(A)"        "(C)"        "(A)"        "(T)"        "(RC)"      
##  [21] "(A)"        "(S)"        "(S)"        "(S)"        "(C)"       
##  [26] "(DC)"       "(R)"        "(A)"        "(C)"        "(DC)"      
##  [31] "(S)"        "(S)"        "(A)"        "(S)"        "(S)"       
##  [36] "(R)"        "(M)"        "(A)"        "(C)"        "(S)"       
##  [41] "(S)"        "(C)"        "(A)"        "(S)"        "(C)"       
##  [46] "(AC)"       "(A)"        "(S)"        "(A)"        "(C)"       
##  [51] "(A)"        "(R)"        "(S)"        "(T)"        "(C)"       
##  [56] "(S)"        "(S)"        "(R)"        "(C)"        "(T)"       
##  [61] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
##  [66] "(C)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
##  [71] "(R)"        "(R)"        "(S)"        "(B)"        "(DC)"      
##  [76] "(M)"        "(A)"        "(C)"        "(S)"        "(S)"       
##  [81] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
##  [86] "(C)"        "(A)"        "(C)"        "(A)"        "(S)"       
##  [91] "(C)"        "(A)"        "(S)"        "(S)"        "(S)"       
##  [96] "(S)"        "(DC)"       "(S)"        "(S)"        "(S)"       
## [101] "(C)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [106] "(C)"        "(S)"        "(DC)"       "(C)"        "(C)"       
## [111] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [116] "(S)"        "(A)"        "(DC)"       "(S)"        "(A)"       
## [121] "(C)"        "(A)"        "(S)"        "(A)"        "(DC)"      
## [126] "(S)"        "(C)"        "(S)"        "(A)"        "(S)"       
## [131] "(M)"        "(S)"        "(DC)"       "(R)"        "(C)"       
## [136] "(C)"        "(S)"        "(C)"        "(S)"        "(T)"       
## [141] "(S)"        "(S)"        "(DC)"       "(S)"        "(T)"       
## [146] "(C)"        "(S)"        "(M)"        "(S)"        "(DC)"      
## [151] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [156] "(C)"        "(C)"        "(R)"        "(S)"        "(C)"       
## [161] "(C)"        "(R)"        "(S)"        "(C)"        "(A)"       
## [166] "(T)"        "(S)"        "(RC)"       "(C)"        "(A)"       
## [171] "(A)"        "(A)"        "(S)"        "(A)"        "(S)"       
## [176] "(S)"        "(T)"        "(S)"        "(S)"        "(S)"       
## [181] "(A)"        "(DC)"       "(M)"        "(C)"        "(S)"       
## [186] "(A)"        "(T)"        "(A)"        "(C)"        "(S)"       
## [191] "(C)"        "(R)"        "(C)"        "(S)"        "(S)"       
## [196] "(S)"        "(S)"        "(R)"        "(C)"        "(DC)"      
## [201] "(A)"        "(DC)"       "(R)"        "(C)"        "(S)"       
## [206] "(S)"        "(C)"        "(C)"        "(R)"        "(S)"       
## [211] "(S)"        "(C)"        "(A)"        "(S)"        "(S)"       
## [216] "(C)"        "(DC)"       "(S)"        "(M) (Tas.)" "(M) (Tas.)"
## [221] "(C) (Vic.)" "(C) (Vic.)" "(S)"        "(DC)"       "(S)"       
## [226] "(RC)"       "(S)"        "(DC)"       "(S)"        "(S)"       
## [231] "(R)"        "(S)"        "(A)"        "(C)"        "(C)"       
## [236] "(A)"        "(A)"        "(RC)"       "(S)"        "(C)"       
## [241] "(S)"        "(S)"        "(S)"        "(C)"        "(C)"       
## [246] "(S)"        "(C)"        "(C)"        "(C)"        "(A)"       
## [251] "(C)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [256] "(S)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [261] "(A)"        "(A)"        "(S)"        "(S)"        "(C)"       
## [266] "(A)"        "(M)"        "(S)"        "(S)"        "(C)"       
## [271] "(R)"        "(S)"        "(R)"        "(DC)"       "(R)"       
## [276] "(C)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [281] "(A)"        "(R)"        "(DC)"       "(A)"        "(C)"       
## [286] "(A)"        "(S)"        "(S)"        "(A)"        "(C)"       
## [291] "(C)"        "(A)"        "(T)"        "(S)"        "(C)"       
## [296] "(A)"        "(A)"        "(S)"        "(S)"        "(T)"       
## [301] "(C)"        "(A)"        "(A)"        "(DC)"       "(A)"       
## [306] "(C)"        "(M)"        "(M)"        "(S)"        "(A)"       
## [311] "(A)"        "(C)"        "(C)"        "(S)"        "(DC)"      
## [316] "(S)"        "(C)"        "(S)"        "(S)"        "(DC)"      
## [321] "(RegC)"     "(C)"        "(S)"        "(S)"        NA          
## [326] "(A)"        "(S)"        "(A)"        "(S)"        "(A)"       
## [331] "(S)"        "(C)"        "(R)"        "(C)"        "(S)"       
## [336] "(A)"        "(DC)"       "(S)"        "(A)"        "(R)"       
## [341] "(S)"        "(S)"        "(RC)"       "(T)"        "(A)"       
## [346] "(M)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [351] "(S)"        "(A)"        "(RC)"       "(S)"        "(A)"       
## [356] "(R)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [361] "(DC)"       "(M)"        "(M)"        "(AC)"       "(DC)"      
## [366] "(A)"        "(A)"        "(S)"        "(S)"        "(A)"       
## [371] "(C)"        "(S)"        "(S)"        "(C)"        "(R)"       
## [376] "(S)"        "(S)"        NA           "(A)"        "(T)"       
## [381] "(S)"        "(A)"        "(C)"        "(C)"        "(A)"       
## [386] "(C)"        "(DC)"       "(C)"        "(A)"        "(A)"       
## [391] "(A)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
## [396] "(M)"        "(R)"        "(DC)"       "(C)"        "(S)"       
## [401] "(S)"        "(C)"        "(C)"        "(C)"        "(C)"       
## [406] "(C)"        "(S)"        "(A)"        NA           "(S)"       
## [411] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [416] "(S)"        NA           "(C)"        "(S)"        "(C)"       
## [421] "(DC)"       "(S)"        "(C)"        "(S)"        "(C)"       
## [426] "(M)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [431] "(C)"        "(S)"        "(S)"        "(S)"        "(A)"       
## [436] "(A)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [441] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
## [446] "(C) (NSW)"  "(S) (Qld)"  "(R) (Qld)"  "(DC) (SA)"  "(C) (SA)"  
## [451] "(M) (Tas.)" "(M) (Tas.)" "(C)"        "(R)"        "(M)"       
## [456] "(C)"        "(R)"        "(S)"        "(RC)"       "(S)"       
## [461] "(M)"        "(C)"        "(R)"        "(C)"        "(DC)"      
## [466] "(C)"        "(C)"        "(M)"        "(C)"        "(S)"       
## [471] "(C)"        "(DC)"       "(M)"        "(S)"        "(C)"       
## [476] "(C)"        "(A)"        "(DC)"       "(R)"        "(C)"       
## [481] "(C)"        "(A)"        "(M)"        "(C)"        "(C)"       
## [486] "(S)"        "(S)"        "(S)"        "(A)"        "(R)"       
## [491] "(M)"        "(A)"        "(R)"        "(A)"        "(A)"       
## [496] "(R)"        "(R)"        "(R)"        "(S)"        "(C)"       
## [501] "(C)"        "(S)"        "(A)"        "(S)"        "(M)"       
## [506] "(M)"        "(S)"        "(A)"        "(A)"        "(S)"       
## [511] "(A)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [516] NA           "(A)"        NA           "(R)"        "(C)"       
## [521] "(S)"        "(C)"        "(S)"        "(A)"        "(A)"       
## [526] "(A)"        "(A)"        "(C)"        "(A)"        "(A)"       
## [531] "(A)"        "(A)"        "(C) (NSW)"  "(A)"        "(C)"       
## [536] "(R)"        "(S)"        "(A)"        "(R)"        "(C)"       
## [541] "(A)"        "(S)"        "(A)"        "(A)"

What is "\$.+\$"???
This is a pattern expressed as regular expression or regex for short
Note in R, you have to add an extra \ when \ is included in the pattern (yes this means that you can have a lot of backslashes... just keep adding \ until it works! Enjoy this xkcd comic.)
From R v4.0.0 onwards, you can use raw string to elimiate all the extra \, e.g. r"($.+$)" is the same as "\$.+\$"

24/39

Regular expressions Part 1Regular expression, or regex, is a string of characters that define a search pattern for text
25/39

Regular expressions Part 1Regular expression, or regex, is a string of characters that define a search pattern for text
Regular expression is... 
25/39

Regular expressions Part 1Regular expression, or regex, is a string of characters that define a search pattern for text
Regular expression is... hard
25/39

Regular expressions Part 1Regular expression, or regex, is a string of characters that define a search pattern for text
Regular expression is... hard, but comes up often enough that it's worth learning
25/39

Regular expressions Part 1

Regular expression, or regex, is a string of characters that define a search pattern for text
Regular expression is... hard, but comes up often enough that it's worth learning

ozanimals <- c("koala", "kangaroo", "kookaburra", "numbat")

25/39

Regular expressions Part 1

Regular expression, or regex, is a string of characters that define a search pattern for text
Regular expression is... hard, but comes up often enough that it's worth learning

ozanimals <- c("koala", "kangaroo", "kookaburra", "numbat")

= Basic match

str_detect(ozanimals, "oo")

## [1] FALSE  TRUE  TRUE FALSE

str_extract(ozanimals, "oo")

## [1] NA   "oo" "oo" NA

str_match(ozanimals, "oo")

##      [,1]
## [1,] NA  
## [2,] "oo"
## [3,] "oo"
## [4,] NA

25/39

Regular expressions Part 2

= Meta-characters

"." a wildcard to match any character except a new line

str_starts(c("color", "colouur", "colour", "red-column"), "col...")

## [1] FALSE  TRUE  TRUE FALSE

26/39

Regular expressions Part 2

= Meta-characters

"." a wildcard to match any character except a new line

str_starts(c("color", "colouur", "colour", "red-column"), "col...")

## [1] FALSE  TRUE  TRUE FALSE

"(.|.)" a marked subexpression with alternate possibilites marked with |

str_replace(c("lovelove", "move", "stove", "drove"), "(l|dr|st)o", "ha")

## [1] "havelove" "move"     "have"     "have"

26/39

Regular expressions Part 2

= Meta-characters

"." a wildcard to match any character except a new line

str_starts(c("color", "colouur", "colour", "red-column"), "col...")

## [1] FALSE  TRUE  TRUE FALSE

"(.|.)" a marked subexpression with alternate possibilites marked with |

str_replace(c("lovelove", "move", "stove", "drove"), "(l|dr|st)o", "ha")

## [1] "havelove" "move"     "have"     "have"

"[...]" matches a single character contained in the bracket

str_replace_all(c("cake", "cookie", "lamington"), "[aeiou]", "_")

## [1] "c_k_"      "c__k__"    "l_m_ngt_n"

26/39

Regular expressions Part 3

= Meta-character quantifiers

"?" zero or one occurence of preceding element

str_extract(c("color", "colouur", "colour", "red"), "colou?r")

## [1] "color"  NA       "colour" NA

27/39

Regular expressions Part 3

= Meta-character quantifiers

"?" zero or one occurence of preceding element

str_extract(c("color", "colouur", "colour", "red"), "colou?r")

## [1] "color"  NA       "colour" NA

"*" zero or more occurence of preceding element

str_extract(c("color", "colouur", "colour", "red"), "colou*r")

## [1] "color"   "colouur" "colour"  NA

27/39

Regular expressions Part 3

= Meta-character quantifiers

"?" zero or one occurence of preceding element

str_extract(c("color", "colouur", "colour", "red"), "colou?r")

## [1] "color"  NA       "colour" NA

"*" zero or more occurence of preceding element

str_extract(c("color", "colouur", "colour", "red"), "colou*r")

## [1] "color"   "colouur" "colour"  NA

"+" one or more occurence of preceding element

str_extract(c("color", "colouur", "colour", "red"), "colou+r")

## [1] NA        "colouur" "colour"  NA

27/39

Regular expressions Part 4

"{n}" preceding element is matched exactly n times

str_replace(c("banana", "bananana", "bana", "banananana"), "ba(na){2}", "-")

## [1] "-"     "-na"   "bana"  "-nana"

28/39

Regular expressions Part 4

"{n}" preceding element is matched exactly n times

str_replace(c("banana", "bananana", "bana", "banananana"), "ba(na){2}", "-")

## [1] "-"     "-na"   "bana"  "-nana"

"{min,}" preceding element is matched min times or more

str_replace(c("banana", "bananana", "bana", "banananana"), "ba(na){2,}", "-")

## [1] "-"    "-"    "bana" "-"

28/39

Regular expressions Part 4

"{n}" preceding element is matched exactly n times

str_replace(c("banana", "bananana", "bana", "banananana"), "ba(na){2}", "-")

## [1] "-"     "-na"   "bana"  "-nana"

"{min,}" preceding element is matched min times or more

str_replace(c("banana", "bananana", "bana", "banananana"), "ba(na){2,}", "-")

## [1] "-"    "-"    "bana" "-"

"{min,max}" preceding element is matched at least min times but no more than max times

str_replace(c("banana", "bananana", "bana", "banananana"), "ba(na){1,2}", "-")

## [1] "-"     "-na"   "-"     "-nana"

28/39

Regular expressions Part 5

= Character classes

[:alpha:] or [A-Za-z] to match alphabetic characters
[:alnum:] or [A-Za-z0-9] to match alphanumeric characters
[:digit:] or [0-9] or \\d to match a digit
[^0-9] to match non-digits
[a-c] to match a, b or c
[A-Z] to match uppercase letters
[a-z] to match lowercase letters
[:space:] or [ \t\r\n\v\f] to match whitespace characters
and more...

29/39

View matches with regular expressionsstr_view(c("banana", "bananana", "bana", "banabanana"), "ba(na){1,2}")
banana
bananana
bana
banabanana
str_view_all(c("banana", "bananana", "bana", "banabanana"), "ba(na){1,2}")
banana
bananana
bana
banabanana
30/39

View matches with regular expressionsstr_view(c("banana", "bananana", "bana", "banabanana"), "ba(na){1,2}")
banana
bananana
bana
banabanana
str_view_all(c("banana", "bananana", "bana", "banabanana"), "ba(na){1,2}")
banana
bananana
bana
banabanana

When a function in stringr ends with _all, all matches of the pattern are considered
The one without _all only considers the first match

30/39

Back to Extracting the string

str_extract(LGA, "\\(.+\\)")

##   [1] "(C)"        "(S)"        "(R)"        "(S)"        "(R)"       
##   [6] "(S)"        "(DC)"       "(R)"        "(DC)"       "(C)"       
##  [11] "(DC)"       "(S)"        "(S)"        "(S)"        "(DC)"      
##  [16] "(A)"        "(C)"        "(A)"        "(T)"        "(RC)"      
##  [21] "(A)"        "(S)"        "(S)"        "(S)"        "(C)"       
##  [26] "(DC)"       "(R)"        "(A)"        "(C)"        "(DC)"      
##  [31] "(S)"        "(S)"        "(A)"        "(S)"        "(S)"       
##  [36] "(R)"        "(M)"        "(A)"        "(C)"        "(S)"       
##  [41] "(S)"        "(C)"        "(A)"        "(S)"        "(C)"       
##  [46] "(AC)"       "(A)"        "(S)"        "(A)"        "(C)"       
##  [51] "(A)"        "(R)"        "(S)"        "(T)"        "(C)"       
##  [56] "(S)"        "(S)"        "(R)"        "(C)"        "(T)"       
##  [61] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
##  [66] "(C)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
##  [71] "(R)"        "(R)"        "(S)"        "(B)"        "(DC)"      
##  [76] "(M)"        "(A)"        "(C)"        "(S)"        "(S)"       
##  [81] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
##  [86] "(C)"        "(A)"        "(C)"        "(A)"        "(S)"       
##  [91] "(C)"        "(A)"        "(S)"        "(S)"        "(S)"       
##  [96] "(S)"        "(DC)"       "(S)"        "(S)"        "(S)"       
## [101] "(C)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [106] "(C)"        "(S)"        "(DC)"       "(C)"        "(C)"       
## [111] "(S)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [116] "(S)"        "(A)"        "(DC)"       "(S)"        "(A)"       
## [121] "(C)"        "(A)"        "(S)"        "(A)"        "(DC)"      
## [126] "(S)"        "(C)"        "(S)"        "(A)"        "(S)"       
## [131] "(M)"        "(S)"        "(DC)"       "(R)"        "(C)"       
## [136] "(C)"        "(S)"        "(C)"        "(S)"        "(T)"       
## [141] "(S)"        "(S)"        "(DC)"       "(S)"        "(T)"       
## [146] "(C)"        "(S)"        "(M)"        "(S)"        "(DC)"      
## [151] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [156] "(C)"        "(C)"        "(R)"        "(S)"        "(C)"       
## [161] "(C)"        "(R)"        "(S)"        "(C)"        "(A)"       
## [166] "(T)"        "(S)"        "(RC)"       "(C)"        "(A)"       
## [171] "(A)"        "(A)"        "(S)"        "(A)"        "(S)"       
## [176] "(S)"        "(T)"        "(S)"        "(S)"        "(S)"       
## [181] "(A)"        "(DC)"       "(M)"        "(C)"        "(S)"       
## [186] "(A)"        "(T)"        "(A)"        "(C)"        "(S)"       
## [191] "(C)"        "(R)"        "(C)"        "(S)"        "(S)"       
## [196] "(S)"        "(S)"        "(R)"        "(C)"        "(DC)"      
## [201] "(A)"        "(DC)"       "(R)"        "(C)"        "(S)"       
## [206] "(S)"        "(C)"        "(C)"        "(R)"        "(S)"       
## [211] "(S)"        "(C)"        "(A)"        "(S)"        "(S)"       
## [216] "(C)"        "(DC)"       "(S)"        "(M) (Tas.)" "(M) (Tas.)"
## [221] "(C) (Vic.)" "(C) (Vic.)" "(S)"        "(DC)"       "(S)"       
## [226] "(RC)"       "(S)"        "(DC)"       "(S)"        "(S)"       
## [231] "(R)"        "(S)"        "(A)"        "(C)"        "(C)"       
## [236] "(A)"        "(A)"        "(RC)"       "(S)"        "(C)"       
## [241] "(S)"        "(S)"        "(S)"        "(C)"        "(C)"       
## [246] "(S)"        "(C)"        "(C)"        "(C)"        "(A)"       
## [251] "(C)"        "(S)"        "(S)"        "(S)"        "(S)"       
## [256] "(S)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [261] "(A)"        "(A)"        "(S)"        "(S)"        "(C)"       
## [266] "(A)"        "(M)"        "(S)"        "(S)"        "(C)"       
## [271] "(R)"        "(S)"        "(R)"        "(DC)"       "(R)"       
## [276] "(C)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [281] "(A)"        "(R)"        "(DC)"       "(A)"        "(C)"       
## [286] "(A)"        "(S)"        "(S)"        "(A)"        "(C)"       
## [291] "(C)"        "(A)"        "(T)"        "(S)"        "(C)"       
## [296] "(A)"        "(A)"        "(S)"        "(S)"        "(T)"       
## [301] "(C)"        "(A)"        "(A)"        "(DC)"       "(A)"       
## [306] "(C)"        "(M)"        "(M)"        "(S)"        "(A)"       
## [311] "(A)"        "(C)"        "(C)"        "(S)"        "(DC)"      
## [316] "(S)"        "(C)"        "(S)"        "(S)"        "(DC)"      
## [321] "(RegC)"     "(C)"        "(S)"        "(S)"        NA          
## [326] "(A)"        "(S)"        "(A)"        "(S)"        "(A)"       
## [331] "(S)"        "(C)"        "(R)"        "(C)"        "(S)"       
## [336] "(A)"        "(DC)"       "(S)"        "(A)"        "(R)"       
## [341] "(S)"        "(S)"        "(RC)"       "(T)"        "(A)"       
## [346] "(M)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [351] "(S)"        "(A)"        "(RC)"       "(S)"        "(A)"       
## [356] "(R)"        "(S)"        "(S)"        "(C)"        "(S)"       
## [361] "(DC)"       "(M)"        "(M)"        "(AC)"       "(DC)"      
## [366] "(A)"        "(A)"        "(S)"        "(S)"        "(A)"       
## [371] "(C)"        "(S)"        "(S)"        "(C)"        "(R)"       
## [376] "(S)"        "(S)"        NA           "(A)"        "(T)"       
## [381] "(S)"        "(A)"        "(C)"        "(C)"        "(A)"       
## [386] "(C)"        "(DC)"       "(C)"        "(A)"        "(A)"       
## [391] "(A)"        "(S)"        "(DC)"       "(DC)"       "(S)"       
## [396] "(M)"        "(R)"        "(DC)"       "(C)"        "(S)"       
## [401] "(S)"        "(C)"        "(C)"        "(C)"        "(C)"       
## [406] "(C)"        "(S)"        "(A)"        NA           "(S)"       
## [411] "(C)"        "(S)"        "(M)"        "(C)"        "(S)"       
## [416] "(S)"        NA           "(C)"        "(S)"        "(C)"       
## [421] "(DC)"       "(S)"        "(C)"        "(S)"        "(C)"       
## [426] "(M)"        "(A)"        "(A)"        "(A)"        "(S)"       
## [431] "(C)"        "(S)"        "(S)"        "(S)"        "(A)"       
## [436] "(A)"        "(A)"        "(S)"        "(S)"        "(S)"       
## [441] "(C)"        "(S)"        "(C)"        "(C)"        "(C)"       
## [446] "(C) (NSW)"  "(S) (Qld)"  "(R) (Qld)"  "(DC) (SA)"  "(C) (SA)"  
## [451] "(M) (Tas.)" "(M) (Tas.)" "(C)"        "(R)"        "(M)"       
## [456] "(C)"        "(R)"        "(S)"        "(RC)"       "(S)"       
## [461] "(M)"        "(C)"        "(R)"        "(C)"        "(DC)"      
## [466] "(C)"        "(C)"        "(M)"        "(C)"        "(S)"       
## [471] "(C)"        "(DC)"       "(M)"        "(S)"        "(C)"       
## [476] "(C)"        "(A)"        "(DC)"       "(R)"        "(C)"       
## [481] "(C)"        "(A)"        "(M)"        "(C)"        "(C)"       
## [486] "(S)"        "(S)"        "(S)"        "(A)"        "(R)"       
## [491] "(M)"        "(A)"        "(R)"        "(A)"        "(A)"       
## [496] "(R)"        "(R)"        "(R)"        "(S)"        "(C)"       
## [501] "(C)"        "(S)"        "(A)"        "(S)"        "(M)"       
## [506] "(M)"        "(S)"        "(A)"        "(A)"        "(S)"       
## [511] "(A)"        "(C)"        "(DC)"       "(S)"        "(S)"       
## [516] NA           "(A)"        NA           "(R)"        "(C)"       
## [521] "(S)"        "(C)"        "(S)"        "(A)"        "(A)"       
## [526] "(A)"        "(A)"        "(C)"        "(A)"        "(A)"       
## [531] "(A)"        "(A)"        "(C) (NSW)"  "(A)"        "(C)"       
## [536] "(R)"        "(S)"        "(A)"        "(R)"        "(C)"       
## [541] "(A)"        "(S)"        "(A)"        "(A)"

31/39

Back to Extracting the string

str_extract(LGA, "\\(.+\\)") %>% 
  table()

## .
##        (A)       (AC)        (B)        (C)  (C) (NSW)   (C) (SA) (C) (Vic.) 
##        100          2          1        120          2          1          2 
##       (DC)  (DC) (SA)        (M) (M) (Tas.)        (R)  (R) (Qld)       (RC) 
##         40          1         23          4         38          1          7 
##     (RegC)        (S)  (S) (Qld)        (T) 
##          1        182          1         12

31/39

Back to Extracting the string

str_extract(LGA, "\\(.+\\)") %>% 
  table()

## .
##        (A)       (AC)        (B)        (C)  (C) (NSW)   (C) (SA) (C) (Vic.) 
##        100          2          1        120          2          1          2 
##       (DC)  (DC) (SA)        (M) (M) (Tas.)        (R)  (R) (Qld)       (RC) 
##         40          1         23          4         38          1          7 
##     (RegC)        (S)  (S) (Qld)        (T) 
##          1        182          1         12

Where the same Local Government Area name appears in different States or Territories, the State or Territory abbreviation appears in parenthesis after the name. Local Government Area names are therefore unique.
-Australian Bureau of Statistics

31/39

Retry Extracting the string

str_extract(LGA, "\\([^)]+\\)") %>% 
  table()

## .
##    (A)   (AC)    (B)    (C)   (DC)    (M)    (R)   (RC) (RegC)    (S)    (T) 
##    100      2      1    125     41     27     39      7      1    183     12

32/39

Retry Extracting the string

str_extract(LGA, "\\([^)]+\\)") %>% 
  # remove the brackets
  str_replace_all("[\\(\\)]", "") %>% 
  table()

## .
##    A   AC    B    C   DC    M    R   RC RegC    S    T 
##  100    2    1  125   41   27   39    7    1  183   12

"[]" for single character match
We want to match ( and ) but these are meta-characters
So we need to escape it to have it as a literal: $ and $
But we must escape the escape character... so it's actually \$ \$

32/39

R v4.0.0 Extracting the string


str_extract(LGA, r"(\([^)]+\))") %>% 
  # remove the brackets
  str_replace_all(r"([\(\)])", "") %>% 
  table()

## .
##    A   AC    B    C   DC    M    R   RC RegC    S    T 
##  100    2    1  125   41   27   39    7    1  183   12

If using R v4.0.0 onwards, you can use the raw string version instead

33/39

Back to Census34/39

Raw Data vs. Aggregated Data

Although the data collected was from individual households surveying each person in the household (see sample form here), the downloaded data are aggregated.
Aggregated data presents summary statistics from the raw data. When the only summary statistics are counts then it is generally called frequency data.
The raw data collected would be similar to the form

household_id	person	gender	age	maritial_status	income_per_week
1	John Smith	F	40	Married	400-499
1	Jane Smith	M	39	Married	300-399
1	David Smith	M	10	Never married	Nil
1	Mary Smith	F	8	Never married	Nil
2	John Citizen	M	32	Never married	400-499
2	Jane Citizen	F	33	Never married	1750-1999

35/39

What you lose in aggregate dataFor aggregate data, there are less scope for you to draw insights conditioned on other variables. 
E.g. based on frequency data alone, you cannot answer questions like: how many middle income families with 2 children?
Raw data are desirable if you can get hold of it!
36/39

What you lose in aggregate dataFor aggregate data, there are less scope for you to draw insights conditioned on other variables. 
E.g. based on frequency data alone, you cannot answer questions like: how many middle income families with 2 children?
Raw data are desirable if you can get hold of it!
Trust and skepticismBy the way, did you notice anything odd about the dummy data presented in the last slide?
36/39

What you lose in aggregate dataFor aggregate data, there are less scope for you to draw insights conditioned on other variables. 
E.g. based on frequency data alone, you cannot answer questions like: how many middle income families with 2 children?
Raw data are desirable if you can get hold of it!
Trust and skepticismBy the way, did you notice anything odd about the dummy data presented in the last slide?
John Smith was recorded as female and Jane Smith as male. Data may have been incorrectly recorded. 
36/39

What you lose in aggregate dataFor aggregate data, there are less scope for you to draw insights conditioned on other variables. 
E.g. based on frequency data alone, you cannot answer questions like: how many middle income families with 2 children?
Raw data are desirable if you can get hold of it!
Trust and skepticismBy the way, did you notice anything odd about the dummy data presented in the last slide?
John Smith was recorded as female and Jane Smith as male. Data may have been incorrectly recorded. 
How much do you trust the aggregate data?
Have some healthy dose of skepticism in your data.
36/39

Data ConfidentialityThe data is not just aggregated, but it is also anonymised
E.g. in 2016_GCP_Sequential_Template.xlsx, Sheet "G 17a", footnote says "Please note that there are small random adjustments made to all cell values to protect the confidentiality of data. These adjustments may cause the sum of rows or columns to differ by small amounts from table totals."
37/39

Data Confidentiality

The data is not just aggregated, but it is also anonymised
E.g. in 2016_GCP_Sequential_Template.xlsx, Sheet "G 17a", footnote says "Please note that there are small random adjustments made to all cell values to protect the confidentiality of data. These adjustments may cause the sum of rows or columns to differ by small amounts from table totals."

Do you think that you'll get the same numbers if you use the ones from different geographical code? E.g. SA1 and STE.

37/39

Data Confidentiality

The data is not just aggregated, but it is also anonymised
E.g. in 2016_GCP_Sequential_Template.xlsx, Sheet "G 17a", footnote says "Please note that there are small random adjustments made to all cell values to protect the confidentiality of data. These adjustments may cause the sum of rows or columns to differ by small amounts from table totals."

Do you think that you'll get the same numbers if you use the ones from different geographical code? E.g. SA1 and STE.

You can check this in the tutorial 🔧

37/39

Summary
We went through how to locate and understand the data variables for the personal income data from the 2016 Australian census.
We know some limitations with this data. 
We learnt how to manipulate strings and a little about regular expression. 
We learnt about what tidy data is.

38/39

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Lecturer: Emi Tanaka

Department of Econometrics and Business Statistics

ETC5512.Clayton-x@monash.edu

Week 4

39/39

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

ETC5512: Wild Caught Data

Australian census

Population data

Population data

Population data

Population data

Sample survey

Census

Sample survey

Census

Australian Bureau of Statistics (ABS)

Australian Bureau of Statistics (ABS)

Australian Bureau of Statistics (ABS)

ABS Census Data

ABS Census Data

ABS Census Data

ABS Census Data

ABS Census Data

ABS Census Data

Getting the ABS Census Data

https://www.abs.gov.au/census/find-census-data

Navigating ABS Census data

Navigating ABS Census data

Navigating ABS Census data

Navigating ABS Census data

Navigating ABS Census data

DataPack directory structure

DataPack directory structure

DataPack directory structure

Getting started

Getting started

Getting started

Getting started

Meta-data

Meta-data

Finding Table G17

Back to metadata

Back to metadata

Back to metadata

Found Table G17?

Tables G17A-G17C

Table G17

What is Tidy Data?

Tidy Data Principles

What is Tidy Data?

Tidy Data Principles

Tidying the ABS 2016 Census Data

Manipulating strings

Manipulating strings

Manipulating strings

Manipulating strings

Base R and stringr

Why use stringr?

Why use stringr?

Why use stringr?

Why use stringr?

Why use stringr?

Case study Aussie Local Government Area

Case study Aussie Local Government Area

Extracting the string

Extracting the string

Extracting the string

Extracting the string

Extracting the string

Regular expressions Part 1

Regular expressions Part 1

Regular expressions Part 1

Regular expressions Part 1

Regular expressions Part 1

Regular expressions Part 1

Regular expressions Part 2

Regular expressions Part 2

Regular expressions Part 2

Regular expressions Part 3

Regular expressions Part 3

Regular expressions Part 3

Regular expressions Part 4

Regular expressions Part 4

Regular expressions Part 4

Regular expressions Part 5

Base R and `stringr`

Why use `stringr`?

Why use `stringr`?

Why use `stringr`?

Why use `stringr`?

Why use `stringr`?