7  Fundamentals

đź“– Status of the book

Hi there! This book is a work-in-progress. You may like to come back later when it’s closer to a complete state. If you would like to raise issues or leave feedback, please feel to do this:

Chapter 1 describe the basic terminology used in the field of experimental design. We can, however, describe some terms more fundamentally by considering every categorised entity (physical or otherwise) involved in the experiment to be a factor in the design. The two primary roles of a factor are treatments and units; under this categorisation blocks, experimental units and observational units are all just units – the explicit role of the unit is determined implicitly by the relationship of the factors. For example, if a treatment factor is allocated to a plot factor, then the plot is an experimental unit. Another special type of factor is the response, or more broadly an observational record of a unit, although these factors are often (although not always) used in the design of experiments.

Rule 1: Primary roles are encoded as a class

All experimental factors must be encoded as an object with the factor class (e.g. edbl_fct) with primary roles as unit (edbl_unit), treatment (edbl_trt), record (edbl_rcrd) or other. Roles determined implicitly by relationship with other factors should not be encoded as a class.

In the grammar, a factor is an experimental variable manifested as an object with a dual structure: the “factor graph” and the “level graph”. The factor graph characterizes the holistic attributes of the factor (e.g. the label “vaccine” for the treatment factor) and the level graph is the constitution of the discrete factor levels such that it captures the idiosyncratic semantic features of each level that distinguish itself from other levels (e.g. labels “type A” and “type B” for the two types of vaccine treatment). A factor can be physical (e.g. person, plot, animal and drug), metaphysical (e.g. gender, time and space) or unobserved (e.g. intended response measures).

In the system, every level graph co-exists with a factor graph in a reciprocal relationship, i.e. any change in the factor graph may be reflected as a change in level graph (and vice-versa).

Rule 2: Dual structure of factors

All factors must have a dual structure that describes the factor graph (holistic attributes of the factor) and level graph (constitution of the levels).

Every factor in the system is given an explicit role stored as a class in the factor graph. The main roles are unit, treatment and record. The relationship between factors assigns an implicit role, e.g. treatment linked as a directed graph to a unit means that the unit is an experimental unit. The implicit roles are described in Table 7.1.

Table 7.1: Implicit role based on the relationship between factors
A B \(A \rightarrow B\) relationship Implicit role for B
unit unit B is nested in A Nested unit
treatment unit B is applied to A Experimental unit
record unit B is measured on A Observational unit

There are two orders of relationships in the system: the higher-order links between the factor nodes in the factor graph and the lower-order links between the level nodes in the level graph. A higher-order link is created by explicit specification by the user and depending on how it is defined, the system creates the lower-order links between the level nodes with some constraint as prescribed by their explicit role and other user inputs.

Rule 3: Rules for relationships between factors
  1. A higher-order link between any two factor nodes must be explicitly specified by the user.
  2. A lower-order link between any two factor nodes cannot be made if there is no higher-order link between the factor nodes linked to the two level nodes.

Figure 7.1: An example of the full schema of an experimental design with 3 factors: diet, subject, and day. There are three types of diet (Atkins, Keto, Vegan and Paleo), 4 different subjects per day over a total of two days.

An intermediate construct of the experimental design is stored as an object that contain two types of directed graphs, \(G_H = (V(G_H), E(G_H))\) and \(G_L = (V(G_L), E(G_L))\), where \(V(G_F)\) and \(V(G_L)\) are sets of vertices and \(E(G_F)\) and \(E(G_L)\) are sets of edges. We refer \(G_F\) and \(G_L\) as factor graph and level graph, respectively. In the factor graph, every factor is represented as a single vertex. Every factor has a finite number of levels and each of these levels is represented as a single vertex in the level graph.

A factor graph shows a high order view of the relationship between the factors in the experiment. In the above figure, there are two treatment factors: irrigation and fertilizer. Irrigation treatment is applied to the mainplot and the fertilizer treatment is applied to the subplot. The arrow from mainplot node to the subplot node implies that the subplot is nested in the mainplot. The shape and color of the node correspond to the class of the factor (e.g. unit, treatment).

The above figure shows the level graph. The nodes of the same color are the levels of the same factor (e.g. all yellow nodes correspond to the levels of the unit, subplot). The shape of the node correspond to the class of the corresponding factor.

Rule 4: All factors must be given an identity

All nodes in the factor graph must be explicitly named.

Table Table 7.2 shows an example of an illustrative experiment that tests the exam score on a different exam time allocation for two different subjects. In such an experiment, the observational unit may be specified as the combination of Subject-Student, i.e. an observational unit can only be uniquely identified by using information across multiple factors. In the grammar, a factor cannot be implicitly assumed from other factors. This restriction means, for example, that a new factor, like Exam Booklet, which uniquely identifies every Subject-Student combination must be specified in the system. This restriction is not only for the purpose of internal graph representation but forces the user to confront what the observational units actually are. Naming things are hard, but without naming things, it can be hard to create a shared understanding about the experimental structure.

Table 7.2: ?(caption)

(a) The table below shows the exam time allocation for every subject-student combination.
Exam Booklet Subject Student Exam Time Score
1 Math 1 Morning 58
2 Science 1 Afternoon 90
3 Math 2 Afternoon 39
4 Science 2 Morning 80
Rule 5: All factor levels must map to a single observational unit

A viable experimental design is only specified if the relationship between the factors can be reconciled to a single observational unit.

Rule 6: The unit and treatment mapping
  • In the factor system, the mapping is from treatment-to-unit. In other words, each treatment factor can only be applied to a single unit factor.
  • In the level system, the mapping is from unit-to-treatment. Each level of a unit factor can only be mapped to a single level of the treatment factor level.
Rule 7: Bijective mapping of unit and record
  • Each record factor is measured only for a single unit factor.
  • Every level of a unit factor level has a single measurement.

7.1 An edibble graph

An edibble graph, or edbl_graph object, is a special type of directed graph. This form is used to represent intermediate constructs of the experimental design.

In a factor graph:

  • a vertex is a variable,
  • an edge is a high-level connection between two variables, and
  • the direction of an edge defines a relationship based on which two variables it is connecting. Say if we have two nodes named A and B with a directed edge starting from A to B, then the meaning of the relationship follows from Table 7.1. If the combination is not listed below then the nodes cannot have a direct relationship.

As an example, consider a split-plot design that contains 4 main plots with 2 sub plots within each main plot (so 8 subplots in total). There are 2 treatment factors: fertilizer (with levels A and B) and variety (with levels V1 and V2). Each level of the fertilizer is randomly applied to two main plots. Each level of variety is randomly applied to one sub plot within each main plot. Two responses are planned to be measured on the sub plots: yield and height.

In a level graph:

  • a vertex is a level,
  • an edge is a direct connection between two levels, and
  • the direction of an edge defines the same relationship as for high-level view, except if the edge is connecting nodes of levels of the same unit variable then it represents the sequence order of the levels.

The whole edibble graph object contains all the nodes and edges from the high- and low-level views. The whole edibble graph can have numerous nodes and edges, even when the number of units are small, that it’s visualisation will be too cluttered to be any useful. Consequently, when visualising these intermediate construct of the experimental design, only a high- or low-level view is visible to the user, but the object contains the information seen in both views.

7.2 An edibble table (or data frame)

An edibble, or edbl_table object, is a special class of tibble. The word “edibble” itself already implies that it is a table so appending the word with table or data frame seems superfluous. However, edibble can refer to the package, object, or used as an adjective to other objects, so appending edibble with table or data frame is to make explicit emphasis it refers to the edbl_table object, otherwise the reader is expected to infer its meaning by context.

An edbl_table was originally called edbl_df following convention from tibble but I decided to break away from this since other edibble components are graph and design, so a two letter word felt too short in contrast.

An edibble data frame is produced when the variables can be laid out in a tidy data format. An edibble is constructed from two possible ways:

  1. converting edibble graph to edibble using serve_table and
  2. converting existing data frame to edibble using edibble.

7.3 Record factor