A factor graph shows a high order view of the relationship between the factors in the experiment. In the above figure, there are two treatment factors: irrigation and fertilizer. Irrigation treatment is applied to the mainplot and the fertilizer treatment is applied to the subplot. The arrow from mainplot node to the subplot node implies that the subplot is nested in the mainplot. The shape and color of the node correspond to the class of the factor (e.g. unit, treatment).
7 Fundamentals
Chapter 1 describe the basic terminology used in the field of experimental design. We can, however, describe some terms more fundamentally by considering every categorised entity (physical or otherwise) involved in the experiment to be a factor in the design. The two primary roles of a factor are treatments and units; under this categorisation blocks, experimental units and observational units are all just units – the explicit role of the unit is determined implicitly by the relationship of the factors. For example, if a treatment factor is allocated to a plot factor, then the plot is an experimental unit. Another special type of factor is the response, or more broadly an observational record of a unit, although these factors are often (although not always) used in the design of experiments.
In the grammar, a factor is an experimental variable manifested as an object with a dual structure: the “factor graph” and the “level graph”. The factor graph characterizes the holistic attributes of the factor (e.g. the label “vaccine” for the treatment factor) and the level graph is the constitution of the discrete factor levels such that it captures the idiosyncratic semantic features of each level that distinguish itself from other levels (e.g. labels “type A” and “type B” for the two types of vaccine treatment). A factor can be physical (e.g. person, plot, animal and drug), metaphysical (e.g. gender, time and space) or unobserved (e.g. intended response measures).
In the system, every level graph co-exists with a factor graph in a reciprocal relationship, i.e. any change in the factor graph may be reflected as a change in level graph (and vice-versa).
Every factor in the system is given an explicit role stored as a class in the factor graph. The main roles are unit, treatment and record. The relationship between factors assigns an implicit role, e.g. treatment linked as a directed graph to a unit means that the unit is an experimental unit. The implicit roles are described in Table 7.1.
A | B | \(A \rightarrow B\) relationship | Implicit role for B |
---|---|---|---|
unit | unit | B is nested in A | Nested unit |
treatment | unit | B is applied to A | Experimental unit |
record | unit | B is measured on A | Observational unit |
There are two orders of relationships in the system: the higher-order links between the factor nodes in the factor graph and the lower-order links between the level nodes in the level graph. A higher-order link is created by explicit specification by the user and depending on how it is defined, the system creates the lower-order links between the level nodes with some constraint as prescribed by their explicit role and other user inputs.
An intermediate construct of the experimental design is stored as an object that contain two types of directed graphs, \(G_H = (V(G_H), E(G_H))\) and \(G_L = (V(G_L), E(G_L))\), where \(V(G_F)\) and \(V(G_L)\) are sets of vertices and \(E(G_F)\) and \(E(G_L)\) are sets of edges. We refer \(G_F\) and \(G_L\) as factor graph and level graph, respectively. In the factor graph, every factor is represented as a single vertex. Every factor has a finite number of levels and each of these levels is represented as a single vertex in the level graph.
The above figure shows the level graph. The nodes of the same color are the levels of the same factor (e.g. all yellow nodes correspond to the levels of the unit, subplot). The shape of the node correspond to the class of the corresponding factor.
Table Table 7.2 shows an example of an illustrative experiment that tests the exam score on a different exam time allocation for two different subjects. In such an experiment, the observational unit may be specified as the combination of Subject-Student, i.e. an observational unit can only be uniquely identified by using information across multiple factors. In the grammar, a factor cannot be implicitly assumed from other factors. This restriction means, for example, that a new factor, like Exam Booklet, which uniquely identifies every Subject-Student combination must be specified in the system. This restriction is not only for the purpose of internal graph representation but forces the user to confront what the observational units actually are. Naming things are hard, but without naming things, it can be hard to create a shared understanding about the experimental structure.
Exam Booklet | Subject | Student | Exam Time | Score |
---|---|---|---|---|
1 | Math | 1 | Morning | 58 |
2 | Science | 1 | Afternoon | 90 |
3 | Math | 2 | Afternoon | 39 |
4 | Science | 2 | Morning | 80 |
7.1 An edibble graph
An edibble graph, or edbl_graph
object, is a special type of directed graph. This form is used to represent intermediate constructs of the experimental design.
In a factor graph:
- a vertex is a variable,
- an edge is a high-level connection between two variables, and
- the direction of an edge defines a relationship based on which two variables it is connecting. Say if we have two nodes named A and B with a directed edge starting from A to B, then the meaning of the relationship follows from Table 7.1. If the combination is not listed below then the nodes cannot have a direct relationship.
As an example, consider a split-plot design that contains 4 main plots with 2 sub plots within each main plot (so 8 subplots in total). There are 2 treatment factors: fertilizer (with levels A and B) and variety (with levels V1 and V2). Each level of the fertilizer is randomly applied to two main plots. Each level of variety is randomly applied to one sub plot within each main plot. Two responses are planned to be measured on the sub plots: yield and height.
In a level graph:
- a vertex is a level,
- an edge is a direct connection between two levels, and
- the direction of an edge defines the same relationship as for high-level view, except if the edge is connecting nodes of levels of the same unit variable then it represents the sequence order of the levels.
The whole edibble graph object contains all the nodes and edges from the high- and low-level views. The whole edibble graph can have numerous nodes and edges, even when the number of units are small, that it’s visualisation will be too cluttered to be any useful. Consequently, when visualising these intermediate construct of the experimental design, only a high- or low-level view is visible to the user, but the object contains the information seen in both views.
7.2 An edibble table (or data frame)
An edibble, or edbl_table
object, is a special class of tibble
. The word “edibble” itself already implies that it is a table so appending the word with table or data frame seems superfluous. However, edibble can refer to the package, object, or used as an adjective to other objects, so appending edibble with table or data frame is to make explicit emphasis it refers to the edbl_table
object, otherwise the reader is expected to infer its meaning by context.
An edbl_table
was originally called edbl_df
following convention from tibble but I decided to break away from this since other edibble components are graph and design, so a two letter word felt too short in contrast.
An edibble data frame is produced when the variables can be laid out in a tidy data format. An edibble is constructed from two possible ways:
- converting edibble graph to edibble using
serve_table
and - converting existing data frame to edibble using
edibble
.