In this example, we want to examine the effect of supplements on weight gain in pigs. We have 82 pigs, belonging to 22 litters.
At the beginning of the study, the pigs are given either 0mg of vitamin E, 100mg of vitamin E, or 200mg of vitamin E. Additionally, the pigs are either give 0mg of copper, 35mg of copper, or 175mg of copper.
Over 12 weeks, the pigs are weighed each week, and the amount that they eat is recorded.
import tisane as ts
import pandas as pd
df = pd.read_csv("pigs.csv")
First, we create an environmental variable using tisane.SetUp
. The concept of time is somewhat outside of any other unit, so we create a variable for it using tisane.SetUp
instead of tisane.Unit
.
There were 12 weeks in the study. The weeks are ordered, so we note the order of weeks order=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
and that cardinality=12
.
week = ts.SetUp("Time", order=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], cardinality=12)
Now, we add our variables for the pigs. Each pig is an individual, and it also has many attributes, so we create a variable pig
using tisane.Unit
.
Additionally, there were 82 pigs in the study, so we have cardinality=82
.
pig = ts.Unit("Pig", cardinality=72) # 72 pigs
Pigs are also grouped into litters, which are another level, so we also create a litter
unit. The 82 pigs were grouped in 22 litters.
litter = ts.Unit("Litter", cardinality=21) # 21 litters
Now we add the attributes that were recorded for each pig. We can attach them to the pig
unit using pig.ordinal
and pig.numeric
.
Ordinals are categorical variables with an ordering to them. Here, there are two ordinals: vitamin_e
and copper
. Each pig got separate amounts of each vitamin and mineral.
# Each pig has 1 instance of an ordinal Evit measure
vitamin_e = pig.ordinal(
"Evit", order=["Evit000", "Evit100", "Evit200"], number_of_instances=1
)
# Each pig has 1 instance of an ordinal Cu measure
copper = pig.ordinal("Cu", order=["Cu000", "Cu035", "Cu175"], number_of_instances=1)
Numerics are either integers or floats, and aren't bounded. Here, the pig's weight
and the amount of feed
it consumes are measured each week.
# Each pig has for each value of week 1 instance of a numeric Weight measure
# Also: Each pig has 1 instance of a Weight measure corresponding to each week
weight = pig.numeric("Weight", number_of_instances=week)
# Each pig has for each value of week 1 instance of a numeric Feed consumption measure
feed = pig.numeric("Feed consumption", number_of_instances=week)
week.causes(weight)
The pigs were also organized into litters, and we might expect that pigs in the same litter, sharing the same parents and genetics, will be more similar. So we add this relationship with nests_within
.
pig.nests_within(litter)
We can construct our study design using tisane.Design
. We want to show that week
, our independent variable predicts weight
.
We also attach the data we collected to the design.
design = ts.Design(dv=weight, ivs=[week]).assign_data(df)
Using the design
specification, tisane.infer_statistical_model_from_design
will find candidate statistical models to choose from. You can make your choices in the Tisane GUI, which will appear in the output of the following cell.
(You can also launch the Tisane GUI in a regular browser window. Setting jupyter=True
makes the GUI appear in the output of the cell.)
ts.infer_statistical_model_from_design(design=design, jupyter=True)
Look through each of the tabs. The tabs allow you to add additional independent variables, if Tisane detected they might be useful, as well as add clustering. On the fourth tab, after you pick the data distribution and link function that best fits your data, you can click on the Generate Code
button, which will output a Python script called model.py
.
The script model.py
should be in this folder! You can import it:
import model
Note that if you re-generate the model in the GUI after you've already imported the model in the above cell, you will need to re-start the kernel by going to the main menu and choosing Kernel > Restart
.
The function model.fit_model
returns the fitted model, and also outputs statistics about the model's fit.
m = model.fit_model()
Formula: Weight~Time+(1|Litter)+(1|Pig)+(1|Time) Family: gaussian Inference: parametric Number of observations: 861 Groups: {'Pig': 72.0, 'Litter': 21.0, 'Time': 12.0} Log-likelihood: -2355.822 AIC: 4711.644 Random effects: Name Var Std Pig (Intercept) 31.315 5.596 Litter (Intercept) 9.528 3.087 Time (Intercept) 1.764 1.328 Residual 9.757 3.124 No random effect correlations specified Fixed effects: Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig (Intercept) 15.686 13.195 18.177 1.271 27.926 12.342 0.0 *** Time 6.943 6.718 7.169 0.115 10.013 60.228 0.0 ***
To create a residuals plot, use the function model.show_model_diagnostics
.
model.show_model_diagnostics(m)