In the last tutorial, we discussed how to estimate quantities using experience sampling data that may be sensitive and therefore private. This time we are going to talk about how to do Bayesian inference, that is, how to test hypotheses. We’ll start by considering inference in general and then explore an example using experience sampling data. We’ll finish up by contrasting within and between subject analyses.

Just a reminder that you can access Private at https://private.mall-lab.com/.

There are multiple approaches to inference in the Bayesian world. We are going to focus on a method which conceives of inference as just a form of estimation, so it will look quite like our previous exercises.

In frequentist statistics, a common tool to make statistical inference is the t-test. In a t-test we have two distributions and we are interested in determining if their means are the same. If we had two means which were identical and we subtracted one from the other, we would get zero, indicating no difference between the two conditions. Of course, we are unlikely to get exactly zero because there will be noise in our estimation of both means. The Bayesian approach allows us to estimate our uncertainty about the means in both the experimental condition and the control condition and to build a difference distribution. If the probability of zero in that difference distribution is low, then we can conclude that the distributions do not have identical means.

**Inference with coins**

To see how we can form inferences this way, let’s return to our coin example. Suppose we have two coins which may or may not be biased to the same extent. To figure out whether they are the same, we flip each of them 100 times with the following results:

coin1 = [1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0] coin2 = [1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1]

Navigate to the Private window and enter the data (you may copy and paste the data straight into the Private window). Then set up models of each of the coins, by typing the following:

coin1 ~ Bernoulli(r1) coin2 ~ Bernoulli(r2) r1 ~ Uniform(0, 1) r2 ~ Uniform(0, 1)

r1 and r2 are the rates that we have estimated for coin1 and coin2, respectively. To determine if they are the same we can create a variable which is the difference of the r1 and r2 samples:

diff = r1 - r2

To plot the difference between the two rates:

diffplot = distplot(diff)

Your plot should look like this:

Notice that the plot does not overlap with zero, suggesting that the two distributions are not the same. The diff values are negative, suggesting that the rate of coin1 is lower than the rate of coin2. We can look at this directly by calculating the mean of r1 (0.28) and the mean of r2 (0.61) to confirm.

*Exercise 1: Suppose we have two more coins with data as follows:*

coin3 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0] coin4 = [0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0]

*Compare each of these two new coins against coin1. Which of these coins can you say with confidence is different from coin1?*

In the previous exercise, we eyeballed the plot of diff to determine whether zero was in the distribution. To make this exercise more precise, we can calculate the **credible interval** (actually the highest probability density interval, but these terms are often used interchangeably) of diff and indicate whether zero falls within it. The 95% credible interval is the region between the 2.5th percentile and the 97.5th percentile of the distribution. In Private, we can calculate the credible interval of the difference between the rate of coin1 and the rate of coin2 using the percentile function:

percentile(diff, 2.5) -0.4452904244825719 percentile(diff, 97.5) -0.1974235531125321

Now we can say that zero does not fall within the credible interval of the difference of the means and therefore reject the hypothesis that the distributions are the same.

*Exercise 2: Calculate the credible intervals for the differences between coin1 and coin3 and coin1 and coin4. Does zero fall in these intervals?*

**Inferences on heights**

Now that we have mastered how to do inference on rate variables, let’s try an example with Normal variables.

*Exercise 3: Are men or women taller? Using the data below of men and women’s heights calculate the credible interval of the difference of means variable. Were you able to conclude that they are different?*

maleheights = [176, 181, 164, 176, 176, 181, 180, 191, 152, 182, 169, 188, 182, 182, 190, 180, 189, 169, 190, 194, 173, 179, 155, 183, 191, 186, 178, 174, 179, 182, 188, 169, 186, 169, 182, 173, 171, 172, 174, 169, 179, 170, 174, 184, 201, 180, 183, 184, 196, 180, 168, 194, 174, 161, 189, 174, 186, 168, 176, 193, 177, 176, 187, 170, 175, 181, 168, 179, 159, 168, 186, 182, 162, 177, 176, 187, 172, 178, 178, 197, 175, 170, 170, 181, 186, 207, 187, 175, 163, 174, 169, 173, 170, 169, 164, 178, 201, 178, 198, 164] femaleheights = [161, 152, 163, 168, 166, 155, 166, 185, 161, 173, 161, 164, 184, 158, 138, 145, 160, 164, 173, 176, 182, 174, 178, 163, 153, 156, 165, 166, 158, 173, 147, 150, 171, 155, 167, 167, 161, 183, 156, 154, 162, 165, 169, 165, 171, 151, 176, 169, 179, 167, 165, 169, 162, 177, 169, 163, 154, 170, 199, 153, 169, 154, 156, 169, 162, 172, 153, 178, 168, 156, 169, 146, 162, 163, 167, 175, 174, 174, 161, 160, 172, 164, 184, 167, 166, 173, 159, 172, 153, 159, 152, 163, 143, 157, 154, 155, 172, 164, 154, 168]

*Exercise 4: If you only have the first three male heights and the first three female heights, can you conclude that males are taller than females? Explain how decreasing the amount of data changes the result.*

**Inferences on temperatures from experience sampling data**

The principles for doing inference with our experience sampling data are just the same. Let’s try making inferences about our fake experience sampling data.

*Exercise 5: Using DemoEvents, calculate the credible interval of the difference between winter and summer temperatures. Can you conclude that winter is cooler than summer? Hint: You can extract winter temperatures as follows:*

[e.Temperature for e in DemoEvents if "Winter" in e.Keywords and e.hasField("Temperature")]

**Inferences about anxious dogs using a repeated measures design**

The analyses that we have considered so far have been between subjects. That is, we have had two independent samples to compare. When data involve measurements taken from the same individual in different conditions we can improve our ability to draw conclusions by employing a repeated measures analysis.

To illustrate let’s talk about anxious dogs. In particular, we are going to consider data collected by Beata, Beaumont-Graff, Diaz, Marion, Massal, Marlois, Muller and Lefranc (2007) about the effects of administering Zylkene on a dog’s anxiety level. On the anxiety scale they used, low numbers correspond to lower anxiety levels. They administered the drug on five occasions, but we will just consider the anxiety levels at time point one and five. The data are as follows:

before = array([20, 22, 28, 27, 22, 28, 20, 24, 28, 29, 26, 25, 23, 23, 23, 21, 22, 27, 22]) after = array([20, 14, 20, 24, 19, 21, 22, 20, 14, 19, 15, 18, 16, 12, 12, 15, 11, 13, 11])

Instead of estimating means for each condition and then taking the difference, we are going to take the difference between the data values and then estimate the distribution as follows:

diff = before - after

Now we are going to model diff as a normal distribution, with a mean mu and a standard deviation sigma. When estimating means and standard deviations for the priors, we will use a mean of 0 and a standard deviation of 20. We will then calculate the upper and lower confidence intervals and conclude whether the treatment is having an effect.

diff ~ Normal(mu, sigma) mu ~ Normal(0, 20) sigma ~ HalfNormal(20) CILower = percentile(mu, 2.5) CIUpper = percentile(mu, 97.5)

The estimate of CILower is 5.322 which is above 0 and so we conclude that the treatment is having an effect.

*Exercise 6 – Repeat the previous analysis assuming that we are talking about two different sets of dogs (ie it is a between-subjects analysis). How does this impact your confidence in your response?*

**Food for thought**

The standard approach to inference in the psychological sciences is to compare means as we have above. The inferences that we make, however, are quite often about individuals not means of groups. For instance, if we were to run an experiment testing memory for long and short lists of words, we would typically test the means. If we found that the mean for the short list was reliably higher than the mean for long lists, we might conclude that people are better at short lists than long lists – by which we would infer that all people are better at short lists than long lists. But this may not be the case?The mean will be different provided a consistent proportion of people are different, but that proportion can be arbitrarily small. There is an argument that in many circumstances we ought to be testing what proportion of people are different rather than whether the means are different (see Dennis, Lee and Kinnell, 2008).

*Challenge Exercise: Define a model that estimates the proportion of dogs who showed an effect of the anxiety treatment (hint: you will need to use a Bernoulli variable, and you may find more hints in the reading above).*

**Summary**

In this tutorial, we have focused our attention on drawing inferences from data including experience sampling data. While there are many ways that Bayesians do hypothesis testing, thinking of inference as another kind of estimation makes results more robust and more intuitive.

**References**

Beata, C., Beaumont-Graff, E., Diaz, C.,Marion, M., Massal, N., Marlois, N., Muller, G., Lefranc, C., (2007). “Effects of Alpha-Casozepine (Zylkene) versus Selegiline Hydrochloride (Selgian, Anipryl) on Anxiety Disorders in Dogs,” Journal of Veterinary Behavior, Vol. 2, 175-183

Dennis, S., Lee, M. D., & Kinnell, A. (2008). Bayesian analysis of recognition memory: The case of the list-length effect. *Journal of Memory and Language, 59*(3), 361-376.

### Solutions to Exercises

*Exercise 1: Suppose we have two more coins with data as follows:*

coin3 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0] coin4 = [0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0]

*Compare each of these two new coins against coin1. Which of these coins can you say with confidence is different from coin1?*

coin1 ~ Bernoulli(r1) coin3 ~ Bernoulli(r3) coin4 ~ Bernoulli(r4) r1 ~ Uniform(0, 1 r3 ~ Uniform(0, 1) r4 ~ Uniform(0, 1) diff13 = r1 - r3 diff14 = r1 - r4 plot13 = distplot(diff13) plot14 = distplot(diff14)

We can say with confidence that coin4 is different to coin1 (right image), as 0 does not lie within the difference distribution of these two coins. We cannot say with confidence that coin3 is different to coin1 (left image), as 0 does lie within the difference distribution of these two coins.

*Exercise 2: Calculate the credible intervals for the differences between coin1 and coin3 and coin1 and coin4. Does zero fall in these intervals?*

Using the variables created in Exercise 1:

percentile(diff13, 2.5) -0.13354425865397213 percentile(diff13, 97.5) 0.10459568268559756

Yes, 0 falls within the 95% credible interval of diff13.

percentile(diff14, 2.5) -0.3107069093813402 percentile(diff14, 97.5) -0.0560365592397889

No, 0 does not fall within the 95% credible interval of diff14.

*Exercise 3: Are men or women taller? Using the data below of men and women’s heights calculate the credible interval of the difference of means variable. Were you able to conclude that they are different?*

maleheights = [176, 181, 164, 176, 176, 181, 180, 191, 152, 182, 169, 188, 182, 182, 190, 180, 189, 169, 190, 194, 173, 179, 155, 183, 191, 186, 178, 174, 179, 182, 188, 169, 186, 169, 182, 173, 171, 172, 174, 169, 179, 170, 174, 184, 201, 180, 183, 184, 196, 180, 168, 194, 174, 161, 189, 174, 186, 168, 176, 193, 177, 176, 187, 170, 175, 181, 168, 179, 159, 168, 186, 182, 162, 177, 176, 187, 172, 178, 178, 197, 175, 170, 170, 181, 186, 207, 187, 175, 163, 174, 169, 173, 170, 169, 164, 178, 201, 178, 198, 164] femaleheights = [161, 152, 163, 168, 166, 155, 166, 185, 161, 173, 161, 164, 184, 158, 138, 145, 160, 164, 173, 176, 182, 174, 178, 163, 153, 156, 165, 166, 158, 173, 147, 150, 171, 155, 167, 167, 161, 183, 156, 154, 162, 165, 169, 165, 171, 151, 176, 169, 179, 167, 165, 169, 162, 177, 169, 163, 154, 170, 199, 153, 169, 154, 156, 169, 162, 172, 153, 178, 168, 156, 169, 146, 162, 163, 167, 175, 174, 174, 161, 160, 172, 164, 184, 167, 166, 173, 159, 172, 153, 159, 152, 163, 143, 157, 154, 155, 172, 164, 154, 168]

maleheights ~ Normal(muMale,sdMale) muMale ~ HalfNormal(100) sdMale ~ HalfNormal(100) femaleheights ~ Normal(muFemale, sdFemale) muFemale ~ HalfNormal(100) sdFemale ~ HalfNormal(100) diffheights = muMale-muFemale CILower = percentile(diffheights, 2.5) CIUpper = percentile(diffheights, 97.5)

CILower 10.88390868781887 CIUpper 16.76501134944898

As 0 doesn’t lie between the lower and upper credible intervals we conclude that men are taller than women.

*Exercise 4: If you only have the first three male heights and the first three female heights, can you conclude that males are taller than females? Explain how decreasing the amount of data changes the result.*

maleheights3 = [176, 181, 164] femaleheights3 = [161, 152, 163] maleheights3 ~ Normal(muMale3,sdMale3) muMale3 ~ HalfNormal(100) sdMale3 ~ HalfNormal(100) femaleheights3 ~ Normal(muFemale3, sdFemale3) muFemale3 ~ HalfNormal(100) sdFemale3 ~ HalfNormal(100) diffheights = muMale3-muFemale3 CILower = percentile(diffheights, 2.5) CIUpper = percentile(diffheights, 97.5) CILower -52.79999819224508 CIUpper 79.67569036106708

With only three observations per condition we have much less precise estimates of the mean heights and therefore cannot conclude that males are taller than females.

*Exercise 5: Using DemoEvents, calculate the credible interval of the difference between winter and summer temperatures. Can you conclude that winter is cooler than summer? Hint: You can extract winter temperatures as follows:*

[e.Temperature for e in DemoEvents if "Winter" in e.Keywords and e.hasField("Temperature")]

winter = [e.Temperature for e in DemoEvents if "Winter" in e.keywords and e.hasField("Temperature")] summer = [e.Temperature for e in DemoEvents if "Summer" in e.keywords and e.hasField("Temperature")] winter ~ Normal(muWinter, sigmaWinter) summer ~ Normal(muSummer, sigmaSummer) muWinter ~ Normal(0, 100) muSummer ~ Normal(0, 100) sigmaSummer ~ HalfNormal(100) sigmaWinter ~ HalfNormal(100) CILower = percentile(diff, 2.5) CIUpper = percentile(diff, 97.5) diff = muSummer - muWinter

If we ask Private for the values of CILower and CIUpper, we get:

CILower 12.12602467197314 CIUpper 13.729209350524874

As 0 does not lie between CILower and CIUpper we are able to say with confidence that the two means are different. Note that we could have estimated the sigma for summer and winter separately if we had reason to believe that the variability of summer temperatures was different from the variability of winter temperatures.

*Exercise 6 – Repeat the analysis of anxious dogs assuming that we are talking about two different sets of dogs (ie it is a between-subjects analysis). How does this impact your confidence in your response?*

before = array([20, 22, 28, 27, 22, 28, 20, 24, 28, 29, 26, 25, 23, 23, 23, 21, 22, 27, 22]) after = array([20, 14, 20, 24, 19, 21, 22, 20, 14, 19, 15, 18, 16, 12, 12, 15, 11, 13, 11]) before ~ Normal(muBefore, sigma) after ~ Normal(muAfter, sigma) muBefore ~ Normal(0, 100) muAfter ~ Normal(0, 100) sigma ~ HalfNormal(100) diff = muBefore - muAfter CILower = percentile(diff, 2.5) CIUpper = percentile(diff, 97.5)

CILower 5.244105145990384 CIUpper = percentile(diff, 97.5) 9.946180129536534

Notice that CILower is smaller now than it was. We have switched from a within subjects analysis to a between subjects analysis, so we are less certain about the result. Within subjects experimental designs eliminate the between subjects variability, and therefore are more powerful.