An Introduction to the Private Language

In this post, I am going to provide a tutorial style introduction to the Private programming language. Private is a privacy preserving probabilistic scripting language. This will be the first of a series of three tutorials that cover different aspects of the language. I’ll link these in here when they are complete.

The tutorial assumes you have a basic understanding of Bayes theorem and Bayesian data analysis. If not, it would be good to take a look at some of these resources before attempting the tutorial:

If you want to try the examples out for yourself, you can access the Private language at https://private.mall-lab.com.

Estimating the bias of a coin

We are going to begin our exploration of the Private language by considering how one would go about estimating the degree of bias in a coin. If a coin is fair, it has a 0.5 probability of landing on heads and a 0.5 probability of landing on tails. However, it is also possible that we have been handed a ‘trick coin’ which is weighted to favour either heads or tails. In this exercise, we are going to be estimating the probability (or rate) of our weighted coin landing on heads.

We’ll start by entering our data. Let’s assume that we flipped our coin 10 times and saw the following pattern:

Head, Tail, Head, Tail, Tail, Tail, Tail, Head, Tail, Tail

We are going to code the data using a one for the heads and a zero for the tails. To enter the data into Private we type:

flips = [1, 0, 1, 0, 0, 0, 0, 1, 0, 0]

So, we have the data (3 heads and 7 tails) that we are going to model. Now we want to define the Bayesian model that we will use to estimate the rate parameter. 

To define a Bayesian model, we need to specify the likelihood of the data given the parameters and the priors of the parameters. We are going to assume that the model that generated the data is a Bernoulli distribution. The Bernoulli distribution describes a random variable that has two possible outcomes given a parameter that determines the probability of each. So, we can go ahead and define this as follows:

flips ~ Bernoulli(r)

where r is the unknown rate variable. Note we have used the “~” character here rather than the “=” because we are providing the probabilistic definition of the variable now – rather than setting its value. We are saying “we believe flips is a Bernoulli random variable”.

Before we can estimate r, we also need to specify the prior. We know that r is a probability, so it has to lie between 0 and 1, but we don’t know anything else about it, so we will define it as a uniform distribution as follows:

r ~ Uniform(0,1)

Notice that the uniform distribution takes the upper and lower bounds as input parameters, in this case 0 and 1.

To see all the variables, type:

sv

sv stands for “show variables.” 

flips = [1, 0, 1, 0, 0, 0, 0, 1, 0, 0] [1, 0, 1, 0, 0, 0, ...]
flips ~ Bernoulli(r)                   [1, 0, 1, 0, 0, 0, ...]
r ~ Uniform(0, 1)                      Computing

You can now see everything you have defined so far. On the left hand side is the code that you typed in and on the right hand side are the values of the variables. When interacting with the shells of languages like python or R, the values of the variables are retained, but the code that generated those values is discarded (which is why a separate code editor is necessary). In Private, the code is retained, so that it can be automatically run again in the future if necessary. In the first line, the set of values on the left hand side is there because they were included in the code. They are repeated on the right hand side because that is the value of the variable flips.

Note r is now ‘Computing’ because all definitions have been provided, but the computer is still generating posterior samples of the random variable r. This means that the computer is generating many estimates of the true rate, given what it knows from the data, the likelihood, and the prior. There is some randomness in the generation of these estimates, but more likely estimates will be generated more often.

While we wait for the computer to finish, we can define some other variables that will be useful. Let’s compute the mean and the standard deviation of r as follows:

meanr = mean(r)
stdr = std(r)

Now type sv. If Private has finished estimating r you’ll see the following:

flips = [1, 0, 1, 0, 0, 0, 0, 1, 0, 0]  [1, 0, 1, 0, 0, 0, …]
meanr = mean(r)                         0.334    
stdr = std(r)                           0.130    
flips ~ Bernoulli(r)                    [1, 0, 1, 0, 0, 0, …]    
r ~ Uniform(0, 1)                       [0.256 … 0.350]        

Inside the brackets next to r are all of the samples (i.e. estimates) that have been generated. It is truncated due to space requirements, so only the first and last values are displayed. If you want to see all of the samples type the name of the variable (r):

[0.256 0.376 0.434 0.508 0.279 0.345 0.149 0.248 0.328 0.093 0.096 0.46
     0.531 0.281 0.453 0.3   0.287 0.526 0.458 0.364 0.264 0.445 0.275 0.343
     0.401 0.128 0.272 0.314 0.338 0.15  0.182 0.574 0.568 0.291 0.27  0.107
     0.302 0.474 0.224 0.565 0.345 0.42  0.646 0.209 0.504 0.226 0.444 0.357
     0.514 0.117 0.204 0.281 0.438 0.352 0.441 0.408 0.49  0.397 0.243 0.108
     0.271 0.255 0.294 0.51  0.194 0.475 0.449 0.224 0.551 0.235 0.42  0.385
     0.484 0.259 0.191 0.343 0.127 0.106 0.306 0.319 0.353 0.479 0.061 0.275
     0.29  0.19  0.173 0.422 0.369 0.389 0.263 0.218 0.176 0.357 0.407 0.378
     0.349 0.206 0.34  0.342 0.264 0.328 0.276 0.205 0.508 0.318 0.058 0.306
     0.296 0.329 0.37  0.256 0.374 0.248 0.27  0.251 0.375 0.332 0.376 0.326
     0.311 0.486 0.334 0.425 0.26  0.365 0.168 0.242 0.457 0.525 0.339 0.392
     0.56  0.461 0.462 0.155 0.204 0.443 0.251 0.204 0.209 0.175 0.415 0.416
     0.201 0.176 0.343 0.12  0.299 0.285 0.49  0.247 0.209 0.388 0.295 0.372
     0.43  0.448 0.173 0.295 0.328 0.279 0.455 0.4   0.435 0.487 0.31  0.134
     0.454 0.114 0.272 0.315 0.341 0.477 0.308 0.419 0.323 0.237 0.322 0.303
     0.518 0.594 0.457 0.457 0.394 0.321 0.389 0.389 0.188 0.253 0.394 0.224
     0.428 0.442 0.192 0.278 0.369 0.517 0.35  0.349 0.404 0.456 0.372 0.209
     0.361 0.254 0.269 0.26  0.344 0.252 0.239 0.328 0.219 0.167 0.445 0.278
     0.158 0.282 0.442 0.204 0.291 0.23  0.453 0.171 0.203 0.694 0.246 0.247
     0.275 0.113 0.257 0.268 0.415 0.381 0.161 0.421 0.568 0.337 0.276 0.31
     0.161 0.511 0.237 0.157 0.192 0.175 0.082 0.654 0.208 0.281 0.189 0.192
     0.096 0.264 0.169 0.261 0.04  0.487 0.188 0.316 0.468 0.293 0.191 0.595
     0.296 0.216 0.108 0.375 0.243 0.306 0.146 0.244 0.314 0.118 0.419 0.42
     0.328 0.477 0.641 0.041 0.272 0.218 0.462 0.204 0.508 0.358 0.42  0.46
     0.377 0.446 0.288 0.251 0.372 0.111 0.206 0.54  0.474 0.322 0.31  0.456
     0.529 0.382 0.212 0.336 0.203 0.421 0.476 0.394 0.155 0.15  0.371 0.465
     0.503 0.178 0.192 0.274 0.467 0.444 0.173 0.324 0.452 0.238 0.628 0.431
     0.459 0.126 0.35  0.213 0.282 0.348 0.241 0.371 0.253 0.405 0.536 0.156
     0.213 0.408 0.307 0.265 0.445 0.123 0.381 0.456 0.172 0.521 0.492 0.454
     0.312 0.487 0.158 0.478 0.479 0.119 0.365 0.428 0.408 0.179 0.171 0.409
     0.252 0.423 0.469 0.428 0.278 0.397 0.361 0.685 0.203 0.223 0.318 0.334
     0.496 0.139 0.411 0.176 0.373 0.327 0.381 0.227 0.169 0.3   0.317 0.342
     0.457 0.394 0.468 0.277 0.472 0.129 0.202 0.681 0.258 0.408 0.453 0.421
     0.14  0.34  0.298 0.201 0.422 0.504 0.339 0.379 0.337 0.114 0.215 0.179
     0.471 0.436 0.418 0.444 0.58  0.269 0.108 0.558 0.42  0.202 0.336 0.211
     0.34  0.267 0.242 0.383 0.397 0.373 0.735 0.3   0.42  0.431 0.233 0.39
     0.265 0.575 0.339 0.151 0.522 0.511 0.287 0.48  0.182 0.352 0.138 0.558
     0.284 0.374 0.236 0.167 0.458 0.541 0.169 0.394 0.297 0.327 0.579 0.298
     0.381 0.453 0.159 0.215 0.546 0.201 0.173 0.262 0.588 0.664 0.388 0.687
     0.205 0.728 0.065 0.338 0.499 0.345 0.128 0.27  0.285 0.363 0.608 0.558
     0.329 0.274 0.45  0.395 0.398 0.463 0.368 0.268 0.04  0.288 0.152 0.367
     0.209 0.332 0.449 0.259 0.484 0.528 0.604 0.429 0.695 0.413 0.581 0.601
     0.334 0.528 0.418 0.243 0.332 0.234 0.194 0.398 0.46  0.504 0.23  0.043
     0.226 0.456 0.259 0.435 0.351 0.414 0.506 0.314 0.46  0.357 0.419 0.161
     0.242 0.329 0.304 0.195 0.164 0.198 0.388 0.549 0.36  0.438 0.302 0.176
     0.158 0.373 0.362 0.299 0.245 0.276 0.183 0.393 0.621 0.513 0.241 0.351
     0.335 0.347 0.226 0.269 0.339 0.427 0.339 0.288 0.395 0.085 0.097 0.345
     0.556 0.444 0.28  0.343 0.565 0.291 0.203 0.118 0.175 0.296 0.151 0.371
     0.324 0.284 0.504 0.464 0.158 0.528 0.434 0.384 0.379 0.563 0.37  0.119
     0.187 0.369 0.132 0.117 0.282 0.34  0.398 0.409 0.281 0.273 0.451 0.368
     0.64  0.535 0.368 0.235 0.574 0.392 0.473 0.537 0.122 0.448 0.302 0.319
     0.23  0.141 0.233 0.12  0.277 0.378 0.168 0.306 0.374 0.123 0.304 0.261
     0.394 0.373 0.537 0.265 0.342 0.332 0.441 0.4   0.329 0.424 0.086 0.327
     0.277 0.404 0.251 0.366 0.323 0.309 0.453 0.251 0.313 0.232 0.215 0.457
     0.292 0.384 0.451 0.519 0.245 0.275 0.319 0.096 0.209 0.277 0.186 0.224
     0.2   0.071 0.454 0.189 0.244 0.274 0.261 0.201 0.455 0.47  0.528 0.24
     0.537 0.465 0.493 0.62  0.24  0.321 0.377 0.209 0.226 0.373 0.235 0.302
     0.489 0.439 0.187 0.341 0.385 0.575 0.275 0.214 0.174 0.392 0.58  0.434
     0.28  0.177 0.393 0.133 0.343 0.624 0.265 0.228 0.221 0.162 0.288 0.256
     0.073 0.367 0.331 0.329 0.551 0.193 0.377 0.511 0.176 0.442 0.113 0.075
     0.387 0.387 0.202 0.232 0.31  0.285 0.466 0.365 0.16  0.366 0.267 0.18
     0.466 0.18  0.47  0.524 0.289 0.504 0.274 0.21  0.4   0.233 0.409 0.428
     0.557 0.256 0.15  0.318 0.273 0.114 0.167 0.259 0.213 0.392 0.18  0.655
     0.377 0.385 0.222 0.588 0.119 0.777 0.312 0.176 0.255 0.55  0.344 0.254
     0.374 0.385 0.32  0.398 0.221 0.296 0.321 0.503 0.619 0.343 0.357 0.384
     0.58  0.288 0.408 0.106 0.255 0.313 0.519 0.542 0.238 0.531 0.215 0.421
     0.276 0.483 0.331 0.361 0.459 0.178 0.385 0.401 0.233 0.473 0.555 0.538
     0.562 0.096 0.376 0.228 0.274 0.253 0.356 0.295 0.205 0.282 0.436 0.192
     0.392 0.432 0.57  0.425 0.114 0.352 0.369 0.407 0.493 0.272 0.36  0.671
     0.526 0.393 0.055 0.336 0.457 0.363 0.345 0.702 0.222 0.428 0.369 0.243
     0.399 0.105 0.211 0.446 0.436 0.358 0.163 0.368 0.32  0.284 0.438 0.374
     0.187 0.379 0.119 0.243 0.201 0.29  0.045 0.192 0.372 0.217 0.32  0.497
     0.262 0.318 0.559 0.394 0.457 0.286 0.17  0.36  0.187 0.405 0.308 0.396
     0.483 0.301 0.276 0.277 0.221 0.398 0.649 0.146 0.28  0.49  0.145 0.452
     0.246 0.332 0.343 0.251 0.212 0.224 0.165 0.372 0.249 0.329 0.388 0.268
     0.221 0.212 0.275 0.282 0.145 0.281 0.363 0.318 0.199 0.512 0.45  0.529
     0.693 0.379 0.301 0.246 0.331 0.344 0.451 0.331 0.579 0.196 0.437 0.394
     0.46  0.223 0.449 0.21  0.284 0.611 0.628 0.401 0.245 0.596 0.23  0.698
     0.446 0.494 0.107 0.145 0.364 0.529 0.347 0.301 0.419 0.159 0.117 0.468
     0.344 0.503 0.068 0.259 0.706 0.459 0.694 0.173 0.238 0.285 0.397 0.487
     0.354 0.313 0.316 0.197 0.277 0.375 0.613 0.264 0.653 0.395 0.167 0.3
     0.376 0.287 0.427 0.283 0.42  0.407 0.225 0.51  0.365 0.183 0.315 0.388
     0.181 0.379 0.358 0.436 0.365 0.461 0.242 0.488 0.497 0.378 0.208 0.186
     0.25  0.435 0.33  0.543 0.286 0.431 0.155 0.302 0.168 0.483 0.409 0.282
     0.32  0.359 0.272 0.487 0.146 0.408 0.392 0.39  0.365 0.145 0.499 0.23
     0.226 0.204 0.378 0.269 0.2   0.158 0.287 0.385 0.408 0.05  0.234 0.381
     0.281 0.105 0.398 0.314 0.269 0.218 0.226 0.34  0.407 0.241 0.351 0.214
     0.391 0.344 0.284 0.247 0.526 0.4   0.167 0.347 0.164 0.341 0.241 0.399
     0.399 0.223 0.656 0.237 0.106 0.352 0.276 0.631 0.464 0.218 0.398 0.381
     0.307 0.252 0.445 0.441 0.342 0.369 0.177 0.316 0.158 0.446 0.314 0.559
     0.373 0.258 0.343 0.35  0.119 0.391 0.233 0.26  0.531 0.559 0.298 0.326
     0.29  0.461 0.334 0.4   0.292 0.149 0.37  0.189 0.245 0.453 0.304 0.064
     0.318 0.474 0.374 0.469 0.156 0.434 0.345 0.256 0.192 0.52  0.475 0.475
     0.42  0.174 0.26  0.457 0.207 0.674 0.394 0.36  0.219 0.254 0.602 0.398
     0.328 0.516 0.451 0.17  0.185 0.411 0.265 0.455 0.665 0.437 0.537 0.385
     0.373 0.183 0.434 0.132 0.215 0.191 0.405 0.535 0.509 0.45  0.179 0.138
     0.427 0.37  0.526 0.368 0.32  0.194 0.254 0.263 0.481 0.456 0.402 0.421
     0.504 0.365 0.281 0.208 0.217 0.528 0.299 0.337 0.242 0.403 0.35  0.375
     0.45  0.313 0.127 0.321 0.315 0.331 0.556 0.134 0.438 0.381 0.262 0.538
     0.377 0.424 0.318 0.251 0.468 0.224 0.334 0.459 0.363 0.422 0.282 0.288
     0.365 0.235 0.462 0.353 0.481 0.365 0.399 0.252 0.571 0.279 0.38  0.096
     0.163 0.272 0.334 0.07  0.401 0.443 0.211 0.341 0.394 0.371 0.482 0.149
     0.225 0.157 0.719 0.138 0.396 0.296 0.399 0.338 0.32  0.428 0.3   0.127
     0.45  0.208 0.262 0.42  0.184 0.328 0.452 0.11  0.185 0.415 0.396 0.205
     0.168 0.734 0.355 0.411 0.294 0.246 0.186 0.596 0.659 0.452 0.431 0.343
     0.317 0.409 0.426 0.468 0.352 0.376 0.415 0.129 0.286 0.154 0.47  0.322
     0.373 0.33  0.29  0.318 0.207 0.385 0.44  0.12  0.347 0.443 0.186 0.327
     0.412 0.28  0.353 0.321 0.258 0.361 0.44  0.249 0.23  0.269 0.513 0.294
     0.208 0.213 0.515 0.157 0.238 0.109 0.27  0.5   0.383 0.426 0.231 0.203
     0.374 0.218 0.401 0.138 0.165 0.473 0.14  0.246 0.587 0.394 0.328 0.601
     0.646 0.28  0.272 0.347 0.317 0.404 0.317 0.441 0.258 0.041 0.516 0.064
     0.422 0.645 0.469 0.344 0.269 0.353 0.189 0.393 0.251 0.372 0.446 0.193
     0.427 0.141 0.186 0.123 0.076 0.243 0.517 0.46  0.34  0.203 0.079 0.29
     0.346 0.237 0.272 0.425 0.643 0.417 0.305 0.34  0.325 0.086 0.311 0.397
     0.488 0.409 0.675 0.321 0.207 0.233 0.387 0.363 0.387 0.362 0.266 0.208
     0.125 0.352 0.288 0.383 0.296 0.251 0.382 0.298 0.408 0.121 0.092 0.389
     0.437 0.452 0.482 0.317 0.266 0.352 0.455 0.305 0.341 0.491 0.064 0.42
     0.281 0.346 0.327 0.244 0.252 0.602 0.34  0.294 0.242 0.239 0.538 0.409
     0.228 0.254 0.312 0.234 0.558 0.474 0.353 0.239 0.284 0.579 0.132 0.327
     0.379 0.221 0.192 0.281 0.159 0.31  0.327 0.356 0.446 0.363 0.384 0.208
     0.478 0.516 0.466 0.228 0.462 0.283 0.278 0.293 0.309 0.223 0.341 0.296
     0.182 0.062 0.35  0.359 0.185 0.142 0.306 0.082 0.215 0.393 0.35  0.256
     0.153 0.24  0.218 0.29  0.524 0.246 0.283 0.549 0.132 0.307 0.574 0.468
     0.326 0.493 0.458 0.406 0.592 0.316 0.627 0.441 0.458 0.221 0.398 0.373
     0.466 0.189 0.262 0.331 0.379 0.394 0.098 0.358 0.142 0.198 0.331 0.196
     0.355 0.422 0.194 0.41  0.317 0.599 0.252 0.188 0.457 0.384 0.415 0.424
     0.371 0.216 0.328 0.255 0.355 0.435 0.246 0.112 0.238 0.17  0.069 0.237
     0.351 0.255 0.419 0.374 0.28  0.425 0.564 0.456 0.338 0.375 0.37  0.242
     0.448 0.415 0.62  0.416 0.393 0.288 0.34  0.207 0.398 0.276 0.392 0.113
     0.443 0.493 0.218 0.354 0.537 0.193 0.393 0.252 0.198 0.367 0.214 0.358
     0.415 0.204 0.457 0.331 0.514 0.291 0.363 0.297 0.489 0.334 0.332 0.47
     0.515 0.269 0.355 0.228 0.464 0.23  0.396 0.286 0.158 0.368 0.279 0.356
     0.194 0.254 0.377 0.366 0.295 0.277 0.539 0.422 0.392 0.247 0.352 0.208
     0.36  0.204 0.308 0.386 0.172 0.257 0.669 0.16  0.347 0.441 0.366 0.322
     0.218 0.646 0.243 0.276 0.34  0.457 0.569 0.511 0.505 0.189 0.278 0.529
     0.45  0.398 0.484 0.557 0.324 0.508 0.172 0.271 0.526 0.474 0.234 0.166
     0.355 0.11  0.334 0.279 0.403 0.356 0.239 0.077 0.086 0.387 0.497 0.187
     0.444 0.272 0.711 0.426 0.42  0.306 0.388 0.391 0.183 0.22  0.328 0.373
     0.505 0.175 0.496 0.302 0.389 0.361 0.353 0.236 0.465 0.278 0.126 0.189
     0.271 0.313 0.27  0.3   0.329 0.163 0.258 0.226 0.399 0.399 0.174 0.556
     0.259 0.488 0.592 0.329 0.171 0.572 0.23  0.458 0.516 0.416 0.296 0.347
     0.44  0.42  0.103 0.478 0.329 0.533 0.333 0.131 0.467 0.236 0.421 0.164
     0.079 0.41  0.358 0.278 0.501 0.292 0.153 0.257 0.405 0.325 0.309 0.416
     0.182 0.265 0.352 0.237 0.332 0.117 0.104 0.392 0.243 0.63  0.245 0.202
     0.365 0.561 0.204 0.401 0.171 0.406 0.265 0.135 0.419 0.356 0.361 0.118
     0.378 0.44  0.286 0.542 0.142 0.473 0.406 0.135 0.368 0.459 0.39  0.302
     0.264 0.128 0.355 0.424 0.358 0.31  0.486 0.37  0.133 0.169 0.154 0.187
     0.413 0.307 0.37  0.416 0.403 0.461 0.124 0.274 0.27  0.608 0.416 0.417
     0.338 0.659 0.305 0.322 0.376 0.491 0.376 0.229 0.428 0.147 0.443 0.181
     0.299 0.33  0.358 0.178 0.372 0.188 0.342 0.374 0.314 0.126 0.348 0.354
     0.444 0.226 0.181 0.415 0.493 0.274 0.217 0.252 0.351 0.275 0.24  0.314
     0.186 0.276 0.353 0.431 0.361 0.351 0.306 0.296 0.263 0.204 0.213 0.281
     0.15  0.221 0.544 0.399 0.42  0.136 0.216 0.558 0.131 0.463 0.246 0.246
     0.236 0.28  0.595 0.23  0.261 0.352 0.356 0.165 0.344 0.251 0.216 0.187
     0.252 0.318 0.367 0.425 0.4   0.286 0.309 0.348 0.363 0.08  0.264 0.419
     0.225 0.382 0.405 0.342 0.312 0.675 0.384 0.421 0.308 0.35  0.155 0.365
     0.275 0.226 0.436 0.384 0.178 0.464 0.343 0.361 0.452 0.439 0.244 0.142
     0.335 0.732 0.172 0.36  0.616 0.314 0.378 0.287 0.255 0.359 0.264 0.093
     0.329 0.366 0.523 0.402 0.31  0.295 0.352 0.414 0.416 0.293 0.364 0.272
     0.4   0.414 0.596 0.26  0.229 0.18  0.454 0.394 0.145 0.339 0.367 0.423
     0.404 0.366 0.293 0.32  0.177 0.358 0.281 0.232 0.272 0.326 0.308 0.28
     0.363 0.174 0.249 0.117 0.295 0.249 0.522 0.333 0.509 0.343 0.26  0.49
     0.437 0.183 0.212 0.203 0.649 0.343 0.398 0.449 0.439 0.523 0.195 0.477
     0.174 0.451 0.29  0.562 0.27  0.039 0.653 0.406 0.46  0.515 0.317 0.415
     0.396 0.326 0.42  0.291 0.541 0.235 0.284 0.335 0.456 0.386 0.328 0.125
     0.112 0.476 0.527 0.207 0.291 0.256 0.371 0.352 0.491 0.239 0.415 0.266
     0.549 0.229 0.219 0.378 0.226 0.269 0.179 0.35 ]

Because the samples of r are now available, meanr and stdr have been calculated. Note meanr is 0.334, which makes sense as there were 3 heads and 7 tails in our data, so we would expect the rate to be around 0.3.

To make it easier to try out different data sets, we are going to generate fake data with a known probability of the coin landing on heads, so that we can see how well we are able to recover the correct value. To generate 100 flips with a coin that has a probability of 0.3 type:

realr = 0.3
flips = Bernoulli(realr, 100)

We say the variable “flips” contains 100 samples of a Bernoulli variable with a rate parameter of 0.3. 

To view the 100 samples that were created, type in the variable name – in this case flips:

[0 0 1 1 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0
 1 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
 1 1 1 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0]

Now if you type sv you’ll see:

flips = Bernoulli(realr, 100)  [0.000 … 0.000]    
meanr = mean(r)                0.390    
stdr = std(r)                  0.047    
realr = 0.3                    0.300    
flips ~ Bernoulli(r)           [0.000 … 0.000]    
r ~ Uniform(0, 1)              [0.417 … 0.475]    

Note that meanr is close to the realr value that we used to generate the data.

One of the nice features of Private is that if we change the value of a variable then the other variables change accordingly. For instance, if we now type:

realr = 0.9

And then sv, we will get:

flips = Bernoulli(realr, 100)  [1.000 … 1.000]    
meanr = mean(r)                0.844    
stdr = std(r)                  0.036    
realr = 0.9                    0.900    
flips ~ Bernoulli(r)           [1.000 … 1.000]    
r ~ Uniform(0, 1)              [0.845 … 0.882]  

So again meanr is a reasonable approximation to the rate we used to generate the data.

You can now generate a histogram of the samples of r like this:

rplot = distplot(r)

If you show the variables using sv you will see:

flips = Bernoulli(realr, 100)  [1.000 … 1.000]    
meanr = mean(r)                0.844    
stdr = std(r)                  0.036    
realr = 0.9                    0.900    
rplot = distplot(r)            [PNG Image]    
flips ~ Bernoulli(r)           [1.000 … 1.000]    
r ~ Uniform(0, 1)              [0.845 … 0.882]

Notice that rplot appears as a variable. Type rplot and you will also see the histogram of the values:

Most of the values of r are around 0.9. The blue line is called the kernel density estimator (kde) and is a smooth representation of the histogram. 

If you change the data again

realr = 0.5

The values will change:

flips = Bernoulli(realr, 100)  [0.000 … 1.000]    
meanr = mean(r)                0.599    
stdr = std(r)                  0.048    
realr = 0.5                    0.500    
rplot = distplot(r)            [PNG Image]    
flips ~ Bernoulli(r)           [0.000 … 1.000]    
r ~ Uniform(0, 1)              [0.523 … 0.591]   

and the plot will update automatically. Type rplot now and you will see:

Exercise 1: Look at the standard deviations of the rate variable when the real rate was 0.3, 0.5 and 0.9. Notice that it is greatest for 0.5 and smallest for 0.9. Why is this the case? (Solutions to exercises are provided at the end of the tutorial).

We have now generated code for a model that estimates the bias of the coin. We have also shown that the estimate of coin bias will change appropriately if we alter the data that we give to the model we have created.

Estimating the bias of multiple coins

Now that we have mastered the estimation of the rate of a single coin, lets spread our wings and consider the case of estimating the rate of multiple identical coins when we flip them all. To do this we are going to use the binomial distribution. The binomial distribution describes the probability of the number of heads given a rate parameter and the number of coins. 

To begin remove all of the variables we set up for the first example:

clear
All variables removed.

sv should return nothing. Again we are going to cheat (I mean facilitate our understanding) by using fake data for which we know the correct rate parameter. We can generate binomial values as follows:

NumberOfHeads = Binomial(10, 0.3, 100)

which means flip 10 coins, weighted r=0.3 towards tails, 100 times. Now type sv:

NumberOfHeads = Binomial(10, 0.3, 100)  [2.000 … 3.000]

To see all of the values type NumberOfHeads:

[2 2 4 4 3 1 5 3 5 4 2 8 4 3 2 2 4 1 4 4 3 3 2 3 5 6 5 4 4 0 7 4 1 3 4 6 4
 4 6 6 3 4 2 2 4 4 2 3 1 2 3 2 3 4 3 0 2 4 6 2 2 4 5 3 2 2 1 5 2 2 1 3 3 1
 6 5 4 3 4 5 5 1 4 4 3 2 3 1 5 3 6 3 5 3 4 2 4 1 3 3]

We can generate the histogram of the samples using distplot:

hplot = distplot(NumberOfHeads, bins=[0,1,2,3,4,5,6,7,8,9], kde=False)

Note how we used “kde=False” to stop distplot from adding the smoothing line. (kde is an optional parameter within the distplot function, and if it is not present, it is assumed that kde is true, and a smoothing line will be automatically created. Recall kde stands for kernel density estimator). We have also used the bins parameter to indicate the values on the x axis of the plot.

Most of the values are around 3 with a spread from 0 to 8. 

As in the case of one coin, we now need to define the likelihood and the prior. We know that the data was drawn from a binomial distribution, so we can define the likelihood as:

NumberOfHeads ~ Binomial(10, r)

where 10 is the number of coins we are flipping and r is the probability that each of them will land on heads (r being the parameter we are trying to estimate).

Again, we will assume apriori that the rate is drawn from a uniform distribution:

r ~ Uniform(0,1)

Type sv and you should see:

NumberOfHeads = Binomial(10, 0.3, 100)              [2.000 … 3.000]    
hplot = distplot(NumberOfHeads, bins = [0, 1, 2, 3  [PNG Image]    
NumberOfHeads ~ Binomial(10, r)                     [2.000 … 3.000]    
r ~ Uniform(0, 1)                                   [0.330 … 0.341]   

To see the mean we can type:

mean(r)
0.3310302490288849

Again,  our model has generated a pretty good estimate of the rate parameter 0.3. (Reminder, we know the actual rate parameter is 0.3 because we generated samples based on a rate of 0.3. When performing this kind of analysis in a real life setting, we typically wouldn’t know the true rate that we are trying to estimate.)

Exercise 2: Repeat the Bernoulli experiment but with 1000 data points instead of 100? What happens to the mean r? What happens to the standard deviation of r? Why?

Exercise 3: Change the number of coins in the binomial example to 100. What happens to the mean rate? What happens to the standard deviation of the rate? Why?

In the last two sections, we have investigated the use of the Bernoulli and Binomial distributions. These distributions describe processes that have two outcomes, typically True and False, and are useful for modelling quantities like button responses (e.g. Did the participant press “Happy” during this period?) or the occurrence of events (e.g. Did it rain during this period?). In the next section, we will model continuous quantities using the Normal distribution. 

Estimating the mean and standard deviation of male heights

The average male over the age of 18 is about 178 centimeters tall. There is also, of course, quite a bit of variation. The standard deviation of the distribution of male heights is about 10 centimeters. In this section, we will estimate these values from data. Below is a data set of male heights. Clear your work space and copy and paste this dataset into Private:

heights = [176, 181, 164, 176, 176, 181, 180, 191, 152, 182, 169, 188, 182, 182, 190, 180, 189, 169, 190, 194, 173, 179, 155, 183, 191, 186, 178, 174, 179, 182, 188, 169, 186, 169, 182, 173, 171, 172, 174, 169, 179, 170, 174, 184, 201, 180, 183, 184, 196, 180, 168, 194, 174, 161, 189, 174, 186, 168, 176, 193, 177, 176, 187, 170, 175, 181, 168, 179, 159, 168, 186, 182, 162, 177, 176, 187, 172, 178, 178, 197, 175, 170, 170, 181, 186, 207, 187, 175, 163, 174, 169, 173, 170, 169, 164, 178, 201, 178, 198, 164]

You can check the length of the data using the len function:

len(heights)
100

You can also plot the data:

heightsplot = distplot(heights)

Now we need to specify the likelihood and priors of the model we will use. In this case, we are going to use the Normal distribution:

heights ~ Normal(mu, sigma)

mu is the parameter that will represent the mean and sigma is the parameter that will represent the standard deviation. Now we need to specify the priors. We expect heights to be around 150 centimeters, so we specify a Normal distribution with a mean of 150 and a standard deviation that is large (100) to indicate that we don’t want our preconceptions to influence the result too much. The standard deviation of the heights must be positive. Therefore we will use a prior distribution that can only be positive. There are many to choose from, but a good one is the HalfNormal. The HalfNormal is created by taking the absolute value of values from a Normal distribution with mean 0.  

To see what the HalfNormal distribution looks like we can generate 1000 samples from a HalfNormal distribution with a standard deviation of 100, and then plot them:

hn = HalfNormal(100, 1000)
hnplot = distplot(hn, kde=False)

Specify the priors as follows:

mu ~ Normal(150, 100)
sigma ~ HalfNormal(100)

Exercise 4: Now let’s see how we did. Calculate the mean of mu. How close did we get to the real value? Now calculate the mean of sigma. How close did we get to the real value of the standard deviation?

Exercise 5: Generate a plot of the samples of mu and another of the samples of sigma.

You may be thinking to yourself that we had to do a lot of work just to estimate the bias of a coin or the mean of some heights. In the case of the bias, you could just divide the number of heads by the total number of flips. In the case of the heights, you could just add the heights and divide by 100. There are two main reasons that we go to this trouble. Firstly, by thinking about these simple operations as Bayesian models it is easier to see how we can extend them to more complicated models that would not be so easy to estimate. Secondly, simple statistics like the rate or the mean can be used to reverse engineer people’s private data. In the next section, we will demonstrate the first point by showing how to extend our model of heights to a regression model in which we want to determine the impact of a predictor on height. We will leave the second point, until subsequent tutorials.  

How does gender impact height?

Woman tend to be shorter than men. On average a woman over the age of 18 is about 165 centimeters tall. The standard deviation of the distribution of female heights is slightly smaller than for men – around 9 centimeters. In this section, we are going to define a linear regression model that predicts height based on gender.

Start by clearing your workspace and entering the data:

heights = [176, 181, 164, 176, 176, 181, 180, 191, 152, 182, 169, 188, 182, 182, 190, 180, 189, 169, 190, 194, 173, 179, 155, 183, 191, 186, 178, 174, 179, 182, 188, 169, 186, 169, 182, 173, 171, 172, 174, 169, 179, 170, 174, 184, 201, 180, 183, 184, 196, 180, 168, 194, 174, 161, 189, 174, 186, 168, 176, 193, 177, 176, 187, 170, 175, 181, 168, 179, 159, 168, 186, 182, 162, 177, 176, 187, 172, 178, 178, 197, 175, 170, 170, 181, 186, 207, 187, 175, 163, 174, 169, 173, 170, 169, 164, 178, 201, 178, 198, 164, 161, 152, 163, 168, 166, 155, 166, 185, 161, 173, 161, 164, 184, 158, 138, 145, 160, 164, 173, 176, 182, 174, 178, 163, 153, 156, 165, 166, 158, 173, 147, 150, 171, 155, 167, 167, 161, 183, 156, 154, 162, 165, 169, 165, 171, 151, 176, 169, 179, 167, 165, 169, 162, 177, 169, 163, 154, 170, 199, 153, 169, 154, 156, 169, 162, 172, 153, 178, 168, 156, 169, 146, 162, 163, 167, 175, 174, 174, 161, 160, 172, 164, 184, 167, 166, 173, 159, 172, 153, 159, 152, 163, 143, 157, 154, 155, 172, 164, 154, 168]

The first 100 values are male heights and the second 100 values are female heights. We need to generate a predictor variable to indicate which data are for females and which data are for males. We can do that as follows:

gender = [1] * 100 + [0] * 100

We are using 1 to indicate a male and a 0 to indicate a female. [1] is a list containing a 1 and when we multiply that by 100 we are indicating we want to create a list that has 100 copies of [1]. We do the same for [0] and then add these lists together to concatenate them into one long list. If you type gender now you should see:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Now we are ready to specify our regression model. We start by specifying heights as a Normal variable as we did before:

heights ~ Normal(mu, sigma)

we’ll define sigma in the same way also:

sigma ~ HalfNormal(100)

To determine mu, we will assume that we have an intercept called muFemale (which will effectively be an estimate of the female height) and then a parameter “diff” that indicates how many centimeters to increase the mean by to capture male heights. Our equation for mu then is: 

mu ~ muFemale + gender * diff

Now we need to add the priors. As far as we know apriori, muFemale is similar to mu for the male heights so we will use the Normal distribution with mean 150 and standard deviation 100. The difference in height between men and women will be a smaller number, but still positive, so we will use a HalfNormal with a standard deviation of 30:

muFemale ~ Normal(150, 100)
diff ~ HalfNormal(30)

When you type sv you should now see something like:

heights = [176, 181, 164, 176, 176, 181, 180, 191,  [176, 181, 164, 176, 176, 181, …]    
gender = [1] * 100 + [0] * 100                      [1, 1, 1, 1, 1, 1, …]    
heights ~ Normal(mu, sigma)                         [176, 181, 164, 176, 176, 181, …]    
sigma ~ HalfNormal(100)                             [9.757 … 9.898]    
mu ~ muFemale + gender * diff                       'Not retained.'    
muFemale ~ Normal(150, 100)                         [162.953 … 163.623]    
diff ~ HalfNormal(30)                               [14.383 … 12.909]

Private has generated samples of muFemale and diff, but has not kept the samples of mu. In many cases, there can be very large numbers of these internally calculated values and so to avoid storing all that data, Private makes a strategic decision to throw them away. 

Exercise 6: Calculate the mean of muFemale. Does it correspond to the real value? 

Exercise 7: Calculate samples of the mean of the male heights? (Hint: You will need to use muFemale and diff). Do they correspond to reality?

Exercise 8: If we wanted to allow for the possibility that females are taller than males, how would we need to change the code?

Summary

That concludes our introduction to the Private language. You should now understand how to construct models of both binary and continuous data and how to calculate statistics of and plot the samples of the parameters. In the next tutorial, we will talk about the estimation of parameters that might depend on private information.

Solutions to Exercises

Exercise 1: Look at the standard deviations of the rate variable when the real rate was 0.3, 0.5 and 0.9. Notice that it is greatest for 0.5 and smallest for 0.9. Why is this the case?

When the true rate is .5, there are many likely estimates on both sides of 0.5 which the model needs to account for. When the rate is .9 however, the distribution will peak at .9, but because there is an upper bound of 1, there is a much smaller pool of possible values greater than .9 that the estimate can take, and while the distribution has a tail to the left, the smaller amount of variance to the right (greater than .9) will decrease the total variance. A rate of .3 is similar to .9, except it is constrained by the lower bound of 0.

Exercise 2: Repeat the Bernoulli experiment but with 1000 data points instead of 100? What happens to the mean r? What happens to the standard deviation of r? Why?

You will notice that the means stay roughly the same, but the standard deviation decreases when more samples are added, as the posterior distribution is becoming more precise. Visually observing the graphs for both distributions, when you have more information (i.e., high n) the posterior becomes more peaked. This means that you are more certain about what values are plausible, and what values are not.

Exercise 3: Change the number of coins in the binomial example to 100. What happens to the mean rate? What happens to the standard deviation of the rate? Why?

The standard deviation is much smaller when 100 coins are flipped. This is because each data point is providing more information about the rate.

Troubleshooting: To update your code, you will need to change the number of coins in both the samples you are generating and the likelihood distribution. If you only update the number of coins in the samples you generate, the parameters for the likelihood will not match the data and you will get an error message.

Exercise 4: Now let’s see how we did. Calculate the mean of mu. How close did we get to the real value? Now calculate the mean of sigma. How close did we get to the real value of the standard deviation?

Type:

mean(mu)

and you should get:

178.2784329068591

which is pretty close to 178 centimetres and:

mean(sigma)

which should give:

10.247529701800103

which is pretty close to 10 centimetres. So we have recovered the values quite well.

Exercise 5: Generate a plot of the samples of mu and another of the samples of sigma.

Type:

muPlot = distplot(mu)
sigmaPlot = distplot(sigma)

Exercise 6: Calculate the mean of muFemale. Does it correspond to the real value? 

meanMuFemale = mean(muFemale)

The mean is 164.483 which is very close to the true value.

Exercise 7: Calculate samples of the mean of the male heights? (Hint: You will need to use muFemale and diff). Do they correspond to reality?

muMale = muFemale + diff
meanMuMale = mean(muMale)

Alternative Solution to Exercise 7

meanMuMale = mean(muFemale + diff)

They correspond quite well (the mean is 178.253 ) .

Exercise 8: If we wanted to allow for the possibility that females are taller than males, how would we need to change the code?

diff would have to be specified as a distribution that can take on negative values, such as the normal distribution.

One thought on “An Introduction to the Private Language

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s