Well, I already gave one formula for penis size prediction, but I realized the dataset I had included many other significant variables as well, so I might as well use them. Due to the increased complexity of the formulas, I had to start using SAS instead of Excel, but the general thought is the same. In both cases, we’re reducing the sum of the squares of the differences between the actual value and what our formula predicts the value should be by changing coefficients. My statistics class actually came in useful, as we had recently gone over how to turn categorical variables, something like hair color, into a numeric variable that can be processed by the formula.
The basic idea is to separate it out into a series of binary variables. So, for example, if hair color options include blonde, brown, black, and red, then you create three (n-1) binary variable, that track whether the individual has one trait, such as blonde hair, brown hair, and black hair. They have a 1 if so, and a zero otherwise. The coefficient attached to this variable is then just a simple addition to the formula, since A*1=A. Interesting and useful.
By implementing other variables, not only was I able to make the formula more accurate (The value of R2 went from 0.5787 to 0.7118, which is the amount of variability explained by the model), but I was also able to eliminate the need of an initial estimate altogether, although at the cost of accuracy, which I find interesting. Since the initial estimate is such a great predictor, though, (As you’d expect. If you want to predict something, you could do worse than asking the owner.) the model changes quite a bit without that variable, so really, there are two separate formula: one where you have an initial self-reported estimate of penis size, and another where you don’t.
Now, the formula gets a bit more complicated, but I’ll do my best to explain it. It’s a combination of categorical additions (for example, if the guy has black hair, had so-and-so centimeters to the estimate) and products (the guy’s height times so-and-so centimeters per year). All values are in centimeters.
The variables used are:
Height (in meters)
Age (in years)
Hair Color
Eye Color
Ethnicity
Self-Measured Penis Length (in centimeters)
Now then, the formula is:
Predicted Penis Size = -5.56387 + (Age * -0.00895) + (Height * 6.57043) + (Self-Measured Penis Length * 0.45639) + (Eye Color Value) + (Hair Color Value) + (Ethnicity Value)
The values for hair color, eye color, and ethnicity come from the below tables.
Eye Color | Eye Color Value |
Green | 0.4041 |
Brown | 0.48353 |
Blue | -0.2685 |
Gray | 0 |
Other | 0.3803764 |
Hair Color | Hair Color Value |
Brown | 0.49832 |
Blonde | 0.37735 |
Black | 0.61258 |
Red | 0 |
Other | 0.511714767 |
Ethnicity | Ethnicity Value |
Caucasian | 0.09476 |
Arab | -0.59318 |
East Asian | -0.91096 |
Black | 1.30666 |
Latino | 1.16791 |
Mediterranean | -0.16643 |
Mixed | 0.34564 |
Central Asian | -0.31054 |
Indian | -1.12532 |
Australoid | 0 |
Other/Unknown | 0.10610367 |
(The classes with a value of 0 don’t indicate any sort of normality, it’s just a by-product of the linear regression method I used.)
Using this model, the average error is about a centimeter, which implies that, assuming a normal distribution, you’re about 68% likely to be within a centimeter of the true size. Oftentimes, though, men don’t just blurt out an estimate of their penis size, which requires a different formula. The variables are the same, only now we have no self-measured penis size. This time, the formula and value charts are:
Predicted Penis Size = -8.54628 + (Age * -0.00711) + (Height * 11.59279) + (Eye Color Value) + (Hair Color Value) + (Ethnicity Value)
Eye Color | Eye Color Value |
Green | 1.16689 |
Brown | 1.26317 |
Blue | 0.29298 |
Gray | 0 |
Other | 1.1251647 |
Hair Color | Hair Color Value |
Brown | 0.76874 |
Blonde | 0.54502 |
Black | 1.14517 |
Red | 0 |
Other | 0.8795126 |
Ethnicity | Ethnicity Value |
Caucasian | 1.08656 |
Arab | 0.53232 |
East Asian | -1.47795 |
Black | 2.63534 |
Latino | 2.1038 |
Mediterranean | 0.64598 |
Mixed | 1.25143 |
Central Asian | 0.20177 |
Indian | -1.33676 |
Australoid | 0 |
Other/Unknown | 0.99191133 |
This formula is less exact, with an average error of about 1.7 cm, which means you’re only 44% likely to be within a centimeter of the true size. Not awful, that’s almost even odds, but it’s certainly much better to get an estimate if you can. The best thing is that the expectation of a lie is built in. The average exaggeration is 2.65 cm, or a little over an inch.
So what does this data indicate? Well, first, age matters very, very little to penis size, but all of our observations are at least 18, so we should say that once you reach the age of 18, age matters little.
Secondly, height does matter. Without any previous estimate of penis size, even a decimeter increase in height, about four inches, correlates with an average increase of a centimeter in penis size.
As for eye color, brown-eyed men tend to be largest, with blue- and grey-eyed being the smallest. With hair color, black-haired men tend to be largest, with redheads being the smallest, which incidentally supports an anecdote I once heard from a summer camp coworker, which was that Irish men tend to be smaller than average. With regards to ethnicity, men of African and Latino descent tend to be largest, while East Asians and Indians tend to be smallest, which aligns with the most common stereotypes and prejudices.
Honestly, I kind of wish I could turn this into an app or something. Might be useful, yeah? Interesting, at least. You could probably get a lot more observations, too, if you had a way to verify the true size later.
Dear John,
Perhaps, you can sell this framework to a software company who can make a ‘killer-app’ out of it and you can laugh all the way to the bank! Meanwhile, I thought this a rather convoluted way of saying ‘Balls to statistics’? I hated statistics classes too! Cheers 🙂
LikeLike
Well, I like most of them. I just currently have one awful professor. She assigned a project on Thanksgiving! Ugh…
LikeLike
Pingback: An Oddity in My Search Terms | John Kutensky
−5.56387+13+1.70688
+0.13208+−0.2865+0.37735
+0.09476 = 9.4607
IS THIS RIGHT??
LikeLike
Which variables are you using?
LikeLike