Random Forest 
Random forest (or random forests) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and “Random Forests” is their trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman’s “bagging” idea and the random selection of features, introduced independently by Ho and Amit and Geman in order to construct a collection of decision trees with controlled variation.
Learning Algorithm
Each tree is constructed using the following algorithm:
- Let the number of training cases be N, and the number of variables in the classifier be M.
- We are told the number m of input variables to be used to determine the decision at a node of the tree; m should be much less than M.
- Choose a training set for this tree by choosing n times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes.
- For each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set.
- Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).
For prediction a new sample is pushed down the tree. It is assigned the label of the training sample in the terminal node it ends up in. This procedure is iterated over all trees in the ensemble, and the mode vote of all trees is reported as random forest prediction.
Features and Advantages
- It is one of the most accurate learning algorithms available. For many data sets, it produces a highly accurate classifier.
- It runs efficiently on large databases.
- It can handle thousands of input variables without variable deletion.
- It gives estimates of what variables are important in the classification.
- It generates an internal unbiased estimate of the generalization error as the forest building progresses.
- It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
- It has methods for balancing error in class population unbalanced data sets.
- Prototypes are computed that give information about the relation between the variables and the classification.
- It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.
- The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.
- It offers an experimental method for detecting variable interactions.
Disadvantages
- Random forests have been observed to overfit for some datasets with noisy classification/regression tasks.
- Unlike decision trees, the classifications made by Random Forests are difficult for humans to interpret.
Mega Millions Math: Probability of No Jackpot Winner
You would like to figure out the approximate probability that there are no jackpot winners for the Mega Millions. On March 27, 2012 4,715,569 tickets from the total pool of tickets were winners. Approximately 1 out of 40 randomly chosen tickets are winners. Therefore we can approximate that 40 * 4,715,569 = 188,622,760 total tickets were sold.
But wait, you have a 1 in 175,711,536 chance of hitting the jackpot. Approximately 188,622,760 were sold. So does that mean there’s a 100% chance of one of those tickets being the jackpot winner? No, simply because some tickets have the same numbers which increases the total ticket pool.
What is the probability that no one hit the Mega Millions jackpot last night? You have to know a little about Bernouilli trials. A Bernouilli trial is an experiment whose outcome is random and can be either of two possible outcomes, “success” and “failure”. So in our case “success” is winning the jackpot, “failure” is not winning the jackpot.
Say a gambler plays a slot machine that pays out with a probability of one in n and plays it n times. Then, for large n the probability that the gambler will lose every bet is approximately 1/e or about 37%. This is a Bernoulli trials process and you can approximate probabilities with respect to this process using the constant e.
We can apply this situation to Mega Millions. The Mega Millions is sort of like a slot machine that pays out the jackpot with a probability of 1/175,711,536. So according to our experiment, if you play 175,711,536 tickets there’s still a 37% chance that there would be no winner!
It’s not intuitive but the math is there and you can’t argue against it. Because it’s math.
Last night there were 188,622,760 tickets played. Use e again to approximate the probability of no jackpot. The Mega Millions went through a period where there was an expectation of 1.0735 jackpot winners (188,622,760/175,711,536 = 1.0735). The approximate probability that there were no jackpots in 1.0735 “cycles” is e^(-1.0735) or approximately 34%.
Cliffs:
-Lotteries and games of chance are usually not intuitive.
-The probability that there would be no jackpot winner last night is 34%. Consequently, the probability that there should’ve been AT LEAST ONE jackpot winner is 66%. Think about how I came up with that number.
-The mathematical constant e is awesome.
-They carded me last night when I tried to buy Quick Picks.
James R. Murphy’s La Guardia students with their string figures.
What are the chances of you coming into being? 
By: Ali Binazir
June 15th, 2011
A little while ago I had the privilege of attending TEDx San Francisco, organized by the incomparable Christine Mason McCaull. One of the talks was by Mel Robbins, a riotously funny self-help author and life coach with a syndicated radio show. In it, she mentioned that scientists calculate the probability of your existing as you, today, at about one in 400 trillion (4×1014).
“That’s a pretty big number,” I thought to myself. If I had 400 trillion pennies to my name, I could probably retire.
Previously, I had heard the Buddhist version of the probability of ‘this precious incarnation’. Imagine there was one life preserver thrown somewhere in some ocean and there is exactly one turtle in all of these oceans, swimming underwater somewhere. The probability that you came about and exist today is the same as that turtle sticking its head out of the water — in the middle of that life preserver. On one try.
So I got curious: are either of these numbers correct? Which one’s bigger? Are they gross exaggerations? Or is it possible that they underestimate the true number?
First, let us figure out the probability of one turtle sticking its head out of the one life preserver we toss out somewhere in the ocean. That’s a pretty straightforward calculation.
According to WolframAlpha, the total area of oceans in the world is 3.409×108square kilometers, or 340,900,000 km2 (131.6 million square miles, for those benighted souls who still cling to user-hostile British measures). Let’s say a life preserver’s hole is about 80cm in diameter, which would make the area inside
3.14(0.4)2=0.5024 m2
which we will conveniently round to 0.5 square meters. If one square kilometer is a million square meters, then the probability of Mr Turtle sticking his head out of that life preserver is simply the area inside the life preserver divided by the total area of all oceans, or
0.5m2/3.409×108x106m2 = 1.47 x 10-15
or one in 6.82×1014, or about 1 in 700 trillion.
One in 400 trillion vs one in 700 trillion? I gotta say, the two numbers are pretty darn close, for such a farfetched notion from two completely different sources: old-time Buddhist scholars and present-day scientists. They agree to within a factor of two!
So to the second question: how accurate is this number? What would we come up with ourselves starting with first principles, making some reasonable assumptions and putting them all together? That is, instead of making one big hand-waving gesture and pronouncing, “The answer is five hundred bazillion squintillion,” we make a series of sequentially-reasoned, smaller hand-waving gestures so as to make it all seem scientific. (This is also known as ‘consulting’ – especially if you show it all in a PowerPoint deck.)
Oh, this is going to be fun.
First, let’s talk about the probability of your parents meeting. If they met one new person of the opposite sex every day from age 15 to 40, that would be about 10,000 people. Let’s confine the pool of possible people they could meet to 1/10 of the world’s population twenty years go (one tenth of 4 billion = 400 million) so it considers not just the population of the US but that of the places they could have visited. Half of those people, or 200 million, will be of the opposite sex. So let’s say the probability of your parents meeting, ever, is 10,000 divided by 200 million:
104/2×108= 2×10-4, or one in 20,000.
Probability of boy meeting girl: 1 in 20,000.
So far, so unlikely.
Now let’s say the chances of them actually talking to one another is one in 10. And the chances of that turning into another meeting is about one in 10 also. And the chances of that turning into a long-term relationship is also one in 10. And the chances of that lasting long enough to result in offspring is one in 2. So the probability of your parents’ chance meeting resulting in kids is about 1 in 2000.
Probability of same boy knocking up same girl: 1 in 2000.
So the combined probability is already around 1 in 40 million — long but not insurmountable odds. Now things start getting interesting. Why? Because we’re about to deal with eggs and sperm, which come in large numbers.
Each sperm and each egg is genetically unique because of the process of meiosis; you are the result of the fusion of one particular egg with one particular sperm. A fertile woman has 100,000 viable eggs on average. A man will produce about 12 trillion sperm over the course of his reproductive lifetime. Let’s say a third of those (4 trillion) are relevant to our calculation, since the sperm created after your mom hits menopause don’t count. So the probability of that one sperm with half your name on it hitting that one egg with the other half of your name on it is
1/(100,000)(4 trillion)= 1/(105)(4×1012)= 1 in 4 x 1017, or one in 400 quadrillion.
Probability of right sperm meeting right egg: 1 in 400 quadrillion.
But we’re just getting started.
Because the existence of you here now on planet earth presupposes another supremely unlikely and utterly undeniable chain of events. Namely, that every one of your ancestors lived to reproductive age – going all the way back not just to the first Homo sapiens, first Homo erectus and Homo habilis, but all the way back to the first single-celled organism. You are a representative of an unbroken lineage of life going back 4 billion years.
Let’s not get carried away here; we’ll just deal with the human lineage. Say humans or humanoids have been around for about 3 million years, and that a generation is about 20 years. That’s 150,000 generations. Say that over the course of all human existence, the likelihood of any one human offspring to survive childhood and live to reproductive age and have at least one kid is 50:50 – 1 in 2. Then what would be the chance of your particular lineage to have remained unbroken for 150,000 generations?
Well then, that would be one in 2150,000 , which is about 1 in 1045,000– a number so staggeringly large that my head hurts just writing it down. That number is not just larger than all of the particles in the universe – it’s larger than all the particles in the universe if each particle were itself a universe.
Probability of every one of your ancestors reproducing successfully: 1 in 1045,000
But let’s think about this some more. Remember the sperm-meeting-egg argument for the creation of you, since each gamete is unique? Well, the right sperm also had to meet the right egg to create your grandparents. Otherwise they’d be different people, and so would their children, who would then have had children who were similar to you but not quite you. This is also true of your grandparents’ parents, and their grandparents, and so on till the beginning of time. If even once the wrong sperm met the wrong egg, you would not be sitting here noodling online reading fascinating articles like this one. It would be your cousin Jethro, and you never really liked him anyway.
That means in every step of your lineage, the probability of the right sperm meeting the right egg such that the exact right ancestor would be created that would end up creating you is one in 1200 trillion, which we’ll round down to 1000 trillion, or one quadrillion.
So now we must account for that for 150,000 generations by raising 400 quadrillion to the 150,000th power:
[4x1017]150,000 ≈ 102,640,000
That’s a ten followed by 2,640,000 zeroes, which would fill 11 volumes of a book the size of The Tao of Dating with zeroes.
To get the final answer, technically we need to multiply that by the 1045,000 , 2000 and 20,000 up there, but those numbers are so shrimpy in comparison that it almost doesn’t matter. For the sake of completeness:
(102,640,000)(1045,000)(2000)(20,000) = 4x 102,685,007 ≈ 102,685,000
Probability of your existing at all: 1 in 102,685,000
As a comparison, the number of atoms in the body of an average male (80kg, 175 lb) is 1027. The number of atoms making up the earth is about 1050. The number of atoms in the known universe is estimated at 1080.
So what’s the probability of your existing? It’s the probability of 2 million people getting together – about the population of San Diego – each to play a game of dice with trillion-sided dice. They each roll the dice, and they all come up the exact same number – say, 550,343,279,001.
A miracle is an event so unlikely as to be almost impossible. By that definition, I’ve just shown that you are a miracle.
Now go forth and feel and act like the miracle that you are.
Love Among the Equations 
Real math isn’t some cold, dead set of rules to be memorized and blindly followed. The act of devising a calculus problem from your observations of the world around you – and then solving it – is as much a creative endeavor as writing a novel or composing a symphony. It isn’t easy, but there is genuine pleasure to be found in making the effort.
As with mathematics, so with love. There are no hard and fast rules to be blindly followed, no matter what the self-help gurus may tell you. Sometimes you just need to take a Fourier transform of yourself, shatter the walls and break everything down into the component parts. Once you’ve analyzed the full spectrum, you can rebuild, this time with just the right mix of ingredients that will enable you finally to combine your waveform with that of another person. Does mathematically analyzing a sunset, or the ocean waves, make either any less romantic? Not to me. It only enhances my sense of wonder. When we listen to the rhythmic cycle of waves crashing on the shore, we can hear those waves because our brains break apart that signal to identify the basic “ingredients.” And every time we gaze at a sunset —a spectacular orange-red, or a soft pinkish glow—our brain has taken a Fourier transform so we can fully appreciate those hues.
Packing problem: different ways to pack oranges in two dimensions (left and right); according to Johannes Kepler, the most efficient way to pack spheres in three dimensions is the face-centred cubic arrangement (centre).
Imagine filling a large container with small equal-sized spheres. The density of the arrangement is the proportion of the volume of the container that is taken up by the spheres. In order to maximize the number of spheres in the container, you need to find an arrangement with the highest possible density, so that the spheres are packed together as closely as possible.
Experiment shows that dropping the spheres in randomly will achieve a density of around 65%. However, a higher density can be achieved by carefully arranging the spheres as follows. Start with a layer of spheres in a hexagonal lattice, then put the next layer of spheres in the lowest points you can find above the first layer, and so on – this is just the way you see oranges stacked in a shop. At each step there are two choices of where to put the next layer, so this natural method of stacking the spheres creates an uncountably infinite number of equally dense packings, the best known of which are called cubic close packing and hexagonal close packing. Each of these arrangements has an average density of
![]()
The Kepler conjecture says that this is the best that can be done—no other arrangement of spheres has a higher average density.
Welcome to another edition of Secretly Judging You
48÷2(9+3) = ????
A sum, proved impossible by the theorem, appears in an episode of The Simpsons, “Treehouse of Horror VI”. In the three-dimensional world in “Homer3”, the equation 178212 + 184112 = 192212 is visible, just as the dimension begins to collapse. The joke is that the twelfth root of the sum does evaluate to 1922 due to rounding errors when entered into most handheld calculators; notice that the left hand side is odd, while 192212 is even, so the equality cannot hold. Instead of 1922, it actually is 1921.999999995 (via Fermat’s Last Theorem in fiction)
Because of Paul Erdős’ prolific output, friends created the Erdős number as a humorous tribute; Erdős alone was assigned the Erdős number of 0 (for being himself), while his immediate collaborators could claim an Erdős number of 1, their collaborators have Erdős number at most 2, and so on. Approximately 200,000 mathematicians have an assigned Erdős number, and some have estimated that 90 percent of the world’s active mathematicians have an Erdős number smaller than 8 (not surprising in light of the small world phenomenon). Due to collaborations with mathematicians, many scientists in fields such as physics, engineering, biology, and economics have Erdős numbers as well.
The following table shows the number of people with Erdös number 1, 2, 3, …, according to the electronic data. Note that there are slightly fewer people shown here with Erdös numbers 1 and 2 than in our lists, since our lists are compiled by hand from various sources in addition to MathSciNet. In addition to these people with finite Erdös number, there are about 45,000 mathematicians who have collaborated but have an infinite Erdös number, and 84,000 who have never published joint works (and therefore of course also have an infinite Erdös number).
Erdös number 0 --- 1 person
Erdös number 1 --- 502 people
Erdös number 2 --- 5713 people
Erdös number 3 --- 26422 people
Erdös number 4 --- 62136 people
Erdös number 5 --- 66157 people
Erdös number 6 --- 32280 people
Erdös number 7 --- 10431 people
Erdös number 8 --- 3214 people
Erdös number 9 --- 953 people
Erdös number 10 --- 262 people
Erdös number 11 --- 94 people
Erdös number 12 --- 23 people
Erdös number 13 --- 4 people
Erdös number 14 --- 7 people
Erdös number 15 --- 1 person
Erdös number 16 --- 0 people
(Editor’s note: Hank Aaron has an Erdös number of 1 because he and Erdős autographed a baseball when Emory University awarded them both honorary degrees on the same day.)




