faQster: Practical Math and Science

Showing posts with label Practical Math and Science. Show all posts

Tuesday, July 26, 2011

What Might A Harvard University 150,000-Subject Study Be Missing?

They studied foods that make people gain weight.

If you want to gain weight, you should eat: Potato chips, sugar-sweetened beverages, unprocessed meats, etc.

If you want to lose weight, you should eat: Nuts, fruits, vegetables, yogurt, etc.

The study says that "adjustments were made for lifestyle". No doubt they did, but I wonder how complete those adjustments were. Look again at that list of foods. Is there no implied lifestyle associated with a diet of Lays and Dr. Pepper vs. cashews and sprouts?

Thursday, May 10, 2007

Is Racial Profiling Morally Acceptable at Airports?

If anything, "racial profiling" is morally required at airports.

Of course, there are a couple of caveats here:

1. Since "race" is just a cultural abstraction, what we really mean is that it profiling on appearances should be required.

2. "Profiling" does not mean "arresting" or "punishing" or "guilty!" or any other emotionally-charged term that is intended to make a point by lies and/or hyperbole.

3. "Profiling" refers to "give extra scrutiny".

All inspected people at airports fall into one of four categories:

1. True Positives. These are people who are correctly deemed as being dangerous, and are therefore kept from flying. All security precautions, machinery, and procedures are intended to catch these people -- but have any true positives actually been caught? If they were, they were not publicized .

2. True Negatives. These are people like you (I hope), me, and almost everyone else: Innocent and are deemed as innocent. Walk through the metal detector, and go to your gate.

3. False Positives: These are people who are flagged as being suspicious, but are really innocent. They are (we are told) flagged because of their suspicious behavior; e.g., buying a one-way ticket with cash and no bags, for instance. Theoretically, this group also includes the "flying imams" of Minneapolis, whose behavior was consistent with terrorism, but actually posed no threat.

4. False Negatives: People who clear security and then crash planes into skyscrapers; Mohammed Atta is a famous false negative.

The problems with airport security and profiling are related to Categories (3) and (4). Specifically, false positives and false negatives are errors that, in an ideal world, would be zero. That is, in our perfectly-calibtrated world, only Categories (1) and (2) would exist.

The trouble is that (3) and (4) cannot be eliminated together; in fact, when one category is reduced, the other will need to increase. Specifically, the best way to eliminate (3) is to clear everyone through security. Put another way, if no one is stopped, then no innocent people will be stopped.

And the best way to eliminate (4) is to stop everyone. If every last passenger is carefully screened, then we know (by definition) that bombers and hijackers will be screened, too.

However, neither of the above is practical. It isn't practical to automatically clear or screen everyone (unless passengers would be willing to pay much more for air travel, in both time and money).

Therefore, a balance has to be found between minimizing Errors (3) and (4). But no matter which balance is found there will be problems: Either too many innocent people will be screened or not enough bombers and hijackers will be screened. So, the best approach is to minimize Error (4) to the point where any additional additional reduction would create a disproportionate rise in Error (3). That is, a 1% error rate in allowing hijackers on planes might be better than a 0.9% error rate if that means screwing up the system but good.

So...how do we decrease the Error (4) rate without making thousands of travelers even more upset over airport delays? The answer (as you might have probably guessed) is to pre-screen people based on their likelihood of trying to blow up a plane. Put another way, at a given level of "passenger inconvenience", the probability of a disaster is lessened by profiling. Or, put yet another way, if profiling were to be discontinued, one of the error rates will need to go up: Either all passengers will have to go through more arduous security delays (without any added security), or more planes will be blown up. Take your pick.

The people in Category (3) are apparently less concerned about either of the above choices than they are about being singled out as a false positive. They say: "Profiling should cease, and other people should be singled out, too." Never mind that they will still be singled out -- what matters to them is that they will no longer need to feel envy of people in other groups that are waved through. That is, to lessen their sense of envy, everyone else must also be inconvenienced and/or more planes must crash. Adding huge costs to flying to ameliorate envy: Is that defensible? Sacrificing lives to ameliorate envy: Is that the moral solution?

Incidentally, profiling does not require that everyone from one group (say, people in Islamic garb) be singled out to the exclusion of everyone else. In fact, a (non-random) mix of checks would be better; if "looking Muslim" is the only way to get stopped, then hijackers would learn that they can get a free pass onto a plane by not "looking Muslim" -- as they did on 9/11. A probabilistic approach would be best; a completely random screening procedure, as stated above, is not only dangerous, but also ridiculous.

Sunday, April 8, 2007

Can Average Test Scores Increase Without Students Scoring Any Higher?

You bet.

If you're in charge of the school district, you can increase average scores (and probably your salary) by simply shuffling students from one school to another. With absolutely no change in individual test scores, the averages will increase.

Illustration:

Let's say that there are two schools, one with low-scoring students, and another with high-scoring students. In fact, here are their grades:

School "A"
95
95
90
90
85
85
Average = 90.0

School "B"
90
85
85
80
75
65
Average = 80.0

What you need to do is make some morally superior platitude about how the School "B" students are suffering from segregation, underfunding, discrimination, etc., and then transfer the worst School "A" students to School "B".

In this example, let's transfer two School "A" students. The new distributions are:

School "A"
95
95
90
90
Average = 92.5, an increase of 2.5 points!

School "B"
90
85
85
85
85
80
75
65
Average = 81.25, an increase of 1.25 points!

Now you can report that the average scores in both schools have increased, and can look like the highly-respected public servant that you are.

This process has a name, the Will Rogers Phenomenon, and it has already been shown to reveal deceptive medical statistics -- specifically in cancer survival.

Must be careful with those numbers...

Sunday, April 1, 2007

Does Milk Go Bad at Exactly Midnight of The Expiration Date?

Or, if you prefer, here are similar questions:

- Are people suddenly responsible enough to drink at midnight of their 21st birthday?

- Does driving become much more dangerous at 65.001 mph?

- Are you obese if your BMI is 30.0, but not if your BMI is 29.999?

These specific definitions are intended to address the problem of vagueness by pretending that there is precision where there is none. They're forms of the continuum fallacy, which is illustrated by trying to figure out how many grains of sand it takes to make a sand pile. If you have a some sand that is smaller than a sand pile, adding one grain will never convert it to a "pile". But that implies that achieving a sand pile is impossible if you only add one grain at a time.

So, how can vagueness be addressed? Or, more accurately, how can vagueness be managed? Mathematically, it cannot be addressed; it will remain a paradox.

A) Minimize one error, and ignore the other -- which seems to be the usual solution. That is, set an expiration date that will ensure that only 5% of milk will go bad -- and accept the negative consequence that lots of otherwise good milk will be discarded. Or, set a speed limit that will reduce fatal accidents to 5% of unrestricted speed fatalities, and accept that many people will pay the price of wasting lots of time by driving too slow. There's nothing magic about 5% in these cases; in fact, one would have to need to balance the two types of error to find the "correct" solution. But as long as the solution is "one size fits all", there will be inefficiencies and equity concerns. ("Why should that inept person be allowed a drivers license when it is denied to me because I am under the cutoff age? I'm a better driver; I should be driving him!")

B) Redefine these terms to have more categories; i.e. milk can have a "fresh" date, a "probably fresh" date, a "little curdling" date, and a "foul" date. This provides more useful information, though it can be unnecessarily confusing. Also, it does not address the vagueness issue because each of these new categories would be defined by artificially precise dates.

C) Assign probabilities to freshness; i.e., develop a thermometer-like scale from 0% freshness to 100% freshness. However, this doesn't address the problem; it ignores it. By analogy, this would be like replacing the vague word "fever" with only a numerical gauge.

D) Evaluate every product and person individually. This addresses equity issues, but not vagueness. That is, it doesn't explain exactly when someone becomes obese or bald.

E) Use plain-English and common-sense intuition to either override or complement numerical data and express the evaluation in plain English. Milk sitting at room temperature, someone driving 75 mph with no traffic nearby, and a psychopath 35-year old reaching for a case of beer are all examples of where unanticipated factors invalidate the original "rules" and would make us say, respectively, that the milk might be getting old, that there is minimal added danger in driving faster, and that the psychopath should probably not drink too much, regardless of age. It's related to fuzzy logic, and though it doesn't resolve the vagueness paradox, it does help solve the problem.

Friday, March 30, 2007

Can Doctors Calculate Statistics?

Here's a simple problem. Let's say that there's a disease that strikes one person in a thousand. And let's also say that there's a test for the disease that, on average, mistakenly indicates that fifty healthy people in a thousand have this disease. Now you take this medical test, and the result is "positive".

Now for the question: What is the probability that you have this disease?

Well, we know that in our population of one thousand, this test will result in a "positive" for 51 people, of which only one will have the disease. So, the answer is that the probability is one out of 51, or just under 2%.

These are conditional probabilities (or, if you prefer, Bayesian reasoning), and if understand this concept, then you probably know more than your doctor:

Hoffrage and Gigerenzer (1998; Gigerenzer, 1996) tested 48 physicians on four standard diagnostic problems, including mammography. When information was presented in termsof probabilities, only 10% of the physicians reasoned consistently with Bayes’ rule

For instance, Eddy (1982) asked physicians to estimate the probability that a woman with a positive mammogram actually has breast cancer, given a base rate of 1% for breast cancer, a hit rate of about 80%, and a false-alarm rate of about 10%. He reported that 95 of 100 physicians estimated the probability that she actually has breast cancer to be between 70% and 80%, whereas Bayes’ rule gives a value of about 7.5%.

One obvious (to me, at least) question is this: With these sort of medical tests, it seems like the outcome is "healthy" regardless of the test result. So, what's the point of the test?

Thursday, March 22, 2007

What Should We Do About Global Warming?

Nothing.

The popular assumption that global warming is real, is caused by human activity, and will culminate in disaster, requires many leaps of faith and many poor decisions.

Accepting the hypothesis that something must be done about global warming requires the following assumptions:

1. Temperatures are rising. Maybe they are, and maybe they aren't. There seems to be some dispute over this, but the government is saying that temperatures over the last 25 years have increased on average about four-tenths of one degree F. (Apparently, this average is the sum of simultaneous warming and cooling on different parts of the planet.) For argument's sake, let's assume that temperatures are rising.

2. Rising temperatures will lead to catastrophe. The magnitude of global-warming's impact (if any) is speculative, and utterly unreliable when one considers that A) Property values are not declining in coastal areas, and B) People (when not stopped by their government) are generally able to adapt to all sorts of changing conditions. From food preservation to flying to central heating to sun screens to migrating populations to automobile design to agricultural methods, people have used their heads to adapt to environmental changes that were more sudden and more severe than the long-term gradual effect of global warming. But for argument's sake, let's assume that global warming will lead to catastrophe.

3. Global warming is caused by human activity. Earth's temperature has been changing since Day One. Repeated ice ages and hot spells have happened long before people were making "carbon footprints", and for that matter, long before there were people, period. The trouble with attributing global warming to human activity is that it's impossible to do a controlled experiment. We can't switch human activity on and off to see if it indeed has an effect on temperatures. So, as a substitute, we accept as fact that:

A) Human activity increases carbon dioxide, and

B) Atmospheric carbon dioxide increases air temperatures.

By transitivity, we then use logic to conclude that:

C) Human activity causes air temperatures to rise.

All of which, so far, seems appropriate. But then comes the logical flaw of affirming the consequent:

D) If human activity causes air temperatures to rise, then rising air temperatures were caused by increased human activity.

The problem here, which ought to be clear, is that rising temperatures could be the result of any number of things that might dwarf the influence of human activity. Some of these influences are known, such as the variation in the sun's energy and in the orientation and orbit of Earth. Other influences are not known (and might not even exist), but it would be a fallacious argument from ignorance to say something like, "Because we can't figure out the other causes, we'll assume that one possible cause, human activity, is responsible for global warming."

There is no way of determining the influence of human activity on possible global warming. All the forecasters have are mathematical models, produced by vague historical associations and guesswork colored by politics and biases -- which produce dubious results with large margins of error.

But still, for argument's sake, let's assume that global warming is dangerous, and is caused by human activity.

4. Diminishing human activity is a smart investment. What exactly shall we give up to obtain less global warming? How much of global warming's destructiveness will be diminished if we use fluorescent light bulbs? What if we bought more fuel-efficient cars? What if we drove less? What if we stopped driving altogether? What if multiple families shared apartments, as was the case in the (heavily-polluted) Soviet Union?

These questions require an answer of the form: "If we discarded all air-conditioners, then the probability of coastal erosion due to rising sea levels will be diminished by...what?...percent." Or: "If we banned all air travel, then the frequency of hurricanes will be diminished by...what?...percent." Obviously, these questions cannot be answered, because no one has any idea what the benefits will be. Instead, the "environmental" advice is of the form, "It couldn't hurt if we used less energy, so let's do it."

But ignoring the cost of diminished human activity does not make this cost disappear. In fact, less human activity -- less exchange, less production -- results in lower economic growth. And economic growth is precisely what separates the living standards of the USA from Haiti, Mexico, Liberia, etc., etc., etc.

And, ironically, economic growth is what explains the difference in environmental cleanliness between the USA and the aforementioned nations. If you're not rich, you can't afford a catalytic converter.

5. The actions of a few western countries will make a difference. If, say, western governments taxed energy use to lower the quantity demanded by people in west, then there will be more energy available for the rest of the world. For example, if Saudi Arabia cannot sell as much petroleum to Americans, then it will need to lower its price -- and therefore increase the quantity demanded by the Chinese. Think of it this way: If you needed to sell your home, and there was a sudden drop in demand, what would you do? You would, as any homeowner knows, drop the price. In the end, your home will still get sold. Similarly, the petroleum will still be sold -- and used.

Wednesday, March 21, 2007

Am I Discriminating Against Minorities?

Let's say that you own a building with ten apartments in a city where 20% of the population is "minority", and that none of your apartments are rented to minorities. Does that make you a "racist"?

In fact, if you were to randomly select tenants for your building, there is an 11% chance of that outcome. And, for that matter, there is a 36% chance that your building would be under-represented with minorities; i.e., a situation where they occupy either none (0%), or only one (10%), apartment. This is not conclusive evidence of discrimination.

[If you like probabilities, then you are probably aware that this is a result of a binomial calculation. There's a simple calculator that does the work for you here.]

Now, let's say that 50% of landlords in this city have no minority tenants; the probability of that happening by chance is about zero. Now, since the expected frequency of such an outcome is only 11%, than means that something is "wrong" with about 80% of the landlords. And all you have to do is figure out which 80% might be discriminating -- or, as is typically the case:

A) Punish 100% of landlords, innocent or guilty, and then,

B) Compel all landlords to have renting quotas.

And it is at that point when you will be discriminating -- against the majority. Is there any reason why that is better than discriminating against the minority?

And worse, the above example assumes that the lack of minority tenants is due to active discrimination. But would active discrimination also explain the lack of male kindergarten teachers? Or young people in hospitals?

If you want to see whether you are discriminating, look at the neighbors you chose to live near and look at the spouse you selected -- and then look in the mirror and ask yourself if you are guilty.

Monday, March 19, 2007

Which is More Useful: Empirical Data or Human Judgment?

Let's say you're a physician and you prescribe a medication to a patient. This medication has known side effects, but the patient reports a side-effect that seems impossible to ascribe to the medication.

Which would you initially feel more comfortable believing:

A) The patient is reporting a symptom unrelated to the medication (your bias: empirical data), or

B) The patient is suffering from a heretofore-unknown side effect (your bias: human judgment).

Now let's say that you just charted a driving route with a mapping program, and a person familiar with the area sees the computer-generated route, and says, "I drive around there all the time and know of a faster route."

Which would you initially feel more comfortable believing:

A) The program, using mathematical algorithms free of human bias, is correct (your bias: empirical data), or

B) The person, with knowledge that the computer doesn't have, is correct (your bias: human judgment).

In both cases, the empirical data derived from the agglomeration of large quantities of data and objective measurements is free of human biases; e.g., the patient might be a hypochondriac, and your motoring friend might be avoiding the best route because of one or two bad experiences.

But...in both cases, the empirical data was also generated by humans -- humans who can easily overlook critical factors when assembling data. And the empirical data was processed by a quick-calculating, but nevertheless very dumb, piece of electronic equipment that cannot consider any factors beyond what humans fed it.

So, the answer is: There is no simple answer; just consider the quality of your sources.

Tuesday, March 13, 2007

How Can I Look Like an Expert?

An expert knows how to establish a reputation; here's how to do it:

1) Find an event whose outcome is almost assured, and loudly predict that outcome every time. For example, find the name of your incumbent political representatives and predict that they will win. On average, if there is a 90% chance that incumbents win, then you will have a winning percentage of 90% -- and bragging rights! IMPORTANT: Don't be a smart-ass and randomly pick incumbents 90% of the time, and challengers 10% of the time; if you do, your winning percentage will drop to 82%. (In that case, you will correctly pick an incumbent .9 x.9 = 81% of the time, and you will correctly pick a challenger .1 x .1 = 1% of the time.)

2) Make lots of unlikely predictions. Once in a while (like the proverbial clock that's right twice a day), you will be correct. At that point, you should constantly remind others of your uncanny forecasting ability -- and they will frequently return to tap your expertise. No one will remember your losing predictions anyway -- and even if they do remember, those incorrect predictions will be dwarfed by your spectacular long-shot insights.

3) Use lots of jargon (preferably in Latin), name-drop, boast about your academic credentials and your years of experience, and condescend to your audience. If you have a European accent, use it! Never acknowledge that other views might be correct, never express any doubt, never display humor (though sarcasm can be OK) -- and if anyone questions you, feign disgust (unless disgust comes naturally). Be dismissive of others and make your impatience known when they speak; talk over them if you need to. And utilize Points #1 and #2!

Tuesday, March 6, 2007

Does Proximity to a Lake Cause People to Have Colonoscopies?

There is a weak, though statistically significant correlation between the two. Specifically, an increase in a state's area under water is associated with an increase in the percentage of people over the age of fifty who have colonoscopies.

Therefore, in order to encourage more people to have colonoscopies, we should increase the size of lakes -- or somehow submerge more land under water. Artificial flooding would probably do the trick -- or perhaps increased carbon emissions would lead to health-improving global-warming floods.

But then, maybe we have cause-and-effect reversed. Perhaps an increase in colonoscopies results in more land-under-water; colonoscopies might be a bigger threat to the environment than the aforementioned carbon emissions!

Or, could this all be coincidence? In fact, I went to StateMaster, and arbitrarily selected the "Percentage of Area Water" statistics. Statemaster then spit out a long list of correlations with related (and seemingly unrelated) variables, and "Colonoscopy Testing" just happened to be on the top of the list.

No matter how often we hear "correlation is not causation", it never quite sinks in.

Maybe this should be called the "correlation-is-causation" bias.

Makes you think about the relationship between human activity and global warming, too.