Don’t be so sure of yourself

August 6, 2010

Try as I might, most statistics is beyond my understanding.  However, I do know one thing that put’s me ahead of most people on this front:

Don’t be so sure of yourself

The spurious precision of a test statistic is a great comfort – you are trying to understand something complicated, and LOOK!  A number, which tells us everything!  Except it doesn’t.  It tells us something about a parameter of model which may or may not approximate your data.  The data may be misleading (intentionally or unintentionally on the part of the person who generated the dataset), or the tests chosen specifically to produce the magical “statistical significance”.

When you read an article about how such-and-such has been “proved” or “shown”, ignore the spurious confidence and head for the data.  What AREN’T you told?  How much data was collected to draw this conclusion?  On this final point, I can’t put it better than this excerpt from a contaminated land email list I’m on:

For samples with an inhomogeneous matrix, (most made ground in this country) the single result taken from the 0.5 kg sample submitted, generated on a 5g sample that is actually analysed is therefore questionable. In fact the statistics of taking a 0.5kg sample on a 10m x 10m grid at 0.5m depths is the same as going into Leeds city centre, randomly tapping a person on the shoulder, asking them if they are male or female and then asking their age and using this data to extrapolate the average age and sex of the population. Not many census or market research firms would view this as credible statistics.

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine


One-stop-shop for risk screening values

July 16, 2010

I’ve just had ANOTHER brilliant thought (quote at 5 mins)!  Well, my colleague did, but she said I was welcome to it, and if I made any money out of it I said I would buy her a drink.  Seems like a good deal to me.

For those of you who are not involved in environmental consultancy and risk assessment, you might assume that by now it would be a simple thing to determine if the level of a contaminant in something (soil, water, food etc) is dangerous or not.  It isn’t.  The problem is that the available screening values (which can be enshrined in legislation, or simply the firm opinion of the relevant agency, which almost amounts to the same thing) are smeared thinly and irregularly across the internet.  If you come across a non-standard contaminant in a non-standard situation, it can be a nightmare even to find out whether there ARE any statutory limits, let alone what they are or if they apply to your particular situation.

There could well be an opportunity for the brave soul(s) who decides to collate ALL of this information into one easily searchable database.  You could charge quite handsomely for access to this, so long as you could guarantee the currency of the information, because for the consultants who need to deal with the figures time is money.  One graduate could spend one day searching for information for one project they are working on, and still not necessarily get the correct information.  That’s £250 right there, gone.  Multiply that by a lot of projects and it could save consultancies serious money, as well as making their services more saleable because they could guarantee they would be using the most up-to-date and relevant information (this is a problem more often than you would think).

Seth Godin and I are obviously thinking along the same lines.

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine


Risk – and splitting hairs

May 5, 2010

As society becomes more and more risk averse and “compliance” (i.e. box-ticking) based, what once was probably a sensible attempt at quantifying risk becomes increasingly silly.  Contaminated land – ground contaminated by former land uses and now presenting a risk to “receptors” (usually people or watercourses) – is one such area.

Old-school industry frequently left a polluting legacy – there were few if any rules on how you needed to control the polluting output from your processes.  It is not that uncommon to find significant pools of oil on or beneath the surface of derelict land that have persisted for decades.  It is undeniable that this can present a very serious health risk, therefore we are obliged to deal with the problem, and rightly so.

A large pool of oil (“free product” in the jargon) is clearly a problem which needs to be addressed.  But what about contamination that isn’t immediately visible?  Soil on an old industrial site can look like normal old soil but can actually be chock-full of nasties, you don’t need chunks of asbestos and pools of oil to present a risk.  In these circumstances, we will have needed to analyse the soil and determine the level of contaminants present.  But what do we compare it to?  What level denotes an unacceptable risk?

We can try and work that out.  We can make some assumptions, based on what we know of the chemical in question – how it behaves, how mobile it is, how quickly it breaks down.  But soil is highly variable, and differences in the properties of the soil can have a very large effect on the behaviour of the contamination.  Despite this, we can at least take measurements and determine how the contaminant ought to behave based on past experience and laboratory data.  Highly imperfect, but reasonably good.

The problems start to arrive when you consider how a human being using the site will be affected.  Will the site be a car park, with a small fringe of green around the outside?  If so, even quite high levels of contamination will probably not cause a risk as people tend not to spend much time on the small grass verges next to car parks.  But what if the site will become family homes with gardens?  This time, we need to be worrying about little children running around their gardens all summer, getting mucky and generally being exposed to contamination.  Clearly, the risk is greater.

But we still need to quantify this.  And once we have made the very sensible decision that we need to differentiate between people using a site as a car park and it being a family garden, we create a whole heap of trouble.  How do you QUANTIFY the difference in risk?  By making quantitative assumptions about behaviour.  You need to decide on some sensible assumptions for how often the typical person will be on the site, how long each visit will be, how much soil-derived dust they are likely to be inhaling (say the site is a sports field – the heavy breathing caused by exertion will increase this – but precisely how much?), how much dust they will “trackback” to their homes, and so on.

But making all these assumptions makes the model you have used highly specific.  Take the sports field for example.  A teacher regular taking PE lessons there could be at risk from contamination.  But how much is acceptable?  Part of answering that questions comes from making an assumption about how long they will be doing it for (as in years of their life).  To be on the safe side, most of these types of calculations assume that the adult will work at the site their whole working life.  But how long is that?!  I once had to rerun some calculations because it was decided that a teacher would be taking a one-year PGCE (teaching) course after they went to college, so their lifetime exposure to the playing field of death would be one year shorter.  When you get to that level of detailed assumption it becomes slightly absurd.  Unfortunately, this is the rod we make for our own backs when we take the perfectly sensible decision to distinguish between different types of risk.

So next time you read in the local paper that a site is “contaminated”, it may not be that simple.  Someone has made a very long list of assumptions that may or may not be an accurate reflection of what goes on.  And I haven’t even started on sampling error yet.

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine


Follow

Get every new post delivered to your Inbox.