Paul Bradshaw

Savile extracted

In Uncategorized on February 23, 2013 at 11:47 am

On Friday the BBC released documents from The Pollard Report into the Savile inquiry.

These were published as scanned PDFs, making it impossible to search text or count mentions of particular terms.

We’ve used document extraction service DocumentCloud to convert the two key documents – appendices 10 (statements) and 12 (emails and documents) – into text. These are linked below. If you use them, let us know so we can continue to do this.

Savile Transcript appendix 10 (PDF)
Savile Transcript appendix 10 (Text)


Savile Appendix12 (PDF)
Savile Appendix12 (Text)



BBC College of Journalism teams up with Help Me Investigate for health reporting event

In Uncategorized on February 19, 2013 at 7:08 pm

We’ve teamed up with the BBC College of Journalism for an event on reporting the new health system that comes into force this year.

From April powers to control health spending, and to hold that to account, will be shifted. Over 200 new groups of GPs and other local representatives will have new responsibilities to commission health services, while local councils will also have new spending powers, as well as new responsibilities.

Journalists and the new health system‘ is bringing together the people who will be scrutinising the new clinical commissioning system – journalists, bloggers and councillors – with the new players making key decisions. 

It will discuss what are likely to be the important issues, as well as providing an opportunity for building new contacts with bodies, hyperlocal bloggers and health experts. 

The event is being held at Birmingham’s Margaret Street on March 26. Sign up and get more details at

Too big for Excel? What to do with big datasets

In Uncategorized on February 12, 2013 at 10:00 pm

Recently the NICAR mailing list (for journalists who use computer assisted reporting) discussed how they dealt with datasets that were ‘too big for Excel’. With their permission, I’m reproducing a digest of the highlights.

How much is too much

Different versions of Excel have different limits to the data they can handle. From a million rows in Excel 2010 to just 16,000 rows by 256 columns in Excel 5, Office Watch gives a good rundown of the various versions.

Tom Torok points out that Excel 2007’s million row limit is per sheet, rather than per workbook (spreadsheet), so if you have