Big Data Disaster

Access to lots of fast moving data without the thoughtful and ethical involvement of human beings spells disaster. A poignant example of this is the use of so-called Big Data by the three major credit agencies in America. Experian, Equifax, and Transunion have done great harm to the lives of millions of Americans through their irresponsible use of data. According to a new FTC report published today, mistakes exist on the credit reports of 20% of Americans. Credit reports are used to grant or deny access to loans and other services. Not only are these agencies getting the facts wrong far too often, but they are making if virtually impossible for people to get these mistakes corrected.

If you want to understand the horror faced by millions of Americans, watch the report that was aired by 60 Minutes last night. It will chill you to the bone and make you angry. Credit agencies are hiding behind walls of Big Data, using it as an impenetrable barrier to the fair treatment that people deserve. The algorithms that these agencies use for credit scoring are enshrined in mystery, hidden from public scrutiny.

Data, no matter what its size, speed, or source, must pass through the hands and minds of thoughtful and ethical people. It is necessary for people who understand data and are committed to using it ethically are part of the process, otherwise, Big Data can become an oppressor much like Big Brother, holding senseless sway over our lives. You can’t fight data! You can only appeal to human beings. When human beings are removed from the process, when human brains and empathy are circumvented, to whom will you turn for reason and justice?

Take care,

11 Comments on “Big Data Disaster”


By Rob Meredith. February 11th, 2013 at 7:27 pm

Hi Stephen,

The ethical aspects of big data, business intelligence and technology-based decision support is an area that lacks significant work. I wrote an academic paper with a colleague a few years ago looking at the topic - one of the sources we cited deals precisely with the problem of taking humans out of the decision-making loop (in this case, in the context of medical decision making). The source reference was:

Fox, J. (1993). Decision Support Systems as Safety-Critical Components: Towards a Safety Culture for Medical Informatics. Methods of Information in Medicine, 32(5), 345-348.

When you take people out of the loop, it’s not just an issue of who you turn to for “reason and justice”, it’s a problem of who bears the moral responsibility for decisions made by (or even significantly influenced by) machines and algorithms. Who gets held accountable?

In the case of the credit scoring agencies you refer to, it seems that the answer might be ‘no-one’.

By Colin Michael. February 11th, 2013 at 7:50 pm

Only 20%? That seems low. I can’t think of anyone telling me they found their credit report to be 100% correct unless it is something they watch constantly. I’ve had account reported open that were long closed, items that were removed under the bankruptcy laws return several years later, other people’s accounts on my reports, etc. One time when I pulled all three reports each agency had different erroneous items. How are people supposed to count on such as system when they go to apply for a job or to rent an apartment, never mind buy a house? Is it any wonder people fall for “clean your credit report” scams? What other recourse is there? I’ve managed to get a number of things removed, but they usually come back in a few months.

On the other hand, my wife has no credit report. Nothing comes up. She was happy to find nothing wrong, but a blank credit report could be a real problem when she needs to re-enter the work force. They have you coming and going.

By Jo Bryce. February 12th, 2013 at 2:51 am

Has anyone read “Socrates Reloaded: the case fo ethics in business and technology” by Frank Buytendijk?

By Stephen Few. February 12th, 2013 at 10:00 am

Colin,

I agree that the 20% figure seems low. As you pointed out, what’s worse is the fact that the credit bureaus make it almost impossible to get errors corrected, even when the errors are obvious. They’ve put a bureaucracy in place to discourage people from making the attempt. This shows how little they care about the integrity of the data or the great harm that they cause to innocent people. I suppose we shouldn’t be surprised. Corporations are soulless unless there are leaders at the helm who infuse them with humanity (intelligence and empathy).

By Varius Madan. February 23rd, 2013 at 3:42 pm

As someone who spent nearly three years in pointless litigation with the big 3, I can personally attest that they could care less about correcting problems.

1) these companies deal with their “customers” like banks: the higher your net worth or social status, the more attention they will give you and more than likely they will correct a problem (my understanding is that there are two different legal teams, one for the high profile clients and one for the the working joe).

2) the FCRA is a piece of legislation that is poorly written and long overdue for a overhaul. Case in point: the reporting agencies are only obligated to correct errors and not omissions. I.e. if you can prove in a court of law that information is being provided to them that their algorithms aren’t picking up, they can say “so?” and you can’t do a damn thing.

By Howard. February 25th, 2013 at 4:34 am

Stephen,

Before I begin, I’d like to butter you by saying I love your books.

I work in the UK for Callcredit (a Credit reference Agency, CRA) and would like to explain a few things. I’m not looking to get absolution, but just to highlight a few constraints the CRA industry has to deal with.

No company has 100% data quality, it’s impossible, and in this industry I understand data quality is highly important, we are dealing with people’s lives. 80% accuracy is not great and as someone has already pointed out the quality of data differs from agency to agency.

The reason for this is that CRA’s don’t all get their data from the same suppliers. Meaning we are at the mercy of suppliers data quality. If your bank says you’ve changed address but your insurance company says you haven’t, how does the CRA know which to believe? Now imagine being confronted with millions of conflicting pieces of data on a daily basis, humans cant handle it in a timely fashion. We have an algorithm which decides on how to update our system, removing the human interaction.

Also, in the UK, the CRA’s are not allowed to change any people’s data without confirmation from the initial data supplied of the data. If someone calls to complain about a court judgement is it ethical that we change the data without confirming it with the courts. Is it ethical to allow any or our employees to change any person’s data without confirmation from the primary source? What if your neighbour calls a CRA and impersonates you to get your address changed or says you’ve changed your last name? How would you feel if a CRA actioned that request?

What happens with data quality disputes is that the CRA contacts the primary source to confirm the truth (from their point of view). The CRA then has to wait until there is a response. This period differs in length of time depending on the data supplier (up to 28 days in the UK) then the CRA responds to the person who raised the dispute. All the CRA is able to do is report on what the data supplier has said. E.g. if your phone supplier says you didn’t pay your phone bill, when you really did, who are the CRA to argue that case, the CRA has to trust the data supplier. It’s unfortunate but true.

Not that it will mean much to some of your readers but the quality of a CRA’s data set is paramount to selling their services. CRA’s continuously compete for business based on how better their data is than their competitors, so it is in their best interest to have it as accurate as possible.

I don’t work with disputes personally but I have been on the sticky end of one. Until I joined Callcredit I had a similar frame of mind, but now I understand the complexity of handling millions of peoples data and restrictions the CRA’s deal with I’ve come to appreciate their side of things.

I’m not saying the process is perfect or fast. And I’m not saying it can be improved but hopefully you’ll understand some issues as to why it is sometimes hard to put data quality right.

By Peter Knight. March 7th, 2013 at 6:13 pm

Hi Stephen,

While a gree with the overall tone of your article, I do have a bone to pick with this line - “The algorithms that these agencies use for credit scoring are enshrined in mystery, hidden from public scrutiny”.

Well maybe they are… but if the algorithms were open to public scrutiny, how many people in the public would understand them anyway? Try explaining a meta-model of Random Forests, Neural Nets, Support Vector Machines and Gradient Boosted Trees to a lay person… you could take bets on how long it would take for their eyes to glaze over! Hell, try explaining just one of those modeling concepts.

If I was one of these agencies, why would I use a simple, easily explainable model in favour of a more complex and more accurate model?

Of course if the data is bad that’s another story :)

By Stephen Few. March 7th, 2013 at 6:39 pm

Hi Peter,

I wouldn’t expect the general public to understand these algorithms. I do think the algorithms should be inspected by experts, however, to make sure that they assign credit scores in a reasonable manner that serves the best interests of the public. Do you agree that this would be appropriate and useful? Our credit scores have too great an influence on our lives to remain hidden from scrutiny. As I understand it, the credit bureaus will not even provide a plain English description of the criteria that they use.

By grasshopper. March 15th, 2013 at 11:55 am

I agree that it’s hard to get mistakes corrected - I’ve run my annual free credit report in the past, and found a few small errors (spelling of my name, and a ‘previous address’ that I did not live at). I went through their form to send in a correction, and never heard anything back, and the items were not corrected the next year when I checked them. These were small errors that wouldn’t affect my credit, and I can imagine it would be even harder to fix a ‘real’ problem.

There needs to be more transparency in their process, and responsiveness to questions/corrections.

By Howard. March 21st, 2013 at 1:49 am

Each credit agency uses their own algorithm. For them this is a point of differentiation for selling thier products, one of the resaons for secrecy. Working out individuals credit worthyness is not straight forwad there are a lot of variables and some need to be weighted differently e.g. a court judgement is more imortant than missing a mobile phone contract payment. missing 3 or 4 mobile contract payments in a row may raise more concerns than a single unpaid electric bill etc. making is basic may well make it worse to get gredit, or open the lender to greater risk of giving bad debt.

By Stephen Few. March 21st, 2013 at 3:08 am

Howard,

I’m aware of the fact that the credit bureaus use different algorithms. Given the fact that they keep them secret, how could it be otherwise? It isn’t necessary for their algorithms to be kept secret as a means of protection. Isn’t this what patent law was designed to do?