Is unpredictable data making it difficult to run queries? Clean data is always best, but since data is rarely clean to begin with, you may need to use error handling in BigQuery. In this video, find out when to use it, and how it can help.
Well today, I want to talk about error handling in BigQuery and some things that we can do to mitigate inconsistent data. So we’ve got, over here, is a data set that we’ve been given that has some data in different formats, but each one of these columns is a string. So we’re going to import it into BigQuery, which you’ve already done here. And we’re going to try and put some structure and get those columns into the proper format. And then we’re going to introduce some bad data into the table, and we’re going to see what we can do to deal with that bad data. So, first thing we’re going to do is get all the columns in by name. So this one… For the start date, we’re going to wrap it in the timestamp function and we’ll keep the same name. Start, date and score.
All right, let’s go ahead and run that and just make sure that it still works. Okay. All right. Oh, and has start date. Okay. So first thing we want to do is we’re going to introduce some bad data. Let’s say, we’ll take this one out and we’re just going to pass, let’s say string to it and see what happens when we run this function now. So we’re going to get an error because the timestamp does not know what to do when it gets a string, when it gets something that it can’t process into a date. So one thing we can do is add the safe prefix to the timestamp function. So there’s a number of functions that BigQuery has where you add the safe prefix. And instead of sending an error, it’s just going to return no and it’s going to go on processing the query. So what does it say free fix?
It is just adding safe in front of the function and see what does that look like? All right. So instead of sort of erroring out, now we just have NULL there. It does not work for everything. There are a number of functions, so aggregate functions, so MIN, MAX, things like that, it doesn’t apply to ARRAY functions. It doesn’t apply to user-defined functions. It does not apply to window functions, the analytic functions, it doesn’t apply to. So it certainly does not work in every situation, but there are a number of functions that it does apply to, and it is helpful. So let’s look at the other columns. So score. So right now score is a string, but let’s say that maybe later on, maybe this is the start of a larger pipeline and later on, we’re going to do some math on the score. So let’s go ahead and cast that to an integer. So let’s do cast score as N 64.
Okay. Whoops. And as score. Okay, so that was good. But now let’s say that we have some bad data. We’re going to introduce some bad data into the score and now let’s see what happens. Okay, so we’ve got an error in that column, so how can we deal with that? Let’s try adding the safe prefix to cast and see what happens now. Okay, so we cannot do that. We can’t add the safe prefix to the cast function. What we can do… BigQuery does give us a safe cast function. So for whatever reason, they don’t let you just put the prefix on there, but there is an entirely separate, safe cast function that it lets you use. So let’s form that, and see what happens. So it’s the exact same result as a safe dot, but it’s just a completely different function for whatever reason.
So two different ways of dealing with errors in functions and in casting. So ideally you will have clean data from the start. It’s not always an option, obviously when you’re dealing with real world data, especially if you’re doing something just like data exploration. So this is obviously just one way of getting started. So thanks for watching. If you’re interested, please subscribe for more on BigQuery, Google Analytics and Google Tag Manager.