In this tutorial, we'll look at taking your first steps in BigQuery by importing data locally or from a Google Sheet. We'll also look at the Parquet file format, and how it may be right for your project.
Hello. I wanted to take a step back today and talk about getting started with BigQuery by importing data from a couple of different places, maybe from data that you have locally on your local machine or from Google Sheet. So maybe not something necessarily production data, but something that you have locally. You want to do some exploration or do some sample queries. I’ll talk about some ways of getting started.
So let’s start. We’re going to go to our sample project and our data set where you go to create a table and I’m going to go down to upload. We’re going to start by just uploading a file that I have locally. There is a couple of different formats that we’re going to look at. CSV is probably the one that you’re most familiar with, so a file that you can open up, or you can open up in a text editor or open up with Excel.
We’re going to look at the Parquet format, which may or may not be one that you’re used to. There’s some kind of interesting things about it. It’s.
a compressed format, so it’s binary. If you open it up, you won’t be able to see the data, unless you’re opening up in a Parquet reader. The thing that’s kind of cool about it is that it’s, like I said, it’s very compressed. If you have a one terabyte, CSV file, it’s only going to be a 130 gig Parquet file. It’s 87% compressed by comparison, and it has the schema built into it, which is also really cool.
It’s not ideal for every situation, but it’s definitely worth looking into. And then if you’re working in, let’s say Python, for instance, in using like the Panda’s library, you can natively read and write from the Parquet format. If you’re doing some work in Pandas, you can, instead of using like to CSV or to Excel, you can just write out to Parquet and then you could upload it directly in the Parquet format into BigQuery. So let’s say I’m a Parquet file here. I’m going to click on that and then make a table name.
And you go here and click create table.
Okay, so there, and everything imported. Let’s say that we want to do a different kind of report. We don’t have everything in a file that I want to upload. Let’s say I have it in a Google sheet. There’s actually kind of a neat relationship that BigQuery has with Google Sheets. You can kind of use Sheets as a pass through. And you kind of use Sheets as an interface if you want to make change to the data.
Let’s go back to our data set and I’m going to go to create data or create table, and I’m going to go to drive, and I’m going to change this to Google Sheet. And we go here and I’ve already got a sample Google Sheets set up with just a few columns. I’m going to take this URL and I’m going to copy it. I’m going to paste this in over here. Now I haven’t done anything with the share, with the privacy. This one is private only to me. You don’t have to make it available to BigQuery or anything. I’m not going to do anything with the sheet and range. I’m just going to call this sample sheet.
And for right now, I’m not going to do anything with these. We’ll talk about these in a second. I’m just going to go ahead and let’s just turn down the detective on here. And header rows, skip. Let’s just leave this one. So what that means is we’re going to go ahead and skip this header row. We don’t want to import this. And then by leaving, by checking this box, what it means is we want BigQuery to try and figure out the data types. In other words, we want it to infer that this is a string. This is a date, and this is an end. So let’s see what kind of a job it does. I’m going to go ahead and hit create.
All right and that worked. Let’s go back. I’m going to delete this.
And let’s create another one.
But let’s say that we want to specify the types, just to go through kind of the same exercise.
Let’s say we want to make slightly different names.
That’s still going to be a string.
One trick that I usually like to do when I am specifying the columns, is I will switch to edit as text, and I will copy that and then paste it somewhere, and then switch it back. Just because you may wind up typing that a few times, so it’s good to copy it. Then when you go to do this process again, you can switch to edit as text and then paste it in just to avoid having to do the whole process. Header rows, put a table name in. Let’s call this scores. And let me go ahead and click create table.
One thing you’ll notice when you’re building or when you’re importing data that comes from Google Sheets, is that you will no longer have this, the preview tab. That’s really the only difference.
The other thing, or one thing that that is really cool. So let’s flip back over here. This data is updated in real time. If I change this and then we go ahead and run this again, this data is updated automatically. It’s almost like you’ve got an interface to update data. It’s a really cool way to play with your data and do sample queries and kind of hone them. And then maybe if you didn’t move them over and do it on production data, or do it on data in a different environment, it’s a really good way to work interactively. So again, thank you for watching and if you’re interested, please subscribe for more on BigQuery, Google Analytics, and Google Tag Manager.