How to Write Basic SQL Queries in BigQuery
Image by Editor | Ideogram
This tutorial introduces the basics of SQL querying with Google BigQuery. While very similar, BigQuery SQL has some syntax differences with standard SQL, some of which will be highlighted along the post. For those familiar with SQL, adapting to BigQuery should be pretty straightforward.
Throughout examples, we will explore basic SELECT-FROM-WHERE queries and discover how to sort retrieved results in BigQuery. The examples shown below revolve around three tables loaded as illustrated in the previous article in this BigQuery series: ‘chinese_cuisine’, ‘japanese_cuisine’, and ‘thai_cuisine’, describing popular recipes from different asian cuisines. These three tables have identical schema, consisting of five columns:
- name: string
- description: string
- type: string
- preparation_time_mins: string
- difficulty: string
Basic SELECT Statements
One of the simplest queries in BigQuery is the default one shown when opening a new query tab. For instance, when opening a new query tab for the ‘chinese_cuisine’ table, the default query is:
SELECT FROM `bigquery-project-432209.asian_cuisines.chinese_cuisine` LIMIT 1000
Executing this query by clicking the “RUN” button will throw an error. The reason is that it misses an important element: the column(s) of the table to be retrieved. In the simplest case that we want to retrieve all columns, then we only need to add an asterisk ‘*’ after the SELECT keyword.
SELECT * FROM `bigquery-project-432209.asian_cuisines.chinese_cuisine` LIMIT 1000
This query returns the entire table, or the first 1000 rows if it were larger (due to the ‘LIMIT 1000’ clause at the end).
name description type preparation_time_mins difficulty
Egg Fried Rice Fried rice with eggs, vegetables, and sometimes meat, seasoned with soy sauce. Side Dish 20 Low
Spring Rolls Crispy rolls filled with vegetables, meat, or seafood, served with dipping sauce. Appetizer 45 Low
Sichuan Hotpot A communal dish where diners cook various meats, vegetables, and tofu in a spicy and numbing broth at the table. Main Course 90 High
Mooncake A traditional Chinese pastry typically eaten during the Mid-Autumn Festival, filled with lotus seed paste or red bean paste. Dessert 120 High
Peking Duck A famous Beijing dish featuring roasted duck with crispy skin, served with pancakes, hoisin sauce, and scallions. Main Course 240 High
Ma Po Tofu A Sichuan dish made with soft tofu set in a spicy, oily, and salty chili-and-bean-based sauce, usually with minced meat. Main Course 25 Medium
Kung Pao Chicken A spicy stir-fry dish made with chicken, peanuts, vegetables, and chili peppers. Main Course 30 Medium
Chow Mein Stir-fried noodles with vegetables and a choice of meat, often seasoned with soy sauce. Main Course 30 Medium
Sweet and Sour Pork A popular dish featuring deep-fried pork coated in a tangy, sweet, and sour sauce with bell peppers and pineapple. Main Course 35 Medium
Sesame Chicken Deep-fried chicken pieces coated in a sweet and savory sauce, garnished with sesame seeds. Main Course 35 Medium
Hot and Sour Soup A flavorful soup with a tangy and spicy broth, typically made with mushrooms, tofu, bamboo shoots, and pork. Soup 40 Medium
Wonton Soup A clear broth soup containing wontons (dumplings) filled with seasoned meat or seafood. Soup 50 Medium
Steamed Dumplings Soft, steamed dumplings filled with meat, seafood, or vegetables, often served with a dipping sauce. Appetizer 50 Medium
Dim Sum A variety of bite-sized portions of food traditionally served in small steamer baskets or on small plates. Appetizer 60 Medium
Red Bean Buns Steamed buns filled with sweet red bean paste, a popular dessert. Dessert 90 Medium
Just like in standard SQL, retrieving one or more specific columns is as easy as specifying their names separated by commas in the SELECT clause.
SELECT name FROM `bigquery-project-432209.asian_cuisines.chinese_cuisine` LIMIT 1000
This query returns the names of chinese dishes in our table:
Egg Fried Rice
Spring Rolls
Sichuan Hotpot
Mooncake
Peking Duck
Ma Po Tofu
Kung Pao Chicken
Chow Mein
Sweet and Sour Pork
Sesame Chicken
Hot and Sour Soup
Wonton Soup
Steamed Dumplings
Dim Sum
Red Bean Buns
Meanwhile, this query retrieves a list of japanese dish names along with their preparation time in minutes, by selecting the two associated columns accordingly:
SELECT name, preparation_time_mins FROM `bigquery-project-432209.asian_cuisines.japanese_cuisine` LIMIT 1000
Output:
name preparation_time_mins
Miso Soup 15
Onigiri 20
Matcha Ice Cream 30
Sushi 60
Sukiyaki 40
Mochi 90
Tempura 45
Ramen 60
Tonkatsu 30
Takoyaki 40
Filtering Data with the WHERE Clause
So far, the above examples retrieve all the data instances from the queried table without any filtering. If instead of retrieving all thai dishes from the ‘thai_cuisine’ table, we only were interested in retrieving those dishes labeled with a low difficulty level of preparation, how can we apply this condition in our query?
This is where the ‘WHERE’ clause comes into scene. A ‘WHERE’ clause contains a boolean condition that can be either true or false, and generally involves checking the values of one or more columns in the table. This condition acts as a filter in the query, returning only those rows (recipes) where the condition holds true.
For example, adding the following filter helps retrieve the name, dish type, and preparation time of thai dishes with low difficulty.
SELECT name, type, preparation_time_mins FROM `bigquery-project-432209.asian_cuisines.thai_cuisine` WHERE difficulty="Low"
Some bad news here: only two of the thai dishes in our table seem to be easy to prepare!
name type preparation_time_mins
Mango Sticky Rice Dessert 20
Thai Iced Tea Beverage 10
Filtering based on numerical data -for instance based on the time of preparation- is typically done via comparison operators like , =,
SELECT name, type, difficulty FROM `bigquery-project-432209.asian_cuisines.chinese_cuisine` WHERE preparation_time_mins
The ORDER BY clause is useful to present query results in a certain order, e.g. alphabetically, in decreasing or increasing order of a numerical column values, etc.
This example sorts the names of all japanese dishes alphabetically and presents the name and description of each recipe in such alphabetical order:
name description
Matcha Ice Cream "A popular Japanese dessert made from green tea powder (matcha) mixed into creamy ice cream."
Miso Soup "A traditional Japanese soup made from a stock called dashi, miso paste, tofu, and seaweed."
Mochi "A traditional Japanese rice cake made from glutinous rice, often filled with sweet fillings like red bean paste or ice cream."
Onigiri "Rice balls often wrapped in nori and filled with various ingredients like pickled plum, salmon, or tuna."
Ramen "A noodle soup dish with a meat or fish-based broth, typically flavored with soy sauce or miso, and topped with sliced pork, nori, and egg."
Sukiyaki "A hot pot dish of thinly sliced beef, vegetables, tofu, and noodles, simmered in a sweet and savory broth."
Sushi "Vinegared rice served with a variety of ingredients such as raw fish, vegetables, and seaweed."
Takoyaki "Ball-shaped snacks made from a batter filled with octopus, tempura scraps, pickled ginger, and green onion, topped with takoyaki sauce and bonito flakes."
Tempura "Lightly battered and deep-fried vegetables, seafood, or meat, often served with a dipping sauce."
Tonkatsu "A breaded and deep-fried pork cutlet, typically served with shredded cabbage and tonkatsu sauce."
Interestingly, few japanese dish names seem to start with the first letters in the roman alphabet!
The ASC and DESC options can be optionally appended to the SORT BY clause to specific an ascending or descending order, respectively. The default order is ascending or alphabetical (ASC), whereas DESC would display the results in inverse alphabetical order.
This last example applies the SORT BY clause to a numerical column, sorting japanese recipes of both medium and high difficulty in decreasing time of preparation.
A few things to note here. First, when using the WHERE and ORDER BY clauses jointly, ORDER BY normally comes last. Second, to apply a filter that checks for more than one possible categorical (string) value, we use the IN() operator that contains the values to verify in the filter, separated by commas.
In the next post of this series, we will delve into more advance queries, some of them involving multiple tables jointly.