Working with Arrays in BigQuery SQL to Manipulate Text

Proper knowledge of array manipulation in SQL is a must have for every data analyst (or marketer). This article will cover how to manipulate text strings in arrays along with error handling.

Task 1: Creating an Array

Creating an array is a quite easy process. Imagine you want to create an array of 2 text strings:

  1. I want to buy bananas!
  2. Bananas are sold out.

The code is:

SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences

This SQL command will create a table with 1 column called “sentences” and with 1 row (notice the row count column) where the value is our array:

Task 2: Unnesting an Array

So now you have successfully created the array, what if you want to convert the array values into separate rows? Then you need to use UNNEST function. This function is basically “unpivoting” your array. Notice the row count which is no longer 1 but it’s 2 now.

After using the UNNEST function, you need to run a new SELECT on the top of the previous query.

The code is:

SELECT
strings

FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings

Task 3: Converting an Array Back to String

Imagine you have a column with arrays and you want to convert them back to strings. You need to deploy ARRAY_TO_STRING function. It has 2 arguments: the array AND the delimiter which is used to connect elements of the array. In my sample, I am choosing space:

The code is:

SELECT ARRAY_TO_STRING([‘apples’, ‘bananas’, ‘potatoes’,’carrot’],’ ‘) AS array_as_string

Task 4: Selecting the First Word in an Array

We now need to select the first word from my dummy table which I created from my dummy array:

The first word should be added as a new column – so I want to see “I”, and “Bananas”. How to achieve that? At first, you need to use SPLIT function to convert the text string to an array. SPLIT function has 2 arguments. The first is the text string, the second is the delimiter. In my case, the delimiter is space again.

Then the next step is to use [ORDINAL(1)] – that’s the command for selecting the first item in the array. ORDINAL works on one-based offset. Second word would be selected with (2).

There is also an option to use [OFFSET(0)] which works on zero-offset basis, otherwise it’s the same as ORDINAL. I personally prefer ORDINAL.

The code is:

SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word

FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings

Task 5: Selecting the Last Word in an Array

In this exercise, we need to select the last word of the sentences. So we need to get “bananas!” and “out.” to a separate column. ARRAY_REVERSE will help us do the trick:

  1. First of all, we use the SPLIT function again.
  2. Then we use ARRAY_REVERSE to reorder the elements in the array from end to the beginning.
  3. The we can select the first element in the array again.

The code is:

SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word

FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings

Task 6: Selecting the 5th Word in an Array

Now it’s time to select the fifth word. The problem is “Bananas are sold out.” string does not have 5 words. If you use just simple SPLIT(strings,’ ‘)[ORDINAL(5)] AS fifth_word, the query will fail:

So how do we handle these situations? It’s simpler than you think.

You can just use SAFE_ORDINAL function which makes sure the query does not fail even if the array does not have 5 elements:

The error is no longer happening, null is returned instead.

The code is:

SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word,
SPLIT(strings,’ ‘)[SAFE_ORDINAL(5)] AS fifth_word

FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings

Task 7: Selecting the 3rd and 4th Word of an Array

Things get more difficult now.  Here are the steps you need to take:

  1. Create an array with SPLIT function
  2. Run a “subselect” on the top of this array by unnesting the array into row with offset count. Offset is again zero based!
  3. Then filter the offset to 3rd and 4th word
  4. Create an array on the top of this “subselect”
  5. Convert the array back to string with space delimiter

The code is:

SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word,
SPLIT(strings,’ ‘)[SAFE_ORDINAL(5)] AS fifth_word,

ARRAY_TO_STRING(ARRAY(SELECT words FROM UNNEST(SPLIT(strings,’ ‘)) AS words WITH OFFSET WHERE OFFSET >= 2 AND OFFSET <= 3),’ ‘) AS third_and_fourth_word

FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings

Task 8: Selecting Every Second Word of an Array

The toughest one. It’s gotta be similar to the previous task, right?  The main trick is around MOD() function which is returning the remainder of the division of X by Y.

So in our case, we need to find all instances in the array where division of OFFSET by 2 returns 1 because again, OFFSET is zero based. So the second word’s OFFSET is 1, fourth word’s OFFSET is 3. When you divide 3 by 2, the remainder is 1 and that’s your “second word”:

The code is:

SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word,
SPLIT(strings,’ ‘)[SAFE_ORDINAL(5)] AS fifth_word,

ARRAY_TO_STRING(ARRAY(SELECT words FROM UNNEST(SPLIT(strings,’ ‘)) AS words WITH OFFSET WHERE OFFSET >= 2 AND OFFSET <= 3),’ ‘) AS third_and_fourth_word,

ARRAY_TO_STRING(ARRAY(SELECT words FROM UNNEST(SPLIT(strings,’ ‘)) AS words WITH OFFSET WHERE MOD(OFFSET,2)=1),’ ‘) AS every_second_word

FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings

Summary

You can see that proper knowledge of array manipulations opens a whole new world of opportunities. At the same time, most of the examples in this article were not that difficult. Should I add more use cases? Let me know in the comment section below!

Leave a Reply

Your email address will not be published. Required fields are marked *