Proper knowledge of array manipulation in SQL is a must have for every data analyst (or marketer). This article will cover how to manipulate text strings in arrays along with error handling.
Task 1: Creating an Array
Creating an array is a quite easy process. Imagine you want to create an array of 2 text strings:
- I want to buy bananas!
- Bananas are sold out.
The code is:
SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences
This SQL command will create a table with 1 column called “sentences” and with 1 row (notice the row count column) where the value is our array:
Task 2: Unnesting an Array
So now you have successfully created the array, what if you want to convert the array values into separate rows? Then you need to use UNNEST function. This function is basically “unpivoting” your array. Notice the row count which is no longer 1 but it’s 2 now.
After using the UNNEST function, you need to run a new SELECT on the top of the previous query.
The code is:
SELECT
strings
FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings
Task 3: Converting an Array Back to String
Imagine you have a column with arrays and you want to convert them back to strings. You need to deploy ARRAY_TO_STRING function. It has 2 arguments: the array AND the delimiter which is used to connect elements of the array. In my sample, I am choosing space:
The code is:
SELECT ARRAY_TO_STRING([‘apples’, ‘bananas’, ‘potatoes’,’carrot’],’ ‘) AS array_as_string
Task 4: Selecting the First Word in an Array
We now need to select the first word from my dummy table which I created from my dummy array:
The first word should be added as a new column – so I want to see “I”, and “Bananas”. How to achieve that? At first, you need to use SPLIT function to convert the text string to an array. SPLIT function has 2 arguments. The first is the text string, the second is the delimiter. In my case, the delimiter is space again.
Then the next step is to use [ORDINAL(1)] – that’s the command for selecting the first item in the array. ORDINAL works on one-based offset. Second word would be selected with (2).
There is also an option to use [OFFSET(0)] which works on zero-offset basis, otherwise it’s the same as ORDINAL. I personally prefer ORDINAL.
The code is:
SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word
FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings
Task 5: Selecting the Last Word in an Array
In this exercise, we need to select the last word of the sentences. So we need to get “bananas!” and “out.” to a separate column. ARRAY_REVERSE will help us do the trick:
- First of all, we use the SPLIT function again.
- Then we use ARRAY_REVERSE to reorder the elements in the array from end to the beginning.
- The we can select the first element in the array again.
The code is:
SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word
FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings
Task 6: Selecting the 5th Word in an Array
Now it’s time to select the fifth word. The problem is “Bananas are sold out.” string does not have 5 words. If you use just simple SPLIT(strings,’ ‘)[ORDINAL(5)] AS fifth_word, the query will fail:
So how do we handle these situations? It’s simpler than you think.
You can just use SAFE_ORDINAL function which makes sure the query does not fail even if the array does not have 5 elements:
The error is no longer happening, null is returned instead.
The code is:
SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word,
SPLIT(strings,’ ‘)[SAFE_ORDINAL(5)] AS fifth_word
FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings
Task 7: Selecting the 3rd and 4th Word of an Array
Things get more difficult now. Here are the steps you need to take:
- Create an array with SPLIT function
- Run a “subselect” on the top of this array by unnesting the array into row with offset count. Offset is again zero based!
- Then filter the offset to 3rd and 4th word
- Create an array on the top of this “subselect”
- Convert the array back to string with space delimiter
The code is:
SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word,
SPLIT(strings,’ ‘)[SAFE_ORDINAL(5)] AS fifth_word,
ARRAY_TO_STRING(ARRAY(SELECT words FROM UNNEST(SPLIT(strings,’ ‘)) AS words WITH OFFSET WHERE OFFSET >= 2 AND OFFSET <= 3),’ ‘) AS third_and_fourth_word
FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings
Task 8: Selecting Every Second Word of an Array
The toughest one. It’s gotta be similar to the previous task, right? The main trick is around MOD() function which is returning the remainder of the division of X by Y.
So in our case, we need to find all instances in the array where division of OFFSET by 2 returns 1 because again, OFFSET is zero based. So the second word’s OFFSET is 1, fourth word’s OFFSET is 3. When you divide 3 by 2, the remainder is 1 and that’s your “second word”:
The code is:
SELECT
strings,
SPLIT(strings,’ ‘)[ORDINAL(1)] AS first_word,
ARRAY_REVERSE(SPLIT(strings,’ ‘))[ORDINAL(1)] AS last_word,
SPLIT(strings,’ ‘)[SAFE_ORDINAL(5)] AS fifth_word,
ARRAY_TO_STRING(ARRAY(SELECT words FROM UNNEST(SPLIT(strings,’ ‘)) AS words WITH OFFSET WHERE OFFSET >= 2 AND OFFSET <= 3),’ ‘) AS third_and_fourth_word,
ARRAY_TO_STRING(ARRAY(SELECT words FROM UNNEST(SPLIT(strings,’ ‘)) AS words WITH OFFSET WHERE MOD(OFFSET,2)=1),’ ‘) AS every_second_word
FROM
(SELECT [‘I want to buy bananas!’,’Bananas are sold out.’] AS sentences),
UNNEST(sentences) AS strings
Summary
You can see that proper knowledge of array manipulations opens a whole new world of opportunities. At the same time, most of the examples in this article were not that difficult. Should I add more use cases? Let me know in the comment section below!
Excsume me, when u create an Array in BigQuery, where is it stored pls ? what datastore ? Table ?
In array how u get latest value ? … array[latest]
Well, it’s basically a data type, so it can sit in a simple table as row for example…
Regarding the last item in the array, I would do:
ARRAY_REVERSE({your array})[ORDINAL(1)]
Basically, you reverse-order your array and then take the first item 🙂