Data Time Machine: Unearthing History with BigQuery’s Time Travel Feature

6 min readJun 11, 2023

Introduction

BigQuery is a powerful and scalable data warehouse service that lets you store and analyze massive amounts of data in the cloud. But what if you accidentally delete or modify some data and want to restore it or analyze how it changed over time? Or what if you want to compare the current state of your data with a previous point in time? This is where the time travel feature of BigQuery comes in handy.

What is time travel in BigQuery?

Time travel is a feature that lets you access data stored in BigQuery that has been changed or deleted. You can access the data from any point within the time travel window, which covers the past seven days by default1. Time travel lets you:

Query data that was updated or deleted
Restore a table that was deleted
Restore a table that expired

Time travel works by keeping track of the changes made to the data in BigQuery and storing them as delta records. These delta records are used to reconstruct the historical versions of the tables when you use the time travel feature.

How to query data at a point in time

You can query a table’s historical data from any point in time within the time travel window by using a FOR SYSTEM_TIME AS OF clause. This clause takes a constant timestamp expression and references the version of the table that was current at that timestamp. The table must be stored in BigQuery; it cannot be an external table.

For example, the following query returns a historical version of the bigquery-public-data.samples.shakespeare table from one hour ago:

SELECT
  *
FROM
  bigquery-public-data.samples.shakespeare 
  FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);

The result is a table with the same schema and data as the original table, but as it was one hour ago. You can use any valid timestamp expression for the FOR SYSTEM_TIME AS OF clause, such as a literal timestamp, a date function, or a variable.

You can also use time travel to query data from multiple tables at different points in time, as long as they are in the same project and location. For example, the following query joins two tables from two hours ago and one hour ago, respectively:

SELECT
  s.word,
  s.word_count,
  s.corpus,
  w.comment
FROM
  `bigquery-public-data.samples.shakespeare` s FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 HOUR)
JOIN
  `bigquery-public-data.samples.wikipedia` w FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
ON
  s.word = w.title;

The result is a table with the words that appear both in Shakespeare’s works and in Wikipedia articles, along with their word counts and corpus names, as they were at different points in time.

How to restore a table from a point in time

You can restore a table that was deleted or expired by copying it from a point in time to a new table, using either the BigQuery web UI, the bq command-line tool, or the API. For example, using the bq command-line tool, you can run the following command to restore a deleted table named jb_dataset.jb_table from one hour ago to a new table named jb_dataset.jb_table_restored. The time, -3600000 (equal to 1 hour), is specified in milliseconds using a relative offset.

bq cp jb_dataset.jb_table@-3600000 jb_dataset.jb_table_restored

The restored table will have the same schema and data as the original table at the specified point in time, but it will have a new creation time and expiration time (if any).

How to configure the time travel window

By default, the time travel window covers the past seven days for all tables in BigQuery. However, you can configure the time travel window for each dataset individually, depending on your storage costs and data retention needs.

The storage costs for time travel depend on your storage billing model. If you use physical bytes (the default option), then you are billed for both the current and historical versions of your data. If you use logical bytes (a preview option), then you are only billed for the current version of your data.

You can configure the time travel window to be longer (max 7 days) in cases where it is more important to have the option of recovering the data, and to be shorter (min 2 days) where that isn’t a concern. Using a shorter time travel window allows you to save on storage costs. It can also allow you to conform to data retention policies that require hard deletion of data in a specific time frame.

To configure the time travel window for a dataset, you need to specify a max_time_travel_hours property when you create or update the dataset. This property determines how long the historical versions of the tables within the dataset are accessible through time travel. You can use the Google Cloud console, the bq command-line tool, or the BigQuery API to specify the time travel window for a dataset. For example, using the bq command-line tool, you can run the following command to create a dataset named jb_dataset with a time travel window of three days (i.e., 72 hours):

bq --location=US mk \
    --dataset \
    --max_time_travel_hours=72 \
    --storage_billing_model=PHYSICAL \
    jb_project:jb_dataset

The --max_time_travel_hours flag takes a value in hours, so 72 hours equals three days. You can also use the BigQuery Console or the API to set this property.

To update the time travel window for an existing dataset, you can use one of these methods:

Use an ALTER SCHEMA statement with an OPTIONS clause that specifies max_time_travel_hours. For example:

ALTER SCHEMA jb_project.jb_dataset SET OPTIONS (max_time_travel_hours = 48);

This statement updates the time travel window to two days for jb_project.jb_dataset.
Use the bqcommand-line tool’s bq update command with a --max_time_travel_hours flag. For example:

bq update \
--dataset=true --max_time_travel_hours=48 \
jb_project:jb_dataset

This command sets the time travel window to two days for jb_dataset.

Benefits of using time travel in BigQuery

Time travel in BigQuery is a useful feature that can help you with various scenarios, such as:

Data recovery: You can restore a table that was accidentally deleted or expired, or recover data that was mistakenly updated or deleted.
Data analysis: You can compare the current and historical states of your data, or perform trend analysis or anomaly detection over time.
Data auditing: You can track the changes made to your data over time, or verify the compliance or quality of your data.
Data backup: It reduces the need for creating and maintaining backup tables or snapshots, which can incur additional storage costs and complexity.

Time travel in BigQuery is also easy to use, as it only requires a simple SQL clause or a copy command. It does not require any additional setup or maintenance, and it works seamlessly with other BigQuery features, such as partitioning, clustering, and encryption.

Conclusion

In this blog post, we have learned how to use time travel in BigQuery to access historical data that has been changed or deleted within the past seven days by default. We have also seen some of the benefits and use cases of time travel, such as data recovery, data analysis, data auditing, and data backup. Time travel is a powerful and convenient feature that can help you deal with various data scenarios and challenges.

However, time travel also has some limitations and costs that you need to be aware of. For example, you cannot access data that is older than the time travel window, which is seven days by default but can be configured at the dataset level. You also need to consider the storage billing model and the storage costs for time travel, which depend on whether you use physical bytes or logical bytes. Moreover, you need to pay attention to the permissions and security implications of creating and accessing snapshots across different datasets and locations.

I hope you found this blog post informative and helpful. If you have any questions or feedback, please let me know in the comments below. Happy time traveling! 🚀

I’m always interested in connecting with other data enthusiasts and professionals who share my passion for BigQuery and cloud data analytics. If you want to chat more about this topic or anything related to BigQuery, feel free to connect with me on LinkedIn. You can find my profile here: https://www.linkedin.com/in/jashbhatt/
I look forward to hearing from you and learning from your experiences. Thank you for reading and have a great day! 😊