R2
This page guides you through the process of setting up the R2 destination connector.
Prerequisites
List of required fields:
- Account ID
- Access Key ID
- Secret Access Key
- R2 Bucket Name
- R2 Bucket Path
- Allow connections from Airbyte server to your Cloudflare R2 bucket
Step 1: Set up R2
Sign in to your Cloudflare account. Purchase R2 this
Use an existing or create new Access Key ID and Secret Access Key.
Prepare R2 bucket that will be used as destination, see this to create an S3 bucket, or you can create bucket via R2 module of dashboard.
Step 2: Set up the R2 destination connector in Airbyte
For Airbyte Cloud:
- Log into your Airbyte Cloud account.
- In the left navigation bar, click Destinations. In the top-right corner, click + new destination.
- On the destination setup page, select R2 from the Destination type dropdown and enter a name for this connector.
- Configure fields:
- Account Id
- See this to copy your Account ID.
- Access Key Id
- See this on how to generate an access key.
- Secret Access Key
- Corresponding key to the above key id.
- R2 Bucket Name
- R2 Bucket Path
- Subdirectory under the above bucket to sync the data into.
- R2 Path Format - Additional string format on how to store data under R2 Bucket Path. Default value is
${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY} _${EPOCH}_
. - R2 Filename pattern
- The pattern allows you to set the file-name format for the R2 staging file(s), next placeholders combinations are currently supported:
{date}
,{date:yyyy_MM}
,{timestamp}
,{timestamp:millis}
,{timestamp:micros}
,{part_number}
,{sync_id}
,{format_extension}
. Please, don't use empty space and not supportable placeholders, as they won't recognized.
- The pattern allows you to set the file-name format for the R2 staging file(s), next placeholders combinations are currently supported:
- Account Id
- Click
Set up destination
.
For Airbyte OSS:
-
Go to local Airbyte page.
-
In the left navigation bar, click Destinations. In the top-right corner, click + new destination.
-
On the destination setup page, select R2 from the Destination type dropdown and enter a name for this connector.
-
Configure fields:
- Account Id
- See this to copy your Account ID.
- Access Key Id
- See this on how to generate an access key.
- Secret Access Key
- Corresponding key to the above key id.
- Make sure your R2 bucket is accessible from the machine running Airbyte.
- This depends on your networking setup.
- The easiest way to verify if Airbyte is able to connect to your R2 bucket is via the check connection tool in the UI.
- R2 Bucket Name
- R2 Bucket Path
- Subdirectory under the above bucket to sync the data into.
- R2 Path Format - Additional string format on how to store data under R2 Bucket Path. Default value is
${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY} _${EPOCH}_
. - R2 Filename pattern
- The pattern allows you to set the file-name format for the R2 staging file(s), next placeholders combinations are currently supported:
{date}
,{date:yyyy_MM}
,{timestamp}
,{timestamp:millis}
,{timestamp:micros}
,{part_number}
,{sync_id}
,{format_extension}
. Please, don't use empty space and not supportable placeholders, as they won't recognized.
- The pattern allows you to set the file-name format for the R2 staging file(s), next placeholders combinations are currently supported:
- Account Id
-
Click
Set up destination
.
The full path of the output data with the default S3 Path Format ${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_
is:
<bucket-name>/<source-namespace-if-exists>/<stream-name>/<upload-date>_<epoch>_<partition-id>.<format-extension>
For example:
testing_bucket/data_output_path/public/users/2021_01_01_1234567890_0.csv.gz
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
| | | | | | | format extension
| | | | | | unique incremental part id
| | | | | milliseconds since epoch
| | | | upload date in YYYY_MM_DD
| | | stream name
| | source namespace (if it exists)
| bucket path
bucket name
The rationales behind this naming pattern are:
- Each stream has its own directory.
- The data output files can be sorted by upload time.
- The upload time composes of a date part and millis part so that it is both readable and unique.
But it is possible to further customize by using the available variables to format the bucket path:
${NAMESPACE}
: Namespace where the stream comes from or configured by the connection namespace fields.${STREAM_NAME}
: Name of the stream${YEAR}
: Year in which the sync was writing the output data in.${MONTH}
: Month in which the sync was writing the output data in.${DAY}
: Day in which the sync was writing the output data in.${HOUR}
: Hour in which the sync was writing the output data in.${MINUTE}
: Minute in which the sync was writing the output data in.${SECOND}
: Second in which the sync was writing the output data in.${MILLISECOND}
: Millisecond in which the sync was writing the output data in.${EPOCH}
: Milliseconds since Epoch in which the sync was writing the output data in.${UUID}
: random uuid string
Note:
- Multiple
/
characters in the R2 path are collapsed into a single/
character. - If the output bucket contains too many files, the part id variable is using a
UUID
instead. It uses sequential ID otherwise.
Please note that the stream name may contain a prefix, if it is configured on the connection. A data sync may create multiple files as the output files can be partitioned by size (targeting a size of 200MB compressed or lower) .
Supported sync modes
Feature | Support | Notes |
---|---|---|
Full Refresh Sync | ✅ | Warning: this mode deletes all previously synced data in the configured bucket path. |
Incremental - Append Sync |