Depending on your data retention policy, RudderStack stores the following two types of events:

  • All the raw events ingested by RudderStack.
  • The final event payload along with the error, in case of delivery failures.
The events are deleted from the bucket upon successful delivery of the events. RudderStack does not persist any of the customer data.

Follow the steps in this guide if you want RudderStack to back up the events in your own cloud-specific bucket.

Bucket configuration settings

If you are using RudderStack Open Source and want to use your own bucket to store the events, you will need to enable and set certain variables in your RudderStack backend.

Docker setup

Storing events into your S3 bucket

To capture the events in your S3 bucket, uncomment the following lines in your docker.env file:

# JOBS_BACKUP_STORAGE_PROVIDER=S3
# JOBS_BACKUP_BUCKET=<your_s3_bucket>
# JOBS_BACKUP_PREFIX=<prefix>
# AWS_ACCESS_KEY_ID=
# AWS_SECRET_ACCESS_KEY=

Then follow these steps:

  1. Specify your S3 bucket name for the variable JOBS_BACKUP_BUCKET.
  2. Add the specific AWS IAM keys by following the Permissions for Amazon S3 section below.
The <prefix> value for the JOBS_BACKUP_PREFIX variable refers to the path under the bucket in which RudderStack stores the data. For example, if JOBS_BACKUP_PREFIX is set to prefix then RudderStack stores the data in the location <your_s3_bucket>/prefix.

Storing events into your GCS bucket

To capture the events in your GCS bucket, uncomment the following lines in your docker.env file:

# JOBS_BACKUP_STORAGE_PROVIDER=GCS
# JOBS_BACKUP_BUCKET=<your_gcs_bucket>
# JOBS_BACKUP_PREFIX=<prefix>
# GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials

Then, follow these steps:

  1. Specify your GCS bucket name for the variable JOBS_BACKUP_BUCKET.
  2. Specify the location of the downloaded JSON file containing the required permissions for the variable GOOGLE_APPLICATION_CREDENTIALS. You can obtain this JSON file by referring to the Permissions for GCS section below.

Kubernetes Setup

Similar to the Docker setup, you can configure your bucket settings by changing the values in the values.yaml file.

Permissions for Amazon S3

Follow these steps to use your own S3 bucket for RudderStack to store the events:

  1. Create a bucket using the Amazon S3 service.
  2. Create a new customer-managed policy with the following JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::{BUCKET_NAME}/*"
}
]
}
  1. Create a new group and add the policy created above to this group.
  2. Create a new user in Identity and Access Management (IAM) with the programmatic access and add the user to the above group.
  3. Download and note the Access key ID and Secret Access Key.

Permissions for GCS

This section lists the steps to use your own GCS bucket for RudderStack to store the events:

Under Roles in your GCP dashboard, you need to create a role with the following permissions:

  • storage.objects.create
  • storage.objects.get
It is highly recommended to add each permission one after the other.
Permission

Then, create a service account by following these steps:

  1. Assign a name to the service account:
Assign a name
  1. Add the role you created in step 1.
screenshot 2020 08 05 at 11 41 24 am
  1. Create a key with the type JSON and save this file locally:
Create a key
  1. Then, create a bucket with the bucket access control set to Uniform:
Create a bucket

Once the bucket is created, add the required permissions by following the below steps:

  1. Go to the Permissions tab.
  2. Then, add the member with the service account created above.
  3. Add the role.
Go to Permissions
  1. Finally, download the JSON file containing the required permissions.

Contact us

For more information on the topics covered on this page, email us or start a conversation in our Slack community.

On this page