s3sync-service
Description
The s3sync-service
tool is asynchronously syncing data to S3 storage service for multiple sites (path + bucket combination).
On start, the s3sync-service
launches pool of generic upload workers, checksum workers and an FS watcher for each site. Once all of the above launched it starts comparing local directory contents with S3 (using checksums<->ETag and also validates StorageClass) which might take quite a while depending on the size of your data directory, disk speed, and available CPU resources. All the new files or removed files (if retire_deleted
is set to true
) are put into the upload queue for processing. The FS watchers, upload and checksum workers remain running while the main process is working, which makes sure that your data is synced to S3 upon change.
Running the s3sync-service
- Create directory with configuration file, eg. -
/path/to/config/config.yml
. - Run docker container with providing AWS credentials via environment variables (IAM role should also do the trick), alternatively credentials could be provided in the config file, mount directory containing the config file and all of the backup directories listed in the config file:
docker run --rm -ti \
-e "AWS_ACCESS_KEY_ID=AKIAI44QH8DHBEXAMPLE" \
-e "AWS_SECRET_ACCESS_KEY=je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY" \
-e "AWS_DEFAULT_REGION=us-east-1" \
-v "/path/to/config:/opt/s3sync-service" \
-v "/backup/path:/backup" \
zmazay/s3sync-service \
./s3sync-service -config /opt/s3sync-service/config.yml
Configuration
Example configuration:
upload_workers: 10
sites:
- local_path: /local/path1
bucket: backup-bucket-path1
bucket_region: us-east-1
storage_class: STANDARD_IA
access_key: AKIAI44QH8DHBEXAMPLE
secret_access_key: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
exclusions:
- .[Dd][Ss]_[Ss]tore
- .[Aa]pple[Dd]ouble
- local_path: /local/path2
bucket: backup-bucket-path2
bucket_path: path2
exclusions:
- "[Tt]humbs.db"
- local_path: /local/path3
bucket: backup-bucket-path3
bucket_path: path3
exclusions:
- "[Tt]humbs.db"
Command line args
-config string
Path to the config.yml (default "config.yml")
-metrics-path string
Prometheus exporter path (default "/metrics")
-metrics-port string
Prometheus exporter port, 0 to disable the exporter (default "9350")
Generic configuration options
Variable | Description | Default | Required |
---|---|---|---|
access_key | Global AWS Access Key | n/a | no |
secret_access_key | Global AWS Secret Access Key | n/a | no |
aws_region | AWS region | n/a | no |
loglevel | Logging level, valid options are - trace , debug , info , warn , error , fatal , panic . With log level set to trace logger will output everything, with debug everything apart from trace and so on. |
info |
no |
upload_queue_buffer | Number of elements in the upload queue waiting for processing, might improve performance, however, increases memory usage | 0 |
no |
checksum_workers | Number of checksum workers for the service | CPU*2 |
no |
upload_workers | Number of upload workers for the service | 10 |
no |
watch_interval | Interval for file system watcher in milliseconds | 1000 |
no |
s3_ops_retries | Number of retries for upload and delete operations | 5 |
no |
Site configuration options
Variable | Description | Default | Required |
---|---|---|---|
name | Human friendly site name | bucket/bucket_path |
no |
local_path | Local file system path to be synced with S3, using relative path is known to cause some issues. | n/a | yes |
bucket | S3 bucket name | n/a | yes |
bucket_path | S3 path prefix | n/a | no |
bucket_region | S3 bucket region | global.aws_region |
no |
retire_deleted | Remove files from S3 which do not exist locally | false |
no |
storage_class | S3 storage class | STANDARD |
no |
access_key | Site AWS Access Key | global.access_key |
no |
secret_access_key | Site AWS Secret Access Key | global.secret_access_key |
no |
watch_interval | Interval for file system watcher in milliseconds, overrides global setting | global.watch_interval |
no |
exclusions | List of regex filters for exclusions | n/a | no |
s3_ops_retries | Number of retries for upload and delete operations | global.s3_ops_retries |
no |
Prometheus metrics
In addition to the default Go metrics, s3sync-service
exports some custom metrics on default path (/metrics
) and port (9350
), check the command line arguments for customisation.
All the custom metrics are exported separately for the configured sites (has site="my-site"
in labels).
Metric name | Description | Metric type |
---|---|---|
s3sync_data_total_size | Total size of the synced objects | Gauge |
s3sync_data_objects_count | Total number of the synced objects | Gauge |
s3sync_errors_count | Number of errors, could be used for alerting | Counter |
Gotchas
- Same bucket can be used for multiple sites (local directories) only in case both use some
bucket_path
, otherwise, site using bucket root will delete the data from the prefix used by another site. Settingretire_deleted
tofalse
for the site using bucket root should fix this issue. - AWS credentials and region have the following priority:
- Site AWS credentials (region)
- Global AWS credentials (region)
- Environment variables