Building the Vector Puppet Module
Joshua Bigler
02-01-2025
Vector is a system observability and telemetry tool written in Rust that specializes in collecting, processing, and shipping logs and metrics in a vendor-neutral fashion. You can think of it as akin to tools like Logstash and Elastic’s Beats software (Filebeat, Metricbeat, etc), Fluent Bit/Fluentd, and many others. Those tools are fine for their purpose. However I find many of them either limited, or simply difficult to learn and set up, often containing strange configuration syntax. As such, I stumbled onto Vector and quickly appreciated it’s simple syntax, small resource footprint, and ability to transform logs and metrics using its VRL (Vector Remap Language) processor.
At that time, there was no open-source Puppet module to install and configure Vector. I figured I would address that gap, and in this post I will demonstrate the thought process in creating the Vector module. I won’t explain every single bit of code or go over every single variable. The code expressed doesn’t exactly represent what is currently in the repository either, it is a simplified version.
In general, when managing an application for a system, there often three primary stages:
- Installation
- Configuration
- Execution
Some applications require additional steps, but this is a good starting point. Puppet classes can easily be broken down into ordered steps using sub classes and the contain
directive:
# manifests/init.pp
class vector {
contain vector::install
contain vector::configure
contain vector::service
Class['vector::install']
-> Class['vector::configure']
-> Class['vector::service']
}
Now that we have the initial framework in place, we need to write the three sub classes.
Installation
The first step is install vector. While we could attempt to configure package repositories on the system to ensure Vector is available, this would be very tricky when considering all of types of packagement management systems that could be used (apt, snap, yum/dnf, etc.), and it’s also possible that the user is wanting to install from mirrors and/or custom repositories. Instead of polluting the module with a large list of configuration items and spaghetti code to configure all of that, I opted to leave package repository management out of scope of the module. There are plenty of puppet modules that can configure those things much better than I can.
With that decided, we instead have the vector::install
simply use the package
resource to install it and assume the user has already configure relevant repositories:
# manifests/install.pp
class vector::install(
String $vector_ensure = present,
) {
package { 'vector':
ensure => $vector_ensure,
}
}
By default, this simply ensures Vector is installed. The user can override vector::install::vector_ensure
to a specific version to make sure that specific version is installed.
Configuration
Although Vector is relatively easy to configure, the vector::configure
class will still be the most involved. Vector has three main components in its operation:
- Sources: Where data is generated (log file, kubernetes logs, journald, socket, etc)
- Transforms: Where data is manipulated (usually using vrl)
- Sinks: Where data is output (Elasticsearch, Kafka, S3, etc)
The user plugs these pieces together to create a topology. In addition, there are global options, such as api settings and the location of vector’s data directory.
Most people configure all of these in a single configuration file (such as /etc/vector/vector.toml). However, I want sources, transforms, and sinks to be declared as puppet resources, so that users may declare them across multiple modules. Coordinating all of that into a single file is tricky - the concat
module could come into play, but I try to lessen dependencies on other modules as much as possible, and that module is not guaranteed to work well with JSON or YAML data (Vector config files can be TOML, JSON, or YAML).
Instead, we will take advantage of a little-known (and not very well advertised) feature in vector configuration: Automatic Namespacing. This feature allows us to split up configurations by their types into different files. That way, when the user declares a source, transform, or sink, a new file is simply put into its corresponding directory.
The approach will be as such:
- Global configurations will go into
/etc/vector/global.yaml
- Puppet will create
/etc/vector/configs
and/etc/vector/configs/{sources, transforms, sinks}
- We create defined types for each of the vector components, which will craft files to go into their corresponding directory
# manifests/configure.pp
class vector::configure(
Hash $global_opts = {},
) {
$global_opts_file = '/etc/vector/global.yml'
$configs_dir = '/etc/vector/configs'
$sources_dir = "${configs_dir}/sources"
$transforms_dir = "${configs_dir}/transforms"
$sinks_dir = "${configs_dir}/sinks"
# Craft the global options using stdlib's to_yaml function
file { $global_opts_file:
ensure => file,
content => to_yaml($global_opts),
notify => Service['vector'],
}
file { $configs_dir:
ensure => directory,
recurse => true,
purge => true,
}
# Yes, you can define multiple directories at the same time!
-> file { [$sources_dir, $transforms_dir, $sinks_dir]:
ensure => directory,
recurse => true,
purge => true,
}
# Systemd service file
file { '/etc/systemd/system/vector.service':
ensure => file,
content => epp('vector/vector.service.epp'),
notify => Class['vector::service'],
}
# Set up automatic dependencies of the configurations dirs and the resource types
File[$sources_dir, $transforms_dir, $sinks_dir] -> Vector::Source<||> ~> Class['vector::service']
File[$sources_dir, $transforms_dir, $sinks_dir] -> Vector::Sink<||> ~> Class['vector::service']
File[$sources_dir, $transforms_dir, $sinks_dir] -> Vector::Transform<||> ~> Class['vector::service']
}
Now that we have the directory structure in place, we need the API for users to programmably create sources, transforms, and sinks as needed. There are many types of sources, sinks, and transforms, each with their own configuration options. Maintaining a full list of them in Puppet is unrealistic since new types get added, configuration options are changed, and so on. Instead, we simply provide generic resource types for them. Let’s look at the requirements for each type:
- A source must have the
type
field, as well other fields depending on the source type - A transform must have the
type
field, aninputs
array, and other fields depending on the type - A sink is much like a transform - it must have the
type
field, aninputs
array, and other fields depending on the type
Each of them must also be named, which we can simply use the resource $title
variable to define that - doing so ensures there are no duplicate names in the same resource type. Also, since we simply need to create a file for each of these resources, there’s no need to crack open Puppet’s Ruby API documentation - using Puppet’s defined resource will do the trick. Let’s see what they look like:
# manifests/source.pp
define vector::source (
String $type,
Hash $parameters,
Vector::ValidConfigFormat $format = 'toml',
) {
$source_hash = $parameters + { 'type' => $type }
$source_file_name = "${vector::configure::sources_dir}/${title}.${format}"
file { $source_file_name:
ensure => file,
content => vector::dump_config($source_hash, $format),
}
}
# manifests/transform.pp
define vector::transform (
String $type,
Array[String] $inputs,
Hash $parameters,
Vector::ValidConfigFormat $format = 'toml',
) {
$transform_hash = $parameters + { 'type' => $type, 'inputs' => $inputs }
$transform_file_name = "${vector::configure::transforms_dir}/${title}.${format}"
file { $transform_file_name:
ensure => file,
content => vector::dump_config($transform_hash, $format),
}
}
# manifests/sink.pp
define vector::sink (
String $type,
Array[String] $inputs,
Hash $parameters,
Vector::ValidConfigFormat $format = 'toml',
) {
$sink_hash = $parameters + { 'type' => $type, 'inputs' => $inputs }
$sink_file_name = "${vector::configure::sinks_dir}/${title}.${format}"
file { $sink_file_name:
ensure => file,
content => vector::dump_config($sink_hash, $format),
}
}
The idea with all three types is the same: Create a configuration hash, craft the full path using the name of the resource and the provided format, and dump the hash according the provided format. All three use a helper function called vector::dump_config
:
# functions/dump_config.pp
function vector::dump_config(
Hash $data,
Vector::ValidConfigFormat $format = 'toml',
) >> String {
case $format {
# Accept 'yaml' and 'yml' for yaml data
/ya?ml/ : { to_yaml($data) }
'toml' : { to_toml($data) }
# Assume everything else is JSON
default : { to_json($data) }
}
}
They all also reference a custom data type Vector::ValidConfigFormat
, to ensure the file extension is one that Vector recognizes:
# types/validconfigformat.pp
type Vector::ValidConfigFormat = Enum['json','yaml','yml','toml']
Execution/Service
Finally, we create the vector::service
class. This class ends up being very simple - just make sure it is running:
# manifests/service.pp
class vector::service {
service { 'vector':
ensure => running,
enable => true,
}
}
And that’s pretty much it! I skipped over many steps, such as what the systemd service file actually looks like. There are any more options in the actual module, plus some variables, steps, etc have been added or moved since I wrote the first version of the module. Definitely consult the repository to see what the code currently looks like!
Simple example usage
node default {
include vector
vector::source { 'logfile_input':
type => 'file',
parameters => {
'include' => ['/var/log/**/*.log'],
},
}
vector::transform { 'logfile_transform':
type => 'remap',
inputs => ['*'],
parameters => {
'source' => '.foo = "bar"',
},
}
vector::sink { 'logfile_kafka':
type => 'kafka',
inputs => ['logfile_transform'],
parameters => {
'bootstrap_servers' => 'localhost:9092',
'topic' => 'logs',
'encoding' => {
'codec' => 'json',
},
}
}
}