Writing a custom struct attribute with procedural macros in Rust

During my little time in the world of Rust (the language, not the game), I have heard of procedural macros here and there and also saw pieces of them. I've been waiting for the moment where writing one would be useful for a project. Well, that time has finally come that I dig further into the topic and write one of my own.

The goal of this article is to give one way of writing a procedural macro that will be in the form of an #[attribute] on a struct. First of all, let's dig a little into the topic of Rust macros, in general.

The Rust book explains pretty well what are macros and gives examples on how to write some.

For short, there are the following types of macros in Rust:

  • declarative macros declared with macro_rules! that enables metaprogramming: vec!
  • procedural macros:
    • custom #[derive]s that generate implementations: #[derive(Debug)]
    • attribute-like macros that defines custom attributes: #[async-trait] and what we will see in this article
    • function-like macros that process what is specified as their argument: include_str!("path/to/a/file/to/include/in/the/code.txt").

The context and starting point

I have a project where I'm taking JSON outputs from telegraf, processing it before dumping the result into a PostgreSQL database.

In the project, I have decided to apply the Observer design pattern (and some lectures, while I'm at it), at least a variant of it, with trait objects. Since declaring the structure that holds data, defining a TryFrom implementation for it and implementing the observer is quickly getting old, I have come up with the following declaration:

#[observer(sql = "../files/system/upsert.sql")]
pub struct SystemMetrics {
    #[tag] host: String,
    load1: f64,
    load5: f64,
    load15: f64,
}

This declaration should expand to the data structure without field attributes, a TryFrom implementation and an Observer implementation like so (intermediate repsentation before the remaining macros are processed):

use relay_macros::observer;

// Data structure without any custom attributes
pub struct SystemMetrics {
    pub host: String,
    pub load1: f64,
    pub load5: f64,
    pub load15: f64,
}

// Implementation of TryFrom
impl std::convert::TryFrom<&crate::datastructures::TelegrafMetric> for SystemMetrics {
    type Error = crate::datastructures::telegraf_structs::StructConvError;
    fn try_from(value: &crate::datastructures::TelegrafMetric) -> Result<Self, Self::Error> {
        Ok(Self {
            // taking into account the custom attribute
            host: crate::datastructures::macros::get_tag(&value.tags, "host")?,
            // or using an implicit default attribute
            load1: crate::datastructures::macros::convert_f64(&value.fields, "load1")?,
            load5: crate::datastructures::macros::convert_f64(&value.fields, "load5")?,
            load15: crate::datastructures::macros::convert_f64(&value.fields, "load15")?,
        })
    }
}

// Implementation of the observer
pub struct SystemMetricsObserver;

#[async_trait]
impl crate::observers::Observer for SystemMetricsObserver {
    // And a default implementation
    async fn process(
            &self,
            metrics: crate::datastructures::TelegrafMetric
        ) -> Result<(), crate::datastructures::StructProcessError> {

        use std::convert::TryFrom;
        use crate::database::DatabaseClient;
        let metrics_data = SystemMetricsTesting::try_from(metrics)?;
        let db = DatabaseClient::get();
        db.execute(
            include_str!("../files/system/upsert.sql"),
            &[
                &(metrics.timestamp as f64),
                &metrics_data.host,
                &metrics_data.load1,
                &metrics_data.load5,
                &metrics_data.load15
            ]
        ).await?;

        Ok(())
    }
}

That's a lot, yep. The fields of the initial struct may have either #[tag] or #[metric], not both. By default, the procedural macro will act as it there was a #[metric] attribute.

Note: The paths showed in the different pieces of code come from an existing project where I have successfully implemented a procedural macro.

Starting the procedural macro

Cargo.toml

Procedural macros live in a separate library crate that has the flag proc-macro set to true:

# Standard package definition

[lib]
proc-macro = true

# Dependencies, we'll cover this just after

This tells the compiler that the crate contains procedural macros. This has the effect of having the crate proc_macro and a few attributes designating the kind of procedural macro available for use.

We will also need a few dependencies:

  • syn: to parse the syntax tree that the compiler gives to us;
  • quote: to have the ability to turn a syntax tree into tokens for the compiler, with the ability of incorporating variables into the tree;
  • proc_macro2: from the docs: "A wrapper around the procedural macro API of the compiler's proc_macro crate".
[dependencies]

# Latest versions, as of writing

syn = "1.0.73"
quote = "1.0.9"
proc-macro2 = "1.0.27"

The skeleton

The skeleton of our procedural macro goes as follow:

use proc_macro::TokenStream as CompilerTokenStream;

#[proc_macro_attribute]
fn observer(meta: CompilerTokenStream, input: CompilerTokenStream) -> CompilerTokenStream {
    input
}

While this macro doesn't do anything particularly useful, it is a procedural macro that will be valid as an attribute, courtesy of #[proc_macro_attribute]. For those who have the eyes pealed while reading the code, you might noticed I have aliased the proc_macro::TokenStream. We will see later how it allows us to be unconfused about who is who when we will generate the syntax tree.

If we take again our starting code

#[observer(sql = "../files/system/upsert.sql")]
pub struct SystemMetrics {
    #[tag] host: String,
    load1: f64,
    load5: f64,
    load15: f64,
}

meta will contain what has been passed to the macro as a stream of tokens and input contains what is being decorated as a stream of tokens. In our case, meta contains

sql = "../files/system/upsert.sql"

and input contains

pub struct SystemMetrics {
    #[tag] host: String,
    load1: f64,
    load5: f64,
    load15: f64,
}

The macro returns the input without any modification. Compiling the decorated code or expanding it will trigger a compilation error because the compiler doesn't know what to do with #[tag]. Leaving out the #[tag] in front of host and compiling again won't result in any error because we have valid Rust syntax and no custom attributes.

We now have a procedural macro despite it doesn't anything useful with the input. It's time to ~~stop~~ process what we have been given ! Down the rabbit hole we go !

Processing the input struct

The struct itself

We have multiple ways of parsing the struct:

  • Parsing the structure with syn::parse<Item>(input) and extracting what's interesting with a destructuring match
  • Creating a struct and implementing a parsing method that will process our input

While parsing the input directly and extracting with a destructuring match does the job, we're kind of unable to cleanly emit compilation errors. We can with panic!, but the compiler will show that the custom attribute has panicked, pointing the error on the attribute itself:

error: custom attribute panicked
 --> relay/src/datastructures/system2.rs:3:1
  |
3 | #[observer(sql = "../files/system/upsert.sql")]
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = help: message: My error message

Yes we have a message but... it's not very useful on its own, is it ?

Instead, we are going to create a structure with a couple of fields and implement the parsing on the struct. You may create a new module if you wish:

// ~/src/lib.rs
#[proc_macro_attribute]
pub fn observer(meta: CompilerTokenStream, input: CompilerTokenStream) -> CompilerTokenStream {
    let metrics_struct = syn::parse_macro_input!(input as TelegrafMetricStruct);

    CompilerTokenStream::new()
}

// ~/src/structures/telegraf_metric_struct.rs
use proc_macro2::Ident;
use syn::parse::{Parse, ParseBuffer};
use crate::structures::metric_attributes::MetricField;

#[derive(Debug)]
pub struct TelegrafMetricStruct {
    pub struct_name: Ident,
    pub properties: Vec<MetricField>
    //                  ^^^^^^^^^^^
    //       I'll show what this struct does later
}

impl Parse for TelegrafMetricStruct {
    fn parse(input: &ParseBuffer) -> syn::Result<Self> {
        todo!();
    }
}

The procedural macro outputs nothing (an empty TokenStream) but we're making progress nonetheless.

Let's decompose this chunk of code with the important aspects:

// ...
pub fn observer(meta: CompilerTokenStream, input: CompilerTokenStream) -> CompilerTokenStream {
    let metrics_struct = syn::parse_macro_input!(input as TelegrafMetricStruct);
    // ...
}

This macro call will expand into either the type we requested to parse and carry on or a compilation error that will show up on the user's console. This is much more useful than panic!ing because the bit of problematic code will be shown to the user of this procedural macro.

impl Parse for TelegrafMetricStruct {
    fn parse(input: &ParseBuffer) -> syn::Result<Self> {
        let strukt = input.parse::<ItemStruct>()?;

        // ...
    }
}

The parse method will be called by syn::parse<T>(), in the expansion of the macro sync::parse_macro_input! and return the result of the parsing operation. It will be either the struct with the information we been or an error that will be shown to the user as a compilation error. Yes, the spelling strukt is intentional since struct is a reserved keyword. Now that I'm thinking about it, I could have used a raw identifier and would be allowed to use the keyword...

Anyway, to parse our input structure, we rely on syn::ItemStruct that will return either the parsed structure or the error if there has been one. We bubble up the error to the caller that will do its thing; we don't care about it.

Let's take a break and see what we want to extract from the parsed struct. Taking what would be passed as the input of TelegrafMetricStruct::parse:

pub struct SystemMetrics {
    #[tag] host: String,
    load1: f64,
    load5: f64,
    load15: f64,
}

We don't care about the visibility, but more about the structure name and its fields that are, respectively struct_name and properties on TelegrafMetricStruct. The field has a name, a type, maybe an helper attribute as explained earlier.

This is where the MetricField in pub properties: Vec<MetricField> comes into play. This struct contains the different interesting properties of the struct fields.

We are going over each field in the ItemStruct and try to turn it into a MetricField before collecting all of them into a vector. If something wrong happens, a compilation error is bubbled up to the called that will do its thing.

impl Parse for TelegrafMetricStruct {
    fn parse(input: &ParseBuffer) -> syn::Result<Self> {
        let strukt = input.parse::<ItemStruct>()?;
        let mut parsed_fields = Vec::new();

        for field in strukt.fields {
            let metric_attribute = MetricField::try_from(&field)?;
                                             // ^^^^^^^^^^^^^^^^
                                             // The magic happens here
                                             // for the fields
            parsed_fields.push(metric_attribute);
        }

        Ok(
            Self {
                struct_name: strukt.ident,
                properties: parsed_fields
            }
        )
    }
}

Yes, that's pretty much everything as far as processing the struct is concerned.

Processing the fields in the input structure

Let's first define what information is interesting to us in order to generate the final code:

  • The name of the field (obviously)
  • The type of the field (that'd be helpful)
  • Where to fetch the field value from: tags or fields in the metric

Let's build a structure that represents this:

use proc_macro2::Ident;
use syn::{Attribute, Type, Field};

#[derive(Debug)]
pub struct MetricField {
    pub name: Ident,
    pub attribute_type: TelegrafFieldType,
    pub ty: Type,
}

#[derive(Debug)]
pub enum TelegrafFieldType {
    Tag,
    Metric,
}

As shown above, the caller tries to convert a syn::Field into a MetricField with MetricField::try_from(&field)?, so we have to impl TryFrom<&Field> for MetricField:

use std::convert::TryFrom;
use quote::ToTokens;

impl TryFrom<&Field> for MetricField {
    type Error = syn::Error;

    fn try_from(field: &Field) -> Result<Self, Self::Error> {
        let name = field
            .ident
            .as_ref()
            .ok_or_else(|| {
                syn::Error::new_spanned(field.to_token_stream(), "Expected a structure with named fields, unnamed field given")
            })?;


        Self::new(&name, &field.attrs, &field.ty)
    }
}

First, we get the name of the field and emit an error if we don't have any. In fact, field.ident is an Option<Ident>, implying that a field can have no name. Where ? In Rust, you can create a struct that looks like this:

struct NegativeNumberOrZero(i32)

That's a tuple struct. The field has no name. Instead, you access it like it were a tuple, for example self.0.

However, in our case, we want a name on the field, otherwise how are we going to fetch the information from whatever telegraf sends us ? To signify this, we raise an error on the field and let it bubble up to the caller. The Option::ok_or_else is a convenient method on Option<T> that lets us turn it into a Result<T, E>. Add the question mark and woosh: the error bubbles out to the caller and looks like this:

error: Expected a structure with named fields, unnamed field given
  --> relay/src/datastructures/system2.rs:12:29
   |
12 | struct NegativeNumberOrZero(i32);
   |                             ^^^

So, let's implement MetricField::new and see where we are processing the field attributes:

impl MetricField {
    pub fn new(name: &Ident, raw_helper_attributes: &[Attribute], ty: &Type) -> syn::Result<Self> {
        // Getting the name of attributes put in front of struct fields
        let helper_attributes = raw_helper_attributes
            .iter()
            .map(|attribute| {
                attribute
                    .path
                    .segments
                    .iter()
                    .map(|segment| &segment.ident)
                    .collect::<Vec<_>>()
            })
            .flatten()
            .collect::<Vec<_>>();

        // Making sense of the attribute(s)
        let attribute_type = if helper_attributes.is_empty() {
            TelegrafFieldType::Metric
        } else if helper_attributes.len() == 1 {
            let helper_attribute = helper_attributes[0];
            TelegrafFieldType::try_from(helper_attribute)?
        } else {
            return Err(syn::Error::new_spanned(name, "Field has more than one attribute"));
        };

        Ok(
            Self {
                name: name.clone(),
                ty: ty.clone(),
                attribute_type,
            }
        )
    }
}

That was a chunk of code ! Let's decompose everything in this snippet.

First, we have this piece of code whose variable naming might not be the best, but here we are.

let helper_attributes = raw_helper_attributes
.iter()
.map(|attribute| {
    attribute
        .path
        .segments
        .iter()
        .map(|segment| &segment.ident)
        .collect::<Vec<_>>()
})
.flatten()
.collect::<Vec<_>>();

This piece of code iterates over each syn::Attribute, gets its syn::Path and each segment of the path. The documentation explains pretty well how everything works together.

When we reach the segment, we get its name and collect everything into a vector. At this stage, we would have a vector of vectors or Vec<Vec<&Ident>>. flatten() the whole vector and collect the attribute names into a single vector.

The next bit determines what the field attribute expresses:

let attribute_type = if helper_attributes.is_empty() {
    TelegrafFieldType::Metric
} else if helper_attributes.len() == 1 {
    let helper_attribute = helper_attributes[0];
    TelegrafFieldType::try_from(helper_attribute)?
} else {
    return Err(syn::Error::new_spanned(name, "Field has more than one attribute"));
};

This bit of code also makes sure that:

  • if there is zero attribute, we pick a default one;
  • if there is one attribute, we interpret what the user has intended;

Otherwise, raise an error that will get placed on the input struct field.

Interpreting, or at least, trying to (see where I'm going ?) make sense of what the procedural macro user wanted, it doesn't get much more complicated since we're using familiar constructs:

impl TryFrom<&Ident> for TelegrafFieldType {
    type Error = syn::Error;

    fn try_from(ident: &Ident) -> Result<Self, Self::Error> {
        Ok(
            // Idents have a string representation we can use
            match ident.to_string().as_str() {
                "tag" => TelegrafFieldType::Tag,
                "field" => TelegrafFieldType::Metric,
                _ => {
                    return Err(syn::Error::new_spanned(ident, format!("Unknown attribute `{}`", ident)))
                }
            }
        )
    }
}

Returning an error is as simple as creating a syn::Error with new_spanned, passing where the error should be placed and writing the description. In this case, the potential error is placed on the field name we don't know how to make sense of.

At the end of MetricField::new, we return the MetricField struct with the interesting information:

Ok(
    Self {
        name: name.clone(),
        ty: ty.clone(),
        attribute_type,
    }
)

Phew ! We are done going down the rabbit hole for parsing the input struct. We have all the information we need to give back a structure to the compiler, at least a few things we have determined we would give back. As a reminder of what we have done so far:

// For the struct
#[derive(Debug)]
pub struct TelegrafMetricStruct {
    pub struct_name: Ident,
    pub properties: Vec<MetricField>
}

// For each field in the struct
#[derive(Debug)]
pub struct MetricField {
    pub name: Ident,
    pub attribute_type: TelegrafFieldType,
    pub ty: Type,
}

Wait... Why do I hear boss music !?

Parsing tokens passed as a parameter of the procedural macro

So, let's remind ourselves where we are after getting deep into the previous rabbit hole:

#[proc_macro_attribute]
pub fn observer(meta: CompilerTokenStream, input: CompilerTokenStream) -> CompilerTokenStream {
    let metrics_struct = syn::parse_macro_input!(input as TelegrafMetricStruct);

    // Parse whatever has been put in the procedural macro parameter <--
    let macro_parameters = syn::parse_macro_input!(meta as MacroParameters);

    // Generate bits of code

    // Give back the generated code to the compiler
    CompilerTokenStream::new()
}

Well, looks like we are going down another rabbit hole, aren't we ? This is also where I will show you how we parse things from scratch.

For reference, I'll pull up the original code we are transforming:

#[observer(sql = "../files/system/upsert.sql")]
pub struct SystemMetrics {
    #[tag] host: String,
    load1: f64,
    load5: f64,
    load15: f64,
}

As I have said, we will have sql = "../files/system/upsert.sql" as tokens in the meta parameter of our procedural macro. So... let's make sense of what has been passed on !

As usual, we will be using a dedicated struct that will do the parsing for us and show errors as compile errors to the user:

#[derive(Debug)]
pub struct MacroParameters {
    pub sql: String,
}

impl Parse for MacroParameters {
    fn parse(input: &ParseBuffer) -> syn::Result<Self> {
        todo!();
    }
}

While I'm here, I may as well throw in some code we will need later: builder pattern for the MacroParameters:

pub struct MacroParametersBuilder {
    sql: Option<String>,
}

pub enum MacroParametersBuilderError {
    MissingField(String)
}

impl MacroParametersBuilder {
    pub fn new() -> Self {
        Self {
            sql: None
        }
    }

    pub fn sql(&mut self, sql: String) {
        self.sql = Some(sql);
    }

    pub fn build(self) -> Result<MacroParameters, MacroParametersBuilderError> {
        Ok(
            MacroParameters {
                sql: match self.sql {
                    Some(s) => s,
                    None => return Err(MacroParametersBuilderError::MissingField("sql".to_string()))
                }
            }
        )
    }
}

The builder enables us processing the tokens passed in the parenthesis of the procedural macro, step by step, making sense of each parameter one after the other. In other words, we build the MacroParameter one step at a time.

So, let's start parsing whatever has been thrown at us, with the symbols first:

use syn::Ident;
use syn::Token;
use syn::Lit;

impl Parse for MacroParameters {
    fn parse(input: &ParseBuffer) -> syn::Result<Self> {
        let mut builder = MacroParametersBuilder::new();

        loop {
            // Will be useful later
            if !input.peek(Ident) {
                break;
            }

            let param_name = input.parse::<Ident>()?;
            input.parse::<Token![=]>()?;
            let param_value = input.parse::<syn::Lit>()?;

            // ... Make sense of whatever has been passed in

            if !input.peek(Token![,]) {
                break;
            }

            input.parse::<Token![,]>()?;
        }

        todo!();
    }
}

Looks like we have some new friends here:

Token! is a macro that expands to types representing the tokens passed into it. Token![,] expands into syn::token::Comma, Token![=] expands into syn::Token::Eq, and so on.

syn::Lit is an enum representing what kind of literal has been passed. This will be useful when attempting to make sense of a parameter value and checking whether the type is valid or not.

Explaining where we are doing in the loop (couldn't find a better control structure):

  • Check if we have a token parseable as an Ident. If not, we break out of the loop. This allows us to have #[observer(param1 = "something",)];
  • Get the name of the parameter
  • Parse the equal token. Since we don't care much about it, we carry on; it's only here to construct the syntax.
  • Get the parameter value as a syn::Lit.
  • Make sense of whatever has been thrown at us
  • Check if we have a comma. If not, we break out of the loop.
  • Parse the comma. We also don't care about it; it's only here to construct the syntax

While proofreading this article, I thought we could parse the parameters using syn::punctuated. I will leave this as homework for the reader. There is so much you can do with syn that you kind of always leave a browser tab with the documentation nearby.

Making sense of the parameter name and value can be done with a match block on the string representation of the parameter (which is still a syn::Ident):

match param_name.to_string().as_str() {
    "sql" => {
        match param_value {
            Lit::Str(str_val) => {
                builder.sql(str_val.value());
            }
            _ => return Err(syn::Error::new_spanned(param_value.to_token_stream(), "Expected string literal"))
        };
    }

    _ => return Err(syn::Error::new_spanned(param_name.to_token_stream(), format!("Unsupported attribute `{}`", param_name)))
}

Notice we use our builder we have created earlier to build our procedural macro parameters. If there is more attributes to come, we can expand this section, the builder and the MacroParameters.

In the block above, we also raise errors whever the attribute is unknown to us or "unsupported" or the type of the litteral is not what we expected. The compilation error is placed respectively on the parameter name and value.

#[observer(sql = 2.71828)] gives:

error: Expected string literal
 --> relay/src/datastructures/system2.rs:3:18
  |
3 | #[observer(sql = 2.71828)]
  |                  ^^^^^^^

and #[observer(unknown_parameter_name = 2.71828)] gives:

error: Unsupported attribute `unknown_parameter_name`
 --> relay/src/datastructures/system2.rs:3:12
  |
3 | #[observer(unknown_parameter_name = 2.71828)]
  |            ^^^^^^^^^^^^^^^^^^^^^^

When everything has been parsed properly, we can build our MacroParameters:

match builder.build() {
    Ok(p) => Ok(p),
    Err(e) => match e {
        MacroParametersBuilderError::MissingField(field) => {
            Err(syn::Error::new(Span::call_site(), format!("Missing field `{}`", field)))
        }
    }
}

If the builder returns an error, we "convert" that error into a compilation error:

error: Missing field `sql`
 --> relay/src/datastructures/system2.rs:3:1
  |
3 | #[observer]
  | ^^^^^^^^^^^
  |

And there we go, we have successfully parsed the tokens passed as a parameter of our procedural macro.

Generating all the things

Now we can start generating all the things we want to generate and give it back to the compiler.

Let's start with declaring the data structure, shall we ?

The struct holding the data

// proc_macro2::TokenStream has been aliased to ProcMacro2TokenStream
// This allows us to "unconfuse" whenever we're dealing with the compiler's
// TokenStream or proc_macro2's TokenStream
fn generate_data_struct(metrics_struct: &TelegrafMetricStruct) -> ProcMacro2TokenStream {
    let fields = &metrics_struct
        .properties
        .iter()
        .map(|f| {
            let name = &f.name;
            let ty = &f.ty;
            quote!{ pub #name: #ty }
        })
        .collect::<Vec<_>>();

    let struct_name = &metrics_struct.struct_name;

    quote! {
        pub struct #struct_name {
            #(#fields),*
        }
    }
}

So, in this piece of code we iterate over our properties field which is a vector of MetricField. For each property we generate a bit of code using the quote crate, its quote! macro and some quasi-quoting.

First, we define two variables that will be incorporated. Note that the quote! macro does seem to support accessing struct fields directly. In the first attempt using the ToToken trait on MetricField, I even managed to overflow the compiler's stack. Wonderful ! Instead, we define variables and then incorporate them. For each struct field we will declare in the future struct, we just stick the name and type of the field in. Then everything is collected into a vector of tokens.

We do pretty much the same thing with the struct we want to declare: getting the name of the struct into a variable and then incorporate it into the generated code. If you look inside the struct we are generated, you can probably recognize something that furiously looks like a macro_rules! repetition. The macro will put all the token streams one after the other with a comma at the end.

The generation of the struct also strips any custom attributes. At the end, we have something like this:

pub struct SystemMetrics {
    pub host: String,
    pub load1: f64,
    pub load5: f64,
    pub load15: f64,
}

The proc_macro2::TokenStream is then returned to the caller for further processing.

Implementation of TryFrom<&TelegrafMetric>

Things get a little more spicy here, although not too much. As the previous section, we iterate on the properties field of TelegrafMetricStruct but add a little twist:

fn generate_try_from_impl(metrics_struct: &TelegrafMetricStruct) -> ProcMacro2TokenStream {
    let generated_fields = metrics_struct
        .properties
        .iter()
        .map(|field| {
            let field_name = &field.name;

            let conversion_expression = match field.attribute_type {
                TelegrafFieldType::Tag => {
                    quote! { crate::datastructures::macros::get_tag(&value.tags, stringify!(#field_name))? }
                }

                TelegrafFieldType::Metric => {
                    let conversion_function = Ident::new(&format!("convert_{}", field.ty.to_token_stream()), Span::call_site());

                    quote! { crate::datastructures::macros::#conversion_function(&value.fields, stringify!(#field_name))? }
                }
            };

            quote! {
                #field_name: #conversion_expression
            }
        })
        .collect::<Vec<_>>();
    // ...
}

In the map method of the iterator, we define our usual variables and use the attribute_type of MetricField to generate the corresponding code to pull data from the correct place in whatever telegraf sends us.

I would like to focus on the match block, more precisely in its arms:

quote! { crate::datastructures::macros::get_tag(&value.tags, stringify!(#field_name))? }

and

let conversion_function = Ident::new(&format!("convert_{}", field.ty.to_token_stream()), Span::call_site());

quote! { crate::datastructures::macros::#conversion_function(&value.fields, stringify!(#field_name))? }

In the first arm, we generate our syntax tree as usual. Because procedural macros are, unlike declarative macros, un-hygienic, we need to be more careful about things in the environment because we can very easily influence it or be influenced by it. This is why I reference my stuff with absolute paths from the root of the crate.

In the second arm, I generate a function identifier with Ident::new before incorporating it in the call. Remember about procedural macros being un-hygienic ? Pulling the function name out of "nowhere" assumes it is defined somewhere, in this case it is the result of a declarative macro that is out of the scope of this article.

The rest of the generation has nothing new:

let data_struct_name = &metrics_struct.struct_name;

quote! {
    impl std::convert::TryFrom<&crate::datastructures::TelegrafMetric> for #data_struct_name {
        type Error = crate::datastructures::telegraf_structs::StructConvError;

        fn try_from(value: &crate::datastructures::TelegrafMetric) -> Result<Self, Self::Error> {
            Ok(
                Self {
                    #(#generated_fields),*
                }
            )
        }
    }
}

generated_fields comes from the iteration we had earlier, the lines are incorporated into the syntax tree, separated by a comma.

Generating the implementation of Observer

The last step of the code generation before calling it complete: generating a cookie-cutter implementation of the trait Observer. We will start with the variables as usual, then code generation.

Here we go with variables

fn generate_observer_impl(metrics_struct: &TelegrafMetricStruct, macro_parameters: &MacroParameters) -> ProcMacro2TokenStream {
    let data_struct_name = &metrics_struct.struct_name;
    let observer_struct_name = Ident::new(&format!("{}Observer", data_struct_name), Span::call_site());
    let sql_filename = macro_parameters.sql.as_str();
    let fields = metrics_struct
        .properties
        .iter()
        .map(|field| {
            &field.name
        })
        .collect::<Vec<_>>();

    todo!();
}

On the second line:

let observer_struct_name = Ident::new(&format!("{}Observer", data_struct_name), Span::call_site());

I create a brand new name for the observer implementation, because we can. In the next line, I pull the sql file macro parameter we have parsed earlier in this article.

Then, the code generation is not extraordinary: you're writing your thing and incorporating variables into it. Ready ?

quote! {
    pub struct #observer_struct_name;

    #[async_trait::async_trait]
    impl crate::observers::Observer for #observer_struct_name {
        async fn process(&self, metrics: &crate::datastructures::TelegrafMetric) -> Result<(), crate::datastructures::StructProcessError> {
            use std::convert::TryFrom;
            use crate::database::DatabaseClient;

            let metrics_data = #data_struct_name::try_from(metrics)?;
            let db = DatabaseClient::get();

            db.execute(
                include_str!(#sql_filename),
                &[
                    &(metrics.timestamp as f64),
                    #(&metrics_data.#fields),*
                ]
            ).await?;

            Ok(())
        }
    }
}

If you look closely at the body of async fn process(&self, &TelegrafMetrics), you can see a few use statements. As I have already said twice earlier, we should be careful with the environment we are working in. Since we are in the body of a function, we can do whatever we want in it; we won't influence the external environment, but remember we still can be influenced by the external environment.

You may have noticed that I don't check if the SQL file the user provided exists or not. include_str! does that for us. I also set an attribute on the trait implementation, attribute that will be further processed by the compiler. Same thing for the include_str! macro.

Putting it all together

We are close of being done generating code. So far, we have three functions that return a proc_macro2::TokenStream. To bring all of that together, we call the functions one after the other, store the result in a variable and return the final syntax tree:

#[proc_macro_attribute]
pub fn observer(meta: CompilerTokenStream, input: CompilerTokenStream) -> CompilerTokenStream {
    // Parsing and making sense of what the user has thrown at us
    let metrics_struct = syn::parse_macro_input!(input as TelegrafMetricStruct);
    let macro_parameters = syn::parse_macro_input!(meta as MacroParameters);

    // Generate the bits of code that we should give back to the compiler
    let generated_data_struct = generate_data_struct(&metrics_struct);
    let generated_try_from_impl = generate_try_from_impl(&metrics_struct);
    let generated_observer_impl = generate_observer_impl(&metrics_struct, &macro_parameters);

    // Assemble everything
    let output = quote! {
        #generated_data_struct

        #generated_try_from_impl

        #generated_observer_impl
    };

    // Pass the result back to the compiler
    output.into()
}

There we go. A procedural macro that generates the code that I have showed you at the beginning of the article. Fantastic, isn't it ?

The conclusion

Let's recap what we have seen in this article:

  • The families of macros that exist in Rust: declarative and procedural macros, the latter divided into three other types;
  • The skeleton of a procedural macro;
  • Parsing and making sense of an user-provided struct, its fields and #[attributes] prefixing the fields using the existing syn::ItemStruct;
  • Generating compilation errors (syn::Error::new_spanned) and have them point to the correct place;
  • Parsing some tokens from scratch and defining our syntax for those parameters;
  • How to build your macro parameters struct step by step with the Builder pattern, though not going into much details as for the construction of the pattern;
  • Generating the final piece of code and giving it back to the compiler.

The takeaway of this article is that the compiler gives you a syntax tree, you manipulate that tree and give something back to the compiler. The entry point is parsing and making sense of what the user has thrown at you with syn::parse_macro_input!. Do whatever you want. Then you give back a TokenStream of valid Rust code to the compiler which will... compile your thing and carry on.

I would recommend having the docs of syn somewhere close so you can take a look at what each struct/enum contains and do operations accordingly. cargo expand will be your friend to see if your procedural (and declarative) macro behaves the way you have intended.

That was quite the long article. Hopefully you have learned a few things from this post and that procedural macros won't scare you as much as they did.