C++11 compile time checked printf format

A few notes on how I wrote 100% compile time checked printf-style format type checker. Source code is online on GitHub.

Motivation

I recently worked again a bit on ezEngine, a beautiful Open Source game engine developed continuously by a few friends of mine in their spare time. ezEngine uses custom implemented formatting function that follows the style of (s)printf. Like printf it is C-varargs based and therefore not type safe. Example:

printf("%i", "not an int");

Depending on the printf implementation this will lead to ugly runtime errors. However, all modern compilers have hardcoded compile time checks for that and will warn you. If you have your own implementation but with a similar interface, the story is different though.

Using modern C++ and variadic templates, you could implement all sorts of nice interfaces that give you clear error messages at runtime – still not trivial, but straight forward compared to what follows here ;). But the goal of this exercise today is different: We want to write code that has no influence on the runtime and works with any existing printf-like interface.

There have been more complex things done at compile-time, so how hard can it be to do a bit of parsing with now that we have constexpr

Compiler targets are vs2015 and the newest versions of GCC and Clang with -std=c++11.

Before we start – on constexpr

Conditions

Using constexpr one could think that we can just loop over the letters of the incoming literal and that’s it. Well not so fast! Loops and if conditions in constexpr are C++14 features. But we are allowed to use the ? operator and recursion (almost) as much as we want!

Throw & Error Handling

There is a special way of exiting a constexpr at compile-time: throw. Yes. Just throw an exception at compile time! But again, this is only allowed in one of the branches of the ternary ? operator. Single standing throw expressions are a C++14 feature. Throw is very helpful for our task, since it allows us to both exit a function and document the place of failure.
Since we are evaluating at compile time, it sounds at first that we can do error handling with static_assert. This doesn’t work though, since static_assert is executed at the template instantiation phase which means that it can’t look at values that are passed to the constexpr function – remember, the function may be called at runtime as well!

// Won't compile!
constexpr bool Nope(int i)
{
  static_assert(i > 0, "error");
  return i > 0;
}
// The way to go.
constexpr bool Yep(int i)
{
  return i > 0 ? true : throw "error";
}

Passing arguments

Passing any arguments other than the format string literal and the current parsing position is very tempting since it would allow us to infer types automatically. But depending on the compiler you will get complaints about the non-constness of your constexpr call earlier or later. Instead we need to get the types from the expressions (using decltype) and pass them in the template type list. If the incoming string is not a compile time array, we obviously cannot call a constexpr function at compile-time that tries to inspect it.

Function Evaluation

MSVC2015 is super lazy when it comes to compiling constexpr: Sometimes only branches that are hit will be compiled. I didn’t realize that for quite a while which was a major pain since I based everything on template functions instead of structs which can be easily specialized for all, even unreachable, cases.

Implementation

Main Recursion

template<int FormatLen, typename ...Params>
struct PrintfFormatCheck;

// The rest of our Implementation...

// Entry for parsing when there are (still) parameters.
template<int FormatLen, typename Param0, typename ...Param>
struct PrintfFormatCheck<FormatLen, Param0, Param...>
{
  static constexpr ErrorCode Recurse(const char(&format)[FormatLen], int pos)
  {
    // --------------------------------------------------------------------------------------------------
    // Compile time error in this function: You passed too many arguments!
    // --------------------------------------------------------------------------------------------------

    // Check if we are at the end of the format string.
      return pos + 1 >= FormatLen ? throw ErrorCode::TOO_MANY_ARGS :
              // A % followed by another % character will write a single % to the stream.
              format[pos] == '%' ?
              format[pos + 1] == '%' ? Recurse(format, pos + 2) :
              // If there has been a %, do type checks.
              ParseSymbol<FormatLen, Param0, Param...>::Recurse(format, pos + 1) : // Signed decimal integer
            // No arg symbol, check next char ...
            Recurse(format, pos + 1);
  }

  // Entry function eats the first param which needs to be void to support zero arguments.
  static constexpr ErrorCode Entry(const char(&format)[FormatLen], int pos)
  {
    static_assert(std::is_same<Param0, void>::value, "First template argument must be void.");
    return PrintfFormatCheck<FormatLen, Param...>::Recurse(format, pos);
  }
  // Fallback for non const string.
  static constexpr ErrorCode Entry(const char*&, int pos)
  {
    return ErrorCode::SUCCESS;
  }
};

This piece is the general entry point and contains already most of the important principles. It is a specialized struct which allows us to write fallbacks for every special case the compiler may stumble onto.
Note that the Entry function expects you to pass at least a void argument. This is a little trick to work around the fact that the macro we’re going to use does not handle zero parameters very well (more on that later).
Also, Entry provides us with a nice way to have a fallback overload for cases where we can’t do compile time parsing (iff the string is not known at compile time).

All the action is in the Recurse function. We use an enum class error code for both throwing (cancel) and return. This makes the control flow very easy and allows us to put the whole thing in static_assert at the very end. The Recurse function will either.. well.. recurse, or branch out to a parse symbol function which follows a similar pattern and will eventually lead back to the Recurse function.

Note that we already use the typename Param0, typename ...Param pattern even if we are not yet reducing the number types. This is just a useful and clean way to make clear that we have at least a parameter type left – writing just typename ...Param would also allow empty parameter packs! The no parameter specialization is basically the same, but without the Entry functions and the ParseSymbol call:

// Entry if there are no parameters (can be end of variadic recursion)
template<int FormatLen>
struct PrintfFormatCheck<FormatLen>
{
  static constexpr ErrorCode Recurse(const char(&format)[FormatLen], int pos)
  {
    // --------------------------------------------------------------------------------------------------
    // Compile time error in this function: You didn't provide enough arguments!
    // --------------------------------------------------------------------------------------------------

    // Check if we are at the end - if yes, success!
    return pos + 1 >= FormatLen ? ErrorCode::SUCCESS :
      // A new % would mean we have too many args... unless there is another one right after it.
      format[pos] == '%' ?
      format[pos + 1] == '%' ? Recurse(format, pos + 2) : throw ErrorCode::TOO_FEW_ARGS :
      // Otherwise keep parsing (recurse)...
      Recurse(format, pos + 1);
  }
};

Parsing

To separate character parsing and type matching, we first translate format specifier in a FormatType enum value:

// Possible formats.
enum class FormatType
{
  STRING,
  REAL,
  INT,
  LONG_INT,
  POINTER
};

Extracting these type enums is easy:

// The symbol parsing engine. Invoked if there are still params and % was found (but without trailing %)
template<int FormatLen, typename Param0, typename ...Param>
struct ParseSymbol<FormatLen, Param0, Param...>
{
  static constexpr ErrorCode Recurse(const char(&format)[FormatLen], int pos)
  {
    // --------------------------------------------------------------------------------------------------
    // Compile time error in this function: Can't parse format string, unknown specifier.
    // --------------------------------------------------------------------------------------------------

    return  format[pos] == 'f' || // Decimal floating point, lowercase
            // ...
            format[pos] == 'A' ?  // Hexadecimal floating point, uppercase
            CheckArgument<FormatType::REAL, FormatLen, Param0, Param...>::CheckAndContinue(format, pos) :

            // All the other specifier and more special cases...

            // Width & precision
            (format[pos] >= '0' && format[pos] <= '9') || format[pos] == '.' || format[pos] == '*'
            // Then skip this letter.
            ? Recurse(format, pos + 1) :

            // Nothing we know!
            throw ErrorCode::INVALID_FORMATSTRING;
  }
};

Note that the “no-paramter specialization” for this is unreachable and Visual Studio doesn’t even need it! Every other compiler that clings closer to the standard though will have trouble if you don’t implement template struct ParseSymbol.
As you can see, we pass the FormatType type on to another function called CheckArgument::CheckAndContinue. Also, we need to do some skipping for the various more complex format cases – there is a lot of untapped error checking potential left here!

How to tell if a given type is compatible with a FormatType? With template specialization:

// Default "Fail"
template<typename Param, FormatType ExpectedFormat>
struct ParamCheck
{
  typedef std::false_type Result;
};
// A List of all success cases:
template<> struct ParamCheck<float, FormatType::REAL> { typedef std::true_type Result; };
template<> struct ParamCheck<double, FormatType::REAL> { typedef std::true_type Result; };
// ...

Using this we can build a checking function that either fails or goes back to the Recurse where we started from:

template<FormatType ExpectedFormat, int FormatLen, typename Param0, typename ...Param>
struct CheckArgument<ExpectedFormat, FormatLen, Param0, Param...>
{
  // "Eats" a paramter type and checks if it matches the expected format type.
  // If yes, goes back to PrintfFormatCheck (with one parameter less).
  static constexpr ErrorCode CheckAndContinue(const char(&format)[FormatLen], int pos)
  {
    // --------------------------------------------------------------------------------------------------
    // Compile time error in this function: One of your arguments does not match the format string!
    // Check the value of the parameter "pos" to find out which format string was missmatched.
    // --------------------------------------------------------------------------------------------------

    // Removing const from param to keep number of ParamChecks a bit lower.
    // If ParamCheck says it is valid, keep going but jump over %x (thus pos+2).
    return ParamCheck<typename std::remove_cv<typename std::remove_reference<Param0>::type>::type, ExpectedFormat>::Result::value ?
              PrintfFormatCheck<FormatLen, Param...>::Recurse(format, pos + 1) : throw ErrorCode::WRONG_ARG;
  }
}

This is also finally the function that “eats” parameters from the variadic template.
To reduce the number of needed ParamCheck specialications, we use these tow little helpers from the standard libraryto get rid of const/volatile and reference in the type:

typename std::remove_cv<typename std::remove_reference<Param0>::type>::type

And that’s basically it! 🙂

Caller Macro

Our check would look something like this right now:

static_assert(PrintfFormatCheck<7, int, float>>::Entry("%i, %f", 0) == ErrorCode::SUCCESS, "This should never happen.");
printf("%i, %f", 1, 1.0f);

Obviously, we need to automate that! Getting the length of a string literal (the first argument to PrintfFormatCheck) is the simple part:

template<int FormatLen>
constexpr int StringLiteralLength(const char(&format)[FormatLen])
{
  return FormatLen;
}

The rest… not so much. I’ve ended up employing a mixture of macro concatenation and __VA_ARGS__ counting to wrap every single parameter into a decltype instruction. All this is quite complex and since this blog entry is already too long, I refer you to the code and this nice list of macro tricks. As this technique doesn’t allow 0 parameters, I added void() (decltype(void()) is void) as an extra parameter into the macro (which is then removed by the Entry function again).

Results

The error messages are not super nice to read, but decryptable:

printf_checked("Hello %s %i some text", 10, 10);

Leads in MSVC to:

printfformatcheck.h(154): error C2131: expression did not evaluate to a constant
  printfformatcheck.h(94): note: failure was caused by evaluating a throw sub-expression
  printfformatcheck.h(154): note: while evaluating 'PrintfFormatCheck::CheckArgument::CheckAndContinue("Hello %s %i some text", 7)'
          with
          [
              Param0=int
          ]
  printfformatcheck.h(224): note: while evaluating 'PrintfFormatCheck::ParseSymbol::Recurse("Hello %s %i some text", 7)'
          with
          [
              Param0=int
          ]
  printfformatcheck.h(226): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Recurse("Hello %s %i some text", 6)'
  printfformatcheck.h(226): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Recurse("Hello %s %i some text", 5)'
  printfformatcheck.h(226): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Recurse("Hello %s %i some text", 4)'
  printfformatcheck.h(226): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Recurse("Hello %s %i some text", 3)'
  printfformatcheck.h(226): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Recurse("Hello %s %i some text", 2)'
  printfformatcheck.h(226): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Recurse("Hello %s %i some text", 1)'
  printfformatcheck.h(233): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Recurse("Hello %s %i some text", 0)'
  test.cpp(64): note: while evaluating 'PrintfFormatCheck::CheckPrintfFormat::Entry("Hello %s %i some text", 0)'

The caller macro expands things that have been one line to two lines which breaks existing one lined if/else blocks. One can easily work around this by replacing with do { /* all the stuff */ } while(false) but this won’t work if your original function was a member or namespace function which is always the case for ezEngine 😦

Still a nice learning experience 🙂

Advertisements

2 thoughts on “C++11 compile time checked printf format

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s