Playing with C# source generators
Contents
Motivation
In one of our projects we receive structured data in several serialization formats like JSON and YAML. This data mostly consists of simple primitive properties that map onto some C# objects later on:
{
"prop1": "hello world",
"prop2": 123
}
These objects later on get mapped to C# objects like record MyObject(string Prop1, int Prop2)
. At this point we could easily directly use serializer libraries and just deserialize the object into the given C# type but there are two problems we ran into:
- Our software acts as a middleware between the incoming data and several dynamically-loaded plugins. That means in the middleware layer we need to be able to inspect the data but we don’t know the actual types yet.
- The serialization formats are also pluggable and our middleware should act on the data regardless of the format.
Because of this, we got used to using an intermediate format in the form of nested Dictionary<string, object?>
types. Then, as soon as the dictionary objects reach the plugin layer, we map the dictionary objects automatically to C# classes using Reflection. This of course has performance implications because of the dynamic nature of Reflection, even though the properties are already known at compile time. There must be a better way of handling those mappings without relying on inspecting the types at runtime.
TLDR
Source generators solve this problem. They are great, but they take some time getting used to, especially in regards to third-party libraries and debugging. The final project with a working example can be found at https://github.com/kpko/csharp-source-generator-example.
Introducing source generators
Source generators, as introduced in C# 9, are a method to implement custom code generators that run at compile time. By now source generators are widely used in .NET:
- System.Text.Json is able to generate serialization code at compile time: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/source-generation?pivots=dotnet-8-0
- Microsoft.Extensions.Logging has generators to generate formatted log messages: https://learn.microsoft.com/en-us/dotnet/core/extensions/logger-message-generator
- Regular expressions can generate compiled regex parsing code: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-source-generators
- ASP.NET Core can map request handlers at compile time: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/aot/request-delegate-generator/rdg?view=aspnetcore-8.0
These source generators have several up- and downsides compared to their default runtime implementation. On the upside, the performance at runtime is better most of the time. Also they support more advanced ahead-of-time (AOT) compilation scenarios as they don’t need to rely on runtime reflection.
The downside is that they might make the compile time longer. Also, because of them relying on compile-time information, source generators might have less information given our application is not running yet.
For our dictionary mapping problem, source generators might be the solution we are looking for.
Source generator implementations
The plan
My plan was simple. I want to use a marker attribute like [GenerateDictionaryConversionMethods]
on classes to instruct a source generator to 1. implement an interface (ISupportDictionaryConversion
) and 2. implement the interface methods ReadFromDictionary
and WriteToDictionary
that, like their name suggests, read values from a dictionary and write them to class members or read class members and write them to the dictionary.
My project structure looks like this:
SourceGeneratorExample.sln
|- MyGenerators.csproj: A project including the source generators
|- TestApp.csproj: A console application or unit test project to test our implementation
The generator library project has be a netstandard2.0 class library and needs to include a property called EnforceExtendedAnalyzerRules
to work.
<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<EnforceExtendedAnalyzerRules>true</EnforceExtendedAnalyzerRules>
<LangVersion>8.0</LangVersion>
<Nullable>enable</Nullable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="4.8.0" PrivateAssets="all"/>
<PackageReference Include="Microsoft.CodeAnalysis.Analyzers" Version="3.3.4" PrivateAssets="all"/>
<PackageReference Include="Microsoft.CSharp" Version="4.7.0"/>
<PackageReference Include="System.Threading.Tasks.Extensions" Version="4.5.4"/>
</ItemGroup>
The TestApp.csproj then can reference our library like this:
<ItemGroup>
<ProjectReference Include="..\MyGenerators\MyGenerators.csproj"
OutputItemType="Analyzer"
ReferenceOutputAssembly="true" />
</ItemGroup>
I implemented marker attributes and an interface for my dictionary conversion methods.
using System;
using System.Collections.Generic;
namespace MyGenerators
{
public class GenerateDictionaryConverterMethodsAttribute : Attribute
{
}
public interface ISupportDictionaryConversion
{
void WriteToDictionary(IDictionary<string, object> dictionary);
void ReadFromDictionary(IReadOnlyDictionary<string, object> dictionary);
}
}
When the code generation works, I would like to use the code like this:
using Microsoft.VisualStudio.TestTools.UnitTesting;
using MyGenerators;
using TestApp;
var firstObject = new Test() { Name = "John Doe", Age = 42 };
var tempDictionary = new Dictionary<string, object?>();
firstObject.WriteToDictionary(tempDictionary);
var secondObject = new Test2();
secondObject.ReadFromDictionary(tempDictionary);
Assert.AreEqual("John Doe", secondObject.Name);
Assert.AreEqual(42, secondObject.Age);
namespace TestApp
{
[GenerateDictionaryConverterMethods]
public partial class Test
{
public string Name { get; set; } = string.Empty;
public int Age { get; set; }
}
[GenerateDictionaryConverterMethods]
public partial class Test2
{
public string Name { get; set; } = string.Empty;
public int Age { get; set; }
}
}
At this stage of the implementation, the WriteToDictionary-method should dump contents of public properties into the dictionary and later map them back onto a similar object. At a later stage the conversion methods could be more advanced, like mapping simple string values onto enums.
Also it is important to mark the classes as partial
to enable the source generator to expand on this class at a later point in time.
Simple source generator
To get started using source generators, I followed the MSDN guide (Source generators overview). This guide explains the basic project structure and also gives an example on how to implement a simple source generator. At this time my focus wasn’t to make it efficient but to make it work and move on from there. My first working example looked like this.
[Generator]
public class MySourceGenerator : ISourceGenerator
{
public void Initialize(GeneratorInitializationContext context)
{
}
public void Execute(GeneratorExecutionContext context)
{
var s = context.Compilation
.GetSymbolsWithName(x => true, SymbolFilter.Type)
.ToList();
var attrName = nameof(GenerateDictionaryConverterMethodsAttribute);
var types = s.OfType<INamedTypeSymbol>()
.Where(a => a.GetAttributes().Any(b => b.AttributeClass?.Name == attrName))
.ToList();
foreach (var type in types)
{
var source = StringBasedSyntax.GenerateSyntax(type);
context.AddSource($"DictionaryHelpers.{type.Name}.g.cs", source);
}
}
}
Basically, as soon as the source generator runs, it searches for all compiled symbols (declared types) with my custom attribute on top. I used a custom class called StringBasedSyntax with a simple, static GenerateSyntax-method that generates the required code for a marked type. The different approaches to generating the actual code can be found later in this post.
For each type, we generate an additional source file called DictionaryHelpers.Type.g.cs
. The .g.cs
is a naming convention for generated files.
This approach has several downsides. Most importantly it’s a pretty inefficient implementation. Configured like this, the generator runs on every compilation and scans the entire compilation unit for symbols. It would be nice to focus on declared classes including the marker attribute, without having to scan every symbol manually. Also, the generator only needs to run if those exact parts are changed.
Incremental source generator
Incremental source generators solve this problem. They use a declaration based pipeline model. The first thing to specify is a SyntaxProvider
that extracts the syntax fragments of interest from the Roslyn compiler. A callback is specified that gets called when those syntax fragments get changed. Everything gets set up in the Initialize
method of our incremental generator. A nice introduction was the video by Shawn Wildermuth.
using System;
using System.Collections.Immutable;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
namespace MyGenerators.Generators
{
[Generator(LanguageNames.CSharp)]
public class MyIncrementalGenerator : IIncrementalGenerator
{
public void Initialize(IncrementalGeneratorInitializationContext context)
{
var provider = context.SyntaxProvider.ForAttributeWithMetadataName(
$"{nameof(MyGenerators)}.{nameof(GenerateDictionaryConverterMethodsAttribute)}",
(n, _) => n.IsKind(SyntaxKind.ClassDeclaration),
(ctx, _) => (INamedTypeSymbol)ctx.TargetSymbol);
var compilation = context.CompilationProvider.Combine(provider.Collect());
context.RegisterSourceOutput(compilation, (ctx, source) => Execute(ctx, source.Right));
}
private void Execute(SourceProductionContext ctx, ImmutableArray<INamedTypeSymbol> types)
{
try
{
foreach (var type in types)
{
var source = StringBasedSyntax.GenerateSyntax(type);
ctx.AddSource($"DictionaryHelpers.{type.Name}.g.cs", source);
}
}
catch (Exception ex)
{
var desc = new DiagnosticDescriptor("GEN001",
"Error in syntax generation",
$"{ex}",
"MyIncrementalGenerator",
DiagnosticSeverity.Error, true);
var diag = Diagnostic.Create(desc, Location.None);
ctx.ReportDiagnostic(diag);
}
}
}
}
In this example I’m also using diagnostics to display errors, because I realized that errors get swallowed and we don’t get any context on why the generator doesn’t work, the files just don’t get generated. Diagnostics can include the exact syntax location where the diagnostic message occurs, but because we are not working with exact code locations at the moment, I include Location.None
.
Syntax generation
There are several methods to generate the actual code that should be placed into our target project. I tried three different methods. For each method I implemented a static class with a static method GenerateSyntax
that returns a syntax string to swap out the implementations.
String-based
Basic string-based code generation was pretty easy to get started with but quickly introduced difficulties in terms of code formatting, remembering braces and more. For very small snippets, this would be a sufficient solution.
public static string GenerateSyntax(INamedTypeSymbol type)
{
var sb = new StringBuilder();
sb.AppendLine($@"// <auto-generated/>
using System;
using System.Collections;
using MyGenerators;
namespace {type.ContainingNamespace.ToDisplayString()}
{{
public partial class {type.Name} : ISupportDictionaryConversion
{{
public void WriteToDictionary(IDictionary<string, object> dictionary)
{{
");
// Generate the WriteToDictionary-method for every member
var props = type.GetMembers()
.Where(m => m.Kind == SymbolKind.Property)
.Cast<IPropertySymbol>()
.ToList();
foreach (var prop in props)
{
sb.AppendLine($@" dictionary[""{prop.Name}""] = {prop.Name};");
}
sb.AppendLine($@"
}}
public void ReadFromDictionary(IReadOnlyDictionary<string, object> dictionary)
{{
");
foreach (var prop in props)
{
sb.AppendLine($@"
if (dictionary.TryGetValue(""{prop.Name}"", out var {prop.Name.ToLowerInvariant()}_value))
{{
{prop.Name} = ({prop.Type.ToDisplayString()}){prop.Name.ToLowerInvariant()}_value;
}}");
}
sb.AppendLine($@"
}}
}}
}}");
return sb.ToString();
}
Keeping track of indentation also was challenging with this approach.
SyntaxFactory
While searching for a solution I stumbled upon the SyntaxFactory
class. This class looks very promising, because it provides high-level methods to generate syntax elements. I also found a working example on GitHub.
public static string GenerateSyntax(INamedTypeSymbol type)
{
var unit = SyntaxFactory.CompilationUnit();
unit = unit.AddUsings(
SyntaxFactory.UsingDirective(SyntaxFactory.ParseName("System")),
SyntaxFactory.UsingDirective(SyntaxFactory.ParseName("System.Collections")),
SyntaxFactory.UsingDirective(SyntaxFactory.ParseName("MyGenerators"))
);
var ns = SyntaxFactory
.NamespaceDeclaration(SyntaxFactory.ParseName(type.ContainingNamespace.ToDisplayString()))
.NormalizeWhitespace();
var cls = SyntaxFactory.ClassDeclaration(type.Name)
.AddModifiers(SyntaxFactory.Token(SyntaxKind.PublicKeyword), SyntaxFactory.Token(SyntaxKind.PartialKeyword))
.AddBaseListTypes(SyntaxFactory.SimpleBaseType(SyntaxFactory.ParseTypeName(nameof(ISupportDictionaryConversion))));
var writeMethod = SyntaxFactory
.MethodDeclaration(SyntaxFactory.ParseTypeName("void"), nameof(ISupportDictionaryConversion.WriteToDictionary))
.AddModifiers(SyntaxFactory.Token(SyntaxKind.PublicKeyword))
.WithBody(SyntaxFactory.Block(SyntaxFactory.ParseStatement("")));
// ...Further implementation...
cls = cls.AddMembers();
ns = ns.AddMembers(cls);
unit = unit.AddMembers(ns);
var source = unit.NormalizeWhitespace().ToFullString();
return source;
}
Generating basic elements like using statements, namespaces and classes was pretty easy but I stopped this approach when I reached the actual method generation. Generating the whole syntax tree for the implementation manually was very verbose. Upon researching this approach, I realized others have the same problem: Reddit discussion: Why is no-one using Roslyn token-based code generation with Source Generators?. On the bright side, this approach basically fixed every problem I had with indentation and formatting because of the built-in auto-formatting with NormalizeWhitespace()
.
Templating
While reading on the topic I came across many people who suggested using some kind of templating engine like Scriban. While Scriban seems to be a good choice using a liquid-inspired syntax, I had a good experience using mustache templates with Stubble so I wanted to try it for generating C# code.
First I had to make a few adjustments to include a library via the Nuget package. It wasn’t enough to reference the nuget package in a standard way. The libraries had to be bundled with the source generator library to make it work. The source generator related articles on Thinktecture helped a lot with this.
<PropertyGroup>
<GeneratePackageOnBuild>true</GeneratePackageOnBuild>
<GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Stubble.Core" Version="1.10.8" PrivateAssets="all" GeneratePathProperty="true"/>
<None Include="$(OutputPath)\$(AssemblyName).dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false"/>
<None Include="$(PkgStubble_Core)\lib\netstandard2.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false"/>
</ItemGroup>
<Target Name="GetDependencyTargetPaths">
<ItemGroup>
<TargetPathWithTargetPlatformMoniker Include="$(PkgStubble_Core)\lib\netstandard2.0\Stubble.Core.dll" IncludeRuntimeDependency="false"/>
</ItemGroup>
</Target>
This configures the build system to copy the Stubble assemblies to the source generator output path. Also I included a mustache template as an embedded resource.
<ItemGroup>
<None Remove="Generators\Template.mustache"/>
<EmbeddedResource Include="Generators\Template.mustache"/>
</ItemGroup>
The template itself looks like this:
// <auto-generated/>
using System;
using System.Collections;
using MyGenerators;
namespace {{Type.Namespace}}
{
public partial class {{Type.Name}} : ISupportDictionaryConversion
{
public void WriteToDictionary(IDictionary<string, object> dictionary)
{
{{#Type.Properties}}
dictionary["{{Name}}"] = {{Name}};
{{/Type.Properties}}
}
public void ReadFromDictionary(IReadOnlyDictionary<string, object> dictionary)
{
{{#Type.Properties}}
if (dictionary.TryGetValue("{{Name}}", out var tmp_{{Name}}))
{
{{Name}} = ({{Type}})tmp_{{Name}};
}
{{/Type.Properties}}
}
}
}
The incremental source generator stays the same and the code that calls the templating engine looks like this:
public static string GenerateSyntax(INamedTypeSymbol type)
{
using var resource = Assembly
.GetExecutingAssembly()
.GetManifestResourceStream("MyGenerators.Generators.Template.mustache")
?? throw new Exception("Generator template not found");
using var reader = new StreamReader(resource);
var content = reader.ReadToEnd();
var stubble = new StubbleBuilder().Build();
var result = stubble.Render(content, new
{
Type = new TypeModel(type)
});
return result;
}
My first few tries failed because I was trying to expose the INamedTypeSymbol
directly to the template engine but I ran into some errors binding to properties of the object. This was because the concrete type uses explicit implementation of the INamedTypeSymbol/ISymbol interface. It turns out those properties are not exposed to the template engine. I tried the same with Scriban later but ran into the same problem. That’s why I’m using simple wrapper classes as a custom model.
public class PropertyModel
{
public string Name { get; set; }
public string Type { get; set; }
public PropertyModel(IPropertySymbol symbol)
{
Name = symbol.Name;
Type = symbol.Type.ToDisplayString();
}
}
public class TypeModel
{
public string Name { get; set; }
public string Namespace { get; set; }
public List<PropertyModel> Properties { get; set; }
public TypeModel(INamedTypeSymbol symbol)
{
Name = symbol.Name;
Namespace = symbol.ContainingNamespace.ToDisplayString();
Properties = symbol
.GetMembers()
.Where(m => m.Kind == SymbolKind.Property)
.Cast<IPropertySymbol>()
.Select(s => new PropertyModel(s))
.ToList();
}
}
This seems like a good approach. I get a simple way to structure the code with a robust template engine and also the resulting code looks clean. For the Test
type from the beginning of the post the generated code looks like this:
// <auto-generated/>
using System;
using System.Collections;
using MyGenerators;
namespace TestApp
{
public partial class Test : ISupportDictionaryConversion
{
public void WriteToDictionary(IDictionary<string, object> dictionary)
{
dictionary["Name"] = Name;
dictionary["Age"] = Age;
}
public void ReadFromDictionary(IReadOnlyDictionary<string, object> dictionary)
{
if (dictionary.TryGetValue("Name", out var tmp_Name))
{
Name = (string)tmp_Name;
}
if (dictionary.TryGetValue("Age", out var tmp_Age))
{
Age = (int)tmp_Age;
}
}
}
}
Caveats
I stumbled across some inconveniences along the way.
- Sometimes the MSBuild caching seems a bit aggressive when using source generators, because sometimes the generator library is rebuild but the test project still uses the old assembly. We see this in the Shawn Wildermuth video too. One thing that helped a lot was constantly deleting obj/bin directories or smashing the
Clean Solution
andRebuild project
buttons. - It took some time to get debugging working. I created a launch profile for my source generator project that looks like this:
{
"$schema": "http://json.schemastore.org/launchsettings.json",
"profiles": {
"Gen": {
"commandName": "DebugRoslynComponent",
"targetProject": "../TestApp/TestApp.csproj"
}
}
}
Conclusion
Source generators are a bit fiddly but fun once they work. This was my first attempt at writing source generators, I feel like there are many more use cases. With Microsoft using generators to make .NET features aot-compatible there is a strong interest from Microsoft and the community to support more scanarios.
- Generating strongly-typed translation from .json-files, finally getting rid of .resx files?
- Generating lightweight reflection helpers at compile time?
- Writing own domain specific analyzers to avoid common mistakes?
Also with the introduction of Interceptors as a preview feature, there is a lot to come in this area.