Neon, a serverless Postgres platform, was recently brought to my attention. It comes with the promise of some interesting features like scale-to-zero, on-demand autoscaling, point-in-time restore and time travel with up to 30 days retention, instant read replicas, and probably the most unique, branching (which allows you to quickly branch your schema and data to create isolated development, test, or other purpose environments).

That's a lot of stuff to play with, but before I jumped in, I wanted to see how Neon integrated with my tech stack of choice - .NET, Azure, and GitHub. I decided to start with something simple - an ASP.NET Core application hosted on Azure App Service. Since I like to build things from the ground up, my first step was infrastructure.

Deploying the Infrastructure

Neon has three deployment options that might be of interest to me:

  • Create a Neon project hosted in the AWS region
  • Create a Neon project hosted in the Azure region
  • Deploy a Neon organization as an Azure Native ISV Service

The first two options are fully managed by Neon, you grab the connection details from the Neon console and start hacking. But I wanted to explore the third option because it provides the tightest integration from an Azure perspective (including SSO and unified billing), and that's what the organizations I work with are usually looking for.

As a strong IaC advocate, I also didn't want to deploy Neon through the Azure Portal or CLI, I wanted to use Bicep. Thanks to the native Azure integration, the Neon organization comes with its own resource type - Neon.Postgres/organizations.

resource neonOrganization 'Neon.Postgres/organizations@2024-08-01-preview' = {
  name: 'neon-net-applications-on-azure'
  location: location
  properties: {
    companyDetails: { }
    marketplaceDetails: {
      subscriptionId: subscription().id
      offerDetails: {
        publisherId: 'neon1722366567200'
        offerId: 'neon_serverless_postgres_azure_prod'
        planId: 'neon_serverless_postgres_azure_prod_free'
        termUnit: 'P1M'
        termId: 'gmz7xq9ge3py'
      }
    }
    partnerOrganizationProperties: {
      organizationName: 'net-applications-on-azure'
    }
    userDetails: {
      upn: 'userId@domainName'
    }
  }
}

You may ask, how did I get the values for the properties? Well, they are not documented (yet, I'm assuming) and I had to resort to reverse engineering (inspecting the template of the manually deployed resource).

The next step is to create a project. Everything in Neon (branches, databases, etc.) lives inside a project. Neon projects don't have an Azure resource representation, so I had to change the tool. I could create a project through the UI, but since I still wanted to have repeatability (something I could later reuse in a GitHub Actions workflow later), I decided to use Neon CLI. I still had to visit the UI (thanks to SSO I could just click on a link available on the resource Overview blade) to get myself an API key.

export NEON_API_KEY=<neon_api_key>

neon projects create \
    --name simple-asp-net-core-on-app-service \
    --region-id azure-westus3 \
    --output json

The output includes connection details for the created database. I don't want to manage secrets manually if I don't have to, so I decided to quickly create a key vault and put them there.

The last missing elements of my intended infrastructure were a managed identity, a container registry, an app service plan, and an app service. Below is the final diagram.

Infrastructure for simple ASP.NET Core application on Azure App Service (managed identity, key vault, container registry, app service plan, and app service) and using Neon deployed as Azure Native ISV Service

Seeding the Database

With all the infrastructure in place, I could start building my ASP.NET Core application. I wanted something simple, something that just displayed data, as my goal here was to see if Neon could be a drop-in replacement for Posgres from a code perspective. But to display data, you have to have data. So I needed to seed the database. I decided to take the simplest approach I could think of - using the built-in seeding capabilities of the Entity Framework Core. I had two options to choose from: UseSeeding/UseAsyncSeeding method or model managed data. I chose the latter and I quickly created a database context with two sets and a bunch of entities to add.

public class StarWarsDbContext : DbContext
{
    public DbSet<Character> Characters { get; private set; }

    public DbSet<Planet> Planets { get; private set; }

    public StarWarsDbContext(DbContextOptions<StarWarsDbContext> options)
    : base(options)
    { }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Planet>(builder =>
        {
            builder.Property(x => x.Name).IsRequired();
            builder.HasData(
                new Planet { Id = 1, Name = "Tatooine", ... },
                ...
            );
        });

        modelBuilder.Entity<Character>(builder =>
        {
            builder.Property(x => x.Name).IsRequired();
            builder.HasData(
                new Character { Id = 1, Name = "Luke Skywalker", ... },
                ...
            );
        });
    }
}

Now I needed to register this database context as a service in my application. For the provider I used Npgsql.EntityFrameworkCore.PostgreSQLin the spirit of the drop-in replacement approach. With the connection string in the key vault and a managed identity in place to authenticate against that key vault, I could use DefaultAzureCredential and SecretClient to configure the provider and restrict my application settings to the key vault URI (yes, I choose this option even in demos).

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddDbContext<StarWarsDbContext>(options =>
{
    var keyVaultSecrettClient = new SecretClient(
        new Uri(builder.Configuration["KEY_VAULT_URI"]),
        new DefaultAzureCredential()
    );
    options.UseNpgsql(keyVaultSecrettClient.GetSecret("neon-connection-string").Value.Value);
});

...

The last thing to do here was to trigger the model creation code. This only needs to happen once, and as this is a demo, it can be ugly. I used an old trick of getting the database context as part of the startup and calling EnsureCreated (please don't do this in serious applications).

...

var app = builder.Build();

using (var serviceScope = app.Services.GetRequiredService<IServiceScopeFactory>().CreateScope())
{
    var context = serviceScope.ServiceProvider.GetRequiredService<StarWarsDbContext>();
    context.Database.EnsureCreated();
}

...

All the pieces are now in place. It's time to wrap things up.

Completing and Deploying the Application

For the application to be complete, it needed some UI. Since I didn't have an idea for anything special, I did something I used to do a lot in the past - a jqGrid based table that was quickly set up with the help of my old project Lib.AspNetCore.Mvc.JqGrid (I've basically copied some stuff from its demo).

As you've probably guessed from the infrastructure, my intention from the start was to deploy this application as a container. So my last step was to add a docker file, build, push, and voilà. I was able to navigate to the application, browse and sort the data.

Everything worked as expected and I consider my first experiment with Neon a success 😊.

Thoughts

They say to never publish part one if you don't have part two ready. I don't have part two ready, but I really think I'll write one, I just don't know when 😉. That's because Neon really got me interested.

In this post I wanted to check out the basics and set the stage for digging deeper. While writing it, I've also created a repository where you can find ready-to-deploy infrastructure and application. I've created GitHub Actions workflows for deploying a Neon organization, creating a Neon project, and deploying the solution itself. All you need to do to play with it is clone and provide credentials 🙂.

Not so long ago my colleague reached out with a question "Are there any known issues with ITelemetryInitializer in Azure Functions?". This question started a discussion about the monitoring configuration for C# Azure Functions in the isolated worker model. At some point in that discussion I stated "Alright, you've motivated me, I'll make a sample". When I sat down to do that, I started wondering which scenario should I cover and my conclusion was that there are several things I should go through... So here I am.

Setting the Scene

Before I start describing how we can monitor an isolated worker model C# Azure Function, allow me to introduce you to the one we are going to use for this purpose. I want to start with as basic setup as possible. This is why the initial Program.cs will contain only four lines.

var host = new HostBuilder()
    .ConfigureFunctionsWebApplication()
    .Build();

host.Run();

The host.json will also be minimal.

{
    "version": "2.0",
    "logging": {
        "logLevel": {
            "default": "Information"
        }
    }
}

For the function itself, I've decided to go for the Fibonacci sequence implementation as it can easily generate a ton of logs.

public class FibonacciSequence
{
    private readonly ILogger<FibonacciSequence> _logger;

    public FibonacciSequence(ILogger<FibonacciSequence> logger)
    {
        _logger = logger;
    }

    [Function(nameof(FibonacciSequence))]
    public async Task<HttpResponseData> Run(
        [HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "fib/{index:int}")]
        HttpRequestData request,
        int index)
    {
        _logger.LogInformation(
            $"{nameof(FibonacciSequence)} function triggered for {{index}}.",
            index
        );

        var response = request.CreateResponse(HttpStatusCode.OK);
        response.Headers.Add("Content-Type", "text/plain; charset=utf-8");

        await response.WriteStringAsync(FibonacciSequenceRecursive(index).ToString());

        return response;
    }

    private int FibonacciSequenceRecursive(int index)
    {
        int fibonacciNumber = 0;

        _logger.LogInformation("Calculating Fibonacci sequence for {index}.", index);

        if (index <= 0)
        {
            fibonacciNumber = 0;
        }
        else if (index <= 1)
        {
            fibonacciNumber = 1;
        }
        else
        {
            fibonacciNumber = 
              FibonacciSequenceRecursive(index - 1)
              + FibonacciSequenceRecursive(index - 2);
        }

        _logger.LogInformation(
            "Calculated Fibonacci sequence for {index}: {fibonacciNumber}.",
            index,
            fibonacciNumber
        );

        return fibonacciNumber;
    }
}

This is a recursive implementation which in our case has the added benefit of being crashable on demand 😉.

Now we can start capturing the signals this function will produce after deployment.

Simple and Limited Option for Specific Scenarios - File System Logging

What we can capture in a minimal deployment scenario is logs coming from our function. What do I understand by a minimal deployment scenario? The bare minimum that Azure Function requires is a storage account, app service plan, and function app. The function can push all its logs into that storage account. Is this something I would recommend for a production scenario? Certainly not. It's only logs (and in production you will want metrics) and works reasonably only for Azure Functions deployed on Windows (for production I would suggest Linux due to better cold start performance or lower pricing for dedicated plan). But it may be the right option for some development scenarios (when you want to run some work, download the logs and analyze them locally). So, how to achieve this? Let's start with some Bicep snippets for the required infrastructure. First, we need to deploy a storage account.

resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' = {
  name: 'stwebjobs${uniqueString(resourceGroup().id)}'
  location: resourceGroup().location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'Storage'
}

We also need an app service plan. It must be a Windows one (the below snippet creates a Windows Consumption Plan).

resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = {
  name: 'plan-monitored-function'
  location: resourceGroup().location
  sku: {
    name: 'Y1'
    tier: 'Dynamic'
  }
  properties: {
    computeMode: 'Dynamic'
  }
}

And the actual service doing the work, the function app.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  name: 'func-monitored-function'
  location: resourceGroup().location
  kind: 'functionapp'
  properties: {
    serverFarmId: appServicePlan.id
    siteConfig: {
      netFrameworkVersion: 'v8.0'
      appSettings: [
        {
          name: 'AzureWebJobsStorage'
          value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccount.name};EndpointSuffix=${environment().suffixes.storage};AccountKey=${storageAccount.listKeys().keys[0].value}'
        }
        {
          name: 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING'
          value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccount.name};EndpointSuffix=${environment().suffixes.storage};AccountKey=${storageAccount.listKeys().keys[0].value}'
        }
        {
          name: 'WEBSITE_CONTENTSHARE'
          value: 'func-monitored-function'
        }
        {
          name: 'FUNCTIONS_EXTENSION_VERSION'
          value: '~4'
        }
        {
          name: 'FUNCTIONS_WORKER_RUNTIME'
          value: 'dotnet-isolated'
        }
      ]
    }
    httpsOnly: true
  }
}

A couple of words about this function app. The kind: 'functionapp' indicates that this is a Windows function app that creates the requirement of setting netFrameworkVersion to desired .NET version. To make this an isolated worker model function, the FUNCTIONS_EXTENSION_VERSION is set to ~4 and FUNCTIONS_WORKER_RUNTIME to dotnet-isolated. When it comes to all the settings referencing the storage account, your attention should go to WEBSITE_CONTENTAZUREFILECONNECTIONSTRING and WEBSITE_CONTENTSHARE - those indicate the file share that will be created for this function app in the storage account (this is where the logs will go).

After deployment, the resulting infrastructure should be as below.

Infrastructure for Azure Functions with file system logging (storage account, app service plan, and function app)

What remains is configuring the file system logging in the host.json file of our function. The default behavior is to generate log files only when the function is being debugged using the Azure portal. We want them to be generated always.

{
    "version": "2.0",
    "logging": {
        "fileLoggingMode": "always",
        "logLevel": {
            "default": "Information"
        }
    }
}

When you deploy the code and execute some requests, you will be able to find the log files in the defined file share (the path is /LogFiles/Application/Functions/Host/).

2024-11-17T15:33:39.339 [Information] Executing 'Functions.FibonacciSequence' ...
2024-11-17T15:33:39.522 [Information] FibonacciSequence function triggered for 6.
2024-11-17T15:33:39.526 [Information] Calculating Fibonacci sequence for 6.
2024-11-17T15:33:39.526 [Information] Calculating Fibonacci sequence for 5.
...
2024-11-17T15:33:39.527 [Information] Calculated Fibonacci sequence for 5: 5.
...
2024-11-17T15:33:39.528 [Information] Calculated Fibonacci sequence for 4: 3.
2024-11-17T15:33:39.528 [Information] Calculated Fibonacci sequence for 6: 8.
2024-11-17T15:33:39.566 [Information] Executed 'Functions.FibonacciSequence' ...

For Production Scenarios - Entra ID Protected Application Insights

As I said, I wouldn't use file system logging for production. Very often it's not even enough for development. This is why Application Insights are considered a default these days. But before we deploy one, we can use the fact that we no longer new Windows and change the deployment to be a Linux based one. First I'm going to change the app service plan to a Linux Consumption Plan.

resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = {
  ...
  kind: 'linux'
  properties: {
    reserved: true
  }
}

With a different app service plan, the function app can be changed to a Linux one, which means changing the kind to functionapp,linux and replacing netFrameworkVersion with the corresponding linuxFxVersion.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  kind: 'functionapp,linux'
  properties: {
    serverFarmId: appServicePlan.id
    siteConfig: {
      linuxFxVersion: 'DOTNET-ISOLATED|8.0'
      ...
    }
    httpsOnly: true
  }
}

There is also a good chance that you no longer need the file share (although the documentation itself is inconsistent whether it's needed when running Linux functions on the Elastic Premium plan) so the WEBSITE_CONTENTAZUREFILECONNECTIONSTRING and WEBSITE_CONTENTSHARE can simply be removed. No requirement for the file share creates one more improvement opportunity - we can drop credentials for the blob storage (AzureWebJobsStorage) as this connection can use managed identity. I'm going to use a user-assigned managed identity because I often prefer them and they usually cause more trouble to set up 😉. To do so, we need to create one and grant it the Storage Blob Data Owner role for the storage account.

resource managedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'id-monitored-function'
  location: resourceGroup().location
}

resource storageBlobDataOwnerRoleDefinition 'Microsoft.Authorization/roleDefinitions@2022-04-01' existing = {
  name: 'b7e6dc6d-f1e8-4753-8033-0f276bb0955b' // Storage Blob Data Owner
  scope: subscription()
}

resource storageBlobDataOwnerRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: storageAccount
  name: guid(storageAccount.id, managedIdentity.id, storageBlobDataOwnerRoleDefinition.id)
  properties: {
    roleDefinitionId: storageBlobDataOwnerRoleDefinition.id
    principalId: managedIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

The change to the function app is only about assigning the managed identity and replacing AzureWebJobsStorage with AzureWebJobsStorage__accountName.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...

  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        {
          name: 'AzureWebJobsStorage__accountName'
          value: storageAccount.name
        }
        ...
      ]
    }
    httpsOnly: true
  }
}

Enough detouring (although it wasn't without a purpose 😜) - it's time to deploy Application Insights. As the classic Application Insights are retired, we are going to create a workspace-based instance.

resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2023-09-01' = {
  name: 'log-monitored-function'
  location: resourceGroup().location
  properties: {
    sku: { 
      name: 'PerGB2018' 
    }
  }
}

resource applicationInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: 'appi-monitored-function'
  location: resourceGroup().location
  kind: 'web'
  properties: {
    Application_Type: 'web'
    WorkspaceResourceId: logAnalyticsWorkspace.id
    DisableLocalAuth: true
  }
}

You might have noticed that I've set DisableLocalAuth to true. This is a security improvement. It enforces authentication by Entra ID for ingestion and as a result, makes InstrumentationKey a resource identifier instead of a secret. This is nice and we can easily handle it because we already have managed identity in place (I told you it had a purpose 😊). All we need to do is grant the Monitoring Metrics Publisher role to our managed identity.

resource monitoringMetricsPublisherRoleDefinition 'Microsoft.Authorization/roleDefinitions@2022-04-01' existing = {
  name: '3913510d-42f4-4e42-8a64-420c390055eb' // Monitoring Metrics Publisher
  scope: subscription()
}

resource monitoringMetricsPublisherRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: applicationInsights
  name: guid(applicationInsights.id, managedIdentity.id, monitoringMetricsPublisherRoleDefinition.id)
  properties: {
    roleDefinitionId: monitoringMetricsPublisherRoleDefinition.id
    principalId: managedIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

Adding two application settings definitions to our function app will tie the services together.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        ...
        {
          name: 'APPLICATIONINSIGHTS_CONNECTION_STRING'
          value: applicationInsights.properties.ConnectionString
        }
        {
          name: 'APPLICATIONINSIGHTS_AUTHENTICATION_STRING'
          value: 'ClientId=${managedIdentity.properties.clientId};Authorization=AAD'
        }
        ...
      ]
    }
    httpsOnly: true
  }
}

Are we done with the infrastructure? Not really. There is one more useful thing that is often forgotten - enabling storage logs. There are some important function app data in there so it would be nice to monitor it.

resource storageAccountBlobService 'Microsoft.Storage/storageAccounts/blobServices@2023-05-01' existing = {
  name: 'default'
  parent: storageAccount
}

resource storageAccountDiagnosticSettings 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: '${storageAccount.name}-diagnostic'
  scope: storageAccountBlobService
  properties: {
    workspaceId: logAnalyticsWorkspace.id
    logs: [
      {
        category: 'StorageWrite'
        enabled: true
      }
    ]
    metrics: [
      {
        category: 'Transaction'
        enabled: true
      }
    ]
  }
}

Now we are done and our infrastructure should look like below.

Infrastructure for Azure Functions with Application Insights (managed identity, application insights, log workspace, storage account, app service plan, and function app)

Once we deploy our function and make some requests, thanks to the codeless monitoring by the host and relaying worker logs through the host, we can find the traces in Application Insights.

Relayed worker logs ingested through host codeless monitoring

You may be asking what I mean by "relaying worker logs through the host"? You probably remember that in the case of C# Azure Functions in the isolated worker model, we have two processes: functions host and isolated worker. Azure Functions wants to be helpful and by default, it sends the logs from the worker process to the functions host process, which then sends them to Application Insights.

Azure Functions relaying worker logs through the host

This is nice, but may not be exactly what you want. You may want to split the logs so you can treat them separately (for example by configuring different default log levels for the host and the worker). To achieve that you must explicitly set up Application Insights integration in the worker code.

var host = new HostBuilder()
    .ConfigureFunctionsWebApplication()
    .ConfigureServices(services => {
        services.AddApplicationInsightsTelemetryWorkerService();
        services.ConfigureFunctionsApplicationInsights();
    })
    .Build();

host.Run();

The telemetry from the worker can now be controlled in the code, while from the host through the host.json. But if you deploy this, you will find no telemetry from the worker in Application Insights. You can still see your logs in the Log stream. You will also see a lot of those:

Azure.Identity: Service request failed.
Status: 400 (Bad Request)

Content:
{"statusCode":400,"message":"Unable to load the proper Managed Identity.","correlationId":"..."}

That's Application Insights SDK not being able to use the user-assigned managed identity. Azure Functions runtime uses the APPLICATIONINSIGHTS_AUTHENTICATION_STRING setting to provide a user-assigned managed identity OAuth token to the Application Insights but none of that (at the time of writing this) happens when we set up the integration explicitly. That said, we can do it ourselves. The parser for the setting is available in the Microsoft.Azure.WebJobs.Logging.ApplicationInsights so we can mimic the host implementation.

using TokenCredentialOptions =
    Microsoft.Azure.WebJobs.Logging.ApplicationInsights.TokenCredentialOptions;

var host = new HostBuilder()
    ...
    .ConfigureServices(services => {
        services.Configure<TelemetryConfiguration>(config =>
        {
            string? authenticationString =
                Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_AUTHENTICATION_STRING");

            if (!String.IsNullOrEmpty(authenticationString))
            {
                var tokenCredentialOptions = TokenCredentialOptions.ParseAuthenticationString(
                    authenticationString
                );
                config.SetAzureTokenCredential(
                    new ManagedIdentityCredential(tokenCredentialOptions.ClientId)
                );
            }
        });
        services.AddApplicationInsightsTelemetryWorkerService();
        services.ConfigureFunctionsApplicationInsights();
    })
    .Build();

host.Run();

The telemetry should now reach the Application Insights without a problem.

Azure Functions emitting worker logs directly

Be cautious. As Application Insights SDK now controls emitting the logs you must be aware of its opinions. It so happens that it tries to optimize by default and adds a logging filter that sets the minimum log level to Warning. You may want to get rid of that (be certain to make it after ConfigureFunctionsApplicationInsights).

...

var host = new HostBuilder()
    ...
    .ConfigureServices(services => {
        ...
        services.ConfigureFunctionsApplicationInsights();
        services.Configure<LoggerFilterOptions>(options =>
        {
            LoggerFilterRule? sdkRule = options.Rules.FirstOrDefault(rule =>
                rule.ProviderName == typeof(Microsoft.Extensions.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider).FullName
            );

            if (sdkRule is not null)
            {
                options.Rules.Remove(sdkRule);
            }
        });
    })
    .Build();

host.Run();

Modifying Application Map in Application Insights

The default application map is not impressive - it will simply show your function app by its resource name.

Default application map in Application Insights

As function apps rarely exist in isolation, we often want to present them in a more meaningful way on the map by providing a cloud role. The simplest way to do this is through the WEBSITE_CLOUD_ROLENAME setting.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        ...
        {
          name: 'WEBSITE_CLOUD_ROLENAME'
          value: 'InstrumentedFunctions'
        }
        ...
      ]
    }
    ...
  }
}

This will impact both the host and the work. With explicit Application Insights integration, we can separate the two (if there is such a need) and change the cloud role for the worker. This is done in the usual way, by registering an ITelemetryInitializer. The only important detail is the place of the registration - it needs to be after ConfigureFunctionsApplicationInsights as Azure Functions are adding their own initializers.

public class IsolatedWorkerTelemetryInitializer : ITelemetryInitializer
{
    public void Initialize(ITelemetry telemetry)
    {
        telemetry.Context.Cloud.RoleName = "InstrumentedFunctionsIsolatedWorker";
    }
}
...

var host = new HostBuilder()
    ...
    .ConfigureServices(services => {
        ...
        services.ConfigureFunctionsApplicationInsights();
        services.AddSingleton<ITelemetryInitializer, IsolatedWorkerTelemetryInitializer>();
        services.Configure<LoggerFilterOptions>(options =>
        {
            ...
        });
    })
    .Build();

host.Run();

The resulting map will look as below.

Modified application map in Application Insights

Monitoring for Scaling Decisions

This is currently in preview, but you can enable emitting scale controller logs (for a chance to understand the scaling decisions). This is done with a single setting (SCALE_CONTROLLER_LOGGING_ENABLED) that takes a value in the : format. The possible values for destination are Blob and AppInsights, while the verbosity can be None, Verbose, and Warning. The Verbose seems to be the most useful one when you are looking for understanding as it's supposed to provide reason for changes in the worker count, and information about the triggers that impact those decisions.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        ...
        {
          name: 'SCALE_CONTROLLER_LOGGING_ENABLED'
          value: 'AppInsights:Verbose'
        }
        ...
      ]
    }
    ...
  }
}

Is This Exhaustive?

Certainly not 🙂. There are other things that you can fiddle with. You can have a very granular configuration of different log levels for different categories. You can fine-tune aggregation and sampling. When you are playing with all those settings, please remember that you're looking for the right balance between the gathered details and costs. My advice is not to configure monitoring for the worst-case scenario (when you need every detail) as that often isn't financially sustainable. Rather aim for a lower amount of information and change the settings to gather more if needed (a small hint - you can override host.json settings through application settings without deployment).

I've been exploring working with asynchronous streaming data sources over HTTP for quite some time on this blog. I've been playing with async streamed JSON and NDJSON. I've been streaming and receiving streams in ASP.NET Core and console applications. But there is one scenario I haven't touched yet - streaming from Blazor WebAssembly. Why? Simply it wasn't possible.

Streaming objects from Blazor WebAssembly requires two things. First is the support for streaming upload in browsers Fetch API. The Fetch API did support streams for response bodies for quite some time (my first post using it is from 2019) but non-experimental support for streams as request bodies come to Chrome, Edge, and Safari in 2022 (and it is yet to come to Firefox as far as I can tell). Still, it's been available for two years and I haven't explored it yet? Well, I did, more than a year ago, with NDJSON and pure Javascript, where I used the ASP.NET Core endpoint I created long ago. But here we are talking about Blazor WebAssembly, which brings us to the second requirement - the browser API needs to be surfaced in Blazor. This is finally happening with .NET 9 and now I can fully explore it.

Async Streaming NDJSON From Blazor WebAssembly

I've decided to start with NDJSON because I already have my own building blocks for it (that mentioned ASP.NET Core endpoint, and NdjsonAsyncEnumerableContent coming from one of my packages). If things wouldn't work I would be able to easily debug them.

I've quickly put together a typical code using HttpClient. I just had to make sure I've enabled streaming upload by calling SetBrowserRequestStreamingEnabled on the request instance.

private async Task PostWeatherForecastsNdjsonStream()
{
    ...

    try
    {
        ....

        HttpRequestMessage request = new HttpRequestMessage(
            HttpMethod.Post, "api/WeatherForecasts/stream");
        request.Content = new NdjsonAsyncEnumerableContent<WeatherForecast>(
            StreamWeatherForecastsAsync());
        request.SetBrowserRequestStreamingEnabled(true);

        using HttpResponseMessage response = await Http.SendAsync(request, cancellationToken);

        response.EnsureSuccessStatusCode();
    }
    finally
    {
        ...
    }
}

It worked! No additional changes and no issues to resolve. I could see the items being logged by my ASP.NET Core service as they were streamed.

After this initial success, I could move to a potentially more challenging scenario - async streaming JSON.

Async Streaming JSON From Blazor WebAssembly

In theory, async streaming JSON should also just work. .NET has the support for IAsyncEnumerable built into JsonSerializer.SerializeAsync so JsonContent should just use it. Well, there is no better way than to try, so I've quickly changed the code.

private async Task PostWeatherForecastsJsonStream()
{
    ...

    try
    {
        ...

        HttpRequestMessage request = new HttpRequestMessage(
            HttpMethod.Post, "api/WeatherForecasts/stream");
        request.Content = JsonContent.Create(StreamWeatherForecastsAsync());
        request.SetBrowserRequestStreamingEnabled(true);

        using HttpResponseMessage response = await Http.SendAsync(request, cancellationToken);

        response.EnsureSuccessStatusCode();
    }
    finally
    {
        ...
    }
}

To my surprise, it also worked! I couldn't test it against my ASP.NET Core service because it didn't support it, but I quickly created a dumb endpoint that would simply dump whatever was incoming.

Now, why was I surprised? Because this is in fact a platform difference - this behavior seems to be unique for the "browser" platform. I did attempt the same in a desktop application earlier and HttpClient would buffer the request - I had to create a simple HttpContent that would wrap the underlying stream and flush on write. I think I like the Blazor WebAssembly behavior better and I'm happy that I didn't have to do the same here.

Ok, but if I can't handle the incoming stream nicely in ASP.NET Core, it's not really that useful. So I've tried to achieve that as well.

Receiving Async Streamed JSON in ASP.NET Core

Receiving async streamed JSON is a little bit more tricky. JsonSerializer has a dedicated method (DeserializeAsyncEnumerable) to deserialize an incoming JSON stream into an IAsyncEnumerable instance. The built-in SystemTextJsonInputFormatter doesn't use that method, it simply uses JsonSerializer.DeserializeAsync with request body as an input. That shouldn't be a surprise, it is the standard way to deserialize incoming JSON. To support async streamed JSON, a custom formatter is required. What is more, to be sure that this custom formatter will be used, it can't be just added - it has to replace the built-in one. But that would mean that the custom formatter has to have all the functionality of the built-in one (if we want to support not only async streamed JSON). Certainly, not something I wanted to reimplement, so I've decided to simply inherit from SystemTextJsonInputFormatter.

internal class SystemTextStreamedJsonInputFormatter : SystemTextJsonInputFormatter
{
    ...

    public override Task<InputFormatterResult> ReadRequestBodyAsync(InputFormatterContext context)
    {
        return base.ReadRequestBodyAsync(context);
    }
}

Now, for the actual functionality, the logic has to branch based on the type of the model as we want to use DeserializeAsyncEnumerable only if it's IAsyncEnumerable.

internal class SystemTextStreamedJsonInputFormatter : SystemTextJsonInputFormatter
{
    ...

    public override Task<InputFormatterResult> ReadRequestBodyAsync(InputFormatterContext context)
    {
        if (context.ModelType.GetGenericTypeDefinition() == typeof(IAsyncEnumerable<>))
        {
            ...
        }

        return base.ReadRequestBodyAsync(context);
    }
}

That's not the end of challenges (and reflection). JsonSerializer.DeserializeAsyncEnumerable is a generic method. That means we have to get the actual type of values, based on it construct the right method, and call it.

internal class SystemTextStreamedJsonInputFormatter : SystemTextJsonInputFormatter
{
    ...

    public override Task<InputFormatterResult> ReadRequestBodyAsync(InputFormatterContext context)
    {
        if (context.ModelType.GetGenericTypeDefinition() == typeof(IAsyncEnumerable<>))
        {
            MethodInfo deserializeAsyncEnumerableMethod = typeof(JsonSerializer)
                .GetMethod(
                    nameof(JsonSerializer.DeserializeAsyncEnumerable),
                    [typeof(Stream), typeof(JsonSerializerOptions), typeof(CancellationToken)])
                .MakeGenericMethod(context.ModelType.GetGenericArguments()[0]);

            return Task.FromResult(InputFormatterResult.Success(
                deserializeAsyncEnumerableMethod.Invoke(null, [
                    context.HttpContext.Request.Body,
                    SerializerOptions,
                    context.HttpContext.RequestAborted
                ])
            ));
        }

        return base.ReadRequestBodyAsync(context);
    }
}

The implementation above is intentionally ugly, so it doesn't hide anything and can be understood more easily. You can find a cleaner one, which also performs some constructed methods caching, in the demo repository.

Now that we have the formatter, we can create a setup for MvcOptions that will replace the SystemTextJsonInputFormatter with a custom implementation.

internal class SystemTextStreamedJsonMvcOptionsSetup : IConfigureOptions<MvcOptions>
{
    private readonly IOptions<JsonOptions>? _jsonOptions;
    private readonly ILogger<SystemTextJsonInputFormatter> _inputFormatterLogger;

    public SystemTextStreamedJsonMvcOptionsSetup(
        IOptions<JsonOptions>? jsonOptions,
        ILogger<SystemTextJsonInputFormatter> inputFormatterLogger)
    {
        _jsonOptions = jsonOptions;
        _inputFormatterLogger = inputFormatterLogger;
    }

    public void Configure(MvcOptions options)
    {
        options.InputFormatters.RemoveType<SystemTextJsonInputFormatter>();
        options.InputFormatters.Add(
            new SystemTextStreamedJsonInputFormatter(_jsonOptions?.Value, _inputFormatterLogger)
        );
    }
}

A small convenience method to register that setup.

internal static class SystemTextStreamedJsonMvcBuilderExtensions
{
    public static IMvcBuilder AddStreamedJson(this IMvcBuilder builder)
    {
        ...

        builder.Services.AddSingleton
            <IConfigureOptions<MvcOptions>, SystemTextStreamedJsonMvcOptionsSetup>();

        return builder;
    }
}

And we can configure the application to use it.

public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        services.AddControllers()
            ...
            .AddStreamedJson();

        ...
    }

    ...
}

It gets the job done. As the formatter for NDJSON expects a different content type (application/x-ndjson) the content negotiation also works and I could asynchronously stream NDJSON and JSON to the same endpoint. Honestly, I think it's cool.

Afterthoughts

One question that arises at this point is "Which one to use?". I prefer NDJSON here. Why? Both require custom formatters and both formatters must use reflection. But in the case of NDJSON that is isolated only to the requests coming with a dedicated content type. To support async streamed JSON, the initial model type check has to happen on every request with JSON-related content type. Also I don't feel great with replacing the built-in formatter like that. On top of that, NDJSON enables clean cancellation as the endpoint is receiving separate JSON objects. In the case of JSON that is a single JSON collection of objects that will not be terminated properly and it will trip the deserialization.

But this is just my opinion. You can grab the demo and play with it, or use one of my NDJON packages and experiment with it yourself. That's the best way to choose the right approach for your scenario.

I've been exploring the subject of Azure Functions extensibility on this blog for quite some time. I've touched on subjects directly related to creating extensions and their anatomy, but also some peripheral ones.

I have always written from the perspective of the in-process model, but since 2020 there has been a continued evolution when it comes to the preferred model for .NET-based function apps. The isolated worker model, first introduced with .NET 5, has been gaining parity and becoming the leading vehicle for making new .NET versions available with Azure Functions. In August 2023 Microsoft announced the intention for .NET 8 to be the last LTS release to receive in-process model support. So the question comes, does it invalidate all that knowledge about Azure Functions extensibility? The short answer is no. But before I go into details, I need to cover the common ground.

Isolated Worker Model in a Nutshell

.NET was always a little bit special in Azure Functions. It shouldn't be a surprise. After all, it's Microsoft technology and there was a desire for the integration to be efficient and powerful. So even when Azure Functions v2 brought the separation between the host process and language worker process, .NET-based function apps were running in the host process. This had performance benefits (no communication between processes) and allowed .NET functions apps to leverage the full capabilities of the host, but started to become a bottleneck when the pace of changes in the .NET ecosystem accelerated. There were more and more conflicts between the assemblies that developers wanted to use in the apps and the ones used by the host. There was a delay in making new .NET versions available because the host had to be updated. Also, there were things that the app couldn't do because it was coupled with the host. Limitations like those were reasons for bringing Azure Functions .NET Worker to life.

At the same time, Microsoft didn't want to take away all the benefits that .NET developers had when working with Azure Functions. The design had to take performance and developers' experience into account. So how does Azure Functions .NET Worker work? In simplification, it's an ASP.NET Core application that receives inputs and provides outputs to the host over gRPC (which is more performant than HTTP primitives used in the case of custom handlers)

Azure Functions .NET Worker overview

The request and response payloads are also pretty well hidden. Developers have been given a new binding model with required attributes available through *.Azure.Functions.Worker.Extensions.* packages. But if the actual bindings activity happens in the host, what do those new packages provide? And what is their relation with the *.Azure.WebJobs.Extensions.* packages?

Worker Extensions and WebJobs Extensions

The well-hidden truth is that the worker extension packages are just a bridge to the in-process extension packages. It means that if you want to create a new extension or understand how an existing one works, you should start with an extension for the in-process model. The worker extensions are mapped to the in-process ones through an assembly-level attribute, which takes the name of the package and version to be used as parameters.

[assembly: ExtensionInformation("RethinkDb.Azure.WebJobs.Extensions", "0.6.0")]

The integration is quite seamless. During the build, the Azure Functions tooling will use NuGet to install the needed in-process extension package, it doesn't have to be referenced. Of course that has its drawbacks (tight coupling to a specific version and more challenges during debugging). So, the final layout of the packages can be represented as below.

Azure Functions .NET Worker Extensions and WebJobs Extensions relation overview

What ensures the cooperation between those two packages running in two different processes are the binding attributes.

Binding Attributes

In the case of the in-process model extensions we have two types of attributes - one for bindings and one for trigger. In the case of the isolated worker model, there are three - for input binding, for output binding, and for trigger.

public class RethinkDbInputAttribute : InputBindingAttribute
{
    ...
}
public sealed class RethinkDbOutputAttribute : OutputBindingAttribute
{
    ...
}
public sealed class RethinkDbTriggerAttribute : TriggerBindingAttribute
{
    ...
}

The isolated worker model attributes are used in two ways. One is for developers, who use them to decorate their functions and provide needed settings. The other is for the worker, which uses them as data transfer objects. They are being serialized and transferred as metadata. On the host side, they are being deserialized to the corresponding in-process model extension attribute. The input and output attributes will be deserialized to the binding attribute, and the trigger will be deserialized to the trigger. This means that we need to ensure that the names of properties which we want to support are matching.

Implementing the attributes and decorating the functions with them is all we need to make it work. This will give us support for POCOs as values (the host and worker will take care of serialization, transfer over gRPC, and deserialization). But what if we want something more than POCO?

Beyond POCO Inputs With Converters

It's quite common for in-process extensions to support binding data provided using types from specific service SDKs (for example CosmosClient in the case of Azure Cosmos DB). That kind of binding data is not supported out-of-the-box by isolated worker extensions as they can't be serialized and transferred. But there is a way for isolated worker extensions to go beyond POCOs - input converters.

Input converters are classes that implement the IInputConverter interface. This interface defines a single method, which is supposed to return a conversation result. Conversation result can be one of the following:

  • Unhandled (the converter did not act on the input)
  • Succeeded (conversion was successful and the result is included)
  • Failed

The converter should check if it's being used with an extension it supports (the name that has been used for isolated extensions registration will be provided as part of the model binding data) and if the incoming content is in supported format. The converter can also be decorated with multiple SupportedTargetType attributes to narrow its scope.

Below is a sample template for an input converter.

[SupportsDeferredBinding]
[SupportedTargetType(typeof(...))]
[SupportedTargetType(typeof(...))]
internal class RethinkDbConverter : IInputConverter
{
    private const string RETHINKDB_EXTENSION_NAME = "RethinkDB";
    private const string JSON_CONTENT_TYPE = "application/json";

    ...

    public RethinkDbConverter(...)
    {
        ...
    }

    public async ValueTask<ConversionResult> ConvertAsync(ConverterContext context)
    {
        ModelBindingData modelBindingData = context?.Source as ModelBindingData;

        if (modelBindingData is null)
        {
            return ConversionResult.Unhandled();
        }

        try
        {
            if (modelBindingData.Source is not RETHINKDB_EXTENSION_NAME)
            {
                throw new InvalidOperationException($"Unexpected binding source.");
            }

            if (modelBindingData.ContentType is not JSON_CONTENT_TYPE)
            {
                throw new InvalidOperationException($"Unexpected content-type.");
            }

            object result = context.TargetType switch
            {
                // Here you can use modelBindingData.Content,
                // any injected services, etc.
                // to prepare the value.
                ...
            };


            return ConversionResult.Success(result);
        }
        catch (Exception ex)
        {
            return ConversionResult.Failed(ex);
        }
    }
}

Input converters can be applied to input and trigger binding attributes by simply decorating them with an attribute (we should also define the fallback behavior policy).

[InputConverter(typeof(RethinkDbConverter))]
[ConverterFallbackBehavior(ConverterFallbackBehavior.Default)]
public class RethinkDbInputAttribute : InputBindingAttribute
{
    ...
}

Adding input converters to the extensions moves some of the logic from host to worker. It may mean that the worker will be now establishing connections to the services or performing other operations. This will most likely create a need to register some dependencies, read configuration, and so on. Such things are best done at a function startup.

Participating in the Function App Startup

Extensions for the isolated worker model can implement a startup hook. It can be done by creating a public class with a parameterless constructor, that derives from WorkerExtensionStartup. This class also has to be registered through an assembly-level attribute. Now we can override the Configure method and register services and middlewares. The mechanic is quite similar to its equivalent for in-process extension.

[assembly: WorkerExtensionStartup(typeof(RethinkDbExtensionStartup))]

namespace Microsoft.Azure.Functions.Worker
{
    public class RethinkDbExtensionStartup : WorkerExtensionStartup
    {
        public override void Configure(IFunctionsWorkerApplicationBuilder applicationBuilder)
        {
            if (applicationBuilder == null)
            {
                throw new ArgumentNullException(nameof(applicationBuilder));
            }

            ...
        }
    }
}

The Conclusion

The isolated worker model doesn't invalidate what we know about the Azure Functions extensions, on the contrary, it builds another layer on top of that knowledge. Sadly, there are limitations in the supported data types for bindings which come from the serialization and transfer of data between the host and the worker. Still, in my opinion, the benefits of the new model can outweigh those limitations.

If you are looking for a working sample to learn and explore, you can find one here.

In the past, I've written posts that explored the core aspects of creating Azure Functions extensions:

In this post, I want to dive into how extensions implement support for the Azure Functions scaling model.

The Azure Functions scaling model (as one can expect) has evolved over the years. The first model is incremental scaling handled by the scale controller (a dedicated Azure Functions component). The scale controller monitors the rate of incoming events and makes magical (following the rule that every sophisticated enough technology is indistinguishable from magic) decisions about whether to add or remove a worker. The workers count can change only by one and at a specific rate. This mechanism has some limitations.

The first problem is the inability to scale in network-isolated deployments. In such a scenario, the scale controller doesn't have access to the services and can't monitor events - only the function app (and as a consequence the extensions) could. As a result, toward the end of 2019, the SDK provided support for extensions to vote in scale in and scale out decisions.

The second problem is the scaling process clarity and rate. Change by one worker at a time driven by complex heuristics is not perfect. This led to the introduction of target-based scaling at the beginning of 2023. Extensions become able to implement their own algorithms for requesting a specific number of workers and scale up by four instances at a time.

So, how do those two mechanisms work and can be implemented? To answer this question I'm going to describe them from two perspectives. One will be how it is being done by a well-established extension - Azure Cosmos DB bindings for Azure Functions. The other will be a sample implementation in my extension that I've used previously while writing on Azure Functions extensibility - RethinkDB bindings for Azure Functions.

But first things first. Before we can talk about the actual logic driving the scaling behaviors, we need the right SDK and the right classes in place.

Runtime and Target-Based Scaling Support in Azure WebJobs SDK

As I've mentioned above, support for extensions to participate in the scaling model has been introduced gradually to the Azure WebJobs SDK. At the same time, the team makes an effort to introduce changes in a backward-compatible manner. Support for voting in scaling decisions came in version 3.0.14 and for target-based scaling in 3.0.36, but you can have an extension that is using version 3.0.0 and it will work on the latest Azure Functions runtime. You can also have an extension that uses the latest SDK and doesn't implement the scalability mechanisms. You need to know that they are there and choose to utilize them.

This is reflected in the official extensions as well. The Azure Cosmos DB extension has implemented support for runtime scaling in version 3.0.5 and target-based scaling in 4.1.0 (you can find some tables that are gathering version numbers, for example in the section about virtual network triggers). This means, that if your function app is using lower versions of extensions, it won't benefit from these capabilities.

So, regardless if you're implementing your extension or you're just using an existing one, the versions of dependencies matter. However, if you're implementing your extension, you will need some classes.

Classes Needed to Implement Scaling Support

The Azure Functions scaling model revolves around triggers. After all, it's their responsibility to handle the influx of events. As you may know (for example from my previous post 😉), the heart of a trigger is a listener. This is where all the heavy lifting is being done and this is where we can inform the runtime that our trigger supports scalability features. We can do so by implementing two interfaces: IScaleMonitorProvider and ITargetScalerProvider. They are respectively tied to runtime scaling and target-based scaling. If they are implemented, the runtime will use the properties they define to obtain the actual logic implementations.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly IScaleMonitor<RethinkDbTriggerMetrics> _rethinkDbScaleMonitor;
    private readonly ITargetScaler _rethinkDbTargetScaler;

    ...

    public IScaleMonitor GetMonitor()
    {
        return _rethinkDbScaleMonitor;
    }

    public ITargetScaler GetTargetScaler()
    {
        return _rethinkDbTargetScaler;
    }
}

From the snippets above you can notice one more class - RethinkDbTriggerMetrics. It derives from the ScaleMetrics class and is used for capturing the values on which the decisions are being made.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    ...
}

The IScaleMonitor implementation contributes to the runtime scaling part of the scaling model. Its responsibilities are to provide snapshots of the metrics and to vote in the scaling decisions.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    public ScaleMonitorDescriptor Descriptor => ...;

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        ...
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        ...
    }
}

The ITargetScaler implementation is responsible for the target-based scaling. It needs to focus only on one thing - calculating the desired worker count.

internal class RethinkDbTargetScaler : ITargetScaler
{
    public TargetScalerDescriptor TargetScalerDescriptor => ...;

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        ...
    }
}

As both of these implementations are tightly tied with a specific trigger, it is typical to instantiate them in the listener constructor.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(...);
        _rethinkDbTargetScaler = new RethinkDbTargetScaler(...);
    }

    ...
}

We are almost ready to start discussing the details of scaling logic, but before we do that, we need to talk about gathering metrics. The decisions need to be based on something.

Gathering Metrics

Yes, the decisions need to be based on something and the quality of the decisions will depend on the quality of metrics you can gather. So a lot depends on how well the service, for which the extension is being created, is prepared to be monitored.

Azure Cosmos DB is well prepared. The Azure Cosmos DB bindings for Azure Functions implement the trigger on top of the Azure Cosmos DB change feed processor and use the change feed estimator to gather the metrics. This gives access to quite an accurate estimate of the remaining work. As an additional metric, the extension is gathers the number of leases for the container that the trigger is observing.

RethinkDB is not so well prepared. It seems like change feed provides only one metric - buffered items count.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    public int BufferedItemsCount { get; set; }
}

Also, the metric can only be gathered while iterating the change feed. This forces an intermediary between the listener and the scale monitor (well you could use the scale monitor directly, but it seemed ugly to me).

internal class RethinkDbMetricsProvider
{
    public int CurrentBufferedItemsCount { get; set; }

    public RethinkDbTriggerMetrics GetMetrics()
    {
        return new RethinkDbTriggerMetrics { BufferedItemsCount = CurrentBufferedItemsCount };
    }
}

Now the same instance of this intermediary can be used by the listener to provide the value.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbMetricsProvider = new RethinkDbMetricsProvider();
        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(..., _rethinkDbMetricsProvider);

        ...
    }

    ...

    private async Task ListenAsync(CancellationToken listenerStoppingToken)
    {
        ...

        while (!listenerStoppingToken.IsCancellationRequested
               && (await changefeed.MoveNextAsync(listenerStoppingToken)))
        {
            _rethinkDbMetricsProvider.CurrentBufferedItemsCount = changefeed.BufferedItems.Count;
            ...
        }

        ...
    }
}

And the scale monitor to read it.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbScaleMonitor(..., RethinkDbMetricsProvider rethinkDbMetricsProvider)
    {
        ...

        _rethinkDbMetricsProvider = rethinkDbMetricsProvider;

        ...
    }

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        return Task.FromResult(_rethinkDbMetricsProvider.GetMetrics());
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        return Task.FromResult((ScaleMetrics)_rethinkDbMetricsProvider.GetMetrics());
    }

    ...
}

We have the metrics now, it is finally time to make some decisions.

Voting to Scale In or Scale Out

An extension can cast one of three votes: to scale out, to scale in, or neutral. This is where the intimate knowledge of the service that the extension is created for comes into play.

As I've already written, the Azure Cosmos DB change feed processor is well prepared to gather metrics about it. It is also well prepared to be consumed in a scalable manner. It can distribute the work among multiple compute instances, by balancing the number of leases owned by each compute instance. It will also adjust the number of leases based on throughput and storage. This is why the Azure Cosmos DB bindings are tracking the number of leases as one of the metrics - it's an upper limit for the number of workers. So the extension utilizes all this knowledge and employs the following algorithm to cast a vote:

  1. If there are no metrics gathered yet, cast a neutral vote.
  2. If the current number of leases is greater than the number of workers, cast a scale-in vote.
  3. If there are less than five metrics samples, cast a neutral vote.
  4. If the ratio of workers to remaining items is less than 1 per 1000, cast a scale-out vote.
  5. If there are constantly items waiting to be processed and the number of workers is smaller than the number of leases, cast a scale-out vote.
  6. If the trigger source has been empty for a while, cast a scale-in vote.
  7. If there has been a continuous increase across the last five samples in items to be processed, cast a scale-out vote.
  8. If there has been a continuous decrease across the last five samples in items to be processed, cast a scale-in vote.
  9. If none of the above has happened, cast a neutral vote.

RethinkDB is once again lacking here. It looks like its change feed is not meant to be processed in parallel at all. This leads to an interesting edge case, where we never want to scale beyond a single instance.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        return GetScaleStatus(
            context.WorkerCount,
            context.Metrics?.ToArray()
        );
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        return GetScaleStatus(
            context.WorkerCount, 
            context.Metrics?.Cast<RethinkDbTriggerMetrics>().ToArray()
        );
    }

    private ScaleStatus GetScaleStatus(int workerCount, RethinkDbTriggerMetrics[] metrics)
    {
        ScaleStatus status = new ScaleStatus
        {
            Vote = ScaleVote.None
        };

        // RethinkDB change feed is not meant to be processed in paraller.
        if (workerCount > 1)
        {
            status.Vote = ScaleVote.ScaleIn;

            return status;
        }

        return status;
    }
}

Requesting Desired Number of Workers

Being able to tell if you want more workers or less is great, being able to tell how many workers you want is even better. Of course, there is no promise the extension will get the requested number (even in the case of target-based scaling the scaling happens with a maximum rate of four instances at a time), but it's better than increasing and decreasing by one instance. Extensions can also use this mechanism to participate in dynamic concurrency.

This is exactly what the Azure Cosmos DB extension is doing. It divides the number of remaining items by the value of the MaxItemsPerInvocation trigger setting (the default is 100). The result is capped by the number of leases and that's the desired number of workers.

We already know that in the case of RethinkDB, it's even simpler - we always want just one worker.

internal class RethinkDbTargetScaler : ITargetScaler
{
    ...

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        return Task.FromResult(new TargetScalerResult
        {
            TargetWorkerCount = 1
        });
    }
}

That's it. The runtime scaling implementation is now complete.

Consequences of Runtime Scaling Design

Creating extensions for Azure Functions is not something commonly done. Still, there is a value in understanding their anatomy and how they work as it is often relevant also when you are creating function apps.

The unit of scale for Azure Functions is the entire functions app. At the same time, the scaling votes and the desired number of workers are delivered at a trigger level. This means that you need to be careful when creating function apps with multiple triggers. If the triggers will be aiming for completely different scaling decisions it may lead to undesired scenarios. For example, one trigger may constantly want to scale in, while another will want to scale out. As you could notice throughout this post, some triggers have hard limits on how far they can scale to not waste resources (even two Azure Cosmos DB triggers may have a different upper limit because the lease containers they are attached to will have different numbers of leases available). This all should be taken into account while designing the function app and trying to foresee how it will scale.

Older Posts