Categories
Uncategorized

Loading Plugins/Extensions at Run Time from NuGet in .NET Core : Part 1 – NuGet

This post is the first in a short series, writing up my efforts creating an plugin/extension system working in .NET Core that:

  • Loads extension packages from NuGet, with all their dependencies (this post).
  • Loads the extensions into my .NET Core Process.
  • Allows the loaded extensions to be unloaded.

Background

As a bit of context, I’m currently building an open-source BDD testing platform that goes beyond Gherkin, AutoStep, which is built entirely in C#, on top of .NET Core 3.1.

In AutoStep, I need to be able to load in extensions that provide additional functionality for AutoStep. For example, extensions might provide:

  • Bindings for some UI platform or library
  • Custom Report Formats
  • Integration with some external Test Management System

In terms of what’s in them, AutoStep extensions are going to consist of things like:

  • .NET DLLs
  • AutoStep Test Files
  • Dependencies on various NuGet packages (Selenium.WebDriver anyone?).

All of the above items fit pretty well within the scope of NuGet packages, and I don’t want to build my own extension packaging, hosting, versioning and so on, so I’m going to say that each extension can be represented by a NuGet package.

AutoStep does not require the .NET Core SDK to build or run any tests, so I can’t just create a csproj, chuck PackageReferences in and be done with it.

I need to bake the idea of extensions into the platform itself.


If you want to jump ahead, you can check out the GitHub repository for AutoStep.Extensions, which provides the NuGet package used to load extensions into our VS Code Language Server and our commmand-line runner.

Loading Extensions from NuGet

Microsoft supplies the NuGet Client SDK, to work with both NuGet packages and source repositories; specifically the NuGet.Protocol and NuGet.Resolver packages.

The documentation on how to actually use the NuGet Client libraries is a bit sparse, so I’m permanently indebted to Martin Bjorkstrom for writing a blog post on it that I used as a pretty detailed guide to get me started.

Loading our extension packages from NuGet involves three phases:

  1. Determine the best version of an extension package to install, given a version range (and normal NuGet rules).
    For example, if the version of the extension requested is 1.4.0, and there is a 1.4.5 version available, we want that one.
  2. Get the list of all NuGet package dependencies (recursively) for each extension.
  3. Download and Extract your packages.

Choosing the Extension Version

This is (relatively) the easy bit. First up, we’ll create some of the context objects we need to get started:

/// <summary>
/// Represents the configuration for a single extension to install.
/// </summary>
public class ExtensionConfiguration
{
    public string Package { get; set; }
    public string Version { get; set; }
    public bool PreRelease { get; set; }
}

public async Task LoadExtensions()
{
    // Define a source provider, with the main NuGet feed, plus my own feed.
    var sourceProvider = new PackageSourceProvider(NullSettings.Instance, new[]
    {
        new PackageSource("https://api.nuget.org/v3/index.json"),
        new PackageSource("https://f.feedz.io/autostep/ci/nuget/index.json")
    });

    // Establish the source repository provider; the available providers come from our custom settings.
    var sourceRepositoryProvider = new SourceRepositoryProvider(sourceProvider, Repository.Provider.GetCoreV3());

    // Get the list of repositories.
    var repositories = sourceRepositoryProvider.GetRepositories();

    // Disposable source cache.
    using var sourceCacheContext = new SourceCacheContext();

    // You should use an actual logger here, this is a NuGet ILogger instance.
    var logger = new NullLogger();

    // My extension configuration:
    var extensions = new[]
    { 
        new ExtensionConfiguration
        {
            Package = "AutoStep.Web",
            PreRelease = true // Allow pre-release versions.
        }
    };
}

Next, let’s write a method to actually get the desired package identity to install. The GetPackageIdentity method goes through each repository, and either:

  • Picks the latest available version if no version range has been configured or,
  • If a version range has been specified, uses the provided NuGet VersionRange class to find the best match given the set of all versions.
private async Task<PackageIdentity> GetPackageIdentity(
          ExtensionConfiguration extConfig, SourceCacheContext cache, ILogger nugetLogger,
          IEnumerable<SourceRepository> repositories, CancellationToken cancelToken)
{
    // Go through each repository.
    // If a repository contains only pre-release packages (e.g. AutoStep CI), and 
    // the configuration doesn't permit pre-release versions,
    // the search will look at other ones (e.g. NuGet).
    foreach (var sourceRepository in repositories)
    {
        // Get a 'resource' from the repository.
        var findPackageResource = await sourceRepository.GetResourceAsync<FindPackageByIdResource>();

        // Get the list of all available versions of the package in the repository.
        var allVersions = await findPackageResource.GetAllVersionsAsync(extConfig.Package, cache, nugetLogger, cancelToken);

        NuGetVersion selected;

        // Have we specified a version range?
        if (extConfig.Version != null)
        {
            if (!VersionRange.TryParse(extConfig.Version, out var range))
            {
                throw new InvalidOperationException("Invalid version range provided.");
            }

            // Find the best package version match for the range.
            // Consider pre-release versions, but only if the extension is configured to use them.
            var bestVersion = range.FindBestMatch(allVersions.Where(v => extConfig.PreRelease || !v.IsPrerelease));

            selected = bestVersion;
        }
        else
        {
            // No version; choose the latest, allow pre-release if configured.
            selected = allVersions.LastOrDefault(v => v.IsPrerelease == extConfig.PreRelease);
        }

        if (selected is object)
        {
            return new PackageIdentity(extConfig.Package, selected);
        }
    }

    return null;
}

Let’s plug that code into our previous code, so we’re now getting the identity:

// ...

// My extension configuration:
var extensions = new[] { new ExtensionConfiguration
{
    Package = "AutoStep.Web",
    PreRelease = true // Allow pre-release versions.
}};

foreach (var ext in extensions)
{
    var packageIdentity = await GetPackageIdentity(ext, sourceCacheContext, logger, repositories, CancellationToken.None);

    if (packageIdentity is null)
    {
        throw new InvalidOperationException($"Cannot find package {ext.Package}.");
    }
}

With this we get a package identity of AutoStep.Web.1.0.0-develop.20 (the latest pre-release version at the time).

Get the List of Package Dependencies

This is where things get interesting. We need to get the complete set of all dependencies, across all the extensions, that we need to install in order to use the extension package.

First off, let’s look at an initial, very naive solution, which just does a straight-forward recurse through the entire dependency graph.

private async Task GetPackageDependencies(PackageIdentity package, SourceCacheContext cacheContext, 
                                          NuGetFramework framework, ILogger logger, 
                                          IEnumerable<SourceRepository> repositories,
                                          ISet<SourcePackageDependencyInfo> availablePackages, 
                                          CancellationToken cancelToken)
{
    // Don't recurse over a package we've already seen.
    if (availablePackages.Contains(package))
    {
        return;
    }

    foreach (var sourceRepository in repositories)
    {
        // Get the dependency info for the package.
        var dependencyInfoResource = await sourceRepository.GetResourceAsync<DependencyInfoResource>();
        var dependencyInfo = await dependencyInfoResource.ResolvePackage(
            package,
            framework,
            cacheContext,
            logger,
            cancelToken);

        // No info for the package in this repository.
        if (dependencyInfo == null)
        {
            continue;
        }

        // Add to the list of all packages.
        availablePackages.Add(dependencyInfo);

        // Recurse through each package.
        foreach (var dependency in dependencyInfo.Dependencies)
        {
            await GetPackageDependencies(
                new PackageIdentity(dependency.Id, dependency.VersionRange.MinVersion),
                cacheContext,
                framework,
                logger,
                repositories,
                availablePackages,
                cancelToken);
        }

        break;
    }
}

That does indeed create the complete graph of all libraries required by that extension, the problem is that it has 104 packages in it!

Long package list

I’ve got the AutoStep.Web package at the top there, but I’ve also got
System.Runtime, which I definitely don’t want.

All the extensions are going to reference the AutoStep.Extensions.Abstractions package (because that’s where we define our interfaces for extensions), but we don’t want to download it ourselves!

Besides the fact that we don’t need to download these shared packages, if we load in the AutoStep.Extensions.Abstractions assembly from the extension’s dependencies, it will not be compatible with the version referenced by the host process.

The actual requirement for our behaviour here is:

All packages provided by the host process should be excluded from the set of dependencies to install.

Filtering the Dependencies

At runtime, how do we know what the set of installed packages are for a .NET Core Application? Luckily, there happens to be an existing file containing this information, the {AssemblyName}.deps.json file that gets copied to your output directory.

You probably haven’t had to worry about it much, but if you look in your application’s output directory, you’ll find it.

It contains the complete package reference graph for your application, and looks a little something like this:

{
  "runtimeTarget": {
    "name": ".NETCoreApp,Version=v3.1",
    "signature": ""
  },
  "compilationOptions": {},
  "targets": {
    ".NETCoreApp,Version=v3.1": {
      "NugetConsole/1.0.0": {
        "dependencies": {
          "Microsoft.Extensions.DependencyModel": "3.1.3",
          "NuGet.Protocol": "5.5.1",
          "NuGet.Resolver": "5.5.1"
        },
        "runtime": {
          "NugetConsole.dll": {}
        }
      },
      "Microsoft.CSharp/4.0.1": {
        "dependencies": {
          "System.Collections": "4.3.0",
          "System.Diagnostics.Debug": "4.3.0",
          "System.Dynamic.Runtime": "4.3.0",
          "System.Globalization": "4.3.0",
          "System.Linq": "4.3.0",
          "System.Linq.Expressions": "4.3.0",
          "System.ObjectModel": "4.3.0",

// ...a lot more content...

Handily, we don’t have to parse this ourselves. If you add the Microsoft.Extensions.DependencyModel package to your project, you can directly access this content using DependencyContext.Default, which gives you a DependencyContext you can interrogate.

Let’s define a method that takes this DependencyContext and a PackageDependency, and checks whether it is provided by the host:

private bool DependencySuppliedByHost(DependencyContext hostDependencies, PackageDependency dep)
{
    // See if a runtime library with the same ID as the package is available in the host's runtime libraries.
    var runtimeLib = hostDependencies.RuntimeLibraries.FirstOrDefault(r => r.Name == dep.Id);

    if (runtimeLib is object)
    {
        // What version of the library is the host using?
        var parsedLibVersion = NuGetVersion.Parse(runtimeLib.Version);

        if (parsedLibVersion.IsPrerelease)
        {
            // Always use pre-release versions from the host, otherwise it becomes
            // a nightmare to develop across multiple active versions.
            return true;
        }
        else
        {
            // Does the host version satisfy the version range of the requested package?
            // If so, we can provide it; otherwise, we cannot.
            return dep.VersionRange.Satisfies(parsedLibVersion);
        }
    }

    return false;
}

Then, let’s plug that in to our existing GetPackageDependencies method:

private async Task GetPackageDependencies(PackageIdentity package, SourceCacheContext cacheContext, NuGetFramework framework, 
                                          ILogger logger, IEnumerable<SourceRepository> repositories, DependencyContext hostDependencies,
                                          ISet<SourcePackageDependencyInfo> availablePackages, CancellationToken cancelToken)
{
    // Don't recurse over a package we've already seen.
    if (availablePackages.Contains(package))
    {
        return;
    }

    foreach (var sourceRepository in repositories)
    {
        // Get the dependency info for the package.
        var dependencyInfoResource = await sourceRepository.GetResourceAsync<DependencyInfoResource>();
        var dependencyInfo = await dependencyInfoResource.ResolvePackage(
            package,
            framework,
            cacheContext,
            logger,
            cancelToken);

        // No info for the package in this repository.
        if (dependencyInfo == null)
        {
            continue;
        }


        // Filter the dependency info.
        // Don't bring in any dependencies that are provided by the host.
        var actualSourceDep = new SourcePackageDependencyInfo(
            dependencyInfo.Id,
            dependencyInfo.Version,
            dependencyInfo.Dependencies.Where(dep => !DependencySuppliedByHost(hostDependencies, dep)),
            dependencyInfo.Listed,
            dependencyInfo.Source);

        availablePackages.Add(actualSourceDep);

        // Recurse through each package.
        foreach (var dependency in actualSourceDep.Dependencies)
        {
            await GetPackageDependencies(
                new PackageIdentity(dependency.Id, dependency.VersionRange.MinVersion),
                cacheContext,
                framework,
                logger,
                repositories,
                hostDependencies,
                availablePackages,
                cancelToken);
        }

        break;
    }
}

This cuts down on the set of packages significantly, but it’s still pulling down some runtime-provided packages I don’t want:

AutoStep.Web : 1.0.0-develop.20       // correct
Selenium.Chrome.WebDriver : 79.0.0    // correct
Selenium.WebDriver : 3.141.0          // correct
Newtonsoft.Json : 10.0.3              // correct
Microsoft.CSharp : 4.3.0              // Ah. This is a runtime package...
System.ComponentModel.TypeConverter : 4.3.0 
System.Collections.NonGeneric : 4.3.0       
System.Collections.Specialized : 4.3.0
System.ComponentModel : 4.3.0
System.ComponentModel.Primitives : 4.3.0
System.Runtime.Serialization.Primitives : 4.3.0
System.Runtime.Serialization.Formatters : 4.3.0
System.Xml.XmlDocument : 4.3.0

So, something is still not right. What’s causing these packages to be present?

Well, simply put, my program doesn’t use System.ComponentModel, so it isn’t in the list of my dependencies. But it is provided by the host, because it’s part of the distributed .NET Runtime.

Ignoring Runtime-Provided Packages

We want to filter out runtime-provided packages completely, but how do we know which ones to exclude? We can’t just filter out any System.* packages, because there are a number of System.* packages that aren’t shipped with the runtime (e.g. System.Text.Json).

As far as I can tell, it’s more or less impossible to determine the full set at run time dynamically.

After some considerable searching however, I found a complete listing of all runtime-provided packages in an MSBuild task in the dotnet SDK, called PackageConflictOverrides, which tells the build system which packages don’t need to be restored! Yay!

This allowed me to define the following static lookup class (excerpt only). You can find a full version here.

/// <summary>
/// Contains a pre-determined list of NuGet packages that are provided by the run-time, and
/// therefore should not be restored from an extensions dependency list.
/// </summary>
internal static class RuntimeProvidedPackages
{
    /// <summary>
    /// Checks whether the set of known runtime packages contains the given package ID.
    /// </summary>
    /// <param name="packageId">The package ID.</param>
    /// <returns>True if the package is provided by the framework, otherwise false.</returns>
    public static bool IsPackageProvidedByRuntime(string packageId)
    {
        return ProvidedPackages.Contains(packageId);
    }

    /// <summary>
    /// This list comes from the package overrides for the .NET SDK,
    /// at https://github.com/dotnet/sdk/blob/v3.1.201/src/Tasks/Common/targets/Microsoft.NET.DefaultPackageConflictOverrides.targets.
    /// If the executing binaries ever change to a newer version, this project must update as well, and refresh this list.
    /// </summary>
    private static readonly ISet<string> ProvidedPackages = new HashSet<string>
    {
        "Microsoft.CSharp",
        "Microsoft.Win32.Primitives",
        "Microsoft.Win32.Registry",
        "runtime.debian.8-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.fedora.23-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.fedora.24-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.opensuse.13.2-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.opensuse.42.1-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.osx.10.10-x64.runtime.native.System.Security.Cryptography.Apple",
        "runtime.osx.10.10-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.rhel.7-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.ubuntu.14.04-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.ubuntu.16.04-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "runtime.ubuntu.16.10-x64.runtime.native.System.Security.Cryptography.OpenSsl",
        "System.AppContext",
        "System.Buffers",
        "System.Collections",
        "System.Collections.Concurrent",
        // Removed a load for brevity....
        "System.Xml.ReaderWriter",
        "System.Xml.XDocument",
        "System.Xml.XmlDocument",
        "System.Xml.XmlSerializer",
        "System.Xml.XPath",
        "System.Xml.XPath.XDocument",
    };
}

Ok, so let’s update our DependencySuppliedByHost method to use this look-up:

private bool DependencySuppliedByHost(DependencyContext hostDependencies, PackageDependency dep)
{
    // Check our look-up list.
    if(RuntimeProvidedPackages.IsPackageProvidedByRuntime(dep.Id))
    {
        return true;
    }

    // See if a runtime library with the same ID as the package is available in the host's runtime libraries.
    var runtimeLib = hostDependencies.RuntimeLibraries.FirstOrDefault(r => r.Name == dep.Id);

    if (runtimeLib is object)
    {
        // What version of the library is the host using?
        var parsedLibVersion = NuGetVersion.Parse(runtimeLib.Version);

        if (parsedLibVersion.IsPrerelease)
        {
            // Always use pre-release versions from the host, otherwise it becomes
            // a nightmare to develop across multiple active versions.
            return true;
        }
        else
        {
            // Does the host version satisfy the version range of the requested package?
            // If so, we can provide it; otherwise, we cannot.
            return dep.VersionRange.Satisfies(parsedLibVersion);
        }
    }

    return false;
}

Now, when we run our code, we get precisely the set of packages we want!

AutoStep.Web : 1.0.0-develop.20
Selenium.Chrome.WebDriver : 79.0.0
Selenium.WebDriver : 3.141.0
Newtonsoft.Json : 10.0.3

Downloading and Extracting

At the moment, our list of dependencies ‘might’ contain duplicates. For example,
two different extensions might reference two different versions of NewtonSoft.Json.
We need to pick one to install that will be compatible with both.

To do this, we need to use the supplied PackageResolver class to constrain the set of packages
to only the ones we want to actually download and install, in a new GetPackagesToInstall method:

private IEnumerable<SourcePackageDependencyInfo> GetPackagesToInstall(SourceRepositoryProvider sourceRepositoryProvider, 
                                                                      ILogger logger, IEnumerable<ExtensionConfiguration> extensions, 
                                                                      HashSet<SourcePackageDependencyInfo> allPackages)
{
    // Create a package resolver context.
    var resolverContext = new PackageResolverContext(
            DependencyBehavior.Lowest,
            extensions.Select(x => x.Package),
            Enumerable.Empty<string>(),
            Enumerable.Empty<PackageReference>(),
            Enumerable.Empty<PackageIdentity>(),
            allPackages,
            sourceRepositoryProvider.GetRepositories().Select(s => s.PackageSource),
            logger);

    var resolver = new PackageResolver();

    // Work out the actual set of packages to install.
    var packagesToInstall = resolver.Resolve(resolverContext, CancellationToken.None)
                                    .Select(p => allPackages.Single(x => PackageIdentityComparer.Default.Equals(x, p)));
    return packagesToInstall;
}

Once we have that list, we can pass it to another new method that actually downloads and extracts the packages for us, InstallPackages.

private async Task InstallPackages(SourceCacheContext sourceCacheContext, ILogger logger, 
                                    IEnumerable<SourcePackageDependencyInfo> packagesToInstall, string rootPackagesDirectory, 
                                    ISettings nugetSettings, CancellationToken cancellationToken)
{
    var packagePathResolver = new PackagePathResolver(rootPackagesDirectory, true);
    var packageExtractionContext = new PackageExtractionContext(
        PackageSaveMode.Defaultv3,
        XmlDocFileSaveMode.Skip,
        ClientPolicyContext.GetClientPolicy(nugetSettings, logger),
        logger);

    foreach (var package in packagesToInstall)
    {
        var downloadResource = await package.Source.GetResourceAsync<DownloadResource>(cancellationToken);

        // Download the package (might come from the shared package cache).
        var downloadResult = await downloadResource.GetDownloadResourceResultAsync(
            package,
            new PackageDownloadContext(sourceCacheContext),
            SettingsUtility.GetGlobalPackagesFolder(nugetSettings),
            logger,
            cancellationToken);

        // Extract the package into the target directory.
        await PackageExtractor.ExtractPackageAsync(
            downloadResult.PackageSource,
            downloadResult.PackageStream,
            packagePathResolver,
            packageExtractionContext,
            cancellationToken);
    }
}

Let’s go ahead and plug those extra methods into our main calling method:

public async Task LoadExtensions()
{
    // Define a source provider, with nuget, plus my own feed.
    var sourceProvider = new PackageSourceProvider(NullSettings.Instance, new[]
    {
        new PackageSource("https://api.nuget.org/v3/index.json"),
        new PackageSource("https://f.feedz.io/autostep/ci/nuget/index.json")
    });

    // Establish the source repository provider; the available providers come from our custom settings.
    var sourceRepositoryProvider = new SourceRepositoryProvider(sourceProvider, Repository.Provider.GetCoreV3());

    // Get the list of repositories.
    var repositories = sourceRepositoryProvider.GetRepositories();

    // Disposable source cache.
    using var sourceCacheContext = new SourceCacheContext();

    // You should use an actual logger here, this is a NuGet ILogger instance.
    var logger = new NullLogger();

    // My extension configuration:
    var extensions = new[]
    {
        new ExtensionConfiguration
        {
            Package = "AutoStep.Web",
            PreRelease = true // Allow pre-release versions.
        }
    };

    // Replace this with a proper cancellation token.
    var cancellationToken = CancellationToken.None;

    // The framework we're using.
    var targetFramework = NuGetFramework.ParseFolder("netcoreapp3.1");
    var allPackages = new HashSet<SourcePackageDependencyInfo>();

    var dependencyContext = DependencyContext.Default;

    foreach (var ext in extensions)
    {
        var packageIdentity = await GetPackageIdentity(ext, sourceCacheContext, logger, repositories, cancellationToken);

        if (packageIdentity is null)
        {
            throw new InvalidOperationException($"Cannot find package {ext.Package}.");
        }

        await GetPackageDependencies(packageIdentity, sourceCacheContext, targetFramework, logger, repositories, dependencyContext, allPackages, cancellationToken);
    }

    var packagesToInstall = GetPackagesToInstall(sourceRepositoryProvider, logger, extensions, allPackages);

    // Where do we want to install our packages?
    // For now we'll pop them in the .extensions folder.
    var packageDirectory = Path.Combine(Environment.CurrentDirectory, ".extensions");
    var nugetSettings = Settings.LoadDefaultSettings(packageDirectory);

    await InstallPackages(sourceCacheContext, logger, packagesToInstall, packageDirectory, nugetSettings, cancellationToken);
}

With all these changes, here’s what the ./extensions folder looks like when we run this:

> ls ./extensions
AutoStep.Web.1.0.0-develop.20
Newtonsoft.Json.10.0.3
Selenium.Chrome.WebDriver.79.0.0
Selenium.WebDriver.3.141.0

All the packages we need are now on disk!


Wrapping Up

At the end of this post, we now have a mechanism for loading packages and a filtered set of dependencies from NuGet.

In the next post, we will load those packages into a custom AssemblyLoadContext and use them in our application.

You can find the complete set of code from this post in this gist.

The ‘production’ code this fed into is in the GitHub repository for AutoStep.Extensions, which provides the NuGet package used to load extensions into our VS Code Language Server and our commmand-line runner.

Categories
c#

Easily loading lots of data in parallel over HTTP, using Dataflow in .NET Core

I recently had a requirement to load a large amount of data into an application, as fast as possible.

The data in question was about 100,000 transactions, stored line-by-line in a file, that needed to be sent over HTTP to a web application, that would process it and load it into a database.

This is actually pretty easy in .NET, and super efficient using async/await:

async static Task Main(string[] args)
{
var httpClient = new HttpClient();
httpClient.BaseAddress = new Uri("https://myserver");
using (var fileSource = new StreamReader(File.OpenRead(@"C:\Data\Sources\myfile.csv")))
{
await StreamData(fileSource, httpClient, "/api/send");
}
}
private static async Task StreamData(StreamReader fileSource, HttpClient httpClient, string path)
{
string line;
// Read from the file until it's empty
while ((line = await fileSource.ReadLineAsync()) != null)
{
// Convert a line of data into JSON compatible with the API
var jsonMsg = GetDataJson(line);
// Send it to the server
await httpClient.PostAsync(path, new StringContent(jsonMsg, Encoding.UTF8, "application/json"));
}
}
view raw program.cs hosted with ❤ by GitHub

Run that through, and I get a time of 133 seconds; this isn’t too bad right? Around 750 records per second.

But I feel like I can definitely make this better. For one thing, my environment doesn’t look exactly look like the diagram above. It’s a scaled production environment, so looks more like this:

I’ve got lots of resources that I’m not using right now, because I’m only sending one request at a time, so what I want to do is start loading the data in parallel.

Let’s look at a convenient way of doing this, using the System.Threading.Tasks.Dataflow package, which is available for .NET Framework 4.5+ and .NET Core.

The Dataflow components provide various ways of doing asynchronous processing, but here I’m going to use the ActionBlock, which allows me to post messages that are subsequently processed by a Task, in a callback. More importantly, it let’s me process messages in parallel.

Let’s look at the code for my new StreamDataInParallel method:

private static async Task StreamDataInParallel(StreamReader fileSource, HttpClient httpClient, string path, int maxParallel)
{
var block = new ActionBlock<string>(
async json =>
{
await httpClient.PostAsync(path, new StringContent(json, Encoding.UTF8, "application/json"));
}, new ExecutionDataflowBlockOptions
{
// Tells the action block how many we want to run at once.
MaxDegreeOfParallelism = maxParallel,
// 'Buffer' the same number of lines as there are parallel requests.
BoundedCapacity = maxParallel
});
string line;
while ((line = await fileSource.ReadLineAsync()) != null)
{
// This will not continue until there is space in the buffer.
await block.SendAsync(GetDataJson(line));
}
}
view raw program.cs hosted with ❤ by GitHub

The great thing about Dataflow is that in only about 18 lines of code, I’ve got parallel processing of data, pushing HTTP requests to a server at a rate of my choice (controlled by the maxParallel parameter).

Also, with the combination of the SendAsync method and specifying a BoundedCapacity, it means I’m only reading from my file when there are slots available in the buffer, so my memory consumption stays low.

I’ve run this a few times, increasing the number of parallel requests each time, and the results are below:

Sadly, I wasn’t able to run the benchmarking tests on the production environment (for what I hope are obvious reasons), so I’m running all this locally; the number of parallel requests I can scale to is way higher in production, but it’s all just a factor of total available cores and database server performance.

Value of maxParallelAverage Records/Second
1750
21293
31785
42150
52500
62777
72941
83125

With 8 parallel requests, we get over 3000 records/second, with a time of 32 seconds to load our 100,000 records.

You’ll notice that the speed does start to plateau (or at least I get diminishing returns); this will happen when we start to hit database contention (typically the main throttling factor, depending on your workload).

I’d suggest that you choose a sensible limit for how many requests you have going so you don’t accidentally denial-of-service your environment; we’ve got to assume that there’s other stuff going on at the same time.

Anyway, in conclusion, Dataflow has got loads of applications, this is just one of them that I took advantage of for my problem. So that’s it, go forth and load data faster!

Categories
arduino c#

Displaying Real-time Sensor Data in the Browser with SignalR and ChartJS

In my previous posts on Modding My Rowing Machine, I wired up an Arduino to my rowing machine, and streamed the speed sensor data to an ASP.NET core application.

In this post, I’m going to show you how to take sensor and counter data, push it to a browser as it arrives, and display it in a real-time chart.

If you want to skip ahead, I’ve uploaded all the code for the Arduino and ASP.NET components to a github repo at https://github.com/alistairjevans/rower-mod.

I’m using Visual Studio 2019 with the ASP.NET Core 3.0 Preview for all the server-side components, but the latest stable release of ASP.NET Core will work just fine, I’m not using any of the new features.

Pushing Data to the Browser

So, you will probably have heard of SignalR, the ASP.NET technology that can be used to push data to the browser from the server, and generally establish a closer relationship between the two.

I’m going to use it send data to the browser whenever new sensor data arrives, and also to let the browser request that the count be reset.

The overall component layout looks like this:

Setting up SignalR

This bit is pretty easy; first up, head over to the Startup.cs file in your ASP.NET app project, and in the ConfigureServices method, add SignalR:

public void ConfigureServices(IServiceCollection services)
{
// Define a writer that saves my data to disk
var folderPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData),
"rower");
services.AddSingleton<ISampleWriter>(svc => new SampleWriter(folderPath, "samples"));
// Keep my machine state as a singleton
services.AddSingleton<IMachineState, MachineState>();
services.AddControllersWithViews()
.AddNewtonsoftJson();
services.AddRazorPages();
// Add signalr services
services.AddSignalR();
}
view raw Startup.cs hosted with ❤ by GitHub

Next, create a SignalR Hub. This is effectively the endpoint your clients will connect to, and will contain any methods a client needs to invoke on the server.

public class FeedHub : Hub
{
private readonly IMachineState machineState;
private readonly ISampleWriter sampleWriter;
public FeedHub(IMachineState machineState, ISampleWriter sampleWriter)
{
this.machineState = machineState;
this.sampleWriter = sampleWriter;
}
public void ResetCount()
{
// Reset the state, and start a new data file
machineState.ZeroCount();
sampleWriter.StartNewFile();
}
}
view raw FeedHub.cs hosted with ❤ by GitHub

SignalR Hubs are just classes that derive from the Hub class. I’ve got just the one method in mine at the moment, for resetting my counter.

Before that Hub will work, you need to register it in your Startup class’ Configure method:

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
//
// Omitted standard content for brevity...
//
app.UseSignalR(cfg => cfg.MapHub<FeedHub>("/feed"));
}
view raw startup.cs hosted with ❤ by GitHub

You’re also going to want to add the necessary SignalR javascript to your project. I did it using the “Manage Client-Side Libraries” feature in Visual Studio; you can find my entire libman.json file (which defines which libraries I’m using) on my github repo

Sending Data to the Client

In the MVC Controller where the data arrives from the Arduino, I’m going to push the sensor data to all clients connected to the hub.

The way you access the clients of a hub from outside the hub (i.e. an MVC Controller) is by resolving an IHubContext<THubType>, and then accessing the Clients property.

public class DataController : Controller
{
private readonly IMachineState machineState;
private readonly ISampleWriter sampleWriter;
private readonly IHubContext<FeedHub> feedHub;
public DataController(IMachineState machineState, ISampleWriter sampleWriter, IHubContext<FeedHub> feedHub)
{
this.machineState = machineState;
this.sampleWriter = sampleWriter;
this.feedHub = feedHub;
}
[HttpPost]
public async Task<ActionResult> ProvideReading(uint milliseconds, double speed, int count)
{
// Update our machine state.
machineState.UpdateMachineState(milliseconds, speed, count);
// Write the sample to file (our sample writer) and update all clients
// Wait for them both to finish.
await Task.WhenAll(
sampleWriter.ProvideSample(machineState.LastSample, machineState.Speed, machineState.Count),
feedHub.Clients.All.SendAsync("newData",
machineState.LastSample.ToString("o"),
machineState.Speed,
machineState.Count)
);
return StatusCode(200);
}
}
view raw DataController.cs hosted with ❤ by GitHub

Pro tip:
Got multiple IO operations to do in a single request, that don’t depend on each other? Don’t just await one, then await the other; use Task.WhenAll, and the operations will run in parallel.

In my example above I’m writing to a file and to SignalR clients at the same time, and only continuing when both are done.

Browser

Ok, so we’ve got the set-up to push data to the browser, but no HTML just yet. I don’t actually need any MVC Controller functionality, so I’m just going to create a Razor Page, which still gives me a Razor template, but without having to write the controller behind it.

If I put an ‘Index.cshtml’ file under a new ‘Pages’ folder in my project, and put the following content in it, that becomes the landing page of my app:

@page
<html>
<head>
</head>
<body>
<div class="container">
<div class="lblSpeed text lbl">Speed:</div>
<div class="valSpeed text" id="currentSpeed"><!-- speed goes here --></div>
<div class="lblCount text lbl">Count:</div>
<div class="valCount text" id="currentCount"><!-- stroke count goes here --></div>
<div class="btnContainer">
<button id="reset">Reset Count</button>
</div>
<div class="chartContainer">
<!-- I'm going to render my chart in this canvas -->
<canvas id="chartCanvas"></canvas>
</div>
</div>
<script src="~/lib/signalr/dist/browser/signalr.js"></script>
<script src="~/js/site.js"></script>
</body>
</html>
view raw Index.cshtml hosted with ❤ by GitHub

In my site.js file, I’m just going to open a connection to the SignalR hub and attach a callback for data being given to me:

"use strict";
// Define my connection (note the /feed address to specify the hub)
var connection = new signalR.HubConnectionBuilder().withUrl("/feed").build();
// Get the elements I need
var speedValue = document.getElementById("currentSpeed");
var countValue = document.getElementById("currentCount");
var resetButton = document.getElementById("reset");
window.onload = function () {
// Start the SignalR connection
connection.start().then(function () {
console.log("Connected");
}).catch(function (err) {
return console.error(err.toString());
});
resetButton.addEventListener("click", function () {
// When someone clicks the reset button, this
// will call the ResetCount method in my FeedHub.
connection.invoke("ResetCount");
});
};
// This callback is going to fire every time I get new data.
connection.on("newData", function (time, speed, count) {
speedValue.innerText = speed;
countValue.innerText = count;
});
view raw site.js hosted with ❤ by GitHub

That’s actually all we need to get data flowing down to the browser, and displaying the current speed and counter values!

I want something a little more visual though….

Displaying the Chart

I’m going to use the ChartJS library to render a chart, plus a handy plugin for ChartJS that helps with streaming live data and rendering it, the chartjs-plugin-streaming plugin.

First off, add the two libraries to your project (and your HTML file), plus MomentJS, which ChartJS requires to function.

Next, let’s set up our chart, by defining it’s configuration and attaching it to the 2d context of the canvas object:

window.onload = function () {
var ctx = document.getElementById('chartCanvas').getContext('2d');
window.myChart = new Chart(ctx, {
type: 'line',
data: {
datasets: [{
label: 'Speed',
data: []
}]
},
options: {
scales: {
xAxes: [{
type: 'realtime',
delay: 0,
// 20 seconds of data
duration: 20000
}],
yAxes: [{
ticks: {
suggestedMin: 0,
suggestedMax: 50
}
}]
}
}
});
// The other signalr setup is still here...
}
view raw site.js hosted with ❤ by GitHub

Finally, let’s make our chart display new sensor data as it arrives:

connection.on("newData", function (time, speed, count) {
// This subtract causes the data to be placed
// in the centre of the chart as it arrives,
// which I personally think looks better...
var dateValue = moment(time).subtract(5, 'seconds');
speedValue.innerText = speed;
countValue.innerText = count;
// append the new data to the existing chart data
myChart.data.datasets[0].data.push({
x: dateValue,
y: speed
});
// update chart datasets keeping the current animation
myChart.update({
preservation: true
});
});
view raw site.js hosted with ❤ by GitHub

With all that together, let’s see what we get!

Awesome, a real-time graph of my rowing!

As an aside, I used the excellent tool by @sarah_edo to generate a CSS grid really quickly, so thanks for that! You can find it at https://cssgrid-generator.netlify.com/

You can check out the entire solution, including all the code for the Arduino and the ASP.NET app, on the github repo at https://github.com/alistairjevans/rower-mod.

Next up for the rowing machine project, I want to put some form of gamification, achievements or progress tracking into the app, but not sure exactly how it will look yet.

Categories
arduino c# networks

Streaming real-time sensor data to an ASP.NET Core app from an Arduino

In my previous posts on Modding my Rowing Machine, I got started with an Arduino, and started collecting speed sensor data. The goal of this post is to connect to the WiFi network and upload sensor data to a server application I’ve got running on my laptop in as close to real-time as I can make it.

Connecting to WiFi

My Arduino Uno WiFi Rev 2 board has got a built-in WiFi module; it was considerably easier than I expected to get everything connected.

I first needed to install the necessary library to support the board, the WiFiNINA library:

Then you can just include the necessary header file and connect to the network:

#include <WiFiNINA.h>
#define NETSSID "MYNETWORK"
#define NETPASS "SECRETPASSWORD"
WiFiClient client;
void setup()
{
Serial.begin(9600);
// Start connecting...
WiFi.begin(NETSSID, NETPASS);
// Give it a moment...
delay(5000);
if(WiFi.status() == WL_CONNECTED)
{
Serial.println("Connected!");
}
}
view raw app.ino hosted with ❤ by GitHub

To be honest, that code probably isn’t going to cut it, because WiFi networks don’t work that nicely. You need a retry mechanism with timeouts to keep trying to connect. Let’s take a look at the full example:

#include <WiFiNINA.h>
#define NETSSID "MYNETWORK"
#define NETPASS "SECRETPASSWORD"
#define TOTAL_WAIT_TIME 60000 // 1 minute
#define ATTEMPT_TIME 5000 // 5 seconds between attempts
WiFiClient client;
void setup()
{
Serial.begin(9600);
unsigned long startTime = millis();
unsigned long lastAttemptTime = 0;
int wifiStatus;
// attempt to connect to Wifi network in a loop,
// until we connect.
while (wifiStatus != WL_CONNECTED)
{
unsigned long currentTime = millis();
if(currentTime - startTime > TOTAL_WAIT_TIME)
{
// Exceeded the total timeout for trying to connect, so stop.
Serial.println("Failed to connect");
while(true);
}
else if(currentTime - lastAttemptTime > ATTEMPT_TIME)
{
// Exceeded our attempt delay, initiate again.
Serial.println("Attempting Wifi Connection");
lastAttemptTime = currentTime;
wifiStatus = WiFi.begin(NETSSID, NETPASS);
}
else
{
// wait 500ms before we check the WiFi status.
delay(500);
}
}
Serial.println("Connected!");
}
view raw app.ino hosted with ❤ by GitHub

The Server

To receive the data from the Arduino, I created a light-weight ASP.NET Core 3.0 web application with a single controller endpoint to handle incoming data, taking a timestamp and the speed:

public class DataController : Controller
{
private readonly SampleWriter sampleWriter;
public DataController(SampleWriter sampleWriter)
{
this.sampleWriter = sampleWriter;
}
[HttpPost]
public async Task<ActionResult> ProvideReading(uint milliseconds, double speed)
{
// sampleWriter is just a singleton dependency with an open file stream,
// writing each record to a CSV file as it arrives.
await sampleWriter.ProvideSample(milliseconds, speed);
return StatusCode(200);
}
}
view raw DataController.cs hosted with ❤ by GitHub

Then, in my Arduino, I put the following code in a method to send data to my application:

#define SERVER "MYSERVER"
#define SERVERPORT 5000
WiFiClient client;
void sendData(unsigned long timestamp, double speed)
{
// Host and port
if(client.connect(SERVER, SERVERPORT))
{
char body[64];
// Clear the array to zeroes.
memset(body, 0, 64);
// Arduino sprintf does not support floats or doubles.
sprintf(body, "milliseconds=%lu&speed=", timestamp);
// Use the dtostrf to append the speed.
dtostrf(speed, 2, 3, &body[strlen(body)]);
int bodyLength = strlen(body);
// Specify the endpoint
client.println("POST /data/providereading HTTP/1.1");
// Write Host: SERVER:SERVERPORT
client.print("Host: ");
client.print(SERVER);
client.print(":");
client.println(SERVERPORT);
// Close the connection after the request
client.println("Connection: close");
// Write the amount of body data
client.print("Content-Length: ");
client.println(bodyLength);
client.println("Content-Type: application/x-www-form-urlencoded");
client.println();
client.print(body);
// Wait for the response
delay(100);
// Read the response (but we don't care what is in it)
while(client.read() != -1);
}
}
view raw app.ino hosted with ❤ by GitHub

I just want to briefly mention one part of the above code, where I’m preparing body data to send.

// Arduino sprintf does not support floats or doubles.
sprintf(body, "milliseconds=%lu&speed=", timestamp);
// Use the dtostrf to append the speed.
dtostrf(speed, 2, 3, &body[strlen(body)]);
view raw app.ino hosted with ❤ by GitHub

The Arduino libraries do not support the %f specifier (for a float) in the sprintf method, so I can’t just add the speed as an argument there. Instead, you have to use the dtostrf method to insert a double into the string, specifying the number of decimal points you want.

Also, if you specify %d (int) instead of %lu (unsigned long) for the timestamp, the sprintf method treats the value as a signed int and you get very strange numbers being sent through for the timestamp.

Once that was uploaded, I started getting requests through!

Performance

We now have HTTP requests from the Arduino to our ASP.NET Core app. But I’m not thrilled with the amount of time it takes to execute a single request.

If we take a look at the WireShark trace (I love WireShark), you can see that each request from start to finish is taking in the order of 100ms!

This is loads, and I can’t have my Arduino sitting there for that long.

ASP.NET Core Performance

You can see in the above trace that the web app handling the request is taking 20ms to return the response, which is a lot. I know that ASP.NET Core can do better than that.

Turns out this problem was actually due to the fact I had console logging switched on. Due to the synchronisation that takes place when writing to the console, it can add a lot of time to requests to print all that information-level data.

Once I turned the logging down from Information to Warning in my appsettings.json file, it got way better.

That’s better!

That actually gives us sub-millisecond response times from the server, which is awesome.

TCP Handshake Overhead

Annoyingly, each request is still taking up to 100ms from start of connection to the end. How come?

If you look at those WireShark traces, we spend a lot of time in the TCP handshaking process. Opening a TCP connection does generally come with lots of network overhead, and that call to client.connect(SERVER, SERVERPORT) in my code blocks until the TCP connection is open; I don’t want to sit there waiting for that every time I want to send a sample.

The simple solution to this is to make the connection stay open between samples, so we can just repeatedly sent data on the same connection, only needing to do the handshake once.

Let’s rework our previous sendData code on the Arduino to keep the connection open:

void sendData(unsigned long timestamp, double speed)
{
char body[64];
int success = 1;
// If we're not connected, open the connection.
if(!client.connected())
{
success = client.connect(SERVER, SERVERPORT);
}
if(success)
{
// Empty the buffer
// (I still don't really care about the response)
while(client.read() != -1);
memset(body, 0, 64);
sprintf(body, "milliseconds=%lu&speed=", timestamp);
dtostrf(speed, 2, 3, &body[strlen(body)]);
int bodyLength = strlen(body);
client.println("POST /data/providereading HTTP/1.1");
client.print("Host: ");
client.print(SERVER);
client.print(":");
client.println(SERVERPORT);
// This tells the server we want to leave the
// connection open.
client.println("Connection: keep-alive");
client.print("Content-Length: ");
client.println(bodyLength);
client.println("Content-Type: application/x-www-form-urlencoded");
client.println();
client.print(body);
}
}
view raw app.ino hosted with ❤ by GitHub

In this version, we ask the server to leave the connection open after the request, and only open the connection if it is closed. I’m also not blocking waiting for a response.

This gives us way better behaviour, and we’re now down to about 40ms total:

There’s one more thing that I don’t love about this though…

TCP Packet Fragmentation

So, what’s left to look at?

TCP segments

I’ve got a packet preceding each of my POST requests, that seems to hold things up by around 40ms. What’s going on here? Let’s look at the content of that packet:

Wireshark data view

What I can tell from this is that rather than wait for my HTTP request data, the Arduino is not buffering for long enough, and is just sending what it has after the first println call containing POST /data/providereading HTTP/1.1. This packet fragmentation slows everything up because the Arduino has to wait for an ACK from the server before it continues.

I just wanted to point out that it looks like the software in the Arduino libraries isn’t responsible for the fragmentation; it looks all the TCP behaviour is handled by the hardware WiFi module, that’s what is splitting my packets.

To stop this packet fragmentation, let’s adjust the sending code to prepare the entire request and send it all at once:

void sendData(unsigned long timestamp, double speed)
{
int success = 1;
char request[256];
char body[64];
if(!client.connected())
{
success = client.connect(server, 5000);
}
if(success)
{
// Empty the buffer
// (I still don't really care about the response)
while(client.read() != -1);
// Clear the request data
memset(request, 0, 256);
// Clear the body data
memset(body, 0, 64);
sprintf(body, "milliseconds=%lu&speed=", timestamp);
dtostrf(speed, 2, 3, &body[strlen(body)]);
char* currentPos = request;
// I'm using sprintf for the fixed length strings here
// to make it easier to read.
currentPos += sprintf(currentPos, "POST /data/providereading HTTP/1.1\r\n");
currentPos += sprintf(currentPos, "Host: %s:%d\r\n", server, 5000);
currentPos += sprintf(currentPos, "Connection: keep-alive\r\n");
currentPos += sprintf(currentPos, "Content-Length: %d\r\n", strlen(body));
currentPos += sprintf(currentPos, "Content-Type: application/x-www-form-urlencoded\r\n");
currentPos += sprintf(currentPos, "\r\n");
strcpy(currentPos, body);
// Send the entire request
client.print(request);
// Force the wifi module to send the packet now
// rather than buffering any more data.
client.flush();
}
}
view raw app.ino hosted with ❤ by GitHub

Once uploaded, let’s look at the new WireShark trace:

No TCP Fragmentation

There we go! Sub-millisecond responses from the server, and precisely hitting my desired 50ms window between each sample send.

There’s still ACKs going on obviously, but they aren’t blocking packet issuing, which is the important thing.

Summary

It’s always good to look at the WireShark trace for your requests to see if you’re getting the performance you want, and don’t dismiss the overhead of opening a new TCP connection each time!

Next Steps

Next up in the ‘Modding my Rowing Machine’ series, I’ll be taking this speed data and generating a real-time graph in my browser, that updates continuously! Stay tuned…