Easily loading lots of data in parallel over HTTP, using Dataflow in .NET Core

I recently had a requirement to load a large amount of data into an application, as fast as possible.

The data in question was about 100,000 transactions, stored line-by-line in a file, that needed to be sent over HTTP to a web application, that would process it and load it into a database.

This is actually pretty easy in .NET, and super efficient using async/await:

Run that through, and I get a time of 133 seconds; this isn’t too bad right? Around 750 records per second.

But I feel like I can definitely make this better. For one thing, my environment doesn’t look exactly look like the diagram above. It’s a scaled production environment, so looks more like this:

I’ve got lots of resources that I’m not using right now, because I’m only sending one request at a time, so what I want to do is start loading the data in parallel.

Let’s look at a convenient way of doing this, using the System.Threading.Tasks.Dataflow package, which is available for .NET Framework 4.5+ and .NET Core.

The Dataflow components provide various ways of doing asynchronous processing, but here I’m going to use the ActionBlock, which allows me to post messages that are subsequently processed by a Task, in a callback. More importantly, it let’s me process messages in parallel.

Let’s look at the code for my new StreamDataInParallel method:

The great thing about Dataflow is that in only about 18 lines of code, I’ve got parallel processing of data, pushing HTTP requests to a server at a rate of my choice (controlled by the maxParallel parameter).

Also, with the combination of the SendAsync method and specifying a BoundedCapacity, it means I’m only reading from my file when there are slots available in the buffer, so my memory consumption stays low.

I’ve run this a few times, increasing the number of parallel requests each time, and the results are below:

Sadly, I wasn’t able to run the benchmarking tests on the production environment (for what I hope are obvious reasons), so I’m running all this locally; the number of parallel requests I can scale to is way higher in production, but it’s all just a factor of total available cores and database server performance.

Value of maxParallelAverage Records/Second
1750
21293
31785
42150
52500
62777
72941
83125

With 8 parallel requests, we get over 3000 records/second, with a time of 32 seconds to load our 100,000 records.

You’ll notice that the speed does start to plateau (or at least I get diminishing returns); this will happen when we start to hit database contention (typically the main throttling factor, depending on your workload).

I’d suggest that you choose a sensible limit for how many requests you have going so you don’t accidentally denial-of-service your environment; we’ve got to assume that there’s other stuff going on at the same time.

Anyway, in conclusion, Dataflow has got loads of applications, this is just one of them that I took advantage of for my problem. So that’s it, go forth and load data faster!

Displaying Real-time Sensor Data in the Browser with SignalR and ChartJS

In my previous posts on Modding My Rowing Machine, I wired up an Arduino to my rowing machine, and streamed the speed sensor data to an ASP.NET core application.

In this post, I’m going to show you how to take sensor and counter data, push it to a browser as it arrives, and display it in a real-time chart.

If you want to skip ahead, I’ve uploaded all the code for the Arduino and ASP.NET components to a github repo at https://github.com/alistairjevans/rower-mod.

I’m using Visual Studio 2019 with the ASP.NET Core 3.0 Preview for all the server-side components, but the latest stable release of ASP.NET Core will work just fine, I’m not using any of the new features.

Pushing Data to the Browser

So, you will probably have heard of SignalR, the ASP.NET technology that can be used to push data to the browser from the server, and generally establish a closer relationship between the two.

I’m going to use it send data to the browser whenever new sensor data arrives, and also to let the browser request that the count be reset.

The overall component layout looks like this:

Setting up SignalR

This bit is pretty easy; first up, head over to the Startup.cs file in your ASP.NET app project, and in the ConfigureServices method, add SignalR:

Next, create a SignalR Hub. This is effectively the endpoint your clients will connect to, and will contain any methods a client needs to invoke on the server.

SignalR Hubs are just classes that derive from the Hub class. I’ve got just the one method in mine at the moment, for resetting my counter.

Before that Hub will work, you need to register it in your Startup class’ Configure method:

You’re also going to want to add the necessary SignalR javascript to your project. I did it using the “Manage Client-Side Libraries” feature in Visual Studio; you can find my entire libman.json file (which defines which libraries I’m using) on my github repo

Sending Data to the Client

In the MVC Controller where the data arrives from the Arduino, I’m going to push the sensor data to all clients connected to the hub.

The way you access the clients of a hub from outside the hub (i.e. an MVC Controller) is by resolving an IHubContext<THubType>, and then accessing the Clients property.

Pro tip:
Got multiple IO operations to do in a single request, that don’t depend on each other? Don’t just await one, then await the other; use Task.WhenAll, and the operations will run in parallel.

In my example above I’m writing to a file and to SignalR clients at the same time, and only continuing when both are done.

Browser

Ok, so we’ve got the set-up to push data to the browser, but no HTML just yet. I don’t actually need any MVC Controller functionality, so I’m just going to create a Razor Page, which still gives me a Razor template, but without having to write the controller behind it.

If I put an ‘Index.cshtml’ file under a new ‘Pages’ folder in my project, and put the following content in it, that becomes the landing page of my app:

In my site.js file, I’m just going to open a connection to the SignalR hub and attach a callback for data being given to me:

That’s actually all we need to get data flowing down to the browser, and displaying the current speed and counter values!

I want something a little more visual though….

Displaying the Chart

I’m going to use the ChartJS library to render a chart, plus a handy plugin for ChartJS that helps with streaming live data and rendering it, the chartjs-plugin-streaming plugin.

First off, add the two libraries to your project (and your HTML file), plus MomentJS, which ChartJS requires to function.

Next, let’s set up our chart, by defining it’s configuration and attaching it to the 2d context of the canvas object:

Finally, let’s make our chart display new sensor data as it arrives:

With all that together, let’s see what we get!

Awesome, a real-time graph of my rowing!

As an aside, I used the excellent tool by @sarah_edo to generate a CSS grid really quickly, so thanks for that! You can find it at https://cssgrid-generator.netlify.com/

You can check out the entire solution, including all the code for the Arduino and the ASP.NET app, on the github repo at https://github.com/alistairjevans/rower-mod.

Next up for the rowing machine project, I want to put some form of gamification, achievements or progress tracking into the app, but not sure exactly how it will look yet.

Streaming real-time sensor data to an ASP.NET Core app from an Arduino

In my previous posts on Modding my Rowing Machine, I got started with an Arduino, and started collecting speed sensor data. The goal of this post is to connect to the WiFi network and upload sensor data to a server application I’ve got running on my laptop in as close to real-time as I can make it.

Connecting to WiFi

My Arduino Uno WiFi Rev 2 board has got a built-in WiFi module; it was considerably easier than I expected to get everything connected.

I first needed to install the necessary library to support the board, the WiFiNINA library:

Then you can just include the necessary header file and connect to the network:

To be honest, that code probably isn’t going to cut it, because WiFi networks don’t work that nicely. You need a retry mechanism with timeouts to keep trying to connect. Let’s take a look at the full example:

The Server

To receive the data from the Arduino, I created a light-weight ASP.NET Core 3.0 web application with a single controller endpoint to handle incoming data, taking a timestamp and the speed:

Then, in my Arduino, I put the following code in a method to send data to my application:

I just want to briefly mention one part of the above code, where I’m preparing body data to send.

The Arduino libraries do not support the %f specifier (for a float) in the sprintf method, so I can’t just add the speed as an argument there. Instead, you have to use the dtostrf method to insert a double into the string, specifying the number of decimal points you want.

Also, if you specify %d (int) instead of %lu (unsigned long) for the timestamp, the sprintf method treats the value as a signed int and you get very strange numbers being sent through for the timestamp.

Once that was uploaded, I started getting requests through!

Performance

We now have HTTP requests from the Arduino to our ASP.NET Core app. But I’m not thrilled with the amount of time it takes to execute a single request.

If we take a look at the WireShark trace (I love WireShark), you can see that each request from start to finish is taking in the order of 100ms!

This is loads, and I can’t have my Arduino sitting there for that long.

ASP.NET Core Performance

You can see in the above trace that the web app handling the request is taking 20ms to return the response, which is a lot. I know that ASP.NET Core can do better than that.

Turns out this problem was actually due to the fact I had console logging switched on. Due to the synchronisation that takes place when writing to the console, it can add a lot of time to requests to print all that information-level data.

Once I turned the logging down from Information to Warning in my appsettings.json file, it got way better.

That’s better!

That actually gives us sub-millisecond response times from the server, which is awesome.

TCP Handshake Overhead

Annoyingly, each request is still taking up to 100ms from start of connection to the end. How come?

If you look at those WireShark traces, we spend a lot of time in the TCP handshaking process. Opening a TCP connection does generally come with lots of network overhead, and that call to client.connect(SERVER, SERVERPORT) in my code blocks until the TCP connection is open; I don’t want to sit there waiting for that every time I want to send a sample.

The simple solution to this is to make the connection stay open between samples, so we can just repeatedly sent data on the same connection, only needing to do the handshake once.

Let’s rework our previous sendData code on the Arduino to keep the connection open:

In this version, we ask the server to leave the connection open after the request, and only open the connection if it is closed. I’m also not blocking waiting for a response.

This gives us way better behaviour, and we’re now down to about 40ms total:

There’s one more thing that I don’t love about this though…

TCP Packet Fragmentation

So, what’s left to look at?

TCP segments

I’ve got a packet preceding each of my POST requests, that seems to hold things up by around 40ms. What’s going on here? Let’s look at the content of that packet:

Wireshark data view

What I can tell from this is that rather than wait for my HTTP request data, the Arduino is not buffering for long enough, and is just sending what it has after the first println call containing POST /data/providereading HTTP/1.1. This packet fragmentation slows everything up because the Arduino has to wait for an ACK from the server before it continues.

I just wanted to point out that it looks like the software in the Arduino libraries isn’t responsible for the fragmentation; it looks all the TCP behaviour is handled by the hardware WiFi module, that’s what is splitting my packets.

To stop this packet fragmentation, let’s adjust the sending code to prepare the entire request and send it all at once:

Once uploaded, let’s look at the new WireShark trace:

No TCP Fragmentation

There we go! Sub-millisecond responses from the server, and precisely hitting my desired 50ms window between each sample send.

There’s still ACKs going on obviously, but they aren’t blocking packet issuing, which is the important thing.

Summary

It’s always good to look at the WireShark trace for your requests to see if you’re getting the performance you want, and don’t dismiss the overhead of opening a new TCP connection each time!

Next Steps

Next up in the ‘Modding my Rowing Machine’ series, I’ll be taking this speed data and generating a real-time graph in my browser, that updates continuously! Stay tuned…