Friday, April 5, 2024

ParallelForAsync alternatives

The Parallel.For and similar methods available in .NET Framework 4.0+ and .NET Standard 2.0+ are designed to simplify parallelism of CPU intensive processing. The Task and related classes (released a bit sooner) when combined with the async and await C# keywords facilitate simple coding patterns for fine-grained concurrency. Those class families are intended to help with different types of problems, but there are times when it's convenient to combine them, and a typical case I have is to query and update large numbers of Azure Blobs as efficiently as possible.

ForEach and ForEachAsync NO

A naïve code sample might look like this:

Parallel.ForEach(nameSource, async (name) =>
{
    await InspectBlobAsync(name, cancelToken);
}

However, Tasks for all of the source elements are scheduled at once, then the async work inside each one will run in parallel. But I can't quickly determine in what order they run or with what degree of parallelism, which might be chosen by the runtime or operating system (a separate research project). Note that if thousands of Tasks are scheduled, they don't all run at once and create thousands of Threads, they are internally throttled. Even worse, the Parallel.ForEach call is synchronous and blocks the caller. You might then think of putting the whole thing in a Task.Run(…) ... which creates more useless Threads and they continue after the outer call immedately returns, so it just makes things worse.

The Parallel.ForEachAsync family is available in .NET 6+ to allow a simple and convenient way to combine parallelism and fine-grained concurrently, but I couldn't use that class because my library was compelled to target Standard 2.0 so it could be used by .NET Framework projects.

Semaphore throttling NO

You will find many articles that use a SemaphoreSlim to throttle a while loop so that a maximum number of Tasks run at once. It's like a gate that opens to let a new Task in as each leaves. I found this technique worked perfectly and planned to release it, but then I discovered that cancellation was a serious problem. I couldn't find a sensible way to signal a cancellation token and get the loop to gracefully exit and ignore waiting Tasks. There may be a way, but it was too much bother so I abandoned the semaphore technique.

Task WhenAll NO

The following code seems like a natural simple solution:

var Tasks = nameSource.Select(name => InspectBlobAsync(name, cancelToken));
await Task.WhenAll(tasks);

That coding pattern using WhenAll is fabulous at smaller scale, say, when you want to run a handful of concurrent network, file or database calls. If you use this pattern to process hundreds or thousands of async calls then you get the same ballistics as the previous ForEach async approach, that is, all the threads are scheduled at once and then run unpredictably in parallel.

Since I'm processing many thousands of Blobs, I'm not sure if the creation and scheduling of that many Tasks is harmful or not. I can't find any definitive discussion of this topic. In any case, my instinct said not to use WhenAll, so I looked for a new technique.

ConcurrentQueue YES

I finally decided to use the ConcurrentQueue class. I load the queue with the names of the thousands of Blobs, which I think is acceptable, and not resource stressful. I then create some Tasks, each containing a simple while loop which which dequeues (aka pulls) the names off the queue and asynchronously processes them. The skeleton code looks lik this:

int concurrentCount = 4;
using (cts = new CancellationTokenSource())
{
  var queue = new ConcurrentQueue<string>(nameSource);
  var pullers = Enumerable.Range(1, concurrentCount).Select(seq => QueuePuller(seq, queue, cts.Token));
  await Task.WhenAll(pullers);
}

async Task QueuePuller(int seq, ConcurrentQueue<string> queue, CancellationToken cancelToken)
{
  while (queue.TryDequeue(out string name))
  {
    try
    {
      await InspectBlobAsync(name, cancelToken);
    }
    catch (OperationCanceledException)
    {
      // Report a cancellation. Break out of the loop and end the puller.
      break;
    }
    catch (Exception ex)
    {
      // Inspect the error and decide to report it or break and end the puller.
    }
  }
}

Although there is more code when using ConcurrentQueue, I feel that it's simple and sensible. The concurrentCount value can be adjusted as needed or set according to the number of cores and it becomes the parallelism throttle (like using a Semaphore, but simpler). Cancellation works simply as well, as signaling the cancel token causes all of the pullers to gracefully exit.

BlockingCollection

It's worth mentioning the BlockingCollection class for more sophisticated scenarios. The BlockingCollection class can be used in a similar pattern to the ConcurrentQueue class, where each can have items pushed and pulled from their internal collections, but the former provides more control over how this happens. See the docs for more information and run searches for samples.

Friday, February 16, 2024

DateTime Ticks Breakdown

For more than 20 years I've been looking at DateTime.Ticks Int64 values and they always look like 63nnnnnnnnnnnnnnnn, and I was wondering how long I have to wait until the leading digits become 64. A quick calculation reveals that we must wait until Monday, 29 January 2029 17:46:40. If you want to wait until the leading digit changes from 6 to 7 then we have to wait until Saturday, 20 March 2219 04:26:40.

If you're wondering when the 64-bit signed Ticks value will overflow and cause a new epochal form of millenium bug, then we have to wait about 29.2 million years.

Here is a nice breakdown of the digits in a Ticks value. Remember that the least significant digit (rightmost) is 100 nanoseconds (10-7 seconds).

UTC Now
Friday, 16 February 2024 06:56:32 ║ 2024-02-16T06:56:32.2446131Z

Ticks
638436633922446131 ║ 638,436,633,922,446,131

Seconds
63,843,663,392

DigitPowerSecondsSpanYears
1-70.000000100:00:00.0000001
3-60.00000100:00:00.000001
1-50.0000100:00:00.00001
6-40.000100:00:00.0001
4-30.00100:00:00.001
4-20.0100:00:00.01
2-10.100:00:00.1
20100:00:01
911000:00:10
3210000:01:40
33100000:16:40
641000002:46:40
651000001.03:46:400.003
36100000011.13:46:400.032
4710000000115.17:46:400.317
881000000001157.09:46:403.169
39100000000011574.01:46:4031.688
61010000000000115740.17:46:40316.881

Thursday, December 7, 2023

Visual Studio conditional compile any file contents

The MSBuild tools and Visual Studio provide a variety of techniques for conditionally compiling source code based upon the active configuration. The #if preprocessor directives allow lines of code to included or excluded based upon defined symbols. The Condition element in project files allows whole files to be included or excluded from the build.

Using #if is particularly useful, but it only works inside C# source code files. Over recent years I have increasingly wanted to apply conditional compilation to other types of project text files such as .html, .js, .txt, etc, which need slightly different contents according to the active configuration. Sometimes I have to manually alter the contents of the files before publishing from Visual Studio to different hosts with different configurations. Editing the files manually is tedious and error-prone. In late 2023 I finally got fed-up with this and found a way of automatically editing the contents of arbitrary text files in a build, based upon the configuration.

Note that if the contents of the text files differed significantly for different configuration builds then it could be better to maintain separate files and use Condition to include the desired files. Unfortunately for me, usually only a few lines might change in the text files, so a #if technique would be preferable.

A skeleton sample C# solution is available which demonstrates all the techniques and tricks required to make this work. There are comments prefixed with a ▅ character (easy to see!) in all important parts of the solution files to explain what is happening. Full source is available from this Azure DevOps repository:

ConditionBuildDemo

In a nutshell, the key trick to making this work is to use a T4 template (a .tt file) to generate each text file that must be customised in some way for different build configurations. I feel this is a little bit clumsy, but in consolation, this is exactly why T4 templates were invented and they also blend smoothly into Visual Studio projects.

Another trick is to pass the name of the current $(Configuration) into the templates so they can be used in the generation logic. The rather obscure <T4ParameterValues> project element can be used for that.

Yet another trick is to make the templates generate when the project starts to build, not just when the template files change. The <Touch> and <TransformOnBuild> projects elements help with that.

There are a few more small technical details than I haves skipped for brevity, but the comments in the solution files explain everything.

Friday, October 13, 2023

Custom build properties and items

Common Properties

Microsoft msbuild processing provides a convenient way of factoring out common build properties that might be shared by many projects. I read about this feature years ago, then forgot about it, so I'm posting this as a reminder to myself and any other developers who might be interested. See Customize the build by folder for information on how you can place the file Directory.Build.props in a suitable parent folder of projects and it will silently be found and used by all child projects. Many projects in a large solution can share build properties by placing a .props file in the solution folder, with contents like this example:

<Project>
  <PropertyGroup>
    <Version>1.2.3</Version>
    <TargetFrameworks>netstandard2.0;net6.0</TargetFrameworks>
    <Authors>The Big Company</Authors>
    <Company>The Big Company</Company>
    <Copyright>Copyright © 1992-2023 The Big Company</Copyright>
    …etc…
  </PropertyGroup>
</Project>

Be careful though if you have other unrelated child projects that might inherit the properties. I had some test projects of mixed types that were spoiled by the common properties, so I had to either move them to a different non-child folder or manually put the correct local override values into their project files.

Common Items

Another common requirement is to put auto-calculated metadata values into the build process. In my case I wanted to put the build time and build machine name into the compiled assembly. After many experiments I discovered the easiest technique is to create a file named Directory.Build.targets in the solution folder next to the custom .props file, with contents like this example:

<Project>
  <ItemGroup>
    <AssemblyAttribute Include="System.Reflection.AssemblyMetadataAttribute">
      <_Parameter1>BuildMachine</_Parameter1>
      <_Parameter2>$(COMPUTERNAME)</_Parameter2>
    </AssemblyAttribute>
    <AssemblyAttribute Include="System.Reflection.AssemblyMetadataAttribute">
      <_Parameter1>BuildTime</_Parameter1>
      <_Parameter2>$([System.DateTime]::Now.ToString("yyyy-MM-dd HH:mm:ss K"))</_Parameter2>
    </AssemblyAttribute>
  </ItemGroup>
</Project>

In this case I'm generating a pair of AssemblyMetadata attributes into the build process. You can generate other attributes as needed and web searches will reveal similar techniques. I wasn't previously aware that a .targets file would be found the same way as the .props file, but I tried it, and it works (I presume it's documented somewhere).

Saturday, August 19, 2023

Enumerate Azure Storage Accounts (New)

In April 2021 I posted an article titled Enumerate Azure Storage Accounts which explained how to enumerate all of the storage accounts in an Azure subscription, then drill down into all the containers and blobs, and tables and rows. This sort of code can be used as the basis of some useful custom reporting tools.

Unfortunately, the old code uses deprecated classes, so after a few concentrated hours of study and suffering I found modern replacement code. The code linked below is a skeleton of the modern way to enumerate storage accounts and their contents. For more details see the Azure SDK Samples.

➤ Note that I've used the environment variables credentials, which looks a bit clumsy in the sample. Look for documentation on classes derived from TokenCredential and pick one that suits your needs.

An example C# console command that uses the new Azure sdk libraries can be downloaded from here:

SubscriptionReaderSample.cs

The .cs file has been renamed as a .txt file to avoid security blocks.

Wednesday, August 16, 2023

Azure Table 'batch' (transaction) operations

To bulk insert rows in the old Azure Table Storage API you would create a TableBatchOperation class and fill it with TableOperations, then call ExecuteBatchAsync. The batch could contain different operations, but bulk inserts were my most common need.

The batch related classes do not exist in the new Table API, and finding the replacement code was a dreadful chore. I didn't know the expression batch had been replaced with transaction, so my searches for "batch" produced endless useless old results and discussions. After searching until my fingers bled I stumbled across a hint that batch processing as was now transaction processing. Useful search results now started to arrive, but it took more time to finally find some definitive sample code that actually worked. The best summary page I found was here:

azure-sdk-for-net Transactional Batches

My sanity check code in LINQPad looks like this:

async Task Main()
{
	var tsclient = new TableServiceClient("YOUR STORAGE CONNECT STRING");
	var tclient = tsclient.GetTableClient("TestTable1");
	await tclient.CreateIfNotExistsAsync().Dump();
	var trans = new List<TableTransactionAction>>();
	var rows = Enumerable.Range(1, 10).Select(i => new MockRow()
	{
		Id = i,
		Name = $"Name for {i}"
	} ).ToArray();
	trans.AddRange(rows.Select(r => new TableTransactionAction(TableTransactionActionType.Add, r)));
	await tclient.SubmitTransactionAsync(trans).Dump();
}

class MockRow : ITableEntity
{
	public string PartitionKey { get; set; } = "P1";
	public string RowKey { get; set; } = Guid.NewGuid().ToString("N");
	public DateTimeOffset? Timestamp { get; set; }
	public ETag ETag { get; set; }
	public int Id { get; set; }
	public string Name { get; set; }
}

Transaction success returns status 202 and a list of sub-responses with status 204. I deliberately caused an error while inserting a transaction of 5 rows, by setting the second RowKey with invalid characters. As advertised, no rows are inserted and a TableTransactionFailedException is thrown with a detailed message like this:

2:The 'RowKey' parameter of value 'Bad\Key' is out of range.
RequestId:c10901e2-e002-009a-7aba-cf9069000000
Time:2023-08-15T20:55:37.4158811Z
 The index of the entity that caused the error can be found in FailedTransactionActionIndex.
Status: 400 (Bad Request)
ErrorCode: OutOfRangeInput
Additional Information:
FailedEntity: 2

Friday, February 3, 2023

localhost has been blocked by CORS policy

For many years I could not debug a Web API service and Blazor app on localhost. I would debug-run the service in one instance of Visual Studio 2022 and the Blazor app in another instance. The first client call to the service would return:

Access to fetch at 'http://localhost:5086/endpoint' from origin 'http://localhost:56709' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

At first this was only an inconvenience and I spent a couple of hours annually trying to overcome the problem. Hundreds of web search results produced a confusing jumble of suggestions, some ridiculously complex, some for the wrong platform, some absurd, and some seemingly sensible ones that did not work! In early 2023 the inability to debug on my desktop over localhost became a serious impediment and I swore to solve it. After approximately 3 solid hours of research and experiments I found the answer.

In the Program.cs code of the .NET 6 Web Api project you insert an AddCors statement as soon as the builder is created.

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddCors(options =>
{
  options.AddDefaultPolicy(policy =>
  {
    policy.AllowAnyHeader().AllowAnyMethod().AllowAnyOrigin();
  });
});

Further down, as soon as the app is created, follow it with a UseCors statement.

var app = builder.Build();
app.UseCors();

I'm sure I've seen that fix code over the years, but it never worked. Perhaps I was on an older Framework, or maybe I had the code in the wrong sequence, or maybe any other number of subtle mistakes could have sabotaged my experiments. The most infuriating aspect of the problem is that the client error message tells you that the 'Access-Control-Allow-Origin' response header is missing, which is true, but it's not the core of the problem. I think I wasted hours in futile efforts to add the header.

The breakthrough clue about what was really causing the problem was revealed by Fiddler, it showed that the first web service call was an OPTIONS request to the service endpoint. The response was 405 Method Not Allowed, which helped steer me to the fix code listed above. It wasn't clear sailing, because I forgot to add the UseCors statement and I wasted half an hour wondering why the AddCors was having no effect.

You may ask why I didn't use Fiddler years ago … I did, but it would never show me the localhost traffic coming through Visual Studio. Now, during my latest efforts, for some unexplained reason I am seeing all the traffic and the problem was revealed.

I'm quite shocked to see that every client call to the web service is silently an OPTIONS, which is then followed by the real request verb. Firstly I worry about the overhead, and then I worry about secret requests being made without my knowledge. I'll have to research when and why this happens, and I'll append a note if I find something useful.

UPDATE May 2023

Run a search for words like "OPTIONS CORS PREFLIGHT" and you will find explanations of the mechanism I complained about. The overhead of preflight requests is not as bad as it looks. There are optimisations and the concept of simple requests that make CORS less onerous that it seems. CORS is however a damn curse on developer testing.