Why and How-To Asynchronous Programming

 

Introduction

Our world is becoming globalized, distributed/clouded more and more. That is very true especially for IT in both consumer and enterprise space. Some examples would be Amazon AWS, Google Documents, Windows Azure Services, Office 365, Windows 8 UI Applications. Within this world, to develop scalable, fluent, and less-dependent/resilient applications, asynchronous programming approach can be used.

In this post, I will give you the big picture of async world (what is it, why to use, and how and when/where to use it) within .NET platform. In the next post, I will focus on MS-recommended approach (task-based asynchronous programming TAP).

 

What is it?

In short, asynchronous programming is to enabling delegation of application process to other threads, systems, and or devices. Synchronous programs runs in a sequential manner whereas asynchronous applications can start a new operation and without waiting the new ones’ completion it can continue working on its own flow (main operation). To simplify, let’s visualize a case where a person send an email and can do nothing till a response received from a sender. Here is the tasks beyond sending email are blocked by the response that you have no control over and may take for a while. What would be the asynchronous way is to send the email and continue working on other tasks while waiting the response from sender. In order to capture the full context of asynchronous programming, we need to understand roles and meanings of OS, CLR, application domain, thread, process.

  • Application domain does isolation (security, GC, memory, etc.) and can have 0 or more applications/processes
  • Each process may use 1 or more thread; threads within the same process can share the memory. This is the reason why multi-threaded applications can be problematic and hard to troubleshoot;  concurrency & race conditions.
  • Threads are the lightweight processes/workers. Each process has at least 1 thread (main thread).
  • CLR provides run time for managed codes
When and where to use it?

Any point where your application initiates an operation that is long-running and/or Input Output-bound. For example, you may leverage asynchronous programming in the following scenarios

  • Read/write on a file (I/O-bound)
  • Application has a long-running operation (CPU-bound)
  • Call a remote service (service on cloud) leveraging an ESB.
  • Other places, depending on your KPIs.

 

How does it work?

Let me explain it with an example: A simple application that process data from 2 Xml documents into a list. Here is the pseudo code:

  1. Parse 1st file’s book elements into a list.
  2. Parse 2nd file’s book elements into another list.
  3. Merge lists

Now, I am interested in only line 1-3 to differentiate use of sync and Async calls.

Here is code snippet showing the ProcessXml and Book objects and the method that does the parsing:  

using System;

using System.Collections.Generic;

using System.Linq;

using System.Xml.Linq;

using System.Threading.Tasks;

 

namespace AsyncDemo

{

    public class ProcessXml

    {

        /// <summary>

        /// Parses 'book' named elements from a file into a list of Book objects

        /// </summary>

        /// <param name="path">file path</param>

        /// <returns>List of Books obj</returns>

        public static List<Book> ParseToList(string path)

        {

            //System.Threading.Thread.Sleep(1000);

            if (System.IO.File.Exists(path))

            {

                XDocument doc = XDocument.Load(path);

                if (doc != null)

                {

                    var coll = doc.Root.Elements("book").Select(p =>

                        new Book

                        {

                            Author = p.Elements("author").First().Value,

                            Title = p.Elements("title").First().Value,

                            Genre = p.Elements("genre").First().Value,

                            Price = double.Parse(p.Elements("price").First().Value),

                            PublishDate = DateTime.Parse(p.Elements("publish_date").First().Value),

                            Description = p.Elements("description").First().Value

                        });

                    return coll.ToList<Book>();

                }

            }

            return null;

        }

 

        /// <summary>

        /// Parses 'book' named elements from a file into a list of Book objects

        /// </summary>

        /// <param name="path">file path</param>

        /// <returns>List of Books obj</returns>

        public static Task<List<Book>> XmlProcessEngineAsync(string path)

        {

            //System.Threading.Thread.Sleep(1000);  

            List<Book> list;

            if (System.IO.File.Exists(path))

            {

                XDocument doc = XDocument.Load(path);

                if (doc != null)

                {

                    list = doc.Root.Elements("book").Select(p =>

                        new Book

                        {

                            Author = p.Elements("author").First().Value,

                            Title = p.Elements("title").First().Value,

                            Genre = p.Elements("genre").First().Value,

                            Price = double.Parse(p.Elements("price").First().Value),

                            PublishDate = DateTime.Parse(p.Elements("publish_date").First().Value),

                            Description = p.Elements("description").First().Value

                        }).ToList<Book>();

                    

                }

            }

            return null; // (Task<List<Book>>)list;

        }

    }

 

    public class Book

    {

        public string Author { get; set; }

        public string Title { get; set; }

        public string Genre { get; set; }

        public double Price { get; set; }

        public DateTime PublishDate { get; set; }

        public string Description { get; set; }

    }

}

 

Here are 2 unit test methods that demonstrate the scenario both sync and Async manners:

/// <summary>

/// Tests the method synchronously

/// </summary>

[TestMethod]

public void ParseToListTestSync()

{

    books1 = ProcessXml.ParseToList(file1);

    books2 = ProcessXml.ParseToList(file2);

    var list = MergeLists(books1, books2);

    Assert.IsNotNull(list); 

}

 

/// <summary>

/// Tests the method asynchronously with APM approach w/o callback

/// </summary>

[TestMethod]

public void ParseToListTestAsyncWithAPM()

{

    books1 = books2 = null;            

    DelProcessing del = new DelProcessing(ProcessXml.ParseToList);

    IAsyncResult result = del.BeginInvoke(file2, null, null);

    books2 = ProcessXml.ParseToList(file1);

 

    //if (!result.IsCompleted)

    //{

        //this runs in main thread; do some other stuff while async method call in-progress

        //Thread.Sleep(1000);

    //}

 

    books1 = del.EndInvoke(result);

    var list = MergeLists(books1, books2);

 

    Assert.IsNotNull(list); 

}

 

Here is the screenshot from the run results

image

 

As you can see, the 1st test method calls the function in a synchronous way:

  • takes 17ms that is sum of the time spent each line in the operation.
  • a single thread used
  • sequential

2nd test method calls the function in a asynchronous way:

  • takes 2ms and that is sum of the time spent one of the parsing call (the max one) and time spent for others.
  • leverages delegate’s invoking methods asynchronously that uses another thread from thread pool; on which the assigned method (ParseToList) gets executed behind the scenes. I would certainly recommend some time spend on delegates.
  • Simply, here is 2 threads used (specific to this example); one is the main which starts the application and the other for executing 1st call to execute ParseToList method. When delegate invoked, the main thread continue its operation without waiting 2 thread completion, which is stated by EndInvoke method.

Here is the picture demonstrates this:

SyncVsAsync

 

How to-do in .NET?

Since the version 1.1, .NET support asynchronous programming, since then each release brought new enhancements (delegates, TPL, etc.). Here is the picture taken when searching Async methods available within With FW 4.5.

image

With FW 4.5, there are 3 patterns available for Async development:

  • Asynchronous Programming Model (APM) or IAsyncResult pattern:
    • Available since FW 1.1
    • Requires 2 methods minimum (Begin and End prefixed methods)
  • Event-based Asynchronous Pattern(EAP): Available since FW 2.0.
  • Task-based Asynchronous Pattern(TAP): Introduced with FW 4.0 and enhanced with FW 4.5

APM style can be implemented in 2 ways; with or without a callback. Sample call above (ParseToListTestAsyncWithAPM) is an example for non-callback APM. We can implement same functionality with a callback as seen below:

   1: /// <summary>

   2: /// Tests the method asynchronously with APM callback

   3: /// </summary>

   4: [TestMethod]

   5: public void ParseToListTestAsyncWithAPM_Callback()

   6: {

   7:     books1 = books2 = null;

   8:     int t1 = System.Threading.Thread.CurrentThread.ManagedThreadId;

   9:     int t2 = 0;

  10:  

  11:     DelProcessing del = new DelProcessing(ProcessXml.ParseToList);

  12:     IAsyncResult result = del.BeginInvoke(file2, (r) =>

  13:     {                

  14:         books1 = del.EndInvoke(r);

  15:         t2 = System.Threading.Thread.CurrentThread.ManagedThreadId;

  16:     }, null);

  17:  

  18:     books2 = ProcessXml.ParseToList(file1);

  19:  

  20:     var list = MergeLists(books1, books2);

  21:  

  22:     Assert.IsNotNull(list);

  23:     Assert.IsTrue(t2 > 0);

  24: }

Let me explain this little bit more in detail:

  • Lambda expression used for callback method
  • ‘r’ represents an IAsyncResult
  • To simulate, what threads used, t1 (representing the main thread id) and t2 (id of the second thread) are used and their values are 11, and 9 respectively.
  • Note that, threads are scheduled by OS; so no control you have when it starts and stops! For example, above, if t1 thread executes line 23 (Assert.IsTrue(t2 > 0);) before t2 completed, then this test method will fail (t2 is still 0). That means, you need to pay attention when and where to use async calls in your application.

 

TAP is the simplest one and is recommended by MS. Here is the code for implementing the same scenario with TAP:

   1: /// <summary>

   2: /// Tests the method asynchronously with TAP

   3: /// </summary>

   4: [TestMethod]

   5: public void ParseToListTestAsyncWithTAP()

   6: {

   7:     int t1 = System.Threading.Thread.CurrentThread.ManagedThreadId;

   8:     int t2 = 0;

   9:  

  10:     books1 = books2 = null;

  11:     Task.Factory.StartNew(()=> {

  12:         books1 = ProcessXml.ParseToList(file2);

  13:         t2 = System.Threading.Thread.CurrentThread.ManagedThreadId;

  14:     });

  15:     books2 = ProcessXml.ParseToList(file1);

  16:  

  17:     var list = MergeLists(books1, books2);

  18:  

  19:     Assert.IsNotNull(list);

  20:     Assert.IsTrue(t2 > 0);

  21: }

 

TAP is hot:), will explain this in detail in my next post hopefully. For now, I would like to share the results of my efforts so far with you:

image

Obviously, perhaps another post would be good for comparing sync vs async or APM vs TAP by running load tests. We will see. This is a very live world/sector and there are many things to unleash, is not it?

 

Conclusion

Wow, that has been my longest post:). Forgot how fast time passed here in Robert’s Coffee in Istanbul.

Well, in this post, I have explained various aspects of asynchronous programming; meaning, differentiations, why and how-to-use. Asynchronous programming can be implemented in both client and server side and provides scalability and performance advantages over synchronous programming. I would certainly recommend you to invest some time on this, since it is now simpler (TAP) and use of it becomes almost a must-have due to more integration to cloud applications.

 

References