How to leverage functional techniques in C#

Published 03/11/2021 8:37 | Tech News | comments
Rate this post

Key takeaways

  • Since they are stateless, pure functions offer measurable benefits when working with high-load applications and high data parallelism. These benefits include no side effects associated with the object’s state and no need to use synchronization primitives, libraries, and frameworks when writing unit tests.
  • Pure functions are used for dealing with multi-thread scenarios involving mutable states. However, when multiple threads work on one iteration, the threads can queue up and form a bottleneck waiting until the required resource is available. The solution to the problems of a mutable state is the use of LINQ technology.
  • Technically, LINQ is a collection of extension methods that can be chained as they all extend the IEnumerable interface. This technology implements all principles of functional programming.
  • If you plan to migrate to multithreading and improve thread safety, it makes sense to use many specific C# methods.
  • C# 9.0 new version release made another step in becoming more thread-safe by introducing the records and init-only properties.

* * *

Over the past five years, humanity has produced more information than during the whole preceding history, and this fact perfectly illustrates the avalanche-like growth of loads today. 

Let’s take the processor production industry, which has not become an exception to this tendency. The industry follows Moore’s law that claims the number of transistors to double every two years. But at some point, the limit had become pretty close, and we started looking into a new approach to augment the performance of computer systems and applications. This is how the multi-core processors came into play. Multi-core CPUs allowed “true” parallel computing that forced developers to rethink the way they do coding. This situation led to an increase in development and research related to parallel computing and its potential benefits connected to performance. 

However, the increase in performance usually comes with a price: race conditions, deadlocks, and possible data corruption. We identify this set of issues as problems related to the mutable state of shared resources. According to modern software engineering, one of the best ways to mitigate such problems is using functional programming. This programming paradigm does not rely on shared resources by default, and in purely functional programming, the developer isn’t allowed to use a mutable state. 

The most used programming paradigm nowadays is object-oriented programming (OOP), which allows the developer to structure concepts and functionality as classes and objects. There are many discussions related to the possibility of mixing it with the functional programming paradigm in order to get the best characteristics of both paradigms. One of the ways to do so involves writing OOP code that does not have a state and does not mutate the allocated resources. This is known as writing the business logic in “pure functions.”

What is a pure function?

A pure function is a function whose result depends only on its arguments and the business logic it describes. Since they are stateless, pure functions have measurable benefits when working with high-load applications and high data parallelism:  

  • no side effects (i.e., mutate state, I/O operations, exception raise) associated with the state of the object;
  • writing unit tests for pure functions does not require any libraries or frameworks to mock the dependencies or interactions, maximizing the testability level;
  • there is no need to use synchronization primitives such as Lock, Semaphore, Mutex, and others.

When used with OOP, pure functions are usually part of multi-threaded scenarios involving mutable states. In most of these cases, the functions are executed in concurrent threads. But when multiple threads are working on one iteration over the same data, there is a risk that one thread might prevent the others from working properly because it changes the state of the resource being shared across all threads. When that happens, the threads queue up, waiting until the required resource (the data) is available, thus forming a bottleneck. 

To provide the correct and controlled access to the shared resources, the part of the code containing the resource is normally converted to allow operation in a single-thread mode. This is known as the “locked” part. The described issue makes the “locked” part of the application single-threaded and very slow. One of the solutions to most of the common problems involving a mutable state in such scenarios (such as filtering, mapping, grouping, aggregating, etc.) was found through the use of LINQ technology.

What makes LINQ a functional approach in the OOP paradigm

First and foremost, C# is a multi-paradigm programming language, meaning it provides a framework that enables developers to mix multiple styles applying different paradigms to solve issues in the most efficient way. There are many functional programming principles that can be used in C#. One of them is LINQ, a technology introduced in 2007 that implements a declarative approach to writing database queries. Although it can not be considered as pure functional programming, it has the potential to be used as such once we follow the rules on which the functional paradigm is based.

The functional paradigm stands on four main principles: first-class and higher-order functions, pure functions, referential transparency, and immutability.

So, the first principle — the first-class function of a higher order can either use other functions as arguments or return them as results. This is exactly what LINQ does. Let’s check some well-known functions from a Map/Reduce pattern of functional programming. In this pattern of functional programming, the “Map” part should be implemented with the method that performs filtering and sorting. LINQ already contains a collection of methods that delivers this functionality with optimized performance. Filtering is implemented as follows: 

Where<TSource>(IEnumerable<TSource>, Func<TSource,Int32,Boolean>)

Sorting is implemented by: 

OrderBy<TSource,TKey>(IEnumerable<TSource>, Func<TSource,TKey>)
OrderByDescending<TSource,TKey>(IEnumerable<TSource>, Func<TSource,TKey>) 

The WHERE method filters the initial collection using a predicate that describes the filtering rules for the TSource object. The predicate must be a method that returns a boolean type. The initial collection should implement the IEnumerable interface, which is implemented by almost all collections in .NET. All objects of the initial collection that satisfy the condition are copied to the new collection without any changes.

Similarly, the ORDERBY method is used to order elements according to a rule implemented by the predicate.

The “Reduce” part of the Map/Reduce pattern can be implemented by any method that the data previously filtered. LINQ has a number of different methods of aggregation, such as:

Aggregate<TSource,TAccumulate,TResult>(IEnumerable<TSource>, TAccumulate, Func<TAccumulate,TSource,TAccumulate>, Func<TAccumulate,TResult>)

Average(IEnumerable<Single>)

Count<TSource>(IEnumerable<TSource>)

Select<TSource,TResult>(IEnumerable<TSource>, Func<TSource,Int32,TResult>)

Observe that the SELECT method can be called over a filtered collection implementing the IEnumerable interface. The second parameter of the function is a generic delegate, which in practice is a function that will be executed while executing the SELECT. The SELECT statement converts the initial collection to a new collection of elements that can differ from the initial ones. By doing so, you can use SELECT as an aggregator over an existing result from another query.

The second and arguably the most important principle of functional programming is the usage of pure functions. LINQ functions are pure by default, as their output depends only on the input expression provided by the programmer. LINQ can be used with lambda expressions, which leverage the lexical context of the environment in which they were declared. Here is an example of a LINQ method using lambda expressions:

int departmentId = 32;

var sumSpentForSalaries = initialCollection.Where(x => x.Id == departmentId ).Count(x => x.Salary);

In this example, we have the initial collection implementing an IEnumerable interface. Let’s assume this is a collection List<TItem>. To calculate the amount of money spent on salary, we need to filter (WHERE) the exact department of interest and then aggregate the results. But how does it know what the value of an ID is? The expression takes this parameter from the lexical environment of where it was created, (the departmenId variable), and the lambda expression uses it as a filtering parameter. The filtering logic does not depend on anything from outside the function except for the input parameter used, and its output is deterministic, which clearly fits the definition of pure function.

Referential transparency stands for the idea where an expression in a program may be replaced by its corresponding value without changing the result of the program. This implies that methods should always return the same value for a given argument without having any other effect. The example from the second principle explicitly shows that the method output depends on two things: initial collection and id provided. Let’s rewrite the code to show that:

bool WhereFunction(IEnumerable<TItem> initialCollection, int id)

{

foreach(var item in initialCollection)

{

if(item.Id == id)

yield return item;

{

}

IEnumerable<TItem2> CountFunction(IEnumerable<TItem> initialCollection)

{

var aggregate = 0;

foreach(var item in initialCollection)

{

aggregate += item;

{

}

Regardless of which class we call this method and what state this class has, the only thing these functions use is provided in the input parameters to the method.

Another principle of LINQ states is that nothing should change its state after it was initialized. LINQ is completely fine with this as every time it does Map/Reduce logic, it creates the new collection of IEnumerable or an aggregate.

Concluding all mentioned above, LINQ is a bright example of the functional approach to implementing the business logic in C#.

The next important milestone we’ve come to is the special techniques you should consider for working with pure functions.

Techniques for working with pure functions

 

To execute task parallelism, the data should not be mutable. Pure functions are used to prevent data mutability, so let’s learn more about some techniques that help work with pure functions. 

Pure function compositions

Function composition is the technique of using one or more functions as an input for another function. For example, the composition of the function G and F produces the function h, which means applying the function G to the result of function F.

C# uses the concept of delegates, which is a type that refers to methods with a parameter list and return type. Delegates are used to pass methods as arguments to other methods. So, the general function composition looks like this:

Let’s start with an example of composition based on the OOP paradigm and then modify it to use a functional approach. Suppose we have a warehouse where robots must register in the warehouse net and then compile the order. From a code perspective, it looks like:

The two methods above can be composed as follows: 

However, the more operations the robot must execute before compiling the order, the longer this method chain will be. In order to avoid that, we can use the following syntax:

In this case, using a functional composition approach can be used to improve code readability.

Caching

This technique of working with pure functions fits most for the functions with a predictably low number of input argument combinations (usually less than 1,000). In this case, you can create a unique key consisting of a combination of all input parameters (usually less than four) and, after the evaluation of the function, save the result into memory or even to some data storage like Redis Cache. 

It works well when the result is not very big and does not depend on a database or some external resource that changes pretty often. This method is typically used for long-running calculations in computer games or stock analysis. This immediately gets the result bypassing the calculations, which greatly speeds up the application.

Atomic pattern

Atomicity exists in CAS (compare-and-swap) operations, which alter state for a single step so that the result is perceived only as completed or not completed. Threads running in parallel can only see the old or new state, so when an atomic operation is executing, threads cannot watch those changes until it completes altogether. Because of this feature, the ABA problem arises.

So, when performing an atomic CAS operation, the memory cell is read twice, both times reading the same value, which is treated unchanged. But since between these two reads, another thread can be executed, which will change the values, the second thread can be fooled into believing that nothing has changed.

To solve the ABA problem, it is necessary to save the state, which should be changed in one and the other isolated object. Thus, the isolated object will not have a separate state, and there will be nothing to synchronize. This operation is thread-safe because it is read-only.

Pure functions are not always pure

Junior and middle-level developers face some difficulties when applying pure functions. The function, whose operation depends on input arguments and does not reference any object state parameters, is thread-safe. To make it clearer, let’s look at examples when pure function contains a trap when running into a multithreaded mode. 

public class TrickyClass
{
    public int SumUpIntegers(int[] data)

     {
          int sum = 0;

          for (int i = 0; i < data.Length; i++)

          {

               sum += data[i];

          }

     }

}

In this example, the sum variable is used as an accumulator, whose value is updated every iteration. However, with the multitasking scenario common today, this code may have a bug in its sum: 

sum += data[i]

This means that in a multithreaded program, this code will not be able to guarantee consistency due to the mutability problem since the value of the array can mutate while it’s being traversed.

It is important to note here that not all state mutations are evil. It can be harmless if the mutation of the state is visible only within the function.

Even though the sum of value has been updated, this change is not noticeable outside the function’s scope. It is exactly this fact that allows us to consider such a realization of the sum as a pure function.

Available toolset for functional programming with C#

As a general rule, developers do really think about parallelization when writing the code. For this reason, they often use thread-unsafe techniques, resulting in an increased risk of code corruption when the existing code is parallelized.

However, Microsoft and the community have increasingly added more approaches, tools, and implementations in C# to facilitate the writing of thread-safe code. Either if you plan to migrate your code to a multithreading architecture or if you just want to improve thread safety, we recommend the following techniques and tools:

  1. PLINQ

PLINQ is a parallel implementation of the LINQ pattern, specifically designed to deal with data parallelism. It is ideal for parallelism because it takes care of all the requirements, like breaking up sequences into smaller pieces and applying logic to every element in the sequence.

By using these techniques in multithreaded applications, you can quickly increase thread safety and see multithreaded processes significantly improved. In case the query does not perform many computations or has a small data source, the PLINQ request may be slower than the subsequent LINQ to Objects request.

Let’s say we have the following scenario: we want automated reports over file storage in order to identify non-standard situations. In particular, we want to find the ten most common situations, so we can address them. This scenario is tailor-made for PLINQ as it has both Task and Data parallelism requirements fulfilled. The information we want can be acquired with the following code:

This code is self-explanatory and easy to read and understand. Due to the use of PLINQ, this code is able to process the files in parallel, which directly affects the performance. However, it has a hidden threat inside, which is hard to notice and which might corrupt all your data.

The method reads files from the filesystem, and any communication with such resources as a filesystem, current date and time, random numbers, printing something to a file or console inevitably leads to side effects. Those functions do not depend only on input but depend on many other things. The brightest example here is getting the current date and time 100 times in parallel, where we cannot predict what thread gets what time. For that reason, we must split the method above and convert it to pure function.

  1. Pure functions

Don’t want to be repetitive, but we’ll emphasize once again, pure functions are irreplaceable in terms of successful work with multithreaded applications since they allow you to write tests much faster, avoid using synchronization primitives, and have a minimum number associated with the function’s result side effects. Think about how to design your code to make the most out of pure functions, and a multithreaded utopia will be closer than ever.

Besides, if the problem concerns data that needs to be parallelized, use a thread-safe LINQ technology that can guarantee the integrity of your data.

Let’s get back to the example we had in item 1, where we were reading from the error report files. It was not thread-agnostic. Let’s fix that:

As you see, we split up the method into two. The first one runs sequentially and has no side effects when interacting with filesystems. The second one is a pure function, whose results depend on nothing but the content argument. So, now we have a completely thread-safe task.

C# 9.0 features

Nothing stands still in software programming, and the new version release of the C# language made another step in becoming more thread-safe by introducing the records and init-only properties. 

Most developers noticed that records are pretty convenient for usage and make a perfect fit from a Domain-driven design perspective, where the equality of two entities cannot rely on unique identifiers or references. But here, let’s dive a bit deeper:

Records can be initialized in two ways, known as positional syntax and traditional syntax (with the constructor). In this case, the positional syntax corresponds to the case where you initialize the record in place as

   record Coordinate(double x, double y)

Thus, when you try to change the initial state of an instance of the record like in the following example:

static void Main(string[] args)

{

    var coord = new Coordinate(31.23423, 27.12324);

    coord.x = 33.12; 

    Console.WriteLine($”I’m {coord.x} {coord.y}”);

}

the compiler will immediately complain about the mutation “Init-only property or indexer ‘Coordinate.x’ can only be assigned in an object initializer, or on ‘this’ or ‘base’ in an instance constructor, or an ‘init’ accessor.” In this approach, you get an immutable object out of the box and do not need to worry.

But when it comes to a traditional way of initializing the records (via constructor):


public record Coordinate

{

    public int X{ get; set; }

    public int Y{ get; set; }

}

then 

static void Main(string[] args)

{

    var coord = new Coordinate

    {

        X = 31.1534,

        Y = 17.1325

    };

    Console.WriteLine($”{coord.X} {coord.Y}”);

    coord.X = 21;

    Console.WriteLine($”{coord.X} {coord.Y}”);

}

The compiler is ready to work with this code. In this case, the record is not immutable by default, but we can make adjustments to make it immutable the same way as they are in positional syntax. For that, we use such init-only properties:

public record Coordinate

{

    public int X{ get; init; }

    public int Y{ get; int; }

}

And then, when you try to execute the code from the main method, the compiler will not let you compile the project complaining, “‘Coordinate.X’ can only be assigned in an object initializer, or on ‘this’ or ‘base’ in an instance constructor or an ‘init’ accessor “. 

We encourage you to use both positional and nominal methods. We recommend using nominal methods when you want the entire record to be immutable and nominal creation when you want to be more precise about which properties are immutable and which are not. For example, in this primitive example, we could keep the X immutable but not the Y:

public record Coordinate

{

    public int X{ get; init; }

    public int Y{ get; set; }

}

To summarize, we can confidently say that to work with multithreaded applications successfully, you need to master the use of pure functions. Providing such advantages as the absence of side effects on the states of objects and the ability to do without synchronization primitives, pure functions have significant use in dealing with the mutable state problem. Fortunately, it can be dealt with using the thread-safe LINQ technology we had an opportunity to discuss above.

 

Author Bio

Maxim Ivanov, CEO & Co-Founder of Aimprosoft, a Ukrainian software development company with a solid 15 years of market presence that expands the boundaries of the technological world by applying more than 50 advanced technologies to its web and mobile software solutions. In his role, Maxim leads the company helping businesses in retail, eCommerce, automotive, telecom, healthcare, real estate, education, manufacturing, and other domains to succeed in attaining their business goals due to innovative software solutions.

Having a Master’s Degree in Computer Science and strategic vision, he helps SMB and enterprises profit faster based on his belief that technology is the driving force that takes business to a whole new growth level. Maxim is trying to be helpful to the community by sharing valuable industry insights within the media space.

[wpdevart_facebook_comment order_type="social" title_text="" title_text_color="#000000" title_text_font_size="22" title_text_font_famely="Open Sans" title_text_position="left" width="100%" bg_color="#CCCCCC" animation_effect="random" count_of_comments="2" ]