Speeding up probing of an array with Swift async tasks

The Swift code below goes through an array of words (Strings) and counts unique words and their frequencies, ignoring words in wordsToFilter (another array of Strings). Then the resulting dictionary (map data structure, consisting of word/count pairs) is sorted by the word frequency in descending order. Finally, the top 100 of the most frequent words are printed out.

      var words = [String]()
      var wordsToFilter = [String]()
...
      var counter = 1
      words.filter { word in
         word.count >= 2 && !wordsToFilter.contains(word)
      }.reduce(into: [:]) { counts, word in
         counts[word, default: 0] += 1
      }.sorted(by: { lhs, rhs in
         lhs.value > rhs.value
      }).prefix(topListSize).forEach { key, value in
         print("\(String(counter).rightJustified(width: 3)). \(key.leftJustified(width: 20, fillChar: ".")) \(value)")
         counter += 1
      }
Functional programming approach to count the frequency of unique words in a text file.

With my test book file of size 17.1 MB, with 2 378 668 words and 97 115 unique words, the code above uses 1.226099 secs to process the file. The time includes reading and splicing the words from the text files into the arrays. For details of measuring, see the end of this post.

Could it be faster if using the Swift async tasks? Let’s try and see!

Below is the code doing the same in eight async tasks. Code for printing out the result is omitted, shown later below.

Async tasks counting word frequencies.

In the code, first the slice size is calculated at line 66. For example, if the array has 1000 words, it is divided into eight slices, each containing 125 words. Then in a for loop, a task group with eight async tasks execute (lines 79-85). Each async task calculates the word frequencies of their own slice of the array. Each task return a dictionary to the task group. Dictionary contains the word / frequency count pairs of the slice of the array.

No thread locking for data synchronisation is needed since all concurrent tasks only read from the array and each of them read from their own slice.

In lines 88-96, the task group awaits for the tasks to finish. As they do that, the task group combines the dictionary of partial result provided by the task to the task group’s dictionary wordCounts. This happens in a single thread so no data corruption happens. The async tasks are not writing to the final dictionary having all the word / frequency pairs from the async tasks.

Finally the result is sorted and printed out from the wordCounts dictionary, after the task group has merged the results from the tasks:

// Now subtasks have finished and results from those have been combined to wordCounts.
            // Sort the combined dictionary by the word count (value of the map).
            var counter = 1
            wordCounts.sorted(by: { lhs, rhs in
               lhs.value > rhs.value
            }).prefix(topListSize).forEach { key, value in
               print("\(String(counter).rightJustified(width: 3)). \(key.leftJustified(width: 20, fillChar: ".")) \(value)")
               counter += 1
            }
            // Signal that the async tasks are finished.
            taskSemaphore.signal()
         }
      }
      // Waiting for the async tasks to finish.
      taskSemaphore.wait()
Printing out the results and main thread waits for the async tasks to finish.

Why the semaphore? This is a console app, and the main thread would continue until the end, after the async tasks were launched. What would happen in the end? The main thread would run past the end of the function, return to main function and finish & quit the process. While the async tasks are still executing. Not good.

So to avoid that 1) the main thread stops to wait for the semaphore, and 2) task group uses the same semaphore to signal when the task group has finished working. The main thread then proceeds to finish.

So, is this any faster? Any use at all in having this more complicated code?

Executing with the same files as above, the execution now takes 0.694983 secs. That is 57% of the original execution time of the single threaded implementation!

Though the absolute times or time differences are not large, the relative difference is very significant. Consider the data sizes being hundreds or thousands of times larger than this test file. Or if this process would be done repeatedly over thousands of files. Continuously. Then the difference would be significant also in time, not only relatively, even if the files would be smaller.

When you take a look at the Xcode Instruments view of the time profiler, you see easily why the speed difference:

Xcode Instruments showing eight parallel threads working on the task.

As you can see, all that work that was earlier done in sequence, is now executed in parallel, asynchronously.

So the answer to the question “Could it be faster if using the Swift async tasks?”, is: yes, absolutely.

The measurements were taken on an Apple Mac Mini M1 (Apple Silicon) with 16GB of RAM and 1 TB of SSD storage.

Reason for the slicing of the array to eight? The M1 processor has eight cores in the processor, each one is put to work. As you can see, the OS and other processes also needs the cores so they are not executed at 100% for this process’ threads all the time.

The code can be found in my BooksAndWords repository at GitHub. Single threaded implementation is in the Functional directory as the async is in the FunctionalParallel directory.

Swift tree structures: value types, enums and classes

What if you need to define a tree like structure of data elements in Swift? For example you might use a Binary Search Tree to keep track of unique words in a book file:

A tree structure where each node in the tree has optional two child nodes, left and right. Each node has two values: the word found in the book, and the count how many times the word appeared in the book.
An example of a binary search tree with unique word counts from a text file.

Since value types are often preferred in Swift, you could use a struct. The Node struct contains the word, the count of it in the book, a key to manage the tree, and optional left and right child nodes of the tree.

struct Node {
   let key: Int
   let word: String
   var count: Int

   var leftChild: Node?
   var rightChild: Node?

   init(_ word: String) {
      key = word.hashValue
      self.word = word
      count = 1
   }
}

But as you can see from the error message “Value type ‘Node’ cannot have a stored property that recursively contains it” — recursive value types are not supported in Swift. A Node in the tree struct cannot contain the left and right child nodes when using value types.

What to do? You have (at least) two options:

  1. Use the enum type with associated values.
  2. Use classes.

With Swift enums, you can define two states for the enumeration. Either a) the node in the tree is Empty (there is no node) or b) it has associated values in a Node — the word, the word count, key used to arrange the nodes in the tree by the word hash value, and the optional left and right subtrees:

indirect enum EnumTreeNode {
   case Empty
   case Node(left: EnumTreeNode, hash: Int, word: String, count: Int, right: EnumTreeNode)

   init() {
      self = .Empty
   }

   init(_ word: String) {
      self = .Node(left: .Empty, hash: word.hashValue, word: word, count: 1, right: .Empty)
   }

   func accept(_ visitor: Visitor) throws {
      try visitor.visit(node: self)
   }
}
A tree node as an enumeration with associated values.

When defining recursive enumerations, you must use the indirect keyword to indicate recursion in the enumeration.

The other option is to use classes, which are reference type elements in Swift:

final class TreeNode {
   let key: Int
   let word: String
   var count: Int

   var left: TreeNode?
   var right: TreeNode?

   init(_ word: String) {
      self.key = word.hashValue
      self.word = word
      count = 1
   }

You can read more about Swift structs and classes from here if you are unfamiliar with them.

Check out the full implementation of both class based and enum based solutions from this GitHub repository.

So, are there any other differences in the enum and class implementations, than the differences in code?

Let’s check out. First run below is using the enum implementation, and the second one is executed using the class based implementation.

> swift build -c release
> .build/release/bstree ../samples/tiny.txt ..samples/ignore-words.txt 100
...
Count of words: 44, count of unique words: 32
>>>> Time 0.0008840560913085938 secs.


> swift build -c release
> .build/release/bstree ../samples/tiny.txt ..samples/ignore-words.txt 100
...
Count of words: 44, count of unique words: 32
>>>> Time 0.0009189844131469727 secs.

So far so good. Both implementations work (not all results not shown above) and seem to be quite fast. The tiny text file contains only 44 words of which 32 are unique.

But when executing both implementations with a larger 16MB file with 2674582 words of which 97152 are unique…:

> .build/release/bstree ../samples/Bulk.txt ..samples/ignore-words.txt 100
...
Count of words: 2674582, count of unique words: 97152
 >>>> Time 16.52852702140808 secs.


> .build/release/bstree ../samples/Bulk.txt ..samples/ignore-words.txt 100
Count of words: 2674582, count of unique words: 97152
 >>>> Time 3.5031620264053345 secs.

You can see that the first enum based implementation took 16.5 secs to process the same file the class based implementation only took 3.5 secs to process. This is a significant difference. Why this happens?

Swift enums behave like value types. When the algorithm reads a word from the book file, it searches if it already exists in the tree. If yes, the old enum value is replaced with the new one in the tree. This results in copying the tree since it is now mutated. The Swift book says:

“All structures and enumerations are value types in Swift. This means that any structure and enumeration instances you create—and any value types they have as properties—are always copied when they’re passed around in your code.”

— Swift Language Guide

So whenever a node in the tree is modified, that results in copying the tree. When you change the tree by adding to it, the tree is copied. Using classes this does not happen. The excessive copying of the tree nodes is a performance killer when having very large data sets to handle.

There is also a performance penalty in using classes — each time a class instance is accessed, the retain / release count is updated. But as you can see, still the implementation is much faster compared to copying the structure with value types.

Summarizing, enums are a nice way to implement recursive data structures. If you have large data sets and/or the tree or tree nodes are updated often, consider using classes instead.