Automation of boring tasks

My ongoing 1st year introductory course on computers has Moodle exams containing different kinds of automatically graded questions like:

An arithmetic task containing radix conversions.

We need to use automatically graded exams since there is 250-300 students each year, too few teachers participating, so evaluating the exams manually would take forever and would be very boring task to do.

Perhaps a necessary disclaimer: Everyone knows these conversions and arithmetics are, in practice, done oftentimes using software tools (programmer’s calculators) or web sites, when you need to. The actual learning goal here is not to say to students that doing this manually is something you have to learn to do for this profession. The learning goal is to expose the way data and simple arithmetics is handled in a computer and play around with these things to get an understanding of how computers work. And in time, you learn to see, for example, when debugging, some things that might be important in the hex, octal or binary values in memory.

Having a large set of questions for a course topic is a good thing to have. Then each exam generated for a student has different set of questions, randomly chosen by Moodle.

This poses another boring task problem. I would have to write tens or hundreds of these questions, choosing the values to convert or calculate, converting them to different radices, calculate the result and then use Moodle question editor to write the questions. And translate them to Finnish (or from Finnish to English), since I teach the course in two languages.

Well, what do you need when you have simple, boring repetitive tasks to do…?

A computer.

Well, a computer alone is just a bunch of metal, sand and plastics, so in addition, software would be needed.

Since I enjoy programming much more than writing Moodle exam questions, I implemented a command line tool using Swift. This tool generates random number conversion tasks (decimal to hex to binary) and arithmetic tasks (summing different values in different radices). Then these tasks are saved in a Moodle XML exam question format in a file.

./ConversionQs output-fi.xml 20 fi --verbose

The command above generates 20 conversion and arithmetic tasks each, in Finnish, outputs them to output-fi.xml file and shows me the progress in terminal window (--verbose flag). Below is an excerpt from an English output file:

<?xml version="1.0" encoding="UTF-8"?>
<quiz>
    <question type="shortanswer">
        <name>
            <text>Convert between radixes</text>
        </name>
        <questiontext format="html">
            <text>&lt;p&gt;Convert the value 66 to radix: hexadecimal.&lt;/p&gt; 	&lt;p&gt;Consider the values to be signed bytes, with eight bits.&lt;/p&gt;
	&lt;p&gt;Write in the answer the prefix for the radix (0x, 0b) asked, if it is not decimal.&lt;/p&gt;
	&lt;p&gt;Otherwise, use only the digits of the requested numbering system, no spaces or other punctuations.&lt;/p&gt;</text>
        </questiontext>
        <defaultgrade>2.0000000</defaultgrade>
        <penalty>0.3333333</penalty>
        <answer fraction="100" format="moodle_auto_format">
            <text>0x42</text>
        </answer>
    </question>
 ...

Then I just import the XML file to Moodle, placing the questions into appropriate question category. Choosing a number of random questions from this set to an exam is an easy thing to do. Whenever I need a fresh set of questions, I just run the command again.

What a rise in productivity level of a teacher! Implementing this took around one working day. Now generating tens or even hundreds of exam questions happens in a fraction of a second, instead of hours and hours of mind consuming boredom!

Swift code exporting the generated questions to an XML file.

Teaching season launched, new apps

Yesterday was the first day of teaching in this Fall semester. Times are a bit challenging; teaching resources for the courses I teach in have shrinked down to about 58-66% of last year’s teaching hours, depending on a course. Also the number of teachers for a course is down. While number of students is going up rather than down.

Face-to-face teaching is not possible with these resources and number of students (225-300+ in two of the courses ongoing currently). We’d be in the classroom every day from 8am to 8pm and still have overlapping teaching events. So online it is. Hello Zoom.

One feedback item from students last year was that the terminology is challenging in the Devices and data network course. Understandable: in five two hour lectures I should be able to give an overview of how computers work and the components they contain. The same for the networking, ten hours of lectures and they should get a grasp on how the Internet and packet networks do their thing. This means a constant shower of new terminology unfamiliar to 1st year students. Other courses have similar issues.

So I developed a dictionary app for students to use. Or actually two. One with Java for desktop computers:

Dictionary Java app for desktop computers

And another one using Swift and SwiftUI for macOS, iOS and iPadOS:

The iPhone version of the dictionary app.

Both apps get the dictionaries from the same source, a remote git repository with a JSON file, using HTTPS. Thus the dictionaries can be edited, appended to and updates are then downloaded to the users’ devices.

In the user device the terms are stored into a SQLite database, so they can be used offline. Users can search for the terms and follow the URLs to sources in the Internet to find out more.

The desktop versions also have a feature that generates a graph of the relationships between the terms by searching the terms from other terms’ description. For this, GraphViz must be installed on the device.

The Swift version has a Widget for all Apple platforms. The widget randomly shows a couple of terms a day from the dictionary. Here’s the iPhone widget (along with the GitHub app widget on top):

And here is the macOS Widget:

Clicking or tapping on the term on the widget opens the app and shows the description as well as English term for that specific ICT term.

The Java version is already public and available for installation. Swift version is still under development, I just need to decide how to distribute it to the various types of devices and prepare the App Store information with screenshots, descriptions and all. Then there is the inevitable App Store review waiting. Hopefully it’ll pass. This is yet another app for the category “downloads and renders JSON in a GUI”.

Hopefully the students find the apps useful for learning. Next I should focus on finishing the dictionary for the basic Internetworking part of my course and put also that available for the apps to download…

New version of SortSpectacle

My sorting algorithm learning tool / demo app SortSpectacle now includes a new method, Block sort.

Block sort is a variant of Merge sort. Merge sort uses additional arrays to recursively divide the original array into smaller arrays which are then sorted and combined to a final sorted array.

Block sort is an in-place variant of Merge sort. It uses a small cache to move elements around. Thus it is more memory efficient than Merge sort. And according to the demo app, also quite fast.

I still need to implement the step-by-step version of the algorithm — without it the execution is not animated. But if you take a look a the implementation (a Swift port I did based on the Java code WikiSort implemented by Mike McFadden), you can see that it will be quite a challenge to do…

Also it was quite nice to read in Twitter that someone is actually using my sorting algorithms in their own Swift/SwiftUI app to demonstrate the new SwiftUI Charts in action!

Anyhows, here’s a YouTube demo video showing the new version in action:

Busy busy busy

I’ve been really busy with evaluating student coursework in two courses. Some of this work was postponed in Feb/March since I participated in the international MSc program selection process then.

I have been considering posting shorter news like things or just small snippets of work I have been doing. Just to keep this blog alive.

So here is a screenshot of something I did in between the course evaluations — potentially a new exercise to be used in the Data structures and algorithms course next Fall. A small utility that converts plain Finnish text to braille symbols.

Finnish epic Kalevala in braille.

The job for the students is to generate the braille version faster than the naive lookup table looping implementation they get ready, using hash tables.

Another thing I did was for a course on GUI design & programming:

Demo about automated GUI testing

The purpose of this stupid card guessing game was to demonstrate how to implement a simple GUI test automation and execute the tests in simulator environment.

That’s it for today. Now back to evaluating those student programming assingments…

Update to the previous post code

Just in case anyone is reading, I updated the code related to the previous post. It was about using Swift async tasks to process data in parallel, comparing it to the single threaded functional programming implementation. Due to updates to Swift (I assume?), the code started to fail (printout was messy, nothing too bad) so I fixed it. Probably my mistake, revealed by some change in the language or libraries I use.

Now the code in FunctionalParallel is using the async version of Swift ArgumentParser library. It is now also otherwise improved and (at least now) works correctly. I will measure the speed on the M1 Mini when I get to it and make any changes, if necessary, in the performance comparison table in the repository readme file.

Now I am out of home, looking after the granddaughter as she sleeps. Her parents (including my daughter) are out, celebrating my son’s birthday. Happy birthday son ?!

Speeding up probing of an array with Swift async tasks

The Swift code below goes through an array of words (Strings) and counts unique words and their frequencies, ignoring words in wordsToFilter (another array of Strings). Then the resulting dictionary (map data structure, consisting of word/count pairs) is sorted by the word frequency in descending order. Finally, the top 100 of the most frequent words are printed out.

      var words = [String]()
      var wordsToFilter = [String]()
...
      var counter = 1
      words.filter { word in
         word.count >= 2 && !wordsToFilter.contains(word)
      }.reduce(into: [:]) { counts, word in
         counts[word, default: 0] += 1
      }.sorted(by: { lhs, rhs in
         lhs.value > rhs.value
      }).prefix(topListSize).forEach { key, value in
         print("\(String(counter).rightJustified(width: 3)). \(key.leftJustified(width: 20, fillChar: ".")) \(value)")
         counter += 1
      }
Functional programming approach to count the frequency of unique words in a text file.

With my test book file of size 17.1 MB, with 2 378 668 words and 97 115 unique words, the code above uses 1.226099 secs to process the file. The time includes reading and splicing the words from the text files into the arrays. For details of measuring, see the end of this post.

Could it be faster if using the Swift async tasks? Let’s try and see!

Below is the code doing the same in eight async tasks. Code for printing out the result is omitted, shown later below.

Async tasks counting word frequencies.

In the code, first the slice size is calculated at line 66. For example, if the array has 1000 words, it is divided into eight slices, each containing 125 words. Then in a for loop, a task group with eight async tasks execute (lines 79-85). Each async task calculates the word frequencies of their own slice of the array. Each task return a dictionary to the task group. Dictionary contains the word / frequency count pairs of the slice of the array.

No thread locking for data synchronisation is needed since all concurrent tasks only read from the array and each of them read from their own slice.

In lines 88-96, the task group awaits for the tasks to finish. As they do that, the task group combines the dictionary of partial result provided by the task to the task group’s dictionary wordCounts. This happens in a single thread so no data corruption happens. The async tasks are not writing to the final dictionary having all the word / frequency pairs from the async tasks.

Finally the result is sorted and printed out from the wordCounts dictionary, after the task group has merged the results from the tasks:

// Now subtasks have finished and results from those have been combined to wordCounts.
            // Sort the combined dictionary by the word count (value of the map).
            var counter = 1
            wordCounts.sorted(by: { lhs, rhs in
               lhs.value > rhs.value
            }).prefix(topListSize).forEach { key, value in
               print("\(String(counter).rightJustified(width: 3)). \(key.leftJustified(width: 20, fillChar: ".")) \(value)")
               counter += 1
            }
            // Signal that the async tasks are finished.
            taskSemaphore.signal()
         }
      }
      // Waiting for the async tasks to finish.
      taskSemaphore.wait()
Printing out the results and main thread waits for the async tasks to finish.

Why the semaphore? This is a console app, and the main thread would continue until the end, after the async tasks were launched. What would happen in the end? The main thread would run past the end of the function, return to main function and finish & quit the process. While the async tasks are still executing. Not good.

So to avoid that 1) the main thread stops to wait for the semaphore, and 2) task group uses the same semaphore to signal when the task group has finished working. The main thread then proceeds to finish.

So, is this any faster? Any use at all in having this more complicated code?

Executing with the same files as above, the execution now takes 0.694983 secs. That is 57% of the original execution time of the single threaded implementation!

Though the absolute times or time differences are not large, the relative difference is very significant. Consider the data sizes being hundreds or thousands of times larger than this test file. Or if this process would be done repeatedly over thousands of files. Continuously. Then the difference would be significant also in time, not only relatively, even if the files would be smaller.

When you take a look at the Xcode Instruments view of the time profiler, you see easily why the speed difference:

Xcode Instruments showing eight parallel threads working on the task.

As you can see, all that work that was earlier done in sequence, is now executed in parallel, asynchronously.

So the answer to the question “Could it be faster if using the Swift async tasks?”, is: yes, absolutely.

The measurements were taken on an Apple Mac Mini M1 (Apple Silicon) with 16GB of RAM and 1 TB of SSD storage.

Reason for the slicing of the array to eight? The M1 processor has eight cores in the processor, each one is put to work. As you can see, the OS and other processes also needs the cores so they are not executed at 100% for this process’ threads all the time.

The code can be found in my BooksAndWords repository at GitHub. Single threaded implementation is in the Functional directory as the async is in the FunctionalParallel directory.

Swift tree structures: value types, enums and classes

What if you need to define a tree like structure of data elements in Swift? For example you might use a Binary Search Tree to keep track of unique words in a book file:

A tree structure where each node in the tree has optional two child nodes, left and right. Each node has two values: the word found in the book, and the count how many times the word appeared in the book.
An example of a binary search tree with unique word counts from a text file.

Since value types are often preferred in Swift, you could use a struct. The Node struct contains the word, the count of it in the book, a key to manage the tree, and optional left and right child nodes of the tree.

struct Node {
   let key: Int
   let word: String
   var count: Int

   var leftChild: Node?
   var rightChild: Node?

   init(_ word: String) {
      key = word.hashValue
      self.word = word
      count = 1
   }
}

But as you can see from the error message “Value type ‘Node’ cannot have a stored property that recursively contains it” — recursive value types are not supported in Swift. A Node in the tree struct cannot contain the left and right child nodes when using value types.

What to do? You have (at least) two options:

  1. Use the enum type with associated values.
  2. Use classes.

With Swift enums, you can define two states for the enumeration. Either a) the node in the tree is Empty (there is no node) or b) it has associated values in a Node — the word, the word count, key used to arrange the nodes in the tree by the word hash value, and the optional left and right subtrees:

indirect enum EnumTreeNode {
   case Empty
   case Node(left: EnumTreeNode, hash: Int, word: String, count: Int, right: EnumTreeNode)

   init() {
      self = .Empty
   }

   init(_ word: String) {
      self = .Node(left: .Empty, hash: word.hashValue, word: word, count: 1, right: .Empty)
   }

   func accept(_ visitor: Visitor) throws {
      try visitor.visit(node: self)
   }
}
A tree node as an enumeration with associated values.

When defining recursive enumerations, you must use the indirect keyword to indicate recursion in the enumeration.

The other option is to use classes, which are reference type elements in Swift:

final class TreeNode {
   let key: Int
   let word: String
   var count: Int

   var left: TreeNode?
   var right: TreeNode?

   init(_ word: String) {
      self.key = word.hashValue
      self.word = word
      count = 1
   }

You can read more about Swift structs and classes from here if you are unfamiliar with them.

Check out the full implementation of both class based and enum based solutions from this GitHub repository.

So, are there any other differences in the enum and class implementations, than the differences in code?

Let’s check out. First run below is using the enum implementation, and the second one is executed using the class based implementation.

> swift build -c release
> .build/release/bstree ../samples/tiny.txt ..samples/ignore-words.txt 100
...
Count of words: 44, count of unique words: 32
>>>> Time 0.0008840560913085938 secs.


> swift build -c release
> .build/release/bstree ../samples/tiny.txt ..samples/ignore-words.txt 100
...
Count of words: 44, count of unique words: 32
>>>> Time 0.0009189844131469727 secs.

So far so good. Both implementations work (not all results not shown above) and seem to be quite fast. The tiny text file contains only 44 words of which 32 are unique.

But when executing both implementations with a larger 16MB file with 2674582 words of which 97152 are unique…:

> .build/release/bstree ../samples/Bulk.txt ..samples/ignore-words.txt 100
...
Count of words: 2674582, count of unique words: 97152
 >>>> Time 16.52852702140808 secs.


> .build/release/bstree ../samples/Bulk.txt ..samples/ignore-words.txt 100
Count of words: 2674582, count of unique words: 97152
 >>>> Time 3.5031620264053345 secs.

You can see that the first enum based implementation took 16.5 secs to process the same file the class based implementation only took 3.5 secs to process. This is a significant difference. Why this happens?

Swift enums behave like value types. When the algorithm reads a word from the book file, it searches if it already exists in the tree. If yes, the old enum value is replaced with the new one in the tree. This results in copying the tree since it is now mutated. The Swift book says:

“All structures and enumerations are value types in Swift. This means that any structure and enumeration instances you create—and any value types they have as properties—are always copied when they’re passed around in your code.”

— Swift Language Guide

So whenever a node in the tree is modified, that results in copying the tree. When you change the tree by adding to it, the tree is copied. Using classes this does not happen. The excessive copying of the tree nodes is a performance killer when having very large data sets to handle.

There is also a performance penalty in using classes — each time a class instance is accessed, the retain / release count is updated. But as you can see, still the implementation is much faster compared to copying the structure with value types.

Summarizing, enums are a nice way to implement recursive data structures. If you have large data sets and/or the tree or tree nodes are updated often, consider using classes instead.

Java file management, hall of fame and a nice surprise

In the previous post I mocked the Java app that was hardcoded to use Windows C: disk as the default place to open files.

What then is the recommended way? One is to start looking from the user home directory:

JFileChooser fileChooser = new JFileChooser();

fileChooser.setCurrentDirectory(new File(System.getProperty("user.home")));

Or pick the documents directory. It is also a nice thing to save the selected directory and use that if the user would like to continue with the already selected directory next time.

What else is happening? It is now the last week of the course Data Structures and Algorithms I am responsible teacher. Students have been analyzing algorithm correctness and time complexity, implementing basic data structures, more advanced ones like hash tables and hash functions, binary search trees and binary search.

Lectures addressed graphs and graph algorithms too, but implementation of these was in an optional course project only, Mazes. When students finish that, they can play a game a bit like the PacMan.

They get to choose from two course projects: either that Mazes project or optionally implement the classic task: count the unique words from a book file, ignoring some words from another file. The largest test file is around 16 MB, having almost 100 000 unique words.

Processing the largest file, using naively two nested for loops takes on my Mac Mini M1 with 16MB of memory around two minutes to process. The fast versions (hash tables, binary search trees) take less than a second.

I have implemented several different solutions for comparison with three different programming languages, Java, Swift and C++. Each week in lectures I demonstrated some of these, and in the end we had this table for comparison (sorry, in Finnish only).

Hall of Fame of the different Books and Words implementations. Swift versions can be found from the link above to GitHub.

As you can see, the C++ with 8 threads was the fastest one. Next after C++ came a couple of Java versions. Swift implementations were not so fast as I expected. After some profiling, I suspect the reason is in the way Unicode chars are handled in Swift. All the book files are UTF-8 and students were expected to handle them correctly. I do not like that mostly in teaching programming languages the default is to stick with ascii and conveniently forget the existence of different languages and character sets.

Well, anyways, for some reason, processing these UTF-8 text files takes a lot of time with Swift. Maybe later I have time to find out if the issue is in my code and/or is there anything that can be done to speed things up.

Something very nice happened a week ago — the student guild of our study program, Blanko, awarded me this diploma for being a quality teacher. Apparently they had a vote and I somehow managed to come first this time. The diploma was accompanied by this Italian themed small gift box. A really nice surprise! I was so astonished to receive this, thank you so much if anyone of you is reading!

Nice award from the students for quality teaching.

Work work work

youtube.com/watch

Just checked the first commit date of the completely renovated Data structures and algorithms course repository. It was around December 2020.

Since then I have designed and implemented about ten exercises with unit tests in Java. And two course projects with unit tests. Plus about ten additional demos with Swift. All with commentary and instructions.

In a week the new course begins. I’ve recorded about 20 hours of new lecture materials. Additional videos about graph programming with C++ will be included from last year demos.

One of the reasons for not posting too much lately ?

The video is about one of the course project implementing a graph data structure with various algorithms including Dijkstra’s shortest path finding algorithm. I thought applying this in a game context would perhaps motivate the students to dwell into the world of data structures and algorithms. We’ll see….

Too busy and stuff

Apparently I am too busy to write anything here. Or … priorities… or workload… Work, life, etc.

During the Summer 2020 I decided that I want an app in the App Store. So I did it.

Slippery Cities (Liukkaat kadut in Finnish, Hala Gator in Swedish) warns pedestrians of slippery weather conditions in six Finnish cities. That photo above of my Apple Watch says that in Oulu there was a slippery alert two hours ago. So wear your spiked shoes. Or be careful out there.

I learned a lot about:

  • Swift programming
  • SwiftUI programming
  • WatchKit development with Xcode
  • Localisation for three languages
  • Accessibility programming in Apple platforms
  • AppStore: submitting apps for review (no issues by the way, app went through without any hassle)
  • Programming in watchOS, including watch faces, complications, background processing, networking,… with beta version of watchOS 7.

Buy it in the App Store, I want to get rich. Mind you, it may not work in the city you live, so consider that. Won’t refund you unless you twist my arm hard.

Then other things keeping me busy. Created a new course on Software Platforms and Ecosystems with two colleagues. That was hard. Luckily also behind us for a while, since the course ended in December.

I also took responsibility for a course on data structures and algorithms. Corona pandemic made that harder than usual, since the course material and arrangements were based on the assumption of face-to-face classroom teaching. Didn’t happen.

Now I am working on a course on server programming, which is a new course for me. I’ve been implementing a HTTPS chat server with Java and writing instructions for students to do the same. Topics include Java, HTTPS, certificates, authentication, JSON, sqlite database, password hashing and salting, UTF, encoding, decoding, HTTP headers,… Maven, Visual Studio Code, git — lots of work, since none of this existed in late December and course started in early January. ?

Then another course where I assist in exercises on advanced software testing and security issues started this week. Lots to learn for me there too.

So I’ve been very, very busy. Maybe in Spring 2021 I’ll be sharing more here.

Oh yes, and some nice things too — got myself a new Mac Mini M1, an Apple Silicon computer. So enjoying doing these things with the new device! My previous iMac 27″was already five years old, luckily in a good condition. I got a decent price for it when I sold it to fund the Mini.