2022 retrospect

What did I do (professionally) during the year 2022? A long post about all stuff done – and mostly in progress. And a prize at the end…

First, l’ll list my active GitHub repositories of 2022. Some of these were created earlier than 2022 but I did work on these during 2022. Then I will list all the major teaching and other work done as part of my post as lecturer at the university.

So to begin with, here are the major and/or new projects I either started or took major leaps forward in 2022:

The largest project for 2022 is GitLogVisualized app (bash shell scripts, git, JUnit tests, Swift, SwiftUI, macOS) . I started to develop it in November when I realised that I need a way to quickly get an impression how all those 275+ students are doing with their projects in the Data structures and algorithms course. This way we can focus on providing help to those students that have a risk of falling behind the schedule.

Screenshot of the GitLogVisualized app showing repository statistics.
GitLogVisualized showing student projects (left column list) and the selected project timeline of commits and passing / failing tests. Student ids are redacted in this screenshot using the SwiftUI redacted feature.

Timed shell scripts are executed automatically every Monday night. Scripts do git pull from each student repository from GitLab, execute git log with certain parameters, then JUnit tests are executed, and all the data is saved in two log files per student repository. The app then loads these log files and shows the project state for each student repository.

Colour codes are used to quickly show if student hasn’t done commits in the last 7-14-19 days (blue, yellow, orange, red) so that we can immediately see who should be contacted to find out if we can do anything to help those left behind ahead in the course. Projects can be sorted ascending or descending order by student, number of git commits and days since last commit.

I am planning to release this app as open source when I get it cleaned up. And a demo video is also in my plans, as soon as I get to it.

Another major project is the TVT-Sanasto, a Java app (Java, Swing, JSON, SQLite database) students can use to learn the basic terms and definitions of different categories of computing and computer networks in Finnish/English. The app can also generate a graph (using GraphViz) showing how terms are related to each other.

Screenshot of the terminology app helping students learn ICT basic terminology.
TVT Sanasto Java app

Actually, the term categories can be anything, since the app downloads the terms from a server JSON file. So basically anyone can write simple terminologies with explanations in a JSON file, and that can be included in the terminology category index file. Currently these JSON files are hosted in GitLab. I am hoping that other teachers would contribute by writing additional term descriptions and perhaps even create new term category dictionaries for their own courses. See a YouTube video of the app.

I also have a Swift/SwiftUI version for the TVT Sanasto app that works in iOS, iPadOS and macOS (Swift, SwiftUI, SQLite database, localized for Finnish and English), but that is still a private beta. Hopefully I will finish this by next Fall when my courses begin. If I am still teaching them. Oh and one student actually implemented a web app using these JSON dictionaries. Nice that the app family expands this way and enables students use whichever tech they find suitable for their needs in learning the basic terminology of the field.

ICT Terms app running on iPad showing basic terminology like what is Boolean algebra.
TVT Sanasto app running on an iPadOS emulator.

Guesswork demo app (Xcode, Swift, SwiftUI, iOS) I implemented for GUI design/programming course in Spring 2022. Idea here was to demonstrate how to consider different screen sizes, zoom levels and screen orientations (portrait/landscape) as part of designing accessibility features in a (mobile) GUI.

iOS app screenshot of a demo showing a card guessing game.
Guesswork demo app GUI

MiniGolf scorecard app (Xcode, Swift, iOS) – originally a demo about localisation and internationalisation for GUI design/programming course. I demonstrated how to consider accessibility and enable localisation for two languages and also take into account different calendars (Gregorian, Chinese, …), zoom levels, etc. in GUI design in SwiftUI/Xcode development environment. After the course, I decided to make this a side project but hey, we all know what sometimes happens to side projects…

Screenshot of a demo grown into an app. Main screen of minigolf scorecard app.
MiniGolf side project, another one missing me while I am working for money at the day job.

QuestionGenerator (Swift), a command line tool to generate Moodle quizzes. This is for Devices and data networks course. The course is passed by an online Moodle exam, where I ask, among other things, conversions between different radices (numbering systems; binary, decimal, octal, hexadecimal) and simple calculations where values are from different numbering systems. This tool is really convenient in generating tens or even hundreds of random conversions and calculations for students to ponder. The tool exports them into an XML file Moodle can import as quiz questions. Saves time and work when teacher does not need to create these by hand and verify the correct result.

My Slippery Cities Apple Watch app – another side project, but this time already available in the App Store! – is going to get a new feature. I would have liked to release it during Fall 2022, but again too much day job work have pushed this forward, like many other side projects. The new feature is predicting slippery weather for pedestrians based on local weather conditions and forecast provided by Apple Weather service. I am beta testing it and hopefully releasing it this Winter.

Screenshot of the Slippery Cities Apple Watch app showing weather based slippery warnings.
Slippery Cities beta showing next five hours slippery warning data based on temperature and precipitation.

The last new app I started in July 2022 I actually never planned to initiate. It just happened. Minesweeper (Swift/SwiftUI, macOS) is another private side project that sporadically advances – or then not when I am too busy doing something I actually get paid for.

Ongoing minesweeper game screenshot.
Minesweeper in action

The classic game, with three different mine field sizes and a top-10 list of results. Some nice SwiftUI animations and sounds (“composed” with GarageBand) when you step on a mine or happen to win.

Probably the reason I started this game project was to have a realistic real world example of using recursion for the students. When the user clicks on a tile, a recursive algorithm is used to open up all the tiles having no neighboring mines for the user. If you have played the game, you’ll know what I mean.

I am planning to move from SwiftUI animations to SpriteKit graphics. I’ve done some learning on the topic but haven’t yet actually started doing it. As a side project, this may happen later rather than sooner…. Unless I decide to continue with SwiftUI animations and then publish this one in the Mac App Store sooner.

OK, then next to the projects started earlier and/or having only minor updates during 2022:

  • Updated a Java console chat client (Java, JUnit, HTTP, JSON) used in the 2021 Programming 3 course (server side programming). Chat client was used as a test client so that students could test their servers. Client also has JUnit tests, sending requests to their servers as students test their HTTP server implementations.
  • Published a GUI chat client (Swift, SwiftUI, HTTP, JSON) for the same Programming 3 chat server. Wanted to try out implementing a GUI chat client using the (then) new Swift async / await features. I implemented this already in 2021 and wanted to release it, perhaps it will help students to try out and learn something else than the usual languages (Java…) used in GUI programming courses.
  • Warnings app (private; iOS/macOS with Swift, SwiftUI and Core Data with Apple Cloud support). Ongoing side project that fetches various warnings (weather, slippery conditions, air quality, etc) from several open data sources and displays alerts to user, warning about potentially harmful or dangerous environmental conditions. Haven’t touched this for a while due to being too busy with paid work. Surprise, surprise.
  • books-cpp demonstration about using C++ map data structures std::map and std::unordered_map in a single and multithreaded app. The demo relates to the Data structures and algorithms course. Learning goal here is to be aware of which container library data structure to select for performance, and that sometimes parallel processing may help but not always, at least significantly.
  • Books and Words (Swift; Xcode, iOS/macOS) – when I knew I would take over the Data structures and algorithms course a couple of years ago, I wanted to brush up my skills in the area and implemented this classic programming exercise. I bought the book Exercises in Programming Style by Cristina Videira and implemented some styles from there in Swift. This year I updated the project with new Swift releases, implemented enum style binary search tree data structure, took new performance measurements, etc.
  • Graphs is a demonstration app I have used in teaching graph data structures. I made just some tiny updates to it this fall.
  • Another demo for the Data structures and algorithms course, SortSpectacle (Swift, SwiftUI, iOS/macOS), now has a new sorting method, Block sort. It is a variant of merge sort.

Then to teaching work at the University. What I did in teaching at Spring/Summer 2022:

  • Taught exercises in Java programming basics in Programming 2 course (around 250 students) for Finnish and English students (separate groups).
  • Taught GUI design/programming (various programming languages) in Programming 4 course.
  • Participated in the national collaboration group of lecturers responsible for data structures and algorithms courses in various universities and tech institutes in Finland. We discuss the pedagogical, technical, etc. topics related to teaching, learning and organizing this kind of courses.
  • Interviewed prospective students over Zoom, applying to our MSc program on software engineering from various countries around the world.
  • Arranged summer courses for Devices and Data networks and Data Structures and algorithms. These were offered as independent study, without actual teaching.
  • Prepared improved materials for two of the courses I an responsible for; Devices and Data networks and Data Structures and algorithms. I made improvements to lectures and exercises based on last year feedback from students and colleagues.

And in Fall 2022:

  • Offered my course Devices and Data networks both in Finnish (252 students) and in English (23 students) separately with three other teachers assisting. Two of them were students I interviewed and selected as part time teaching assistants together with a really nice and professional colleague of mine. Very nice workmates, all of them!
  • Offered another of my courses, Data Structures and algorithms (313 students). As you can see from the projects above, quite a many of those are for this course.
  • Supported a colleague teaching the data structures course for international students. Didn’t participate in teaching the English group though.
  • Preparing (in December) to return – after a very long break – to participate in the study program BSc student projects as a supervisor. Projects launch in the beginning of January. Students plan and execute a software development project for local companies. Teachers supervise them and see that all goes OK, intervene if necessary. I was surprised to see they still use the project documentation templates I created a long time ago in 1998! They’ve been updated since then, obviously.

Last but not least – our study program student guild Blanko awarded me as the Distinguished teacher of the year! This was the second year in a row for me to get this honor! Really taken and humbled that the work I am doing is valued by the students. In addition to the nice award certificate I got a box of various delicacies like chocolate to enjoy. Thanks Blanko!

I am not giving any new year resolutions, but one goal I have is to finish all those side projects this year and put them to actual use somewhere, if anyone is interested or finds these projects of mine fun or useful. Until that happens, I should really, really not start any new side projects…. We’ll see how that goes.

Interface requirements in C++ using C++20 concepts

When teaching data structures and algorithms in Java, I have implemented corresponding demonstrations in C++. So not to give too much solutions to the students, if I would be using Java in demos.

When demonstrating a hash table, students are given this Java skeleton of a hash table class to implement, with key-value pairs:

public class KeyValueHashTable<K extends Comparable<K>, V> implements Dictionary<K, V> {

So, how to do something similar in C++ to keep the demo closer to the Java code? For hash table, the Key class (K) must implement the Comparable interface, as well as override the hashCode() and equals() inherited from Object class.

I do not need to implement a Comparable interface in C++, since the operator overloading does what is necessary. But the Java Object.hashCode() is something I’d need to implement differently in C++, since in C++ there is no common base class to override the hashCode() from.

Instead, I need a Hashable interface in C++ and all classes that would be used as the Key in the hash table, would have to implement the Hashable interface:

class Hashable {
public:
   virtual long hashCode() const = 0;
   virtual ~Hashable() { }
};

For example, a Vehicle class can implement Hashable interface and thus can act as a Key in hash table:

class Vehicle : public Hashable {
// ...simple hash function, regNum being a std::string.
   long hashCode() const override {
      long hash = 5381;
      for (auto c : regNum) {
         hash = (hash << 5) + hash + c;
      }
      return hash;
   }

Luckily my C++ compiler supports C++20 with concepts. Now I can say in the HashTable class that the classes to be used as Keys in the hash table must conform to the Hashable interface by providing a CheckType that does the checking:

#include <utility>
#include <concepts>

template <class Type, class BaseClass>
concept CheckType = std::is_base_of<BaseClass, Type>::value;

And then finally in the HashTable, explicitly say that the Key is required to pass this check that it inherits (implements) Hashable:

template <class K, class V>
requires CheckType<K, Hashable>
class HashTable {

Now, if I would try to use Vehicle without it being a Hashable:

class Vehicle { // Not implementing the Hashable interface!
...
HashTable<Vehicle, Location> hashTable(20);

// Will result in compiler error:
Constraints not satisfied for class template 'HashTable' [with K = Vehicle, V = Location]

Mind the map

Another post about data structures, time performance and the count-the-words-in-a-book-file-fast case I’ve been writing about before.

I did a C++ implementation of the books and words problem. Earlier, I have implemented several solutions for the problem using Swift and Java. This time I used C++ standard library std::map and wanted to see if parallel processing in several threads would speed up the processing.

Obviously it did. Execution time of the multithreaded version was 74% of the single threaded version. The sample files were processed by the single threaded version in 665 ms, while the multithreaded version took only 491 ms. Nice!

But then I saw, from the documentation of the std::map, that it keeps the key values in the dictionary in order while elements are added to the map/dictionary.

But this is not needed in my case! Surely this also takes time and gives me additional possibilities in optimising the time performance.

I changed, in the single threaded implementation, the std::map to std::unordered_map, and behold, it was faster than the multithreaded version with 446 ms execution time!

So mind the map. There are many, and some of those may be more suitable to your use case than the others.

For details, see the project in GitHub.

Java file management, hall of fame and a nice surprise

In the previous post I mocked the Java app that was hardcoded to use Windows C: disk as the default place to open files.

What then is the recommended way? One is to start looking from the user home directory:

JFileChooser fileChooser = new JFileChooser();

fileChooser.setCurrentDirectory(new File(System.getProperty("user.home")));

Or pick the documents directory. It is also a nice thing to save the selected directory and use that if the user would like to continue with the already selected directory next time.

What else is happening? It is now the last week of the course Data Structures and Algorithms I am responsible teacher. Students have been analyzing algorithm correctness and time complexity, implementing basic data structures, more advanced ones like hash tables and hash functions, binary search trees and binary search.

Lectures addressed graphs and graph algorithms too, but implementation of these was in an optional course project only, Mazes. When students finish that, they can play a game a bit like the PacMan.

They get to choose from two course projects: either that Mazes project or optionally implement the classic task: count the unique words from a book file, ignoring some words from another file. The largest test file is around 16 MB, having almost 100 000 unique words.

Processing the largest file, using naively two nested for loops takes on my Mac Mini M1 with 16MB of memory around two minutes to process. The fast versions (hash tables, binary search trees) take less than a second.

I have implemented several different solutions for comparison with three different programming languages, Java, Swift and C++. Each week in lectures I demonstrated some of these, and in the end we had this table for comparison (sorry, in Finnish only).

Hall of Fame of the different Books and Words implementations. Swift versions can be found from the link above to GitHub.

As you can see, the C++ with 8 threads was the fastest one. Next after C++ came a couple of Java versions. Swift implementations were not so fast as I expected. After some profiling, I suspect the reason is in the way Unicode chars are handled in Swift. All the book files are UTF-8 and students were expected to handle them correctly. I do not like that mostly in teaching programming languages the default is to stick with ascii and conveniently forget the existence of different languages and character sets.

Well, anyways, for some reason, processing these UTF-8 text files takes a lot of time with Swift. Maybe later I have time to find out if the issue is in my code and/or is there anything that can be done to speed things up.

Something very nice happened a week ago — the student guild of our study program, Blanko, awarded me this diploma for being a quality teacher. Apparently they had a vote and I somehow managed to come first this time. The diploma was accompanied by this Italian themed small gift box. A really nice surprise! I was so astonished to receive this, thank you so much if anyone of you is reading!

Nice award from the students for quality teaching.

Teaching season started

Fall teaching season has started with one old course already ongoing. Devices and networks. My part is the networks, so currently I can focus on two new courses starting in October. Data structures and algorithms is an old course but I’ll take responsibility for it this year.

Another course, Platforms and ecosystems is a new course. Meaning I have hands full in creating material for the new course together with two other teachers. And at the same time, familiarising myself with the data structures course.

For the data structures course, I’ve worked earlier with the demo sorting app implemented using Swift. No time to add additional sorting methods there, but should build it with new Xcode, Swift and iOS 14 to see if there are any (breaking) changes.

What I did recently is that I implemented simple (and stupid) array and linked list classes, both in C++ and Swift, to demonstrate the effectiveness of creating and accessing arrays in comparison to linked lists. Will be using that when discussing why different data structures have different (preferred) usage situations. And that there may be conflicting requirements for the data structure in an app. Then you just have to make compromises.

Another important thing to show to the students is that you need to build the release version before comparing or measuring performance.

To thread or not to thread

There’s a distributed C++ system I made, used as a “patient” in a course on Software architectures. It includes a command line tool TestDataGenerator, which I implemented to test the performance and reliability of the system. The tool generates random data in memory buffers and then writes four test data files which are read and handled by the system’s distributed nodes. An earlier blog post discussed the tool’s implementation details.

The generator was single threaded, writing the four data files in sequence, in the main thread. But then this stupid idea popped in my head — what if the four test data files are written to disk in parallel? Would it be faster? How much if any?

Threading is absolutely not needed in this case: generating test data for 5000 students takes about 250ms using my MacBook Pro (13-inch, 2018), 2.3 GHz four core Intel Core i5, 1 Tb SSD disk. On machines with HDDs this could be somewhat slower.

However, I wanted to see how much of execution time (if any) I can squeeze off with the four threads, each writing to their own data file from the RAM buffers. Also an opportunity to learn more about threads. Those horrible, evil things everyone is saying nobody should use…

My first implementation where the threads were created and executed when the memory buffer was full, and saving the file done in a lambda function:

 if (bufferCounter >= bufSize) {
   std::thread thread1( [&isFirstWrite, &STUDENT_BASIC_INFO_FILE, &basicInfoBuffer] {
     saveBuffer(isFirstWrite, STUDENT_BASIC_INFO_FILE, basicInfoBuffer);
   });
// ...

But creating a thread takes time. Lots of time, thousands of processor cycles, depending on your setup (see e.g. this blog post). If the tool startup parameters are -s 50000 -b 500 (create 50000 records with buffer size of 500), this would mean 50000/500 = 100 thread creations per file, so 400 threads would be created during the execution of the tool. Not very good for performance.

I changed the implementation to create the four threads only once, before filling and saving the memory buffers:

   // For coordination between main thread and writer threads
   std::atomic<int> threadsFinished{0};
   // Prepare four threads that save the data.
   std::vector<std::thread> savers;
   savers.push_back(std::thread(&threadFuncSavingData, std::ref(threadsFinished), std::cref(STUDENT_BASIC_INFO_FILE), std::ref(basicInfoBuffer)));
   savers.push_back(std::thread(&threadFuncSavingData, std::ref(threadsFinished), std::cref(EXAM_INFO_FILE), std::ref(examInfoBuffer)));
   // ... and same for the remaining two threads.

and then woken up every time the data buffers were full:

if (bufferCounter >= bufSize) {
   if (verbose) std::cout << std::endl << "Activating buffer writing threads..." << std::endl;
   // Prepare variables for the file saving threads.
   startWriting = true;
   threadsFinished = 0;
   int currentlyFinished = 0;
   // And launch the file writing threads.
   launchWrite.notify_all();

And then the main thread waits for the writers to finish their job before filling the memory buffers again.

   // Wait for the writer threads to finish.
   while (threadsFinished < 4) {
      std::unique_lock<std::mutex> ulock(fillBufferMutex);
      writeFinished.wait(ulock, [&] {
         return currentlyFinished != threadsFinished;
      });
      currentlyFinished = threadsFinished;
   }


Obviously the file writing threads notify the main thread about them finishing the file operations using a condition variable and a counter the main thread can use to keep track of if all the writer threads finished:

// Thread function saving data in parallel when notified that buffers are full.
void threadFuncSavingData(std::atomic<int> & finishCount, const std::string & fileName, std::vector<std::string> & buffer) {
   bool firstRound = true;
   while (running) {
      // Wait for the main thread to notify the buffers are ready to be written to disk.
      std::unique_lock<std::mutex> ulock(writeMutex);
      launchWrite.wait(ulock, [&] {
         return startWriting || !running;
      });
      // We are still running and writing, so do it.
      if (buffer.size() > 0 && startWriting && running) {
         saveBuffer(firstRound, fileName, buffer);
         buffer.clear();
         firstRound = false;
         // Update the counter that this thread is now ready.
         // Main thread waits that four threads have finished (count is 4).
         finishCount++;
      }
      // Notify the main thread.
      writeFinished.notify_one();
   }
}

Then to measurements. I created a script which executes the tool 20 times, first using threads and then sequentially; not using threads (command line parameter -z disables the threading code and uses sequential code):

echo "Run this in the build directory of TestDataGenerator."
echo "Removing output files..."
rm test-*.txt
echo "Running threaded tests..."
for ((i = 0; i < 20; i++)); do ./GenerateTestData -s 50000 -e 10 -b 500 >> test-par.txt; done
echo "Running sequential tests..."
for ((i = 0; i < 20; i++)); do ./GenerateTestData -zs 50000 -e 10 -b 500 >> test-seq.txt; done
echo "-- Tests done -- "
open test-*.txt

Just to compare, I executed the tests in two machines. MacBook Pro 2.3 GHz Intel Core i5 with four cores, 1 Tb SSD and iMac 2015 with HDD. Next, I took the output files and from there the amount of milliseconds the tool took each time, to a Numbers file and generated these graphics from the test data:

Comparison of sequential and threaded execution in two machines.
Comparison of sequential and threaded execution in two machines

As you can see, there is no difference in writing in threads (parallel) or writing sequentially. Here you can see how the threads take turns and execute in parallel in the cores of the processor of the MacBook Pro:

Profiler showing threads executing.
Blue areas show when the threads are active, executing.

Profiling the execution shows that having multiple threads doing the work won’t make a difference. In the trace below you can see that most the time the threads are either waiting for their turn to flush the data to disk or actually flushing the data. Most of the time in the selected saveBuffer method is spent in flushing data.

Profiler screenshot shows where time was spent, flushing and waiting.
Selected lines show where the most of the time was spent.

Also, in the sequential execution, where the single main thread does all, time is spend in flushing to disk:

Single threaded execution profile.
Single threaded execution spent most of the time flushing data to disk.

Creating threads to speed up writing to disk — definitely not a good idea in this case. If this would be an app with GUI, then writing large amounts of data in a thread could very well be a good idea. If writing would take more than a couple of hundred milliseconds, user would notice the GUI lagging/not being responsive. So whether to use threads or not to write data to disk, depends on your use case.

This oldish article from DrDobbs is also an interesting read. Writing several files in threads is not necessarily helpful (unless using RAID), and that one should make threading configurable (like the -z parameter in my implementation) because they may in some situations even slow down the app. Also this discussion on when to apply threads is a good one:

Using multiple threads is most helpful when your program is CPU bound and you want to parallelise your program to use multiple CPU cores.

This is not the case for I/O bound problems, such as your scenario. Multiple threads will likely not speed up your system at all.

Adopting some newer C++ features

I’ve been continuously updating my skills in C++, adopting features from newer features of the language, like from the version C++17. For fun and learning, I’ve been updating some older apps to use these newer features, and implementing some new tools adopting newer features like std::variant, algorithms (instead of traditional loops) and attributes. Some examples below.

Attributes

Instead of commenting in a switch/case structure that fallthrough is OK, use the [[fallthrough]] attribute:

      switch (argc) {
         case 4:
            outputFileName = argv[3];
            [[fallthrough]];
            
         case 3:

Reader is then aware that the missing break; is not actually missing by accident, but intentional. Improves code readability and quality, and silences the compiler warning about the missing break.

To make sure the caller of the function handles the return value, use the [[nodiscard]] attribute:

[[nodiscard]]
int readFile(const std::string & fileName, std::vector<std::string> & entries);

Compiler will warn you that the return value is not handled. This again improves code quality.

nodiscard attribute warns you that essential return value is not handled.

Using using instead of typedef

I wanted to use a shorter name for a complex data structure. Usually done with typedef. Instead, using the using keyword, the one used usually with namespaces, is neat:

using queue_package_type = std::map<std::string, std::pair<int,int>>;
queue_package_type queuePackageCounts;

Or similarily:

using NodeContainer = std::vector<NodeView>;
NodeContainer nodes;
// ...
SPConfigurator::NodeContainer nodes = configurator->getNodes();
std::for_each(std::begin(nodes), std::end(nodes), [this](const NodeView & node) {
   std::string description = node.getInputAddressWithPort() + "\t" + node.getName() + "\t" + node.getOutputAddressWithPort();
   QString logEntry = QString::fromStdString(description);
   ui->LogView->appendPlainText(logEntry);
});

Small thing but makes better looking code, in my opinion. When working with templates, The alias declaration with using is compatible with templates, whereas the C style typedef is not.

Algorithms

In a recent post, I mentioned algorithms like std::iota and std::shuffle, useful in generating test data. When handling containers (vectors, lists), the “old way” is to use either indexes or iterators to handle the items. Implementing these carelessly may lead to bugs. The better alternative is to use algorithms from the standard library, readily developed and rigorously tested, also considering performance. An example from a small tool app I recently made, which searches if id values read from one file are contained in lines read from another file:

std::for_each(std::begin(indexes), std::end(indexes), [&matchCount, &dataEntries, &output](const std::string & index) {
   std::any_of(std::begin(dataEntries), std::end(dataEntries), [&matchCount, &index, &output](const std::string & dataEntry) {
      if (dataEntry.find(index) != std::string::npos) {
         *output << matchCount+1 << "   " << dataEntry << std::endl;
         matchCount++;
         return true; // Not returning from the app but from the lambda function.
      }
      return false;   // Not returning from the app but from the lambda function.
   });
});

std::for_each replaces loops created by using iterators (or indexes to the container), and when some additional logic is needed, std::any_of is a nice solution to end the search when a match is found.

A bit more complicated example, using std::find_if, std::all_of and a boolean predicate object assisting in the search when calling std::find_if. In this example (full source code is here), there is a composite design pattern implemented for handling hierarchical key-value -pairs. The code sample below implements removing a specific key-value pair from the object hierarchy.

/**
 A helper struct to assist in finding an Entity with a given name. Used
 in EntityComposite::remove(const std::string &) to find an Entity with a given name.
 */
struct ElementNameMatches {
   ElementNameMatches(const std::pair<std::string,std::string> & nameValue) {
      searchNameValue = nameValue;
   }
   std::pair<std::string,std::string> searchNameValue;
   bool operator() (const Entity * e) {
      return (e->getName() == searchNameValue.first && e->getValue() == searchNameValue.second);
   }
};

/**
 Removes and deletes a child entity from this Entity.
 If the child is not an immediate child of this entity, then it is given
 to the children to be removed from there, if it is found.
 If the child is a Composite, removes and deletes the children too.
 @param nameValue A child with the equal name and value properties to remove from this entity.
 @return Returns true if the entity was removed, otherwise false.
 */
bool EntityComposite::remove(const std::pair<std::string,std::string> & nameValue) {
   bool returnValue = false;
   auto iter = std::find_if(children.begin(), children.end(), ElementNameMatches(nameValue));
   if (iter != children.end()) {
      Entity * entity = *iter;
      children.remove(*iter);
      delete entity;
      returnValue = true;
   } else {
      // child was not an immediate child. Check if one of the children (or their child) has the child.
      // Use a lambda function to go through the children to find and delete the child.
      // std::all_of can be stopped when the child is found by returning false from the lambda.
      std::all_of(children.begin(), children.end(), [nameValue, &returnValue](Entity * entity) {
         if (entity->remove(nameValue)) {
            returnValue = true;
            return false;
         } else {
            return true;
         }
      });
   }
   return returnValue;
}

// And then call remove() like this, for example, with 
// key "customer", and value "Antti Juustila":
newComposite->remove({"customer", "Antti Juustila"});

What you get is more robust code without your own bugs implemented in “custom” loops with indexes and iterators.

std::variant from C++17

What if your app has some data that can be manipulated in two formats? For example, first you get the data from the network in JSON, and then later you parse the JSON string and create an application specific object holding that parsed data. Later on, you again export the data from the internal object type to JSON to be send over to the network.

You could implement this so that you have both the JSON/string object and the application internal class object in memory. Then just add logic to know which currently has the data and should be used, and ignore the other variable until it is needed. An alternative is to use the good old union to handle this, if you want to save memory. This could be quite complicated to implement.

C++17 provides a more well managed option — std::variant. When using union, you have to keep track what the union contains, but using the variant, it knows which type of object it is currently holding and you can check that.

Following the scenario above, a class could have a member variable holding the JSON in a string, or alternatively, after parsing it, in an application specific object, within an unique pointer assisting with memory management:

std::variant<std::string, std::unique_ptr<DataItem>> payload;

In the class containing the payload member variable, you can initialise it to an empty string:

Package::Package()
: payload("")

Then you can provide setters to change from one representation of the data to another:

// Set the data to be a JSON string:
void Package::setPayload(const std::string & d) {
   payload = d;
}
// ...or a DataItem object, parsed from the string:
void Package::setPayload(std::unique_ptr<DataItem> item) {
   payload = std::move(item);
}

When you access the data to use it somewhere, you can check what is actually stored in the variant and return it. If the representation is not the one requested, return an empty value or null pointer to indicate to the caller that the requested representation of the data is not available currently:

// Get the string, using std::get_if:
const std::string & Package::getPayloadString() const {
   auto item = std::get_if<std::string>(&payload);
   if (item) {
      return *item;
   }
   return emptyString;
}
// Get the DataItem, using std::get_if
const DataItem * Package::getPayloadObject() const {
   auto item = std::get_if<std::unique_ptr<DataItem>>(&payload);
   if (item) {
      return item->get();
   }
   return nullptr;
}

Next I’d like to take a look at how to use the new async programming features of C++, as well as the Boost asio library…

Generating test data with C++

The last time I held the Software Architectures course, I wanted to demonstrate students how to test the performance and reliability quality attributes of a distributed software system. The system already had a feature to process data as a batch by reading data files and processing that data in the networked nodes. All I needed to do was to generate data files with thousands of records to read and process in the system. I implemented a small tool app to generate this test data.

First of all, when generating thousands of records, I wanted to preallocate the necessary buffers to make sure that during data creation no unnecessary buffer allocations are made, making data generation faster:

std::vector<int> generatedStudentNumbers;
if (verbose) std::cout << "Creating numbers for students..." << std::endl;
generatedStudentNumbers.resize(studentCount);

resize() allocates big enough vetor for the data. For creating student numbers (int), std::iota and std::shuffle are quite useful:

// Generate student numbers starting from one to studentCount.
std::iota(generatedStudentNumbers.begin(), generatedStudentNumbers.end(), 1);
// Shuffle the numbers randomly.
std::shuffle(generatedStudentNumbers.begin(), generatedStudentNumbers.end(), std::mt19937{std::random_device{}()});

std::iota fills the container with continuous values starting from 1 in this case. std::shuffle puts the numbers in random order. Voilá, you have a long vector of randomly ordered student numbers you can use in the data generation with only four lines of code!

Next, I needed random names for the students for the data set. For that, I needed a vector of names and then randomly get a name from that vector when creating the student records:

std::vector<std::string> firstNames;
firstNames = {"Antti", "Tiina", "Pentti", "Risto", "Päivi", "Jaana", "Jani", "Esko", "Hanna", "Oskari"};

// Initialize the random engine
std::random_device rd;
std::default_random_engine generator(rd());

// Generate a random int from a range
int  generateInt(int maxValue) {
   std::uniform_int_distribution<int> distribution(0,maxValue);
   return distribution(generator);
}

// Pick one random name
const std::string & getFirstName() {
   int index = generateInt(firstNames.size()-1);
   return firstNames[index];
}

generateInt() helper function is used to get a random name from the firstNames array. The same procedure was used to generate a last name and the study program name for the student. Then all these pieces of information was stored into a record, basically a tab separated std::string. Records, in turn, were contained in a vector of strings.

What is then left is storing the test data records into a file:

std::ofstream datafile(fileName, isFirstRound ? std::ios::trunc : std::ios::app);

// Shuffle the records randomly.
std::shuffle(buffer.begin(), buffer.end(), std::mt19937{std::random_device{}()});
auto save = [&datafile](const std::string & entry) { if (entry.length() > 0) datafile << entry << std::endl; };
std::for_each(buffer.begin(), buffer.end(), save);
datafile.close();

After opening the file stream, first again use std::shuffle to put the data into random order, then use the save lambda function to define what saving a record means. Then just pass this lambda to std::for_each to tell what to do to each of the data records — save them into the std::ofstream.

Finally I made the data generator tool configurable with command line parameters, using Sarge:

Sarge sarge;
sarge.setUsage("./GenerateTestData -[hv]s <number> [-e <number>]");
sarge.setDescription("A test data generator for StudentPassing system. (c) Antti Juustila, 2019.\nUses Sarge Copyright (c) 2019, Maya Posch All rights reserved.");
sarge.setArgument("h", "help", "Display help for using GenerateTestData.", false);
sarge.setArgument("v", "verbose", "Display detailed messages of test data generation process.", false);
sarge.setArgument("s", "students", "Number of students to generate in test data files.", true);
sarge.setArgument("e", "exercises", "Number of exercises generated, default is 6 if option not provided.", true);
sarge.setArgument("b", "bufsize", "Size of the buffer used in generating data", true);

I used the test data generator tool to generate up to 10 000 records and used those test data files to see and demonstrate students how the system manages high data throughput and what which performance. It was also interesting to see what the performance bottlenecks were in the system.

Next year (the last time teaching this course) I’ll demonstrate how to use a thread to read the data files, while at the same time reading test data from the network in another thread. There is a large impact on whether using std::thread.join() or .detach() to control how the networking and data file reading threads cooperate.

Remember to join

When I forget to join (or detach) a C++ std::thread, it’ll crash with SIGABRT(6) on macOS. And obviously the stack dump does not tell me what is going on. So I hunt the bug for some hours, digging the log files, then finally remember that I just implemented a thread…

***** FATAL SIGNAL RECEIVED *******
Received fatal signal: SIGABRT(6)	PID: 17403

***** SIGNAL SIGABRT(6)

*******	STACKDUMP *******
	stack dump [1]  1   libg3logger.1.3.2-80.dylib          0x0000000100cb2163 _ZN12_GLOBAL__N_113signalHandlerEiP9__siginfoPv + 83
	stack dump [2]  2   libsystem_platform.dylib            0x00007fff72f73b1d _sigtramp + 29
	stack dump [3]  3   ???                                 0x0000000000000400 0x0 + 1024
	stack dump [4]  4   libsystem_c.dylib                   0x00007fff72e49a1c abort + 120
	stack dump [5]  5   libc++abi.dylib                     0x00007fff6fef2bc8 __cxa_bad_cast + 0
	stack dump [6]  6   libc++abi.dylib                     0x00007fff6fef2ca6 _ZL28demangling_terminate_handlerv + 48
	stack dump [7]  7   libc++abi.dylib                     0x00007fff6feffda7 _ZSt11__terminatePFvvE + 8
	stack dump [8]  8   libc++abi.dylib                     0x00007fff6feffd68 _ZSt9terminatev + 56
	stack dump [9]  9   BasicInfoGUI                        0x0000000100af1ffa _ZN11OHARStudent14StudentHandler8readFileEv + 42
	stack dump [10]  10  BasicInfoGUI                        0x0000000100af28df _ZN11OHARStudent14StudentHandler7consumeERN8OHARBase7PackageE + 2223
	stack dump [11]  11  BasicInfoGUI                        0x0000000100a6d101 _ZN8OHARBase13ProcessorNode14passToHandlersERNS_7PackageE + 897
	stack dump [12]  12  BasicInfoGUI                        0x0000000100a6a69e _ZN8OHARBase13ProcessorNode10threadFuncEv + 2462
	stack dump [13]  13  BasicInfoGUI                        0x0000000100a79061 _ZNSt3__1L8__invokeIMN8OHARBase13ProcessorNodeEFvvEPS2_JEvEEDTcldsdeclsr3std3__1E7forwardIT0_Efp0_Efp_spclsr3std3__1E7forwardIT1_Efp1_EEEOT_OS6_DpOS7_ + 113
	stack dump [14]  14  BasicInfoGUI                        0x0000000100a78f6e _ZNSt3__1L16__thread_executeINS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEMN8OHARBase13ProcessorNodeEFvvEJPS7_EJLm2EEEEvRNS_5tupleIJT_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE + 62
	stack dump [15]  15  BasicInfoGUI                        0x0000000100a78796 _ZNSt3__114__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEMN8OHARBase13ProcessorNodeEFvvEPS8_EEEEEPvSD_ + 118
	stack dump [16]  16  libsystem_pthread.dylib             0x00007fff72f7ed36 _pthread_start + 125
	stack dump [17]  17  libsystem_pthread.dylib             0x00007fff72f7b58f thread_start + 15

Exiting after fatal event  (FATAL_SIGNAL). Fatal type:  SIGABRT
Log content flushed sucessfully to sink

So I am writing this down, just to remember next time. Join or detach. Join or detach….

   void StudentHandler::readFile() {
      std::thread( [this] {
         StudentFileReader reader(*this);
         using namespace std::chrono_literals;
         std::this_thread::sleep_for(50ms);
         reader.read(node.getDataFileName());
      }).join();
   }

Edit: Actually, I changed join() to detach(). With join, the calling thread waited for this file reading thread to finish before continuing to handle incoming data from network. The file was read totally in memory, and only then data from network was combined with data from file and send ahead to next node in the network. With 5000 test records, all of them were held in memory waiting for the data from network to arrive when using join().

When I switched to use detach(), calling thread could continue reading data from network, while simultaneously file reading thread was reading data from file. Whenever a match was found in a list, either one of these threads, the data was combined and send ahead to the next node in the network. So with join, maximum of 5000 records were held in memory all the time, as with detach(), about 1300-1800 records were held in memory at most. Because combined data could be send ahead to next node in the network and discarded from this node. A significant change in the amount of memory the nodes use. So it does matter which you use, depending on the purpose of the threads in your app.