Too busy and stuff

Apparently I am too busy to write anything here. Or … priorities… or workload… Work, life, etc.

During the Summer 2020 I decided that I want an app in the App Store. So I did it.

Slippery Cities (Liukkaat kadut in Finnish, Hala Gator in Swedish) warns pedestrians of slippery weather conditions in six Finnish cities. That photo above of my Apple Watch says that in Oulu there was a slippery alert two hours ago. So wear your spiked shoes. Or be careful out there.

I learned a lot about:

  • Swift programming
  • SwiftUI programming
  • WatchKit development with Xcode
  • Localisation for three languages
  • Accessibility programming in Apple platforms
  • AppStore: submitting apps for review (no issues by the way, app went through without any hassle)
  • Programming in watchOS, including watch faces, complications, background processing, networking,… with beta version of watchOS 7.

Buy it in the App Store, I want to get rich. Mind you, it may not work in the city you live, so consider that. Won’t refund you unless you twist my arm hard.

Then other things keeping me busy. Created a new course on Software Platforms and Ecosystems with two colleagues. That was hard. Luckily also behind us for a while, since the course ended in December.

I also took responsibility for a course on data structures and algorithms. Corona pandemic made that harder than usual, since the course material and arrangements were based on the assumption of face-to-face classroom teaching. Didn’t happen.

Now I am working on a course on server programming, which is a new course for me. I’ve been implementing a HTTPS chat server with Java and writing instructions for students to do the same. Topics include Java, HTTPS, certificates, authentication, JSON, sqlite database, password hashing and salting, UTF, encoding, decoding, HTTP headers,… Maven, Visual Studio Code, git — lots of work, since none of this existed in late December and course started in early January. ?

Then another course where I assist in exercises on advanced software testing and security issues started this week. Lots to learn for me there too.

So I’ve been very, very busy. Maybe in Spring 2021 I’ll be sharing more here.

Oh yes, and some nice things too — got myself a new Mac Mini M1, an Apple Silicon computer. So enjoying doing these things with the new device! My previous iMac 27″was already five years old, luckily in a good condition. I got a decent price for it when I sold it to fund the Mini.

Teaching season started

Fall teaching season has started with one old course already ongoing. Devices and networks. My part is the networks, so currently I can focus on two new courses starting in October. Data structures and algorithms is an old course but I’ll take responsibility for it this year.

Another course, Platforms and ecosystems is a new course. Meaning I have hands full in creating material for the new course together with two other teachers. And at the same time, familiarising myself with the data structures course.

For the data structures course, I’ve worked earlier with the demo sorting app implemented using Swift. No time to add additional sorting methods there, but should build it with new Xcode, Swift and iOS 14 to see if there are any (breaking) changes.

What I did recently is that I implemented simple (and stupid) array and linked list classes, both in C++ and Swift, to demonstrate the effectiveness of creating and accessing arrays in comparison to linked lists. Will be using that when discussing why different data structures have different (preferred) usage situations. And that there may be conflicting requirements for the data structure in an app. Then you just have to make compromises.

Another important thing to show to the students is that you need to build the release version before comparing or measuring performance.

Betas

I wanted to install Apple developer betas on some of my devices to try out something. But then I saw that the bank ID software didn’t work on the beta. Bummer.

Today, week later, I realized that I also have an iPad. And since the bank ID app works also on iPad, I installed public beta of iOS 14 on my phone, as well as developer beta 2 on my Apple Watch. 🙂

New Swift tools

I’ve been working on my sorting methods demonstration app recently when I’ve had time for it. Most of my time currently is spend with evaluating student exercise projects submitted at the end of June for the Software Architectures course.

So, not much new functionality in the sorting app now. I am now mostly trying to make sure code so far is good and documented. Since my plan is to use it as an example and demonstration in the new course I am teaching in Fall, Data structures and Algorithms, I want the demo well documented and of good quality.

To make this happen, I’ve installed several Swift related tools and tried them out.

I am using Swift lint to make sure code is clean and following the Swift coding conventions. Though I had to disable some rules that are too tight for this demo app, using the .swiftlint.yml configuration file.

disabled_rules:
 - multiple_closures_with_trailing_closure
 - line_length
 - todo

For generating html documentation from the comments in the code, I’ve experimented with Jazzy and Swift-doc. Currently Jazzy seems to be in a better shape for my needs, since Swift-doc currently documents only public properties of the project.

Apparently Swift-doc has been used mainly for documenting APIs and libraries and their public APIs. There is a pull request in the GitHub repository, not yet accepted, enabling specifying the level of protection (open, public, fileprivate, private) to document. I tried out the fork which enables protection level configuration, results of which can be seen here. In addition to the protection level issues, there are issues with links. If you try to navigate using the class graphs on the page, the links are not correctly generated by the tool.

The Jazzy generated documentation does not have these issues, and I like the visual apprearance of the Apple themed documentation page (screenshot below).

Well, back to evaluating the SWA exercise work projects, after finishing the study program Zoom meeting I am currently listening to ?

Jazzy generated html documentation of the sorting demo app.

Fun with Swift operator overloading

There is this guy in one forum, writing all his sentences ending with at least two periods, mostly three… Always… This might help him, integrated into some text editor… Saving his keyboard strokes…

extension String {
   static postfix func ... (str: inout String) -> String {
      str = String(str.map {
         $0 == "." ? "…" : $0
      })
      return str
   }
}

var sentence = "Operator dotdotdot. For those. Who want to write. All sentences. With several periods."
let hisStyle = sentence...
print(hisStyle)
// Prints: "Operator dotdotdot… For those… Who want to write… All sentences… With several periods…"

What about adding a copyright at the end of text the user enters, using this brand new copyright operator, <©>

postfix operator <©>
extension String {
   static postfix func <©> (str: inout String) -> String {
      str += " © Antti J."
      return str
   }
}

var originalText = "Life is a b***h and then you die."
var copyrighted = originalText<©>
print(copyrighted)
// Prints: "Life is a b***h and then you die. © Antti J."

Well, back to some more serious work.

Giving feedback in corona times

Remote teaching challenges the way I have given feedback to students. Sometimes the things to comment are largish software design models with associated documentation. When teaching face to face, it is easy to explain, point with fingers, talk together. But in isolation, that is not possible. Commenting their work in writing takes a long time and leaves room for misinterpretations.

As I mentioned in the previous post, I decided to experiment giving audio comments instead of written ones. I look at their documents and UML models, and record my comments into an audio file. The file is then shared from cloud to the student group. For me this is faster than writing. Students are able to follow my thoughts since I take care to pinpoint what I see when I record the audio. Video conferencing could be an option but requires everyone to be there at the same time. Comments on recorded video? – does not provide enough added value, in my opinion. Furthermore, I am able to comment thoroughly (face to face teaching is usually in classroom hours, which are limited) which may benefit the quality of the work. Negative side is that students must tolerate my an hour or so long ramblings about their work in the worst case… I have now given feedback to 16 student deliveries, perhaps around 12-13 hours of comments.

Anyways, as this seems to work, I may continue giving audio feedback after the corona virus isolation ends. Whenever that may happen…

Not a real student feedback recording. I already deleted those from my machine.

Pandemic isolation ramblings

Due to the corona virus pandemic, I’ve been working remotely since the end of February. I felt like having a cold and isolated myself before the University officially recommended that to the personnell. Not having the virus though, but a common cold only. Obviously I cannot be 100% sure since there are no tests available here unless you are critical workforce or seriously ill. So keeping myself isolated just in case, at home.

Luckily the grocery nearby delivers food and their app and website to create the orders works quite well. Today the second delivery is arriving, which should be enough for a week at least. They have a very high load of orders flooding the service. What I usually do is to create an order with a couple of products, select the delivery date about one week later, and then keep filling the order until the day before the delivery. By this day, I already have the next delivery date reserved, with a new growing list of items to order. In this way, we am able to secure the deliveries so that there will not be too long gaps in between.

Since the isolation started, I have continued to offer video sessions via MS Teams and Moodle discussion and chat support to the students in the Software architectures course. Fortunately, the course lectures and exercises were mostly over by the isolation started. Students continue working on their exercise work projects until the end of May. Luckily I have a fast network at home, and even better hardware than at the campus. That large iMac screen has proven to be quite a good a thing to have. Currently I am recoding audio feedback for the first phase of the exercise work projects to the students. The amount of feedback to give could be quite extensive, and text feedback is inferior to audio, in my opinion. Video in this case is not needed since I can easily pinpoint the things I comment by addressing the chapter titles and paragraph and page numbers. Let’s see how this works.

The study program and the research unit are using Zoom video sessions to keep in touch and organize during the pandemic. Probably this will last until summer, but I suspect there will be limitations and exceptional situation even in the Fall semester. Time will tell. University support staff has increased the online training of teachers, providing courses in Zoom on using Moodle, Teams and Zoom itself in teaching.

I’ve started to implement a small app with Swift to learn something new. The app is also something to use as a demonstration in Fall in the Data structures and algorithms course. I will take charge of that course after the summer break, so wanted to do something related to the topic.

Below is a demo video of the app in the early phases. I am planning to implement maybe a couple of more sorting algorithms and improve the graphics and usability of the app. YouTube is full of these kind of videos, so I will not put too much effort on this, like implementing tens of different algos.

Beauty from git logs

I didn’t know — until today — about the existence of Gource. Explained shortly, you can use it to create beautiful animations from the history of git repositories. After installing it, the last couple of hours was spent in watching and recording videos of my projects in git.

The project in this video is called Keywords. It is a demo project I implemented for Software architectures course, implementing a Client/Server app with TCP, session management, client side API to the server and an Android client.

What is interesting to see on the video is how I started the implementation from the Server, then switched to Client, back to Server. Then I implemented a client API for the server as a library (.jar; this is all Java), made modifications to the Client and so on. Every time I needed new functionality:

  • I first implemented the support that on the Server time,
  • then modified the Client API to support that and
  • finally added the support for the feature on the client side.

Basically how incremental development works (may work), doesn’t it?

Installing Gource on macOS was quite simple, with Homebrew:

brew install gource

And then just run it (in some project directory under git):

gource -auto-skip-seconds 1

Most of my projects are ones I work occasionally, some quiet periods in between, so the auto-skip-seconds option is useful to quickly pass these times mostly nothing happened during the project.

Since I had earlier installed ffmpg, it was easy to save the generated animation into a video file:

gource --auto-skip-seconds 1 --title "EasyCrypto demo project (c) Antti Juustila" -1280x720 -o - | ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i - -vcodec libx264 -preset ultrafast -pix_fmt yuv420p -crf 3 -threads 0 -bf 0 gource.mp4

The saved video files can be quite large, so adjusting the -crf option may be a good idea to make the files smaller. Though that also makes the videos not so cool.

I have one project I started 2013 and have worked upon it until January this year. I already watched the evolution of that system with Gource, and it will make a great video. It would be great with subtitles or voice explaining what happens and why. This would be a nostalgic thing to create: the course I have used that system is something I am leaving behind. This Spring is the last time I will teach the course. The video would be kind of a farewell to the system, since it is unlikely I will continue with it without any useful context, like the course has been.

To thread or not to thread

There’s a distributed C++ system I made, used as a “patient” in a course on Software architectures. It includes a command line tool TestDataGenerator, which I implemented to test the performance and reliability of the system. The tool generates random data in memory buffers and then writes four test data files which are read and handled by the system’s distributed nodes. An earlier blog post discussed the tool’s implementation details.

The generator was single threaded, writing the four data files in sequence, in the main thread. But then this stupid idea popped in my head — what if the four test data files are written to disk in parallel? Would it be faster? How much if any?

Threading is absolutely not needed in this case: generating test data for 5000 students takes about 250ms using my MacBook Pro (13-inch, 2018), 2.3 GHz four core Intel Core i5, 1 Tb SSD disk. On machines with HDDs this could be somewhat slower.

However, I wanted to see how much of execution time (if any) I can squeeze off with the four threads, each writing to their own data file from the RAM buffers. Also an opportunity to learn more about threads. Those horrible, evil things everyone is saying nobody should use…

My first implementation where the threads were created and executed when the memory buffer was full, and saving the file done in a lambda function:

 if (bufferCounter >= bufSize) {
   std::thread thread1( [&isFirstWrite, &STUDENT_BASIC_INFO_FILE, &basicInfoBuffer] {
     saveBuffer(isFirstWrite, STUDENT_BASIC_INFO_FILE, basicInfoBuffer);
   });
// ...

But creating a thread takes time. Lots of time, thousands of processor cycles, depending on your setup (see e.g. this blog post). If the tool startup parameters are -s 50000 -b 500 (create 50000 records with buffer size of 500), this would mean 50000/500 = 100 thread creations per file, so 400 threads would be created during the execution of the tool. Not very good for performance.

I changed the implementation to create the four threads only once, before filling and saving the memory buffers:

   // For coordination between main thread and writer threads
   std::atomic<int> threadsFinished{0};
   // Prepare four threads that save the data.
   std::vector<std::thread> savers;
   savers.push_back(std::thread(&threadFuncSavingData, std::ref(threadsFinished), std::cref(STUDENT_BASIC_INFO_FILE), std::ref(basicInfoBuffer)));
   savers.push_back(std::thread(&threadFuncSavingData, std::ref(threadsFinished), std::cref(EXAM_INFO_FILE), std::ref(examInfoBuffer)));
   // ... and same for the remaining two threads.

and then woken up every time the data buffers were full:

if (bufferCounter >= bufSize) {
   if (verbose) std::cout << std::endl << "Activating buffer writing threads..." << std::endl;
   // Prepare variables for the file saving threads.
   startWriting = true;
   threadsFinished = 0;
   int currentlyFinished = 0;
   // And launch the file writing threads.
   launchWrite.notify_all();

And then the main thread waits for the writers to finish their job before filling the memory buffers again.

   // Wait for the writer threads to finish.
   while (threadsFinished < 4) {
      std::unique_lock<std::mutex> ulock(fillBufferMutex);
      writeFinished.wait(ulock, [&] {
         return currentlyFinished != threadsFinished;
      });
      currentlyFinished = threadsFinished;
   }


Obviously the file writing threads notify the main thread about them finishing the file operations using a condition variable and a counter the main thread can use to keep track of if all the writer threads finished:

// Thread function saving data in parallel when notified that buffers are full.
void threadFuncSavingData(std::atomic<int> & finishCount, const std::string & fileName, std::vector<std::string> & buffer) {
   bool firstRound = true;
   while (running) {
      // Wait for the main thread to notify the buffers are ready to be written to disk.
      std::unique_lock<std::mutex> ulock(writeMutex);
      launchWrite.wait(ulock, [&] {
         return startWriting || !running;
      });
      // We are still running and writing, so do it.
      if (buffer.size() > 0 && startWriting && running) {
         saveBuffer(firstRound, fileName, buffer);
         buffer.clear();
         firstRound = false;
         // Update the counter that this thread is now ready.
         // Main thread waits that four threads have finished (count is 4).
         finishCount++;
      }
      // Notify the main thread.
      writeFinished.notify_one();
   }
}

Then to measurements. I created a script which executes the tool 20 times, first using threads and then sequentially; not using threads (command line parameter -z disables the threading code and uses sequential code):

echo "Run this in the build directory of TestDataGenerator."
echo "Removing output files..."
rm test-*.txt
echo "Running threaded tests..."
for ((i = 0; i < 20; i++)); do ./GenerateTestData -s 50000 -e 10 -b 500 >> test-par.txt; done
echo "Running sequential tests..."
for ((i = 0; i < 20; i++)); do ./GenerateTestData -zs 50000 -e 10 -b 500 >> test-seq.txt; done
echo "-- Tests done -- "
open test-*.txt

Just to compare, I executed the tests in two machines. MacBook Pro 2.3 GHz Intel Core i5 with four cores, 1 Tb SSD and iMac 2015 with HDD. Next, I took the output files and from there the amount of milliseconds the tool took each time, to a Numbers file and generated these graphics from the test data:

Comparison of sequential and threaded execution in two machines.
Comparison of sequential and threaded execution in two machines

As you can see, there is no difference in writing in threads (parallel) or writing sequentially. Here you can see how the threads take turns and execute in parallel in the cores of the processor of the MacBook Pro:

Profiler showing threads executing.
Blue areas show when the threads are active, executing.

Profiling the execution shows that having multiple threads doing the work won’t make a difference. In the trace below you can see that most the time the threads are either waiting for their turn to flush the data to disk or actually flushing the data. Most of the time in the selected saveBuffer method is spent in flushing data.

Profiler screenshot shows where time was spent, flushing and waiting.
Selected lines show where the most of the time was spent.

Also, in the sequential execution, where the single main thread does all, time is spend in flushing to disk:

Single threaded execution profile.
Single threaded execution spent most of the time flushing data to disk.

Creating threads to speed up writing to disk — definitely not a good idea in this case. If this would be an app with GUI, then writing large amounts of data in a thread could very well be a good idea. If writing would take more than a couple of hundred milliseconds, user would notice the GUI lagging/not being responsive. So whether to use threads or not to write data to disk, depends on your use case.

This oldish article from DrDobbs is also an interesting read. Writing several files in threads is not necessarily helpful (unless using RAID), and that one should make threading configurable (like the -z parameter in my implementation) because they may in some situations even slow down the app. Also this discussion on when to apply threads is a good one:

Using multiple threads is most helpful when your program is CPU bound and you want to parallelise your program to use multiple CPU cores.

This is not the case for I/O bound problems, such as your scenario. Multiple threads will likely not speed up your system at all.