Kieler Open Source und Linux Tage

Links 2018-12-13

On computer and programming history …

  • What is Code?
    Software has been around since the 1940s. Which means that people have been faking their way through meetings about software, and the code that builds it, for generations.
  • Learning BASIC Like It’s 1983
    In 1983, though, home computers were unsophisticated enough that a diligent person could learn how a particular computer worked through and through. That person is today probably less mystified than I am by all the abstractions that modern operating systems pile on top of the hardware.
  • How Lisp Became God’s Own Programming Language
    Lisp transcends the utilitarian criteria used to judge other languages, because the median programmer has never used Lisp to build anything practical and probably never will, yet the reverence for Lisp runs so deep that Lisp is often ascribed mystical properties.
  • Should you learn C to “learn how the computer works”?
    I’ve often seen people suggest that you should learn C in order to learn how computers work. Is this a good idea? Is this accurate?
  • C Portability Lessons from Weird Machines
    In this article we’ll go on a journey from 4-bit microcontrollers to room-sized mainframes and learn how porting C to each of them helped people separate the essence of the language from the environment of its birth.
  • The Coming Software Apocalypse
    A small group of programmers wants to change how we code—before catastrophe strikes.

Links 2018-11-19

On teams and their problems …

postwhite

I recently updated my small mailserver and finally configured DKIM. But another change was easier and still had more impact: installing postwhite. This little tool takes a list of mail domains, then uses their SPF records to derive a list of their outgoing mail servers, then writes this list into a postscreen whitelist configuration. The current default setting contains 43 domains and generates a whitelist with nearly 2000 lines (each containing an IP or subnet). Everything is nicely scripted and can run as a nightly cronjob.

This setup eliminates my biggest problem with greylisting, which is Office356. Their combination of long email resubmit intervals and using multiple cluster servers for delivery attemps always lead to long delays before I received email from Microsoft or any company using Office356. (BTW, I really like greylisting but this is its biggest design problem: it works for single SMTP servers and enforces certain behaviour, but does not and can not consider clusters.)

Links 2018-11-12

More about microservices and Docker …

Links 2018-10-29

Some computing history …

Python’s concurrent.futures module

For some reason I wanted to improve performance of a small data driven Python program and tried to parallelize it. These are a few learnings to keep around for next time.

I nearly started with the very basics, defining my own threads as well as task and result queues. But then I found the very useful concurrent.futures module which provides a high-level interface to distribute tasks to both threads and processes.

I still made the mistake to start with threads. Everything worked nicely and the tool ran along with four worker threads — but every thread received 25% of CPU time and the overall runtime did not improve. I realized I had forgotten about Python’s Global Interpreter Lock (GIL).
The GIL basically prevents performance improvement using multithreading (at least of CPU-bound tasks, it is still useful for I/O). More information about the GIL:

So I had to switch to multiprocessing instead. The switch itself is really easy because it is nearly completely hidden inside concurrent.futures, I only had to replace the initialization of the ThreadPoolExecutor() with a ProcessPoolExecutor().

But with multiple processes I can no longer share variable values. Everything, including the called function itself, has to be pickled and send to the subprocess.
This required some refactoring, as I had to move the function to the module top-level (as local functions cannot be pickled) and then tried to find a good minimal set of parameters and return values in order to reduce the data transfer between the processes.

I saved my code examples for different concurrent.futures invocations as a gist for later reference: mschuett/concurrent.py

Along the way I also tried the asyncio module for “Asynchronous I/O, event loop, coroutines and tasks”. That one is also quite interesting, but as the name suggests it is focussed on I/O and coroutines in a single thread; functions you need for a network server. For my use case it is not useful, because asyncio does not help to utilize a second CPU core.

Links 2018-10-22

Links 2018-03-04