Archive for the 'Digital' Category

Python’s concurrent.futures module

For some reason I wanted to improve performance of a small data driven Python program and tried to parallelize it. These are a few learnings to keep around for next time.

I nearly started with the very basics, defining my own threads as well as task and result queues. But then I found the very useful concurrent.futures module which provides a high-level interface to distribute tasks to both threads and processes.

I still made the mistake to start with threads. Everything worked nicely and the tool ran along with four worker threads — but every thread received 25% of CPU time and the overall runtime did not improve. I realized I had forgotten about Python’s Global Interpreter Lock (GIL).
The GIL basically prevents performance improvement using multithreading (at least of CPU-bound tasks, it is still useful for I/O). More information about the GIL:

So I had to switch to multiprocessing instead. The switch itself is really easy because it is nearly completely hidden inside concurrent.futures, I only had to replace the initialization of the ThreadPoolExecutor() with a ProcessPoolExecutor().

But with multiple processes I can no longer share variable values. Everything, including the called function itself, has to be pickled and send to the subprocess.
This required some refactoring, as I had to move the function to the module top-level (as local functions cannot be pickled) and then tried to find a good minimal set of parameters and return values in order to reduce the data transfer between the processes.

I saved my code examples for different concurrent.futures invocations as a gist for later reference: mschuett/concurrent.py

Along the way I also tried the asyncio module for “Asynchronous I/O, event loop, coroutines and tasks”. That one is also quite interesting, but as the name suggests it is focussed on I/O and coroutines in a single thread; functions you need for a network server. For my use case it is not useful, because asyncio does not help to utilize a second CPU core.

Links 2017-05-15

On tools …

  • Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs
    Solaris Zones, BSD Jails, and VMs are first class concepts. [..] Containers on the other hand are not real things.
  • CPU Utilization is Wrong
    The metric we all use for CPU utilization is deeply misleading, and getting worse every year. What is CPU utilization? How busy your processors are? No, that’s not what it measures. Yes, I’m talking about the “%CPU” metric used everywhere, by everyone.
  • Practical jq
    I really love jq, the JSON processor. It has changed my life and pretty much replaced Perl and Ruby as my ETL and data-munging go-to tools.
  • Reshaping JSON with jq
    Working with data from an art museum API and from the Twitter API, this lesson teaches how to use the command-line utility jq to filter and parse complex JSON files into flat CSV files.
  • A Visual Guide to What’s New in Swagger 3.0
    Over the past few years, Swagger 2 has become the de facto standard for defining or documenting your API. Since then, it’s been moved to the Linux foundation and renamed to OpenAPI Spec.
  • A plan for open source software maintainers
    As I envision it, a solution would look something like a cross between Patreon and Bugzilla: Users would be able sign up to “support” projects of their choosing […] and would be able to open issues.

let’s learn tcpdump

When I worked on IPv6 implementations I used tcpdump(1) on a daily basis. — Those times are long gone, but even today it is an extremely helpful tool. Just last week it helped me to debug a database connection problem.

To quote the great Rachel: “tcpdump -e can resolve a great many mysteries.”

If you never used tcpdump (or Wireshark) before then Julia Evan’s zine “let’s learn tcpdump” is a great place to start and to learn the 3-7 important command line parameters.

 

Chemnitzer Linuxtage 2016

CLT2016-Tasse

Und noch ein kleiner Hinweis: Bei den Chemnitzer Linux-Tagen sind nun seit einigen Tagen die Audio-Aufzeichnungen der Vorträge online.

Getreu dem Motto „Es ist Dein Projekt“ fand ich viele Vorträge recht kleinteilig und bastelig (à la „Meine drölfzigste Raspberry Pi Lampensteuerung“). Meine persönlichen Highlights waren dann auch zwei Vorträge, die mehr zu meinem eigenen Arbeitsbereich passen: Valentin Haenels Vorstellung des AWS Federation Proxy (leider noch ohne Audio) und René Kochs Übersicht zu oVirt.

New Year’s Crypto Cleanup

Just did some housekeeping of my server I want to document.

Most importantly I got myself a Let’s Encrypt TLS certificate for this blog (and my mailserver), so you no longer have to deal with my self-signed cert to use HTTPS. There has been some discussion about their official client tool, but for a first release it does not seem to be too bad; at least it is written in Python and not in Java or Scala etc. The ACME protocol itself looks sensible and I look forward to more lightweight implementations in the future.

Having a public CA also gave me the opportunity to add an HTTP Strict Transport Security header. Now the next step would be HTTP Public Key Pinning, but that is still out of range for a non-professional website; because Let’s Encrypt may still change their intermediary CA certificate and I also do not have a backup CA that I could use in case of a problem. (BTW, nice HPKP advice on the Let’s Encrypt community site.)

Somewhat related I also expired my old 1024 bit PGP key from  as well as the PGP key of my former work address at DECK36. (BTW, here is a nice description how-to edit gpg key expiration dates by George Notaras.) In order to reach me securely please use my current PGP key (0x4dc5e2280a327754, also on my Contact page).

Interesting Programming Languages

One personal goal this winter is to do more programming in beautiful languages.

At this moment I am quite excited about Python 3, Perl 6, and Go. Read the rest of this entry »

Chemnitzer Linuxtage 2015

CLT2015 Tasse

Observations while Travelling

Train Rides

The day before: “Oh great, several hours on my own. I will pack books and I am gonna get so much reading and writing done.”
On the train: “Argh, I am tired, and it is too loud, I cannot concentrate on anything.”
The day after: “Where went all that time? What did I do?”

Hotel Wi-Fi

The day before: “There will probably be some kind of Wi-Fi available. It is 2014 and it has to be better now than it was the last time.”
On site, after a painful experience including ridiculous prices and/or asking for silly access codes, counting it a success if there is decent signal strength (even without reasonable bandwidth) in the lobby: “Thank god for my smartphone data plan.”

Conferences

9am: “I wonder why they keep so many snacks and cake arround. I just had breakfast and I am fine till lunch.”
After 2-3h of talks or a workshop: “Hunger! I want sugar… and caffeine… and then some more sugar!”