Archive for the 'Digital' Category

Ansible and Vagrant SSH Keys

I recently came across the question how to handle SSH keys with Vagrant and Ansible.


Traditionally all Vagrant boxes and VMs require a fixed login (vagrant) with a fixed SSH key. This was very convenient in a development context, but could raise security issues in case Vagrant VMs were used for anything important and users were not aware of the insecure key.

In current Vagrant versions only boxes (i. e. base images) use the insecure key. When Vagrant starts a new VM it generates a new individual SSH key for this instance. Vagrant keeps these custom keys in the .vagrant/machines subdirectory; so host to guest logins (like vagrant ssh) are still possible.

A vagrant up shows the multi-step procedure:

    workstation: Vagrant insecure key detected. Vagrant will automatically replace
    workstation: this with a newly generated keypair for better security.
    workstation: Inserting generated public key within guest...
    workstation: Removing insecure key from the guest if it's present...
    workstation: Key inserted! Disconnecting and reconnecting using new SSH key...
==> workstation: Machine booted and ready!

To prevent this mechanism one can configure the VM with config.ssh.insert_key = False. This is necessary when modifying a box. Say you want to take a bento base box, install your development tools, and then repackage the VM as your own development box in order to distribute it to a development team. — With the default behaviour (config.ssh.insert_key = True) every repackage build would generate a new basebox with an individual ssh keypair; this makes the new boxes practically unusable, because every user would have to get and configure the custom SSH key for each box. With config.ssh.insert_key = False the box will retain the previous key; that means you users have the same experience as with a normal box.

Note: As a compromise between the two modes one can set a custom ssh key for all baseboxes with config.ssh.private_key_path. This might be useful for companies with many internal boxes. In this case one distribute one SSH key to use with all Vagrant boxes but not use the publicly known insecure standard key.


If all VMs use the same SSH key the setup is straightforward: Configure all Vagrant VMs with config.ssh.insert_key = False or config.ssh.private_key_path and use that key for the Ansible login.

With Vagrant’s default behaviour there is no common SSH key for all VMs. In this case one has to configure Ansible to use the right keys for every VM, but the setup is still simple as long as you have the default /vagrant mount with the .vagrant subdirectory.

This subdirectory contains all Vagrant state and contains these files, right now the private_key is the important one:

$ find .vagrant/

I usually run one Vagrant VM as an Ansible controller (with node.vm.provision :ansible_local) and then 1 to N other VMs as installation targets. To enable access from the controller to the other VMs I include this line in the Ansible inventory: ansible_ssh_private_key_file=/vagrant/.vagrant/machines/{{ inventory_hostname }}/virtualbox/private_key

This should work just the same for running Ansible on your host machine and use it to provision the guest VMs. In this case the path would be a relative one: .vagrant/machines/....

The only important detail is the consistent naming of the VM in Vagrant and Ansible. With the ansible_ssh_private_key_file as above Vagrant’s VM name (set with config.vm.define) and Ansible’s inventory_hostname have to be the same. They are not required to match the guest’s hostname (config.vm.hostname), but I strongly recommend to either use the same value or use an obvious mapping. I usually try to use an FQDN for the hostname, and then use the short hostname (the first component) as the VM name and inventory name.

The only important detail is a consistent naming between Vagrant’s VM name (as set with config.vm.define), the VM’s hostname

Python’s concurrent.futures module

For some reason I wanted to improve performance of a small data driven Python program and tried to parallelize it. These are a few learnings to keep around for next time.

I nearly started with the very basics, defining my own threads as well as task and result queues. But then I found the very useful concurrent.futures module which provides a high-level interface to distribute tasks to both threads and processes.

I still made the mistake to start with threads. Everything worked nicely and the tool ran along with four worker threads — but every thread received 25% of CPU time and the overall runtime did not improve. I realized I had forgotten about Python’s Global Interpreter Lock (GIL).
The GIL basically prevents performance improvement using multithreading (at least of CPU-bound tasks, it is still useful for I/O). More information about the GIL:

So I had to switch to multiprocessing instead. The switch itself is really easy because it is nearly completely hidden inside concurrent.futures, I only had to replace the initialization of the ThreadPoolExecutor() with a ProcessPoolExecutor().

But with multiple processes I can no longer share variable values. Everything, including the called function itself, has to be pickled and send to the subprocess.
This required some refactoring, as I had to move the function to the module top-level (as local functions cannot be pickled) and then tried to find a good minimal set of parameters and return values in order to reduce the data transfer between the processes.

I saved my code examples for different concurrent.futures invocations as a gist for later reference: mschuett/

Along the way I also tried the asyncio module for “Asynchronous I/O, event loop, coroutines and tasks”. That one is also quite interesting, but as the name suggests it is focussed on I/O and coroutines in a single thread; functions you need for a network server. For my use case it is not useful, because asyncio does not help to utilize a second CPU core.

Links 2017-05-15

On tools …

  • Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs
    Solaris Zones, BSD Jails, and VMs are first class concepts. [..] Containers on the other hand are not real things.
  • CPU Utilization is Wrong
    The metric we all use for CPU utilization is deeply misleading, and getting worse every year. What is CPU utilization? How busy your processors are? No, that’s not what it measures. Yes, I’m talking about the “%CPU” metric used everywhere, by everyone.
  • Practical jq
    I really love jq, the JSON processor. It has changed my life and pretty much replaced Perl and Ruby as my ETL and data-munging go-to tools.
  • Reshaping JSON with jq
    Working with data from an art museum API and from the Twitter API, this lesson teaches how to use the command-line utility jq to filter and parse complex JSON files into flat CSV files.
  • A Visual Guide to What’s New in Swagger 3.0
    Over the past few years, Swagger 2 has become the de facto standard for defining or documenting your API. Since then, it’s been moved to the Linux foundation and renamed to OpenAPI Spec.
  • A plan for open source software maintainers
    As I envision it, a solution would look something like a cross between Patreon and Bugzilla: Users would be able sign up to “support” projects of their choosing […] and would be able to open issues.

let’s learn tcpdump

When I worked on IPv6 implementations I used tcpdump(1) on a daily basis. — Those times are long gone, but even today it is an extremely helpful tool. Just last week it helped me to debug a database connection problem.

To quote the great Rachel: “tcpdump -e can resolve a great many mysteries.”

If you never used tcpdump (or Wireshark) before then Julia Evan’s zine “let’s learn tcpdump” is a great place to start and to learn the 3-7 important command line parameters.


Chemnitzer Linuxtage 2016


Und noch ein kleiner Hinweis: Bei den Chemnitzer Linux-Tagen sind nun seit einigen Tagen die Audio-Aufzeichnungen der Vorträge online.

Getreu dem Motto „Es ist Dein Projekt“ fand ich viele Vorträge recht kleinteilig und bastelig (à la „Meine drölfzigste Raspberry Pi Lampensteuerung“). Meine persönlichen Highlights waren dann auch zwei Vorträge, die mehr zu meinem eigenen Arbeitsbereich passen: Valentin Haenels Vorstellung des AWS Federation Proxy (leider noch ohne Audio) und René Kochs Übersicht zu oVirt.

New Year’s Crypto Cleanup

Just did some housekeeping of my server I want to document.

Most importantly I got myself a Let’s Encrypt TLS certificate for this blog (and my mailserver), so you no longer have to deal with my self-signed cert to use HTTPS. There has been some discussion about their official client tool, but for a first release it does not seem to be too bad; at least it is written in Python and not in Java or Scala etc. The ACME protocol itself looks sensible and I look forward to more lightweight implementations in the future.

Having a public CA also gave me the opportunity to add an HTTP Strict Transport Security header. Now the next step would be HTTP Public Key Pinning, but that is still out of range for a non-professional website; because Let’s Encrypt may still change their intermediary CA certificate and I also do not have a backup CA that I could use in case of a problem. (BTW, nice HPKP advice on the Let’s Encrypt community site.)

Somewhat related I also expired my old 1024 bit PGP key from  as well as the PGP key of my former work address at DECK36. (BTW, here is a nice description how-to edit gpg key expiration dates by George Notaras.) In order to reach me securely please use my current PGP key (0x4dc5e2280a327754, also on my Contact page).

Interesting Programming Languages

One personal goal this winter is to do more programming in beautiful languages.

At this moment I am quite excited about Python 3, Perl 6, and Go. Read the rest of this entry »

Chemnitzer Linuxtage 2015

CLT2015 Tasse