Dockerize Web Service

My current web services are running on the same virtual machine. They include the WordPress (a.k.a the blog), Nextcloud, and Redmine, etc. One way to keep them secure is to keep them updated. It can be a headache when I need to find new dependencies for the newer versions. It becomes more convinient when I dockerize the web services.

The drawback of putting on the virtual machine is

  • The packages coming with the OS on the virtual machine aren't always updated. Often time I have to run a custom script to install them.
  • Installing via custom scripts may leave the system on an unsustainable state. I forget what's done and how to undo it.
  • It's possible that different web services require different versions.

I think I'm done with those struggles every time I update the services. So I decide to put them into docker.

With docker, I can

  • Easily know what I install on the docker image.
  • Re-create the image and don't worry about undoing the custom scripts.
  • Isolate different services if they have conflicts on the packages.
  • Choose the base OS image if it is needed.

I spend some time writing the Dockerfile.

  1. Figure out the dependencies. Luckily, with the right OS image, I can just use its package manager to install instead of running custom scripts. This part is straightforward since I can read the documents.
  2. Use VOLUME and save the configuration on the host instead of inside the container. This can also save me a lot of efforts when I iterate the image building.

There are two technical challenges I have to tackle and spent most time.

  1. Certbot needs to use systemd to control apache when I add SSL. After a few search, I find docker systemctl replacement. It creates a script to replace the command. The author provides some Dockerfile examples for different images. I modify that to use a custom script running a bunch of other stuffs and systemctl start httpd followed by bash. The script needs to start a command that's long running. The container will stop after the command ends.
  2. Use cron inside the container. I find a good discussion on Stack Overflow. The main steps in Dockerfile are
# Copy hello-cron file to the cron.d directory
COPY hello-cron /etc/cron.d/hello-cron
# Give execution rights on the cron job
RUN chmod 0644 /etc/cron.d/hello-cron
# Apply cron job
RUN crontab /etc/cron.d/hello-cron

There are however something I can do only when the system runs. That's done in the custom script I set to run in CMD. I use flags to make sure it only runs once for that container.

With the change, I remove the unnecessary packages and settings from my virtual machine. I can control how the environment is setup for different web services.

Use Azure Blob to store files

In this post, I use Azure Storage API to upload files to azure blob container. What's more, I use FileSystemWatcher to monitor file changes in a directory, and update the blob when the file is changed. This can back up files to Azure Blob. To use it, you need to have a Azure Storage account. You can test it locally using Azure Storage Emulator.

I'm using .NET Core 3.0 on Linux. First, let's create the project named azureblob and add necessary packages

dotnet new console -n azureblob 
dotnet add package Microsoft.Azure.Storage.Blob 
dotnet add package Newtonsoft.Json

The Azure blob api lives in Microsoft.Azure.Storage.Blob and I need Newtonsoft.Json to read the setting. Speaking of setting, I create this setting class:

[JsonObject(NamingStrategyType = typeof(SnakeCaseNamingStrategy))]
public class Settings
{
    public string BlobConnectionString { get; set; }
    public string BlobContainer { get; set; }
    public string MonitoredDirectory { get; set; }
}

Correspondingly, the setting file looks like this.

{
     blob_container: "azureblobtutorial",
     blob_connection_string: "<ReplaceWithYourStorageConnectionString. You can find the one for Azure Storage Emulator from the doc.>",
     monitored_directory: "<ReplaceWithYourDirectory>"
 }

Next, I create a class to call the Azure Blob API. The key is to create the CloudBlobClient and get the blob container.

var storageAccount = CloudStorageAccount.Parse(connectionString);
var blobClient = storageAccount.CreateCloudBlobClient();
this._blobContainer = blobClient.GetContainerReference(blobContainer);
this._blobContainer.CreateIfNotExists();
this._requestOptions = new BlobRequestOptions();
this._blobContainer.CreateIfNotExists(this._requestOptions);

Before uploading or deleting a blob, we should get a reference to the blob by its name. I use the file path as the name here

var blob = await this._blobContainer.GetBlobReferenceFromServerAsync(filePath, cancellationToken);

Then we can use the blob to upload or delete a file from Azure Blob.

await blob.DeleteIfExistsAsync(cancellationToken);

Those are basic operations on an Azure Blob. Next we should monitor the file changes in the directory set in monitored_directory. We use FileSystemWatcher. I need to set up the filter to listen to the right events and the event handlers as well

this._watcher = new FileSystemWatcher(monitoredDirectory);
this._watcher.NotifyFilter = NotifyFilters.LastWrite |
                                NotifyFilters.Size |
                                 NotifyFilters.FileName |
                                NotifyFilters.DirectoryName;
this._watcher.IncludeSubdirectories = true;
this._watcher.Changed += this.OnFileChanged;
this._watcher.Created += this.OnFileChanged;
this._watcher.Renamed += this.OnFileRenamed;
this._watcher.Deleted += this.OnFileChanged;
this._watcher.Error += this.OnFileWatchError;
this._watcher.EnableRaisingEvents = true;

Whenever I receive a created/deleted/changed event, in OnFileChanged, I'll eventually trigger a upload or delete on the blob. The Renamed is treated as deletion (on the old one) and creation (on the new one).

The complete code is this commit in this github repos. It still requires some more work to be able to fully work correctly in backing up files in Azure Blob.

  1. When a directory is renamed, it doesn't automatically update the names of the blobs for the files/subdirectories under it.
  2. It doesn't implement differential update. A small change on the file will upload the whole file. This can cause bandwidth for a large file.
  3. When there are frequent changes on the same file, it doesn't batch the changes. It'll upload the whole file that many times.

Regardless, it demonstrates how to use the Azure Blob in a program, as well as the file change.

Redirect Assembly Binding

In a large .Net project, it can be inevitable to have complex dependencies. To make it worse, multiple dependencies may have the dependency on the same assembly but different versions. There is already a way to redirect to bind a different version of assembly in your app. This document outlines how to do it to an application. Sometimes, that's not enough.

The document outlines these approaches

  1. The vendor of assemblies includes a publisher policy file with the new assembly.
  2. Specify the binding in the configuration file at the application level
  3. Specify the binding in the configuration file at the machine level.

The first approach requires the vendor to publish the publisher policy file. The file has to be in global assembly cache which will affect every application on the machine.

What if the vendor doesn't provide this file. Then we can specify the binding in the configuration file by using <bindingRedirect>. The configuration file can be applied to the specified application if it's at the application level or every application if it's at the machine leve.

What if there is not a publisher policy file, or there is no configuration file for the binding at the application level or the machine level? Is it possible to have it happening when you're writing your own application. Probably not. This issue probably happens when you write a plugin or some assembly that're run in a different aplication that you don't own. For example, you're writing a test that's run by vstest. You use a libary A which has a dependency on assembly B version 1.0, and you also use a library C which has a dependency on assembly B version 2.0. At runtime, one version of the assembly B will not be loaded. You don't own the assembly B, and you don't own the application that runs your assembly. Because of that, you cannot count on the publisher policy file or the application-level configuration file. You don't want to create a machine-level configuration file either. There is no assembly level configuration file. The assembly level configuration file is ignored at runtime. I think the best bet of solving it is to load the dependency in the program by yourself. When the runtime doesn't find the right assembly, it raises the event AppDomain.AssemblyResolve.

How do we use AppDomain.AssemblyResolve? The basic idea is:

  1. Check whether the assembly is loaded.
  2. If it's loaded, and if the loaded version satisfies your requirements, then return the loaded one.
  3. If the assembly isn't loaded, and you find one that satisfied your requirements, you can call Assembly.LoadFile to load the assembly and return it.

In a pseudo code, it is

static Assembly OnAssemblyResolve(object sender, ResolveEventArgs args)
{
    if (args.Name.Contains("AssemblyB"))
    {
        foreach (Assembly assembly in AppDomain.CurrentDomain.GetAssemblies())
        {
            if (assembly.FullName.Contains("AssemblyB"))
            {
                return assembly;
            }
        }

        return Assembly.LoadFile("PathToAssemblyB");
    }

    return null;
}

There are, however, some caveats. First, it's at the app domain level, meaning it may impact every assembly in the same app domain. AppDomain.AssemblyResolve passes an event parameter ResolveEventArgs. It has a property ResolveEventArgs.RequestingAssembly to indicate which assembly is requesting to load the one that cannot be resolved. You can use it to make sure that you're loading the assembly in the right context. Second, if you happen to use one of Assembly.Load overloads and it causes AssemblyResolve event, you'll get a stack overflow. You can check out this guidance.

Using well, I think AppDomain.AssemblyResolve can supplement configuration file in handling assembly binding issues in the application.

Update Azure Bot Using Command Line

We can of course manage Azure Bot Service in different ways, for example, from the portal, from Visual Studio, or from command line. I like to use command line. It's convenient: I don't need to navigate the UI in the portal or Visual Studio. I just execute the same command (or last command) from the command line. We can create, publish, and update an Azure bot effectively.

Azure Bot Service Documentation is a good start to learn to develop Azure bot. There is a section about deploy the bot using cli. It covers the command to create and publish the bot. az bot create to create a new bot. az bot publish to publish your code to the bot.

But wait. What if I already have a bot published. I've spent so much hours in debugging my code and making my bot more intelligent. I want to have my bot run the new code. Of course, you can do that from Visual Studio. I would like to use command line. Here is the command:
az bot update --name <BotName> --resource-group <GroupName>
Run this in the top directory of your code. For example, if /path/to/BotCodeInJavaScript contains your code. It's the directory you run the command.

That's it. Your published bot is smarter.


Bad const, Bad enum

Many languages have const and enum. The compiler treats enum values as constan integers too. Enum can be as bad as how const can be. In that sense, I'll use const as an example to demonstrate how they will go wrong.

The Good Side of a const

The meaning of const is, as it indicates, that the value is a constant. There are run time constants or compile time constants. The run time constant means that the value doesn't change when the program runs. The compile time constant indicates that the programmer shouldn't change the value of the variable. The compiler will forbid any assignments to the const except the first initialization. The are at the good side of a const when you don't want to change the value of the variable. It's encouraged in general. The compiler can also use the knowledge to optimize your code.

When It Goes Bad

It'll cause problems when you use  a const in a shared library (.so) or dynamic library (.dll). Let me demonstrate it with an example in C++ on Linux. It'll be the same in C++ on other platforms or C# too.

1. Create a header file with a const in the class: ConstHeader.h

#ifndef __CONSTHEADER_H__
#define __CONSTHEADER_H__

const int TestConst = 10;

class ConstHeader
{
public:
	ConstHeader();

	int get_num() const;

private:
	const int num;
};

#endif //__CONSTHEADER_H__

2. Create source file ConstHeader.cpp

#include "ConstHeader.h"

ConstHeader::ConstHeader() :
	num(TestConst)
{
}

int ConstHeader::get_num() const
{
	return num;
}

3. Create the program that uses the const: UseConst.cpp

#include 
#include "ConstHeader.h"

using namespace std;

int main(int argc, char** argv)
{
	ConstHeader header;
	cout << "number in executable " << TestConst << endl
		<< "number in library " << header.get_num() << endl;
	return 0;
}

4. Compile ConstHeder.cpp into a shared library

$ g++ -shared -Wl,-soname,libConstHeader.so -o libConstHeader.so ConstHeader.cpp

5. Create the program linking to the shared library

$ g++ -o UseConst UseConst.cpp -L. -lConstHeader

6.Run the program

$ LD_LIBRARY_PATH=. ./UseConst
number in executable 10
number in library 10

That looks pretty good. The program uses the same const value as the one in the shared library.

Now what happen if we update the shared library?
Let's change the value of the const TestConst in the shared library

const int TestConst = 20;

Create the shared library and run the program again without recompiling

$ g++ -shared -Wl,-soname,libConstHeader.so -o libConstHeader.so ConstHeader.cpp
$ LD_LIBRARY_PATH=. ./UseConst
number in executable 10
number in library 20

Ooops. When it uses the const directly, it gets 10. While the shared library shows the value is 20.

What's Wrong

Let us pause a minute to think about what's changed. You use a const integer TestConst in a shared library. But the library is updated. Pay extra attentions to the const that are defined in a shared library. Sometimes, when the value is changed. it'll be pain in the ass to debug it. This is the same to an enum's implicit value. For example:

enum class Color
{
	red = 0,
	blue,
	yellow,
};

If this is from a third party library and the enum Color is changed in a new version. E.g.

enum class Color
{
	red = 0,
	green,
	blue,
	yellow,
};

Your program will be broken if you use Color::yellow and Color::blue and don't compile against to the updated header.

All the const and enums defined in the header files can be accessed from a separate compile unit. They are actually interfaces, part of the contract. As a library user, when you use an interface, you expect that the same interface doing the same things in all the versions of the library. Your application relies on that to function well. As a library author, you don't want to drive your user crazy. Don't change the public interfaces.

How to Mitigate It

It depends on your purpose. As a library author, if you just want to provide a well defined value to the library user, use a function to return the value. This will have some overhead in function call, but you can change the value in future version. In C#, you also can declare the variable readonly instead of const. Either way it won't become a compile time constant. Instead, the run time will read the value from your library and it still cannot be changed.

For enum, it'll be a little complicated. The first approach is that you always append the new enumerator at the end. Take Color as an example, instead of adding green in between red and blue, you always add the new enumerator after the last one: yellow. A second approach is that you always set explicit value to the enumerator as what we do to red. There are problems in collaborative work in a large team. There is no way to enforce a person to append to the last in the first approach. For the second approach, two people may use the same explicit value for new enumerators in their work, and they all check in at the same time. Comments in the code. That won't always help.

Remember, public consts and enums are interfaces. Don't change them. This is the best option to prevent them going bad.