Use Azure Blob to store files

In this post, I use Azure Storage API to upload files to azure blob container. What's more, I use FileSystemWatcher to monitor file changes in a directory, and update the blob when the file is changed. This can back up files to Azure Blob. To use it, you need to have a Azure Storage account. You can test it locally using Azure Storage Emulator.

I'm using .NET Core 3.0 on Linux. First, let's create the project named azureblob and add necessary packages

dotnet new console -n azureblob 
dotnet add package Microsoft.Azure.Storage.Blob 
dotnet add package Newtonsoft.Json

The Azure blob api lives in Microsoft.Azure.Storage.Blob and I need Newtonsoft.Json to read the setting. Speaking of setting, I create this setting class:

[JsonObject(NamingStrategyType = typeof(SnakeCaseNamingStrategy))]
public class Settings
{
    public string BlobConnectionString { get; set; }
    public string BlobContainer { get; set; }
    public string MonitoredDirectory { get; set; }
}

Correspondingly, the setting file looks like this.

{
     blob_container: "azureblobtutorial",
     blob_connection_string: "<ReplaceWithYourStorageConnectionString. You can find the one for Azure Storage Emulator from the doc.>",
     monitored_directory: "<ReplaceWithYourDirectory>"
 }

Next, I create a class to call the Azure Blob API. The key is to create the CloudBlobClient and get the blob container.

var storageAccount = CloudStorageAccount.Parse(connectionString);
var blobClient = storageAccount.CreateCloudBlobClient();
this._blobContainer = blobClient.GetContainerReference(blobContainer);
this._blobContainer.CreateIfNotExists();
this._requestOptions = new BlobRequestOptions();
this._blobContainer.CreateIfNotExists(this._requestOptions);

Before uploading or deleting a blob, we should get a reference to the blob by its name. I use the file path as the name here

var blob = await this._blobContainer.GetBlobReferenceFromServerAsync(filePath, cancellationToken);

Then we can use the blob to upload or delete a file from Azure Blob.

await blob.DeleteIfExistsAsync(cancellationToken);

Those are basic operations on an Azure Blob. Next we should monitor the file changes in the directory set in monitored_directory. We use FileSystemWatcher. I need to set up the filter to listen to the right events and the event handlers as well

this._watcher = new FileSystemWatcher(monitoredDirectory);
this._watcher.NotifyFilter = NotifyFilters.LastWrite |
                                NotifyFilters.Size |
                                 NotifyFilters.FileName |
                                NotifyFilters.DirectoryName;
this._watcher.IncludeSubdirectories = true;
this._watcher.Changed += this.OnFileChanged;
this._watcher.Created += this.OnFileChanged;
this._watcher.Renamed += this.OnFileRenamed;
this._watcher.Deleted += this.OnFileChanged;
this._watcher.Error += this.OnFileWatchError;
this._watcher.EnableRaisingEvents = true;

Whenever I receive a created/deleted/changed event, in OnFileChanged, I'll eventually trigger a upload or delete on the blob. The Renamed is treated as deletion (on the old one) and creation (on the new one).

The complete code is this commit in this github repos. It still requires some more work to be able to fully work correctly in backing up files in Azure Blob.

  1. When a directory is renamed, it doesn't automatically update the names of the blobs for the files/subdirectories under it.
  2. It doesn't implement differential update. A small change on the file will upload the whole file. This can cause bandwidth for a large file.
  3. When there are frequent changes on the same file, it doesn't batch the changes. It'll upload the whole file that many times.

Regardless, it demonstrates how to use the Azure Blob in a program, as well as the file change.

Redirect Assembly Binding

In a large .Net project, it can be inevitable to have complex dependencies. To make it worse, multiple dependencies may have the dependency on the same assembly but different versions. There is already a way to redirect to bind a different version of assembly in your app. This document outlines how to do it to an application. Sometimes, that's not enough.

The document outlines these approaches

  1. The vendor of assemblies includes a publisher policy file with the new assembly.
  2. Specify the binding in the configuration file at the application level
  3. Specify the binding in the configuration file at the machine level.

The first approach requires the vendor to publish the publisher policy file. The file has to be in global assembly cache which will affect every application on the machine.

What if the vendor doesn't provide this file. Then we can specify the binding in the configuration file by using <bindingRedirect>. The configuration file can be applied to the specified application if it's at the application level or every application if it's at the machine leve.

What if there is not a publisher policy file, or there is no configuration file for the binding at the application level or the machine level? Is it possible to have it happening when you're writing your own application. Probably not. This issue probably happens when you write a plugin or some assembly that're run in a different aplication that you don't own. For example, you're writing a test that's run by vstest. You use a libary A which has a dependency on assembly B version 1.0, and you also use a library C which has a dependency on assembly B version 2.0. At runtime, one version of the assembly B will not be loaded. You don't own the assembly B, and you don't own the application that runs your assembly. Because of that, you cannot count on the publisher policy file or the application-level configuration file. You don't want to create a machine-level configuration file either. There is no assembly level configuration file. The assembly level configuration file is ignored at runtime. I think the best bet of solving it is to load the dependency in the program by yourself. When the runtime doesn't find the right assembly, it raises the event AppDomain.AssemblyResolve.

How do we use AppDomain.AssemblyResolve? The basic idea is:

  1. Check whether the assembly is loaded.
  2. If it's loaded, and if the loaded version satisfies your requirements, then return the loaded one.
  3. If the assembly isn't loaded, and you find one that satisfied your requirements, you can call Assembly.LoadFile to load the assembly and return it.

In a pseudo code, it is

static Assembly OnAssemblyResolve(object sender, ResolveEventArgs args)
{
    if (args.Name.Contains("AssemblyB"))
    {
        foreach (Assembly assembly in AppDomain.CurrentDomain.GetAssemblies())
        {
            if (assembly.FullName.Contains("AssemblyB"))
            {
                return assembly;
            }
        }

        return Assembly.LoadFile("PathToAssemblyB");
    }

    return null;
}

There are, however, some caveats. First, it's at the app domain level, meaning it may impact every assembly in the same app domain. AppDomain.AssemblyResolve passes an event parameter ResolveEventArgs. It has a property ResolveEventArgs.RequestingAssembly to indicate which assembly is requesting to load the one that cannot be resolved. You can use it to make sure that you're loading the assembly in the right context. Second, if you happen to use one of Assembly.Load overloads and it causes AssemblyResolve event, you'll get a stack overflow. You can check out this guidance.

Using well, I think AppDomain.AssemblyResolve can supplement configuration file in handling assembly binding issues in the application.

Update Azure Bot Using Command Line

We can of course manage Azure Bot Service in different ways, for example, from the portal, from Visual Studio, or from command line. I like to use command line. It's convenient: I don't need to navigate the UI in the portal or Visual Studio. I just execute the same command (or last command) from the command line. We can create, publish, and update an Azure bot effectively.

Azure Bot Service Documentation is a good start to learn to develop Azure bot. There is a section about deploy the bot using cli. It covers the command to create and publish the bot. az bot create to create a new bot. az bot publish to publish your code to the bot.

But wait. What if I already have a bot published. I've spent so much hours in debugging my code and making my bot more intelligent. I want to have my bot run the new code. Of course, you can do that from Visual Studio. I would like to use command line. Here is the command:
az bot update --name <BotName> --resource-group <GroupName>
Run this in the top directory of your code. For example, if /path/to/BotCodeInJavaScript contains your code. It's the directory you run the command.

That's it. Your published bot is smarter.


Bad const, Bad enum

Many languages have const and enum. The compiler treats enum values as constan integers too. Enum can be as bad as how const can be. In that sense, I'll use const as an example to demonstrate how they will go wrong.

The Good Side of a const

The meaning of const is, as it indicates, that the value is a constant. There are run time constants or compile time constants. The run time constant means that the value doesn't change when the program runs. The compile time constant indicates that the programmer shouldn't change the value of the variable. The compiler will forbid any assignments to the const except the first initialization. The are at the good side of a const when you don't want to change the value of the variable. It's encouraged in general. The compiler can also use the knowledge to optimize your code.

When It Goes Bad

It'll cause problems when you use  a const in a shared library (.so) or dynamic library (.dll). Let me demonstrate it with an example in C++ on Linux. It'll be the same in C++ on other platforms or C# too.

1. Create a header file with a const in the class: ConstHeader.h

#ifndef __CONSTHEADER_H__
#define __CONSTHEADER_H__

const int TestConst = 10;

class ConstHeader
{
public:
	ConstHeader();

	int get_num() const;

private:
	const int num;
};

#endif //__CONSTHEADER_H__

2. Create source file ConstHeader.cpp

#include "ConstHeader.h"

ConstHeader::ConstHeader() :
	num(TestConst)
{
}

int ConstHeader::get_num() const
{
	return num;
}

3. Create the program that uses the const: UseConst.cpp

#include 
#include "ConstHeader.h"

using namespace std;

int main(int argc, char** argv)
{
	ConstHeader header;
	cout << "number in executable " << TestConst << endl
		<< "number in library " << header.get_num() << endl;
	return 0;
}

4. Compile ConstHeder.cpp into a shared library

$ g++ -shared -Wl,-soname,libConstHeader.so -o libConstHeader.so ConstHeader.cpp

5. Create the program linking to the shared library

$ g++ -o UseConst UseConst.cpp -L. -lConstHeader

6.Run the program

$ LD_LIBRARY_PATH=. ./UseConst
number in executable 10
number in library 10

That looks pretty good. The program uses the same const value as the one in the shared library.

Now what happen if we update the shared library?
Let's change the value of the const TestConst in the shared library

const int TestConst = 20;

Create the shared library and run the program again without recompiling

$ g++ -shared -Wl,-soname,libConstHeader.so -o libConstHeader.so ConstHeader.cpp
$ LD_LIBRARY_PATH=. ./UseConst
number in executable 10
number in library 20

Ooops. When it uses the const directly, it gets 10. While the shared library shows the value is 20.

What's Wrong

Let us pause a minute to think about what's changed. You use a const integer TestConst in a shared library. But the library is updated. Pay extra attentions to the const that are defined in a shared library. Sometimes, when the value is changed. it'll be pain in the ass to debug it. This is the same to an enum's implicit value. For example:

enum class Color
{
	red = 0,
	blue,
	yellow,
};

If this is from a third party library and the enum Color is changed in a new version. E.g.

enum class Color
{
	red = 0,
	green,
	blue,
	yellow,
};

Your program will be broken if you use Color::yellow and Color::blue and don't compile against to the updated header.

All the const and enums defined in the header files can be accessed from a separate compile unit. They are actually interfaces, part of the contract. As a library user, when you use an interface, you expect that the same interface doing the same things in all the versions of the library. Your application relies on that to function well. As a library author, you don't want to drive your user crazy. Don't change the public interfaces.

How to Mitigate It

It depends on your purpose. As a library author, if you just want to provide a well defined value to the library user, use a function to return the value. This will have some overhead in function call, but you can change the value in future version. In C#, you also can declare the variable readonly instead of const. Either way it won't become a compile time constant. Instead, the run time will read the value from your library and it still cannot be changed.

For enum, it'll be a little complicated. The first approach is that you always append the new enumerator at the end. Take Color as an example, instead of adding green in between red and blue, you always add the new enumerator after the last one: yellow. A second approach is that you always set explicit value to the enumerator as what we do to red. There are problems in collaborative work in a large team. There is no way to enforce a person to append to the last in the first approach. For the second approach, two people may use the same explicit value for new enumerators in their work, and they all check in at the same time. Comments in the code. That won't always help.

Remember, public consts and enums are interfaces. Don't change them. This is the best option to prevent them going bad.

Make Investment Works

From school, training, and work, we'll get deeper knowledge in software development. It's not uncommon we lack investment education since schools don't teach everyone, unless we happen to be in the related majors. I decide to write something about investment to reflect what I've learned by myself so far.

Financial advisor

A Financial advisor is a person that provides financial advice or guidance to clients. They can be insurance agents, investment managers, tax advisors, real estate planners, etc. Investment managers deal with their clients' investment portfolios.  It may be good to have your money managed by experts. But you have to be cautious in choosing a financial advisor. They will charge you fees for compensations. Some charges 1% - 1.5% of the assets they manage. Some charge by hour. Regardless your criteria, please add this: a fiduciary. A fiduciary has a higher ethical standard and is required to act on behalf of his or her clients benefits. That means, if the client had all the prerequisites and the information, the client would have taken the same action. A non-fiduciary is only required to make suitable and reasonable action, which may not be the client's best benefits.

Actively managed fund and passively managed fund

An actively managed fund is always attended by a manager, who is supposed to be an expert. They pick the stocks, bonds and other investment vehicles in the funds. A passively managed fund, on the other hand, doesn't requires a fund manager to pick for the funds. They usually match the index they follow. With less human intervening, it incurs less fees to manage the fund.

You may believe actively managed fund will perform well, since  an expert always keeps an eye on it and adjusts immediately to the market change. You remember the chart of the fund always looks good. For example in the screenshot, FSPTX outperforms the benchmark. (Note the chart is for illustration only. It doesn't mean the fund is good or bad). Is that the whole story? Do you notice the text below it? The performance data featured represents past performance, which is not guarantee of future results.

Example Fund Prospect

from https://fundresearch.fidelity.com/mutual-funds/summary/316390202

This study by S&P Down Jones Indices in 2016 shows that 90 percent of actively managed funds fail to outperform their index targets over the past one-year, five-year and 10-year periods. Why don't you find such funds from your fund managers? The funds can be discontinued. Fund managers don't want you to lose confidence.

One major factor in the under performance is the fee in actively managed funds. The cost can be the management fee, trading commission, etc. All financial advisers need compensations. It doesn't matter how well the fund performs. What really matters is how much you get after the fees.  Let's look at the fee of the fund FSPTX.

Example Fund Fee

from https://fundresearch.fidelity.com/mutual-funds/fees-and-prices/316390202

Its expense ratio is 0.77% and its Exp Cap (Voluntary) is 1.15%. The expense ratio is what you pay right now. Exp Cap is the limit of the fee you may pay in future, which means you might end up paying 1.15% fee. For example, you have $10000 in investment and the market value doesn't change in 10 years. If the expense ratio at 0.77%, you pay $10000 * 10 * 0.77% = $770, If the expense ratio is 1.15%, you pay  $10000 * 10 * 1.15% = $1150. That's $380 more for $10000, even when you don't have any gains. The fee of a passively managed fund can be lower than 0.1%.  I'm sure you see the difference. If you worry about the fee, you may be scared when you know Expense Cap may be terminated or revised at any time. You don't have control how much you're charged.

First, most funds cannot outperform the index. You already don't have too many gains. Second, you need to pay more in fees to have an actively managed fund. Passively managed fund most likely just mirror the index it's tracking. It doesn't requires too many fees to get similar performance as an index. An actively managed fund needs perform much better than a passively managed fund to give you the equal gains, after you pay the fees. Those make passively managed fund more attractive than actively managed fund.

Diversification and re-balance

Don't put all eggs in one basket. For example, if you only invest in one company's stock, your return of investment is the same as the company stock. When the company has a rough year, or even goes bankruptcy, you may lose everything. The best way to reduce the risk is to diversify. You can diversify within the same category. For example, you buy 25 – 30 unrelated stocks. Or you can diversify across all categories. You have everything in your portfolio: stocks, bonds, precious metal, etc. The theory is that the same economy data won't affect all of them in the same way. It may hit one stock or one category heavily but be neutral or good for another one. In the end the positive smooths out the negative. You'll need to decide how to allocate your investment, depending on you risk tolerance and investment goal.

What strikes me is re-balance. It has a twofold effect. First, it shields you from more risks. Second, it is a simple way to buy at the bottom/sell at the top. For example, you have a portfolio with 80% stocks and 20% bonds. If the stocks grow to 90%. That may mean the stock is going up. Because it has a cycle, it'll going down some day. If you keep investing in stocks and the stock market crashes, you'll lose a lot. Why not re-balance and have more in bonds? You may sell some stocks and buy more bonds, or invest more in the bonds. In the end, you will have 80% in stocks and 20% in bonds. That may not be the top when you sell the stock. But who knows. Don't try to time the market. You're reducing the risks and having some gains.

Few can guarantee you can get in in and out of the market at the right time. But there are many ways to reduce the risks. Diversification and re-balance are good tools in your risks management.

 

Lump-sum vs recurring

Do you save the money until it's large enough before you investment? Or do you set aside smaller money in your paycheck and invest every month? The first approach is lump sum and the second one is recurring.

They both have advantages and disadvantages. For lump sum, if you buy when the market is at the bottom, you'll get much larger gains. Every rise will count as your gains afterwards. Remember to sell them at the right time too. You need much luck in both time. If the market crashes after you buy, you need to wait a long time before getting even. The second, regardless how the market is, you keep investing even with smaller money. For some money, you may lose. For some other money, you get gains. When you spread out investment like this, you're also reducing the risks. Lump sum can give you much higher return when you do that at the right time. Recurring on the other hand, can reduce your risks of timing the market wrong.

 

I've been talking most about avoiding the fees and reducing the risks. What really makes it works is discipline. Once you decide how to allocate your investment, how to re-balance, and whether lump-sum or recurring, don't let good or bad news in the market sway you. You need to research and analyze before you make any changes. Try to do it rationally, not emotionally. The best way to get more gains is to time the market right. That's almost impossible. Luck is much more important than expertise. The second best way is to reduce risks. Make sure you don't lose money before you get gains.