Experience and Lessons of Ember.js

The best way to learn is to practice. So I made a small component with Ember.js.
This component is a tree grid. It's a grid, and it acts like a tree too. It presents a tree in a grid. Each node in the tree is a row in the grid. So node and row are used interchangeable. There is a parent-children relationship between rows. All children rows are displayed under their parent row. These children rows are indented. There are operations on rows too. You can choose to show or hide their children rows.
I'm not going to write an Ember.js tutorial. You can find good ones online, for example, http://code.tutsplus.com/tutorials/ember-components-a-deep-dive--net-35551 and http://coding.smashingmagazine.com/2013/11/07/an-in-depth-introduction-to-ember-js/#what_is_handlebars_template_precompiling. I'm not going through all details of how I made it. You can read the code and history from Github https://github.com/kceiw/treegrid-emberjs. The purpose of this blog is to share the lessons I learned, something that are not obvious or I didn't pay attention to it at the beginning. You may have a much better idea of how to make it and how to use ember.js. But as I began, I made these mistakes.
1. User .get('propertyName') instead of .propertyName.
Accessing to properties directly will give you undefined. e.g. ArrayProxy.length is undefined. You have to use ArrayProxy.get('length') or ArrayProxy.set('length');
It confused me at first, especially when I looked at the API. There is length in ArrayProxy. How come it's undefined. But later I realized that I needed to use get() to get property values. It's documented in the official guide. But I didn't think too much of it at the beginning. I think this is weird to some extent that properties are not accessible directly.
2. Set tagName in view/component.
If it's not set, the default value <div> will be used. Everything in the template is wrapped in the tagName. I think it's always a good idea to set a tagName in view/component because <div> is not always the correct one.

I didn't set tagName in TreeGridRowComponent. Since it's used to show each row in a table, so I had <tr> in the template. This should be in <tbody>. Ember.js wraps the template in <div>. The Html is supposed to be <tbody><div><tr>...</tr></div></tbody>. But for some reason Firefox rearranges it to <div></div><tbody><tr>...</tr></tbody>. It looked fine at the beginning. But when I tried to update the row, it failed. What happened was that Ember.js looked up the target element from <div> element. <tr> was't a descendant of <div> any more. Ember.js failed to find the element and it didttn't do anything.

3. classNameBindings value is an array of string. Each string is a class name or expression to determine the class name. It's in the form ['property1:style1:style2', 'property2:style3:style4'].
You should read this API.
The view API:
4. Css icon
This page http://www.dynamicdrive.com/style/csslibrary/item/css_triangle_arrow_divs/ explains how it works. Didn't know it before.
Don't need to set both height and width to 0 though.
They'll affect the perceivable width and height.
In my tree grid, I shown the down arrow when the row is expanded and right arrow the the row is collapsed. I used to set both height and width to 0. When I expanded and collapsed, I noticed that the height of the row changed too. So I just set only one of them to zero and the other was the size that the icon should be.
5. Keyboard focus
Focus is always the prerequisite of keyboard events. There is a way to set the focus to the tree grid in initialization.
6 Only one action
One thing I like to be in Ember.js is to support multiple actions. Each component can only send one action now.
There are a few things worth writing down in the end.
There are a few confusions such as render and component and view. There are no clear guidelines when/how to use them.
Action is a good concept to handle semantic events. But it has the limitation that one component/controller can only send one action.
Css icon is an interesting way to draw simple icons. I'm not clear the performance v.s. image icon.

Universe Model

When I talked to one guy in a meetup months ago, this thought stroke me.

The topic turned to be robot, human, life, and then the universe.

Some thoughts are summarized below.
Let's start with the human. There are two possibilities about its origin:
1. It's related to alien. Either it's created by an alien or it's offspring of an alien.
Then what's the origin of the alien?
2. It's evolving from life on the earth.
From what life it evolved from? How did the life come out at the very beginning.

Once we have the answer. we can ask again: where is that from? If we keep probing and tracing back, we will need to figure out why there was Big Bang or we'll find something that is far astonishing. In my opinion, both findings are about the origin of the universe.

There is one less difficult question: what is life. We already have a definition...... But is every object, for example, stone, sand, etc, a form of life? From our definition, No. But that's just our definition.

Then it comes a model.
Assuming we can develop a complex system in computer. In this system, some programs can modify (evolve) itself. They can probe the system where they're. They can give definitions to what they "observe". Doesn't that sound like an intellectual life? Then we only need to figure out a way to observe that those programs have the abilities of doing these.

Maybe we're living in such a system. It's not necessarily be a computer in somewhere though.

Setting up PPTPD on Linux

I used to have a VPN on Linux. It is running PPTPD on CentOS 5  on a OpenVZ. It used to work fine but last week I found it didn't work any more. I checked the configuration and the iptables and didn't find anything wrong. I also contacted the VPS vendor and couldn't solve this issue. At last I reinstalled the OS and start everything from scratch. When I went on to the setting of the new server, I found the possible reasons why it didn't work before.

I first set up the PPTPD with the standard step, similar to this link. And I connected to it from my iPhone and it worked. What it didn't work is to open web pages and ping to external IPs didn't work either. I sent an email to the vendor asking them to look into the iptables issues and I went on other settings. I add "ALL:ALL" to /etc/hosts.deny and "sshd:ALL" to /etc/hosts.allow. I changed the pam (/etc/pam.d/system-auth). I also modified iptables in aim of making it more secure. After doing this, I found that it didn't work again. It just looks like what I had encountered. I looked at the log. There was only message when PPTPD started. Nothing else is there. I believed some of the settings failed it. Then I rolled back the settings and it worked. I made the one change at a time and tested if everything is fine. And I found:
1. "ALL:ALL" in /etc/hosts.deny fails PPTPD. I believes it is also the reason why I couldn't connect and had to reinstall the OS. And I added a line "pptpd:ALL" to /etc/hosts.allow and it worked.
2. pam may also failed the authentication. I made the change according to this link. And the iPhone failed to connect to the vpn. I also looked at the log. It showed that PPTPD received requests.
3. the name of network card. The name should be venet0. But I used eth0 in all iptables settings. And that is why I couldn't open web pages or ping to any IPs.

A LLVM Pass to Detect Memory Leak

I started to write this pass at the time when I looked into LLVM. Besides documents, source codes, practice is also important in studying a new thing.

With this pass, I wanted to track the dynamic memory allocation and release and it could be used to detect memory leak. I will just call it MemoryLeakDetector. It is used to analyze a C program and the source code has to be translated to LLVM IR.

The basic idea is to replace the function call to malloc() and free() with __m_malloc() and __m_free(). The new functions allocate and free memory just as malloc() and free(). But they keep track of those allocated memory. So that we can know what memory is not free'd before the program exists. The results are printed to standard output. From the results, we know the address and size of the unallocated memory, and the call stack with  file name and line number where the memory is allocated.

I needed to solve two major problems. First, replace the function calls. Second, get the call stack and the source code information. I'll address them one by one.

1. Replace a function call in LLVM.

I used a ModulePass. This is what is done in runOnModule().

bool is_change = false;
Function *custom_malloc = NULL;
Function *custom_free = NULL;Module::FunctionListType &functions = m.getFunctionList();
vector to_replace_functions;

for (Module::FunctionListType::iterator it = functions.begin(), it_end = functions.end(); it != it_end; ++it) {
    Function &func = *it;
    if (func.getName() == "malloc") {
        FunctionType *ft = func.getFunctionType();
        custom_malloc = Function::Create(ft, func.getLinkage());
        custom_malloc->copyAttributesFrom(&func);
        custom_malloc->setName("__m_malloc");
        to_replace_functions.push_back(&func);
    } else if (func.getName() == "free") {
        FunctionType *ft = func.getFunctionType();
        custom_free = Function::Create(ft, func.getLinkage());
        custom_free->copyAttributesFrom(&func);
        custom_free->setName("__m_free");
        to_replace_functions.push_back(&func);
    } else if (func.getName() == "main") {
         // add the initialize to the main function
         //main function's arguments.
         Function::arg_iterator arg_it = func.arg_begin();
         ++arg_it;
         Argument &argv = *arg_it;// create new function
         Type *new_func_return_type = Type::getVoidTy(m.getContext());
         Type *new_func_arg_type = argv.getType();
         ArrayRef new_func_arg(new_func_arg_type);
         FunctionType *inserting_func_type = FunctionType::get(
             new_func_return_type, new_func_arg, false);
         Twine function_name("__m_set_executable_file");
         Function *inserting_func = Function::Create(inserting_func_type,
         GlobalValue::ExternalLinkage, function_name, &m);

          // insert the new function
          BasicBlock &entryBlock = func.getEntryBlock();
          ArrayRef argv_value(&argv);
          Instruction *new_inst = CallInst::Create(inserting_func, argv_value,"");
          entryBlock.getInstList().push_front(new_inst);
          CallInst *ci = cast(new_inst);
          ci->setCallingConv(func.getCallingConv());
    }
}

for (vector::iterator it = to_replace_functions.begin(), it_end = to_replace_functions.end(); it != it_end; ++it) {
    Function *func = *it;
    Function *replace_func = NULL;

    if (func->getName() == "malloc") {
        replace_func = custom_malloc;
    } else if (func->getName() == "free") {
        replace_func = custom_free;
    }

    if (replace_func != NULL) {
        m.getFunctionList().insert(func, replace_func);

        while (!func->use_empty()) {
            CallSite CS(func->use_back());
            vector args(CS.arg_begin(), CS.arg_end());
            Instruction *call = CS.getInstruction();
            Instruction *new_call = NULL;
            const AttrListPtr &call_attr = CS.getAttributes();

            if (InvokeInst *ii = dyn_cast(call)) {
                new_call = InvokeInst::Create(replace_func,
                    ii->getNormalDest(),
                    ii->getUnwindDest(),
                    args, "", call);

                InvokeInst *inv = cast(new_call);
                inv->setCallingConv(CS.getCallingConv());
                inv->setAttributes(call_attr);
            } else {
                new_call = CallInst::Create
                    (replace_func, args, "", call);
                CallInst *ci = cast(new_call);
                ci->setCallingConv(CS.getCallingConv());
                ci->setAttributes(call_attr);
                
                if (ci->isTailCall())
                    ci->setTailCall();
             }

            new_call->setDebugLoc(call->getDebugLoc());

            if (!call->use_empty()) {
                call->replaceAllUsesWith(new_call);
             }

             new_call->takeName(call);
             call->eraseFromParent();
             is_change = true;
         }
    }
}

return is_change;

First, iterate through all the function definitions and the declarations in the current module. When there is one function with the name "malloc" (it is the same to the function "free"), creates a new declaration with the same function type, same linkage and same attributes but a new name "__m_malloc". Because if there is a function declaration in the current module, there should be at least one call of that function in the current module. Once I have the function to replace, I find out where the function to replace is called (CS) and the instruction to call it (call). Then I create a new instruction to call the new function (new_call) with the arguments to the old function, and set the attribute and the calling convention. Then I replace the instruction (replaceAllUseWith).

This is a pass of LLVM. So it can be used by opt. To use it, compile the C program to .bc file by using clang and then apply this pass to it to get a new .bc file. After that, use clang to link them with the necessary object files or libraries. So the executable file will call __m_malloc() and __m_free() instead of malloc() and free().

2. Memory Management.
After replacing malloc() and free() with __m_malloc() and __m_free(), we can implement our own memory management mechanism. Since my goal is to detect memory leak, I don't do my own management. Instead, I call malloc() and free() in the new functions. But I also do some bookkeeping. After malloc() is called, I put the new allocated memory's address and the call stack into a map. This map maps a memory address to a call stack. The call stack is the necessary information to allow me to get the line numbers. After free() is called, the address is removed from the map, which means that the memory is released. So before the program exists, I just need to iterate through the map to know what memory is not released and where the memory is allocated.

3. Get the call stack.
To get the call stack, I use the library libunwind. It provides API to get the call stack. Here is what I do to get the information. It is similar to the example code.

CallStack *cs = new CallStack;
unw_cursor_t cursor;
unw_context_t uc;
unw_word_t ip, sp;unw_getcontext(&uc);
unw_init_local(&cursor, &uc);

while (unw_step(&cursor) > 0) {
    unw_get_reg(&cursor, UNW_REG_IP, &ip);
    AddressDesc al;
    al.current_pc = ip - 1;
    al.size = size;
    cs->push_back(al);
}

What is really needed is the IP, the program counter. The above code iterates through the call frames and push the IP values to cs. And this cs has the enough information that allows us to find out the call stack where the memory is allocated.

4. Get the line number
From (2) and (3), the non-released memory and the call stack is known before the program exists. But how to convert the call stack to the line number?
There is a library bfd to do this. GDB and addr2line are using it to convert an address to the line number. There is no standalone library, but the functionality is integrated into binutils. I modify the addr2line code so that it works in my code. What addr2line does is to read the binary and its symbol table. For a given address, it goes through sections in the binary and finds the nearest line that matches the address. Here is the code I modify and the necessary header file. It can be compiled as a library.

#include "getopt.h"
// #include "libiberty.h"#include "string.h"
#include
#include
#include
#include
#includeusing namespace custom_memory;
using namespace std;

namespace
{
    /* Options passed to cplus_demangle (in 2nd parameter). */const static unsigned DMGL_NO_OPTS       =  0;              /* For readability... */
    const static unsigned DMGL_PARAMS        = (1 << 0);            /* Include function args */
    const static unsigned DMGL_ANSI          = (1 << 1);            /* Include const, volatile, etc */
    const static unsigned DMGL_JAVA          = (1 << 2);            /* Demangle as Java rather than C++. */
    const static unsigned DMGL_VERBOSE       = (1 << 3);            /* Include implementation details.  */
    const static unsigned DMGL_TYPES         = (1 << 4);            /* Also try to demangle type encodings.  */
    const static unsigned DMGL_RET_POSTFIX   = (1 << 5);            /* Print function return types (when present) after function signature.  It applies only to the
toplevel function type.  */
    const static unsigned DMGL_RET_DROP      = (1 << 6);            /* Suppress printing function return types, even if present.  It applies only to the toplevel
function type. */
    const static unsigned DMGL_AUTO          = (1 << 8);
    const static unsigned DMGL_GNU           = (1 << 9);
    const static unsigned DMGL_LUCID         = (1 << 10);
    const static unsigned DMGL_ARM           = (1 << 11);
    const static unsigned DMGL_HP            = (1 << 12);            /* For the HP aCC compiler; same as ARM except for template arguments, etc. */
    const static unsigned DMGL_EDG           = (1 << 13);
    const static unsigned DMGL_GNU_V3        = (1 << 14);
    const static unsigned DMGL_GNAT          = (1 << 15);/* If none of these are set, use 'current_demangling_style' as the default. */
    const static unsigned DMGL_STYLE_MASK    =
(DMGL_AUTO|DMGL_GNU|DMGL_LUCID|DMGL_ARM|DMGL_HP|DMGL_EDG|DMGL_GNU_V3|DMGL_JAVA|DMGL_GNAT);

    const char *TARGET = "x86_64-unknown-linux-gnu";

    inline size_t get_file_size(const char * file_name)
    {
        struct stat statbuf;

        if (stat(file_name, &statbuf) < 0) {
            if (errno == ENOENT)
                cerr << "No such file: " << file_name << endl;
            else
                cout << "Warning: could not locate " << file_name << ", reason: " << strerror(errno) << endl;
        } else if (!S_ISREG(statbuf.st_mode))
            cout << "Warning: " << file_name << " is not an ordinary file." << endl;
        else if (statbuf.st_size < 0)
            cout << "Warning: " << file_name << " has negative size, probably it is too larget." << endl;
        else
            return statbuf.st_size;

        return static_cast(-1);
    }

    inline void set_default_bfd_target(void)
    {
        /* The macro TARGET is defined by Makefile.  */
        const char *target = TARGET;

        if (!bfd_set_default_target(target))
            cerr << "can't set BFD default target to " << target << " : " << bfd_errmsg(bfd_get_error()) << endl;
    }

} // namespace

struct AddressLine::InternalData
{
    InternalData() :
        syms(NULL),
        abfd(NULL),
        pc(0),
        filename(NULL),
        functionname(NULL),
        found(FALSE),
        target(NULL),
        path_pointer(NULL)
    {
    }

    ~InternalData()
    {
        if (syms != NULL)
            free(syms);

        if (abfd != NULL)
            bfd_close(abfd);
    }

    asymbol ** syms;
    bfd *abfd;
    bfd_vma pc;
    const char *filename;
    const char *functionname;
    unsigned line;
    bfd_boolean found;
    const char *target;
    const char *path_pointer; // pointer to the path's internal data, used in c interface.
    string path;
};

AddressLine::AddressLine() : data(new InternalData)
{
}

AddressLine::~AddressLine() = default;

void AddressLine::set_path(const string &_p)  // public
{
    data->path = _p;
    data->path_pointer = data->path.c_str();
}

bool AddressLine::init()                                  // public
{
    if (get_file_size(data->path_pointer) < 1) {
        cerr << "Failed to get file size.n";
        return false;
    }

    set_default_bfd_target();

    data->abfd = bfd_openr(data->path_pointer, data->target);

    if (data->abfd == NULL)
    {
        cerr << "Failed to open the file: " <path << endl;
        return false;
    }

    /* Decompress sections.  */
    data->abfd->flags |= BFD_DECOMPRESS;

#if 0
        if (bfd_check_format(data->abfd, bfd_archive)) {
            cerr <path << ": cannot get addresses from archive" << endl;
           return false;
        }
#endif

    asection *section;
    char **matching;

    if (!bfd_check_format_matches(data->abfd, bfd_object, &matching)) {
        cout << "Warning: check bfd format. "<< data->abfd << " : " << bfd_errmsg(bfd_get_error());

        if (bfd_get_error() == bfd_error_file_ambiguously_recognized)
        {
            list_matching_formats(matching);
            free(matching);
         }

         return false;
    }

    return slurp_symtab();
}

bool AddressLine::slurp_symtab()
{
    long storage;
    long symcount;
    bfd_boolean dynamic = FALSE;

    if ((bfd_get_file_flags(data->abfd) & HAS_SYMS) == 0)
        return false;

    storage = bfd_get_symtab_upper_bound(data->abfd);

    if (storage == 0) {
        storage = bfd_get_dynamic_symtab_upper_bound(data->abfd);
        dynamic = TRUE;
    }

    if (storage < 0)
    {
        cerr <path << " : " <abfd) << " : " << bfd_errmsg(bfd_get_error()) << endl;
        return false;
    }

    data->syms = (asymbol **) malloc(storage);
    if (dynamic)
        symcount = bfd_canonicalize_dynamic_symtab(data->abfd, data->syms);
    else
        symcount = bfd_canonicalize_symtab(data->abfd, data->syms);

    if (symcount < 0) {
        cerr <path << " : " <abfd) << " : " << bfd_errmsg(bfd_get_error()) << endl;
        return false;
    }

    return true;
}

#define xvec_get_elf_backend_data(xvec) ((const struct elf_backend_data *) (xvec)->backend_data)
#define get_elf_backend_data(abfd) xvec_get_elf_backend_data((abfd)->xvec)
// #define bfd_get_flavour(abfd) ((abfd)->xevc->flavour)

SourceInfo AddressLine::get_source_info(size_t address)   // public
{
    SourceInfo si;
    const struct elf_backend_data * bed;
    data->pc = address;

    if (bfd_get_flavour(data->abfd) == bfd_target_elf_flavour
        && (bed = get_elf_backend_data(data->abfd)) != NULL
        && bed->sign_extend_vma
        && (data->pc & (bfd_vma) 1 <s->arch_size - 1)))
        data->pc |= ((bfd_vma) - 1) <s->arch_size;

    data->found = false;
    bfd_map_over_sections(data->abfd);

    if (data->found) {
        while (1) {
        const char *name;
        char *demangled_name = NULL;

        name = data->functionname;

        if (name == NULL || *name == '')
            name = "??";
            demangled_name = bfd_demangle(data->abfd, name, DMGL_ANSI | DMGL_PARAMS);

            if (demangled_name != NULL)
                name = demangled_name;

            si.function = name;

            if (demangled_name != NULL)
                free(demangled_name);

            si.file = ((data->filename != NULL) ? data->filename : "??");
            si.line = data->line;
            data->found = bfd_find_inliner_info(data->abfd, &data->filename, &data->functionname, &data->line);

            if (!data->found)
                break;
        }
    }

    return si;
}

void AddressLine::bfd_map_over_sections(bfd *abfd)
{
    asection *sect;
    unsigned int i = 0;

    for (sect = abfd->sections; sect != NULL; i++, sect = sect->next)
        find_address_in_section(abfd, sect);

    if (i != abfd->section_count)    /* Debugging */
        abort ();
}

void AddressLine::find_address_in_section(bfd *abfd, asection *section)
{
    bfd_vma vma;
    bfd_size_type size;

    if (data->found)
        return;

    if ((bfd_get_section_flags(abfd, section) & SEC_ALLOC) == 0)
         return;

    vma = bfd_get_section_vma(abfd, section);

    if (data->pc < vma)
         return;

    size = bfd_get_section_size(section);
    if (data->pc >= vma + size)
         return;

    data->found = bfd_find_nearest_line(abfd,
    section, data->syms, data->pc - vma, &data->filename, &data->functionname, &data->line);
}

void AddressLine::list_matching_formats(char **p)
{
    cerr <path << ": Matching formats:";
    while(*p)
        cerr << " " << *p++;
    cerr << endl;
}

I created the header file elfbfd.h. It contains the macros, type declaration and function declaration extracted from binutils for ELF file format. It is a big file and all come from binutils so I don't paste it here.

5. Set the executable file.
When the program exists, the non-released memory is in the map and so is the call stack. By using the IP in the call stack and (4), it is easy to get the file and line number, as well as the function name. But we still need to read the executable file first, that is, the running program itself. So let's look back at the pass (1). In the pass, when it detects there is a function main() in the module, it creates a new function __m_set_executable_file, and then creates a call instruction and adds this instruction to the entry block.  __m_set_executable_file() will eventually call AddressLine::set_path(). __m_set_executable_file() receives the argv[] arguments passed to main and argv[0] is the name of the program. It also needs to get the absolute path of the program and pass this path to AddressLine::set_path().

Final thoughts

The above is the step I did in the project. There are still a lot to be done to make it more useful. There are some limitation though.
1. The pass is applied to .bc file. So one needs to compile the C source code to .bc file, runs opt to apply the pass and links the file. This may be addressed by converting it the a plugin to clang.
2. It only detects memory leak in the code that is compiled and linked by the above steps. If a library is used and is not compiled and linked as such, the memory leak in the library code won't be detected.
3. When linking, it has to link to a new library (2), (3), (4). I think this can be addressed by implementing a plugin to clang.

And note that IP is the address of next instruction. So if you pass IP directly to (4) AddressLine::get_source_info(), you will get the the line number of the next line where the function is called. I found passing IP - 1 works fine, but needs to figure out why. I think this may have something to do with the instruction length. And I don't know how it works in instruction branching.

Lock-free Ring Buffer in C++

In a multi-threaded program, we have to make sure that the shared data are thread safe. Usually, we employ some kind of lock (mutex, semaphore etc) to achieve it. However lock itself is expensive. If the lock is removed and in the same time the data is still thread safe, the performance will be improved a lot. There are lots of researches on lock free data structures. Most of them require the hardware to provide CAS. These methods are intricate in terms of implementation and correctness. They need to deal with such problems as ABA Problem as well.Those methods are for general cases. They can be applied to the situation where there are many threads and each one can have read and write acc/ess to the shared data. But when looking into a special case, and studying its properties, we can find a simpler and more efficient lock free data structure.

The special case is the single producer single consumer. This is a simple scenario and it is not uncommon. In this scenario:

  1. There are only two threads accessing to the shared data.
  2.  One of them only reads the data. The other only modifies the data.
  3. Data is written at one end while it's read at the other end.

A queue is usually used to pass data between threads. But I would like to use the ring buffer. This is because:

  1. A queue is usually implemented by using a list. There is memory allocation for each element added. Memory allocation is expensive and there is a global lock in dynamic allocation in C/C++ (you can search for "lock-free malloc" if you're interested).
  2. A ring buffer is usually implemented by using an array. we can allocate memory in bulk. Thus reduce the times to allocate memory. e.g. instead of allocating 10 elements 10 times in a queue, we can allocate an array with 10 element only once.
  3. We can achieve spatial locality by using an array. It is hard to achieve it by using a queue.

There is a drawback of an array. The size of an array is fixed. So we cannot put more elements into the ring buffer when it is full. But there is a way around it.

First let's look at a simple implementation of a lock-free ring buffer in a single producer single consumer context (refer to this link for more information about ring buffer). Then we will see how it works and why it works. At last, I'll address the limitations and improvements.

This is the implementation. It may be improved later.


class RingBuffer
{
public:
    RingBuffer();
    int read(int *result)
    int write(int element);private:
    unsigned read_index;
    unsigned write_index;
    static const unsigned buffer_size = 256;
    int buffer[buffer_size];
};

RingBuffer::RingBuffer() : read_index(0), write_index(0)
{
}

int RingBuffer::read(int *result)
{
    unsigned local_read = read_index;
    if (local_read == write_index)
    return 0;++local_read;

    if (local_read == buffer_size)
        local_read = 0;*result = buffer[local_read];

    read_index = local_read;return 1;
}

int RingBuffer::write(int element)
{
    unsigned local_write = write_index;++local_write;
    if (local_write == buffer_size)
        local_write = 0;

    if (local_write != read_index) {
        buffer[local_write] = element;
        write_index = local_write;
        return 1;
    }
    return 0;
}

There are only two variables that we're interested in. They are read_index and write_index. The variable buffer is where the shared data will be in. And we need to guarantee the access to the data in buffer is thread safe. That is what read_index and write_index for. Any writes won't go pass read_index and any reads won't go pass write_index either.

Note that in single producer single consumer, there are only one thread reading (called reader thread) and only one thread writing (called writer thread). And the reader thread will only call RingBuffer::read() and writer thread will only call RingBuffer::write(). So the read_index is only changed by the reader thread and write_index is only changed by the writer thread. Also note that they are changed only when the actual read/write is done. So in read(), before read_index is changed, writer thread won't write to any places after read_index. That is also where reader thread will read data.  So the writer thread won't overwrite any data. This is true in writer(). The reader thread won't read from places after write_index.

Let's take an example to see how it works. Say the size of the ring buffer is 256 and write_index = 200 and read_index = 100. The writer thread has already writen to the cell 200, and the reader thread has read data from up to cell 100.

0 ... ... ... 100 ... ... ... 200 ... ... ... 255

The cells in blue are safe to write in and the cells in red are available to read. The reader thread first checks whether read_index (100) is equal to write_index (200). It is not, so it will read the element at 101. If read_index goes to 200, the reader thread won't get any element because read() will just return when read_index == write_index, which means that the reader thread never gets a chance to read from any cells in blue including the cell 200. So either the reader thread will read valid data or it won't read any data at all. This is the same when we analyse the writer thread. The writer thread never overwrites the cells in red, including the cell 100. The writer thread will always write to an cell that is safe to do so.

But why are there local_read and local_write? When the reader thread reads the buffer, the writer thread will probably try to write to it. Since read_index is used to prevent the writer thread from writing to the red cells. It shouldn't be changed during this process until the reader thread has finished reading.  So we use local_read to tell the reader thread where to read the element. Once the reader thread gets the element and it is assigned with the value local_read, the writer thread can immediately write to that cell. It won't corrupt any data because the element at that cell is already read.

That is how read_index and write_index together protect the shared data. But to make it actually to work as we expect, we need to address some technical details. There are two major problems we need to take into consideration: atomic operations and reordering.

  • atomic operations

Both the reader thread and the writer thread can access read_index and write_index. The read/write of them must be atomic operations for this to work properly. Otherwise, one thread may get stale data. On x86/x86_64 integer read/write are atomic if they're properly aligned. Thus we have to check with the compiler how they align integers. And I believe most will do them correctly.

  • reordering
  • Compiler reordering

The compiler may reorder instructions to optimize the program. Let's take a look at this snippet from read():

*result = buffer[local_read];     inst. 1
read_index = local_read;          inst. 2

The compiler may emit inst. 2 before inst. 1. If so, read_index is changed before the reader thread reads the element.  If the writer thread happens to run, can it write to read_index? NO. We don't need to worry compiler reordering here.
But it is not OK in write().

buffer[local_write] = element;     inst. 3
write_index = local_write;         inst. 4 

If inst. 4 is moved above inst. 3. Write_index is updated before the writer thread writes to buffer[local_write]. That will allow the reader thread to read from buffer[local_write], which will read stale data. To deal with it, I make local_write volatile to have the compiler not to optimize reads/writes of local_write and make sure inst3. is executed before inst. 4. But don't need to make read_index or write_index volatile. (refer to this and this if you're interested in volatile.)

We still assume it is on x86/x86_64. The CPU won't move reads around other reads, writes around other writes or move writes before reads. It only moves reads ahead of writes. Let's look at read(). It reads read_index to local_read, modifies local_read, reads the element at local_read to the result, and then modifies read_index. The update of read_index is not executed before the element at local_read is written to result. Even if modification of read_index runs before reading the element in local_read due to compiler reordering, we know that the writer thread won't write to read_index. The data is still safe.

There are some limitations to this implementation.

  1. We assume that atomic operations and reordering on x86/x86_64. It is not necessarily true on other platforms.
  2. It only reads/writes one element at once. It would be better to have such interfaces int read(int *array, int max_size) and int write(int *array, int max_size).
  3. The array won't grow. This implementation of ring buffer won't hold more than 255 elements. But what if the writer thread writes more than that but reader thread cannot catch up? We need to think of a way to "expand" the ring buffer. The idea comes from the possible implementation of deque in C++. But instead of vector of vectors, a list of such ring buffer is used. If the current ring buffer is full, we can append a new ring buffer to the list and the writer thread writes to the new one. That's it. Again, we need a lock-free list. And when a ring buffer is empty after reading, we need to release them if they are dynamic allocated. There are two articles regarding lock-free queue.  http://www.drdobbs.com/parallel/208801974 and http://www.drdobbs.com/parallel/210604448?pgno=1. I believe you can implement it and you won't worry about the capacity of a single ring buffer.

Javascript Paging

I wanted to have my own home page that would display all I published on Twitter and other blogger services. So I learned Javascript and wrote one myself. It was easy to have the content displayed on the web page. It displays the latest tweet followed by a blog article. When you click on the right margin, the next article will appear. Similar to that, when you click on the left margin, the previous article will display. There was a problem though. When the article happens to be long, it would take much space from top to bottom. Then I wanted to add paging and the area to display the article is fixed. If the article cannot fit into that area, it only displays what it can display and the remaining parts are hidden. When the right margin is clicked on, the next part is going to display.I spent a lot of time looking for a way for paging. There are many methods and plugins to do this. But they seem not so useful in my case. They provide simple and convenient methods to do the paging. Some assume that every content for each page is already in the page (technically speaking, in DOM). When paging occurs, they just display the next element and hide the current one. Others retrieve new content when paging occurs and replace the old content. But in my case, I have the whole article and need to implement paging on it. That means I also need to "divide" an article into parts.

I found this blog Javascript Dynamic Paging (MooTools Pager). I wished I can just copy and paste. But I was using JQuery framework. However, after reading some lines of code, I guessed I could implement it in JQuery. That's why I couldn't copy & paste and I even didn't bother to read the blog at the first time.

Here is the idea to do the paging.

Let's say the blue box (see below) is where I want to display the article.

Without paging, the article would look like:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

 

 

With paging, it will be:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

"Line 6" is hidden. Here is the code for the above example.


<div id="article" style="border-color: blue; border-style: solid; border-with: 1px; height: 100px; overflow: hidden; width: 100px;">
Line 1<br />
Line 2<br />
Line 3<br />
Line 4<br />
Line 5<br />
Line 6</div>

Then, we need to add a new <div id="slider"> around the article and inside the <div id="article">.
And when paging occurs, we just need to slide the <div id="slider"> up or down. And also have <div id="article"> hide any content outside its bounds, as shown in the following:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

To go to the next page, we need to move <div id="slider"> up by the height of <div id="article">. After that, it will display:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

In the above example, the content is statically set in HTML. But in my web site, the content is retrieved from the server. I use AJAX here. After the content is returned, I create a <div> element and insert it in to <div id="article">. Then everything goes well until we come to the end of the article. If we keep moving the inner <div> up, it will display nothing if we've already displayed the last part of the article. If we go from the above example to the next page, we get

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

The solution is to compare how far the <div id="slider"> moves up and the height of the article. If the distance it moves is larger than the height of the article, then it has displayed the last part of the article. So far, the problem becomes finding the height of the article, or the inner <div>. I tried it many times to find it very tricky. We don't get the size of a <div> unless it is in a DOM. What we need to do is to add <div id="slider"> to <div id="article">. After that, We can get the height of the <div id="slider">, that is, the height of the article.

To put them together, I've managed to implement a Javascript paging method. Take a look at kceiw.me. You can find the source. It is without minification.

Vim plugins for C++ development

The plugins can be found from http://www.vim.org/scripts

1. Tagbar
It displays tags of the current file in the sidebar. You don't need to create a
tag file by using ctags. These tags include macros, namespaces, classes, and
function signatures, etc.
Personally speaking, It does better than taglist.

2. CCTree
It is used to display the calling tree of C code. It needs cscope file

3. clewn
This plugin helps you interact with GDB. Once you have to debug your program,
it is very helpful. You can see the source code in your vim buffer, send
command to GDB and see the result from another vim buffer too.

4. clang_complete
It helps to complete C/C++/Objective-C code in your development. Besides that,
it also displays warnings and errors regarding your code.

5. refactor
It provides some basic functionalities to do refactoring for C/C++.

6. stlrefvim
View STL reference within VIM.

7. CRefVim
View C reference within VIM.

How to Crash a Macbook

I found a rather easy way to crash a Mac.

First, You need to have two or more partitions.
Second, delete the directory /var/vm.
Third, create a symbolic link. ln -s /The/Other/partition/directory
/var/vm. /The/Other/partition/directory is at another partition other
than the one where /var is at.

Then you continue to use your Mac for some time, you will find at least
one program crashes. Then you may decide to reboot your mac to see
whether the program will become normal, you must find that your mac
could not start again.

Don't be panic. You can solve it. Use your Mac installation disc. But
you don't need to reinstall your system. You can open a terminal, delete
the symbolic link /var/vm and create a new directory /var/vm. Then you
will find your crazy mac become normal.
Why?
I think this should be a bug in Mac.
/var/vm is the directory where the swap files are at. Mac seems to use
this directory before it mounts the other partition. So if /var/vm is a
symbolic link to other partition, Mac could not use /var/vm at all when
it starts, because other partitions are not available yet.

And the swap files can become very large. They may use up the all the
disk space. They will make the system rather slow.

How GDB implements break command

When you want to add a breakpoint, you usually use the command "break". GDB will first parse the command, and find the address, as described here. But what's the next?

GDB will create a breakpoint after it parses the command and finds the address and the symbol table . The breakpoint is represented by the type struct breakpoint (see create_breakpoint). At this point, GDB only creates the structure and adds it to the collection of breakpoints. It does not really insert them. The insertion only occurs before the inferior starts to run. For example, when the function proceed is called, which will make the inferior run, it will call insert_breakpoints to insert the breakpoints in the collection of breakpoints.
When a breakpoint is hit, a trap, a signal or whatever will cause the inferior to stop. At this point, GDB will take over of it and handle the event first. It calls the function handle_inferior_event. It will check the reason why the inferior stops. And it will detect whether there is a breakpoint hit. Remember you can add conditions to breakpoints. If the conditions is false, even if there is a breakpoint hit, GDB won't do anything. GDB stops the inferior only when a breakpoint is hit and the conditions are true.

How Does GDB Parse Breakpoint Location

When you use GDB, you can add breakpoints, giving it some arguments. For example, you can specify the location by giving line numbers, function names, variable names, etc. you can also add conditions. When the condition is met, GDB will pause the inferior when the breakpoint is hit. But how does GDB find out the address?

Let's look at the file gdb/breakpoint.c. It implements the command "break" in GDB. Actually, the function break_command_really does parse the command arguments, converts the arguments to address, and adds the breakpoint.

It first calls do_captured_parse_breakpoint, which will eventually calls parse_breakpoint_sals to parse the argument of the command break and return the address specification in the form of the type symtabs_and_lines.

symtabs_and_lines is a struct. It represents a sequences of symtab_and_line. symtab_and_line is a struct containing the information of a symbol table, a line number, the pc, and etc.

The function parse_breakpoint_sals will find out the address, the line number, the symbol table, and the section of the given location, and store them in the variable sal which is of type struct symbol_and_line. What it does is dependent on whether the location is specified.

It is quite simple if the location is not specified. That is, the command is "break" but "break 47". The function just simply assigns the defaults to sal. These defaults are stored in the variables default_breakpoint_pc, default_breakpoint_line, default_breakpoint_symtab. These variables are set by the function print_stack_frame.

In the other case where the location is given, it will call decode_line_1 to parse the location. Considering the location can be an address, a line number, a function, etc, the function decode_line_1 uses different methods to parse them.
For the address, it calls decode_indirect.
If you specify an Object-C method or a compound method without a file name ("A::B::func", "[object func]", etc), it will search the symbol table for the symbol of the function.
If you specify a file name (e.g. "file.c:47"), it first reads the symbol table of that file. Then it calls decode_all_digits if the location is a number (e.g. a line number) or decode_variable if the location is a name (a function or a variable).

The function decode_variable merely looks up the symbol of the variable in the symbol table. GDB uses the struct symbol to represent a function, and a variable, etc. Each symbol has an address. A function address, unlike a local variable address, is known when the when the program is loaded. So function decode_variable can extract the address from the symbol easily.

The function decode_all_digits will calculate the line number (if it begins with "+" or "-", meaning relative to the current line number) and finds out the symbol table which will be used to calculate the address. It also sets the flag explicit_line to 1. And when to find the address of the given line number? it is finished in the function breakpoint_sals_to_pc.

Also, GDB will create a canonical string, which is similar to "file.c:123".

When all these functions return, break_command_really will call breakpoint_sals_to_pc, which will call sals_to_pc, to extract information from the struct symtab_and_line and return the address.

What sals_to_pc does is to read each item in the symbol table and to compare the line of the item with the given line. And it will find the exact match item or the approximate one. There is a line table in each symbol table. It is a mapping of a line number to an address. GDB uses this technique to find out the address of a given line number.

Once the address is found, the function sals_to_pc will find the section in which this line is.