Pages

Tuesday, January 24, 2012

Blog is Moving

My blog will be moved to its new address www.yang-cs.com. This site will remain but will not be maintained.

Thanks a lot to Goodchong's kind help and technical support, and thanks also to your kind attention.

Sunday, January 22, 2012

Emacs: Fonts, Colors and Themes

GNU Emacs is an extensible, customizable text editor--and more. It has various modes and features, making it a powerful editor for different purposes, some of which may be found in its project homepage. A thorough introduction to Emacs is provided by the book Learning GNU Emacs.

Figure 1. Emacs screenshot.

There are several features particularly popular among programmers: different modes for different languages, syntax highlighting, auto-completion, etc. Moreover, it is highly customizable, making it possible for us to extend it and make our own versions that meet our personal needs. This post will touch upon a few of these, focusing on how you can make your emacs look better.

Changing Fonts

Emacs has a command for specifying fonts:
M-x set-default-font
After hitting Enter, you'll be asked to provide a font name. You don't have to remember your favorite font, though, because you can see the full list of available fonts by taking advantage of Emacs' auto-completion feature - just try typing a Tab, then you'll simply choose whatever you like from the resulting list:


Figure 2. Fonts list.

You can make the change permanent by adding the following line to your .emacs file: (Make sure that you have substituted the correct font name.)
(set-default-font "-unknown-DejaVu Sans Mono-normal-normal-normal-*-*-*-*-*-m-0-iso10646-1")

You can also change the font size by adding another line to your .emacs file:
(set-face-attribute 'default nil :height 105)
.
where you specify the font size by its height.

Installing New Fonts

You're never confined to the default system fonts; you're free to add whatever fonts you like to your system and use them in Emacs as discussed above. Some of the fonts you may like can be found in the article Top 10 Programming Fonts.

The procedure of installing a new font is described in this tutorial. Here is a step-by-step example of installing a new font DejaVu Sans Mono:

    1. Download the font here;
    2. Unzip the tarball:
tar -jxf dejavu-fonts-ttf-2.33.tar.bz2
    3. Copy the *.ttf files to directory /usr/share/fonts:
sudo cp dejavu-fonts-ttf-2.33/ttf/ttf.* /usr/share/fonts
    4. Update the font cache:
sudo fc-cache -f -v

Then you're done! You should now be able to find and use your new font "-unknown-DejaVu Sans Mono-normal-normal-normal-*-*-*-*-*-m-0-iso10646-1" by following the procedure described in the previous section.

Changing Colors and Themes

To manage Emacs color themes, try color-theme, an emacs-lisp mode for skinning your emacs. You may download the tarball or use your package manager to install this package:
sudo apt-get install emacs-goodies-el
Then you can test the color themes in Emacs:
M-x color-theme-name
where name is the name of the theme you'd like to try. Note that, as is the case when you try the fonts, you may use TAB to list all available color themes and choose one you like. Once you've decided which theme to use, add the following lines of code to your .emacs file to make the modification permanent:
(require 'color-theme)
(color-theme-initialize)
(color-theme-name)
The next time you start up Emacs, you're ready to view your new color theme!

Designing Your Own Color Theme

It may seem not so desirable to spend too much time designing a brand-new color theme; yet more often is the case when you're comfortable with a particular theme, but feeling like to add some minor changes here or there to make it perfect. I started searching for such technique when I'd like keywords to be bold and comments to appear italic, while most of the themes I like don't achieve this.

To modify an existing color theme, locate it in the color-theme-library.el file (which should be found in the directory where you have installed the color-theme package). The color theme is essentially a lisp function, as shown in the following code snippet:
(defun color-theme-yang ()
  (interactive)
  (color-theme-install
   '(color-theme-yang
     ((background-color . "#ffffff")
     (background-mode . light)
     (border-color . "#969696")
     (cursor-color . "#000000")
     (foreground-color . "#000000")
     (mouse-color . "black"))
     (fringe ((t (:background "#ffffff"))))
     (mode-line ((t (:foreground "#000000" :background "#e5e5e5" :box (:line-width 1)))))
     (mode-line-inactive ((t (:foreground "#000000" :background "#e0e0e0 ":box (:line-width 1)))))
     (region ((t (:background "#dedede"))))
     (font-lock-builtin-face ((t (:foreground "#a73985" :bold t))))
     (font-lock-comment-face ((t (:foreground "#ff0000" :italic t))))
     (font-lock-function-name-face ((t (:foreground "#293cff"))))
     (font-lock-keyword-face ((t (:foreground "#ff9a00" :bold t))))
     (font-lock-string-face ((t (:foreground "#ff24f8"))))
     (font-lock-type-face ((t (:foreground"#293cff" :bold t))))
     (font-lock-constant-face ((t (:foreground "#ff00ff"))))
     (font-lock-variable-name-face ((t (:foreground "#12800f"))))
     (minibuffer-prompt ((t (:foreground "#7299ff" :bold t))))
     (font-lock-warning-face ((t (:foreground "red" :bold t))))
     )))
(provide 'color-theme-yang)
This means you have to modify the lisp code to get the desired effect. Fortunately, the code is quite straightforward. For example, if you like another keyword color, you just have to find the line
(font-lock-keyword-face ((t (:foreground "#ff9a00" :bold t))))
and change the color code #ff9a00 to whatever you like. Also, you may have noticed that if you want the keyword to be bold, you have to specify ":bold t" in the corresponding line; similarly, you can say ":italic t" if you'd like the relevant element to appear italic. Likewise, You may change properties of other programming elements such as class name, function name, string/number literal, etc. These elements together form a color theme.

After you're done, you have a new function. You're free to rename it, as I did here, which results in a new color theme. You can put the code in your color-theme-library.el file, .emacs file or wherever in your Emacs load path and use it the same way as discussed in the previous section.

You may also be interested in Emacs Color-theme Creator, a visual tool to design or modify an Emacs color theme. It provides two initial themes where you start, or you can provide an existing color theme (by copying the code in the color-theme-library.el file) to the text-box located at the bottom-right corner of the page. After you're satisfied with your new design, click the "Generate config file" above the text-box I've just mentioned, copy the resulting code to your color-theme-library.el file or .emacs file, follow the same procedure as described above, and you're done! Congratulations!

Finally, Figure 3 is what color-theme-yang looks like:

Figure 3. Demonstration of color-theme-yang.

Enjoy customizing your Emacs!

Sunday, December 18, 2011

A First Look at Data Science

I have always been wondering how Google succeeds in achieving so many amazing things: guessing (what we think), presenting (what we want), memorizing (what we need), etc. I keep trying to learn whatever I think I need to understand the internal of Google (for example, you may find a book called Google's PageRank and Beyond: The Science of Search Engine Rankings). Machine Learning, wealthy of statistical methods for 'guessing', was my first guess of what led to Google's success. This guess is correct (the correctness will be discussed later), but not sufficient, though, since we also need actual technology, in addition to mere theory, to realize our thoughts. Such technology like database systems lies in traditional Computer Science.

Maybe even that is not sufficient. New technology evolves every day, and it's hard to encapsulate all of them in a single terminology. But thanks to my advisor, Prof. Daisy Zhe Wang, who brought me to the field of Data Science, which, as far as I'm concerned, is a more satisfactory answer. (Note: I'm not saying that what Google does is data science. Rather, I mean that data science has contributed a lot to Google). As a first look at data science, you may look at the following figures, which are obtained from the articles The Rise of Data Science and Rise of the Data Scientist.


Figure 1. Skill sets for data science


Figure 2. Overview of data science

Now you should know why I consider 'data science' as a satisfactory answer: there is enough theory (math, statistics, machine learning, etc), and it specifies what we need to turn these theories into real-life products (hacking skills, substantive expertise). I'm also very happy to see a field that can somewhat summarize what I wish and need to learn, as I discussed in the first paragraph.

Let's take a brief tour of data science following the path of Figure 2. Along the way I'll discuss what I think may be necessary to learn, or suggest reading materials in the specific field.

Computer Science. Computer science provides ways to store and retrieve data efficiently. Relevant fields may be database systems, data structures & algorithms and so on. Machine Learning may be put here as well, or be viewed as a joint area with statistics as shown in Figure 1. You can find some good computer science books in Steven Pigeon's blog (French), Qunfeng's homepage or by asking Google. Furthermore, CS emphasizes programming skills, which should be a major plus in researching in data science: machines are better at dealing with mechanical tasks, such as processing large-scale data sets, than humans. An example set of skills used by data scientists is shown in Figure 3, from EMC and Big Data: 12 Key Findings From the Data Science Study.

Figure 3. Example set of skills used by data scientists

Math, Statistics and Data Mining. These theories provide fundamental methods to analyze the data you have (how to find patterns, how to make decisions basing on data, etc). There are a lot of excellent resources for these topics, some of which may be found at Kurt's pageWeipeng Liu's Blog (Chinese) and many more on web. (Motivated by this, I'm planning to pursue a master's degree in statistics along with the Ph.D. in computer science).

Graphic Design. Researchers should be able to show their results. To succeed in this, a good way may be data visualization. Data visualization is eye-catching and even fun: try to visualize your Google+ social network! Moreover, it also does good to your research: you can catch some essential trends of your data at the first glance of their visualization, which is convenient for later analysis. When considering data visualization, you should be able to determine what and how to visualize. There are several good books on these topics, some of which can be found at Kurt's page, and one more book I'd like to recommend is Visualize This, focusing on how you can visualize your data (using R, Adobe Illustrator, python, etc).

Infovis and Human-Computer Interaction (HCI). Deals with interaction. (Sorry, this is a field I'm not familiar with so currently have nothing to say. I'll add contents here after enough research. )

So that's all for our very first tour of data science. Thank you for staying with me so far. For me, it is fun and pleasing to have such an opportunity to research in these relevant areas. I'm hoping to gain more insight into how Google is implemented in following years. If you're also interested in this and like to discuss, feel free to contact me. :-)

Thursday, December 15, 2011

The Origin of Red-Black Trees

In Introduction to Algorithms, a red-black tree is defined as a binary tree that satisfies the following red-black properties:

    1. Every node is either red or black.
    2. The root is black.
    3. Every leaf (NIL) is black.
    4. If a node is red, then both its children are black.
    5. For each node, all simple paths from the node to descendant leaves contain the same number of black node.
Starting from this definition, we can prove some very nice properties of red-black trees. The most important property is that a red-black tree with $n$ internal nodes has height at most 2log( $n+1$). Given this, we say that red-black trees are approximately balanced.

Another kind of balanced binary tree is the AVL tree. For an AVL tree with $n$ nodes, the height is strictly less than 1.44log( $n+2$ )-1. Though it seems better than the red-black trees, the delete operation for AVL tree may require $O(\log n)$ rotations, which is inconvenient for many applications, while the red-black trees require at most 3. So in practice, people prefer red-black trees more often.

The idea for AVL trees is straightforward. We first try to insert elements and check whether the resulting tree is height-balanced. If it's not, we re-adjust the structure of the tree to make it balanced. The technique we use is referred to as rotations. In the case of red-black trees, however, things do not seem so obvious: it is quite strange that people ever came up with the idea to use colors to control the height of a tree.

To gain a better understanding of red-black trees, especially how the idea of colors came about, we may want to have a look at the origin of red-black trees, which are essentially B-trees of order 4, or 2-3-4 tree, a 4-ary search tree where each node can have a degree of 2, 3, or 4. Given this definition, the red-black tree is in fact a binary tree representation of the B-tree of order 4: every B-tree node can be transformed, or expanded, to a set of black and red nodes as shown below:

Figure 1. Transforming a degree 4 node into red-black nodes

Figure 2. Transforming a degree 3 node into red-black nodes (two possible transformations)

Figure 3. Transforming a degree 2 node into red-black nodes

Note that in all these transformations, there should be no two successive red nodes because each node in the original B-tree is black in the corresponding red-black tree, and produces at most one level of red node(s). Moreover, since the height of a B-tree of $n$ nodes is $O(\log n)$, the resulting red-black tree has height $O(2\log n)=O(\log n)$.

Thus, we can use a binary search tree to represent a B-tree of order 4, which is actually how red-black trees are invented. Knowing this, we should no longer feel the definition of red-black trees so obscure.

Thursday, September 22, 2011

The Three Prisoners Problem

This is an interesting topic, but the answer is not quite straightforward. The problem is stated as following:

Three prisoners. Three prisoners A, B, and C, are on death row. The governor decides to pardon one of the three and chooses at random the prisoner to pardon. He informs the warden of his choice but requests that the name be kept secret for a few days.

The next day, A tries to get the warden to tell him who had been pardoned. The warden refuses. A then asks which of B or C will be executed. The warden thinks for a while, then tells A that B is to be executed.

Warden's reasoning: Each prisoner has a 1/3 chance of being pardoned. Clearly, either B or C must be executed, so I have given A no information about whether A will be pardoned.

A's reasoning: Given that B will be executed, then either A or C will be pardoned. My chance of being pardoned has risen to 1/2.

Actually the warden's reasoning is correct. If we define A, B, C to be the events that A, B, or C will be pardoned, and W to be the event that the warden says B will die, then we can calculate P(A|W)=1/3. (The details of this calculation can be found in Statistical Inference 2ed, Section 1.3). But why is that intuitively? The warden, at first glance, seems to have given A some information about the government's decision. Where's that info gone?

To answer this question, first notice the difference between the events that the warden says B will die, and B will die. In addition to telling A that B will be executed, what the warden says in fact contains extra information: If A is to be pardoned, the warden may either tell A that B or C is going to be executed; If A is to be executed, the warden has no choice but to tell A that B is to be executed (since C is definitely pardoned in this case). So why does the warden tell A that B will die? It's more probably because that A is to be executed. Therefore, though it seems that the probability that A is pardoned seems to have risen to 1/2, the extra information, coming from the warden's choice of telling A which one of B and C will die, decreases the probability back to 1/3.

Here is a variant of the Three Prisoners Problem.
There are 3 doors, behind one of which is there a car. A contestant is asked to pick a door randomly so that he has a chance to win the car. The contestant picks door 1. Then he's shown that there's nothing behind door 3. Now should the contestant switch to door 2 to maximize the chance of winning a car? (Answer: yes)