/images/avatar.png

The Annotated The Annotated Transformer

Thanks for the articles I list at the end of this post, I understand how transformers works. These posts are comprehensive, but there are some points that confused me.

First, this is the graph that was referenced by almost all of the post related to Transformer.

Transformer consists of these parts: Input, Encoder*N, Output Input, Decoder*N, Output. I’ll explain them step by step.

Input

The input word will map to 512 dimension vector. Then generate Positional Encoding(PE) and add it to the original embeddings.

Different types of Attention

\(s_t\) and \(h_i\) are source hidden states and target hidden state, the shape is (n,1). \(c_t\) is the final context vector, and \(\alpha_{t,s}\) is alignment score.

\[\begin{aligned} c_t&=\sum_{i=1}^n \alpha_{t,s}h_i \\ \alpha_{t,s}&= \frac{\exp(score(s_t,h_i))}{\sum_{i=1}^n \exp(score(s_t,h_i))} \end{aligned}\]

Global(Soft) VS Local(Hard)

Global Attention takes all source hidden states into account, and local attention only use part of the source hidden states.

Content-based VS Location-based

Content-based Attention uses both source hidden states and target hidden states, but location-based attention only use source hidden states.

Torchtext snippets

Load separate files

data.Field parameters is here.

When calling build_vocab, torchtext will add <unk> in vocabulary list. Set unk_token=None if you want to remove it. If sequential=True (default), it will add <pad> in vocab. <unk> and <pad> will add at the beginning of vocabulary list by default.

LabelField is similar to Field, but it will set sequential=False, unk_token=None and is_target=Ture

INPUT = data.Field(lower=True, batch_first=True)
TAG = data.LabelField()

train, val, test = data.TabularDataset.splits(path=base_dir.as_posix(), train='train_data.csv',
                                                validation='val_data.csv', test='test_data.csv',
                                                format='tsv',
                                                fields=[(None, None), ('input', INPUT), ('tag', TAG)])

Load single file

all_data = data.TabularDataset(path=base_dir / 'gossip_train_data.csv',
                               format='tsv',
                               fields=[('text', TEXT), ('category', CATEGORY)])
train, val, test = all_data.split([0.7, 0.2, 0.1])

Create iterator

train_iter, val_iter, test_iter = data.BucketIterator.splits(
    (train, val, test), batch_sizes=(32, 256, 256), shuffle=True,
    sort_key=lambda x: x.input)

Load pretrained vector

vectors = Vectors(name='cc.zh.300.vec', cache='./')

INPUT.build_vocab(train, vectors=vectors)
TAG.build_vocab(train, val, test)

Check vocab sizes

You can view vocab index by vocab.itos.

Build Your Own Tiny Tiny RSS Service

After Inoreader change the free plan, which limit the max subscription to 150, I begin to find an alternative. Finally, I found Tiny Tiny RSS. It has a nice website and has the fever API Plugin which was supported by most of the RSS reader app, so you can read RSS on all of you devices.

This post will tell you how to deploy it on your server.

Prerequisite

You need to install Docker and Docker Compose before using docker-compose.yml

Preview LaTeX in Org Mode with Emacs in MacOS

Using the right Emacs Version

I failed to preview LaTeX with emacs-plus. If you have installed d12frosted/emacs-plus, uninstall it and use emacs-mac.

brew tap railwaycat/emacsmacport
brew install emacs-mac

If you like the fancy spacemacs icon, install it with cask: brew cask install emacs-mac-spacemacs-icon

Install Tex

  • Download and install BasicTeX.pkg here.
  • Add /Library/TeX/texbin to PATH.
  • Install dvisvgm by sudo tlmgr update --self && sudo tlmgr install dvisvgm collection-fontsrecommended

Emacs settings

  • Add TeX related bin to path: (setenv "PATH" (concat (getenv "PATH") ":/Library/TeX/texbin"))
  • Tell Org Mode to create svg images: (setq org-latex-create-formula-image-program 'dvisvgm)

Now you can see the rendered LaTeX equation by calling org-preview-latex-fragment or using shortcut ,Tx.

Using Dueling DQN to Play Flappy Bird

PyTorch provide a simple DQN implementation to solve the cartpole game. However, the code is incorrect, it diverges after training (It has been discussed here).

The official code’s training data is below, it’s high score is about 50 and finally diverges.

There are many reason that lead to divergence.

First it use the difference of two frame as input in the tutorial, not only it loss the cart’s absolute information(This information is useful, as game will terminate if cart moves too far from centre), but also confused the agent when difference is the same but the state is varied.