Skip to main content

Posts

2020


Program Crash Caused by CPU Instruction

·3 mins

It’s inevitable to dealing with bugs in coding career. The main part of coding are implementing new features, fixing bugs and improving performance. For me, there are two kinds of bugs that is difficult to tackle: those are hard to reproduce, and those occur in code not wrote by you.

C-m, RET and Return Key in Emacs

·2 mins

I use Emacs to write blog. In the recent update, I found M-RET no longer behave as leader key in org mode, but behave as org-meta-return. And even more strange is that in other mode, it behave as leader key. And M-RET also works in terminal in org mode. In GUI, pressing C-M-m can trigger leader key.

Import custom package or module in PySpark

·1 min

First zip all of the dependencies into zip file like this. Then you can use one of the following methods to import it.

|-- kk.zip
|   |-- kk.py

Using –py-files in spark-submit #

When submit spark job, add --py-files=kk.zip parameter. kk.zip will be distributed with the main scrip file, and kk.zip will be inserted at the beginning of PATH environment variable.

Time boundary in InfluxDB Group by Time Statement

·4 mins

These days I use InfluxDB to save some time series data. I love these features it provides:

High Performance #

According to to it’s hardware guide, a single node will support more than 750k point write per second, 100 moderate queries per second and 10M series cardinality.

C3 Linearization and Python MRO(Method Resolution Order)

·3 mins

Python supports multiple inheritance, its class can be derived from more than one base classes. If the specified attribute or methods was not found in current class, how to decide the search sequence from superclasses? In simple scenario, we know left-to right, bottom to up. But when the inheritance hierarchy become complicated, it’s not easy to answer by intuition.

2019


Difference between Value and Pointer variable in Defer in Go

·3 mins

defer is a useful function to do cleanup, as it will execute in LIFO order before the surrounding function returns. If you don’t know how it works, sometimes the execution result may confuse you.

How it Works and Why Value or Pointer Receiver Matters #

I found an interesting code on Stack Overflow:

Near-duplicate with SimHash

·4 mins

Before talking about SimHash, let’s review some other methods which can also identify duplication.

Longest Common Subsequence(LCS) #

This is the algorithm used by diff command. It is also edit distance with insertion and deletion as the only two edit operations.

Jaeger Code Structure

·1 min

Here is the main logic for jaeger agent and jaeger collector. (Based on jaeger 1.13.1)

Jaeger Agent #

Collect UDP packet from 6831 port, convert it to model.Span, send to collector by gRPC

The Annotated The Annotated Transformer

·4 mins

Thanks for the articles I list at the end of this post, I understand how transformers works. These posts are comprehensive, but there are some points that confused me.

First, this is the graph that was referenced by almost all of the post related to Transformer.

Different types of Attention

·1 min

\(s_t\) and \(h_i\) are source hidden states and target hidden state, the shape is (n,1). \(c_t\) is the final context vector, and \(\alpha_{t,s}\) is alignment score.

\[\begin{aligned} c_t&=\sum_{i=1}^n \alpha_{t,s}h_i \\ \alpha_{t,s}&= \frac{\exp(score(s_t,h_i))}{\sum_{i=1}^n \exp(score(s_t,h_i))} \end{aligned}\]

Global(Soft) VS Local(Hard) #

Global Attention takes all source hidden states into account, and local attention only use part of the source hidden states.