Python

C3 Linearization and Python MRO(Method Resolution Order)

14 March 2020·3 mins

Python supports multiple inheritance, its class can be derived from more than one base classes. If the specified attribute or methods was not found in current class, how to decide the search sequence from superclasses? In simple scenario, we know left-to right, bottom to up. But when the inheritance hierarchy become complicated, it’s not easy to answer by intuition.

2019

Torchtext snippets

1 July 2019·1 min

Load separate files #

data.Field parameters is here.

When calling build_vocab, torchtext will add <unk> in vocabulary list. Set unk_token=None if you want to remove it. If sequential=True (default), it will add <pad> in vocab. <unk> and <pad> will add at the beginning of vocabulary list by default.

Circular Import in Python

10 March 2019·2 mins

Recently, I found a really good example code for Python circular import, and I’d like to record it here.

Here is the code:

1
2
3
4
5
6
7
8
# X.py
def X1():
    return "x1"

from Y import Y2

def X2():
    return "x2"

1
2
3
4
5
6
7
8
# Y.py
def Y1():
    return "y1"

from X import X1

def Y2():
    return "y2"

Guess what will happen if you run python X.py and python Y.py?

Python Dictionary Implementation

17 February 2019·3 mins

Overview #

CPython allocation memory to save dictionary, the initial table size is 8, entries are saved as <hash,key,value> in each slot(The slot content changed after Python 3.6).
When a new key is added, python use i = hash(key) & mask where mask=table_size-1 to calculate which slot it should be placed. If the slot is occupied, CPython using a probing algorithm to find the empty slot to store new item.
When 2/3 of the table is full, the table will be resized.
When getting item from dictionary, both hash and key must be equal.

Resizing #

When elements size is below 50000, the table size will increase by a factor of 4 based on used slots. Otherwise, it will increase by a factor of 2. The dictionary size is always \(2^{n}\).

2018

CSRF in Django

7 November 2018·2 mins

CSRF(Cross-site request forgery) is a way to generate fake user request to target website. For example, on a malicious website A, there is a button, click it will send request to www.B.com/logout. When the user click this button, he will logout from website B unconsciously. Logout is not a big problem, but malicious website can generate more dangerous request like money transfer.

Create Node Benchmark in Py2neo

5 November 2018·2 mins

Recently, I’m working on a neo4j project. I use Py2neo to interact with graph db. Although Py2neo is a very Pythonic and easy to use, its performance is really poor. Sometimes I have to manually write cypher statement by myself if I can’t bear with the slow execution. Here is a small script which I use to compare the performance of 4 different ways to insert nodes.

Deploy Nikola Org Mode on Travis

3 November 2018·3 mins

Recently, I enjoy using Spacemacs, so I decided to switch to org file from Markdown for writing blog. After several attempts, I managed to let Travis convert org file to HTML. Here are the steps.

Install Org Mode plugin #

First you need to install Org Mode plugin on your computer following the official guide: Nikola orgmode plugin.

Using Chinese Characters in Matplotlib

4 October 2018·1 min

After searching from Google, here is easiest solution. This should also works on other languages:

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.font_manager as fm
f = "/System/Library/Fonts/PingFang.ttc"
prop = fm.FontProperties(fname=f)

plt.title("你好",fontproperties=prop)
plt.show()

Output:

2017

Enable C Extension for gensim on Windows

10 June 2017·1 min

These days, I’m working on some text classification works, and I use gensim ’s doc2vec function.

When using gensim, it shows this warning message:

C extension not loaded for Word2Vec, training will be slow.

I search this on Internet and found that gensim has rewrite some part of the code using cython rather than numpy to get better performance. A compiler is required to enable this feature.