CSRF in Django

KK published on 2018-11-07 included in Programming

CSRF(Cross-site request forgery) is a way to generate fake user request to target website. For example, on a malicious website A, there is a button, click it will send request to www.B.com/logout. When the user click this button, he will logout from website B unconsciously. Logout is not a big problem, but malicious website can generate more dangerous request like money transfer.

Django CSRF protection

Each web framework has different approach to do CSRF protection. In Django, the validation process is below:

Create Node Benchmark in Py2neo

KK published on 2018-11-05 included in Programming

Recently, I’m working on a neo4j project. I use Py2neo to interact with graph db. Although Py2neo is a very Pythonic and easy to use, its performance is really poor. Sometimes I have to manually write cypher statement by myself if I can’t bear with the slow execution. Here is a small script which I use to compare the performance of 4 different ways to insert nodes.

import time

from graph_db import graph

from py2neo.data import Node, Subgraph


def delete_label(label):
    graph.run('MATCH (n:{}) DETACH DELETE n'.format(label))


def delete_all():
    print('delete all')
    graph.run('match (n) detach delete n')


def count_label(label):
    return len(graph.nodes.match(label))


def bench_create1():
    print('Using py2neo one by one')
    delete_label('test')
    start = time.time()
    tx = graph.begin()
    for i in range(100000):
        n = Node('test', id=i)
        tx.create(n)
    tx.commit()
    print(time.time() - start)
    print(count_label('test'))
    delete_label('test')


def bench_create2():
    print('Using cypher one by one')
    delete_label('test')
    start = time.time()
    tx = graph.begin()
    for i in range(100000):
        tx.run('create (n:test {id: $id})', id=i)
        if i and i % 1000 == 0:
            tx.commit()
            tx = graph.begin()
    tx.commit()
    print(time.time() - start)
    print(count_label('test'))
    delete_label('test')


def bench_create3():
    print('Using Subgraph')
    delete_label('test')
    start = time.time()
    tx = graph.begin()
    nodes = []
    for i in range(100000):
        nodes.append(Node('test', id=i))
    s = Subgraph(nodes=nodes)
    tx.create(s)
    tx.commit()
    print(time.time() - start)
    print(count_label('test'))
    delete_label('test')



def bench_create4():
    print('Using unwind')
    delete_label('test')
    start = time.time()
    tx = graph.begin()
    ids = list(range(100000))
    tx.run('unwind $ids as id create (n:test {id: id})', ids=ids)
    tx.commit()
    print(time.time() - start)
    print(count_label('test'))
    delete_label('test')


def bench_create():
    create_tests = [bench_create1, bench_create2, bench_create3, bench_create4]

    print('testing create')
    for i in create_tests:
        i()


if __name__ == '__main__':
    bench_create()

Apparently, using cypher with unwind keyword is the fastest way to batch insert nodes.

Deploy Nikola Org Mode on Travis

KK published on 2018-11-03 included in Programming

Recently, I enjoy using Spacemacs, so I decided to switch to org file from Markdown for writing blog. After several attempts, I managed to let Travis convert org file to HTML. Here are the steps.

Install Org Mode plugin

First you need to install Org Mode plugin on your computer following the official guide: Nikola orgmode plugin.

Edit `conf.el`

Org Mode will convert to HTML to display on Nikola. Org Mode plugin will call Emacs to do this job. When I run nikola build, it shows this message: Please install htmlize from https://github.com/hniksic/emacs-htmlize. I’m using Spacemacs, the htmlize package is already downloaded if the org layer is enabled. I just need to add htmlize folder to load-path. So here is the code:

Using Chinese Characters in Matplotlib

KK published on 2018-10-04 included in Programming

After searching from Google, here is the easiest solution. This should also works on other languages:

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.font_manager as fm
f = "/System/Library/Fonts/PingFang.ttc"
prop = fm.FontProperties(fname=f)

plt.title("你好",fontproperties=prop)
plt.show()

Output:

LSTM and GRU

KK published on 2018-04-22 included in Machine-Learning

LSTM

The avoid the problem of vanishing gradient and exploding gradient in vanilla RNN, LSTM was published, which can remember information for longer periods of time.

Here is the structure of LSTM:

The calculate procedure are:

\[\begin{aligned} f_t&=\sigma(W_f\cdot[h_{t-1},x_t]+b_f)\\ i_t&=\sigma(W_i\cdot[h_{t-1},x_t]+b_i)\\ o_t&=\sigma(W_o\cdot[h_{t-1},x_t]+b_o)\\ \tilde{C_t}&=tanh(W_C\cdot[h_{t-1},x_t]+b_C)\\ C_t&=f_t\ast C_{t-1}+i_t\ast \tilde{C_t}\\ h_t&=o_t \ast tanh(C_t) \end{aligned}\]

\(f_t\),\(i_t\),\(o_t\) are forget gate, input gate and output gate respectively. \(\tilde{C_t}\) is the new memory content. \(C_t\) is cell state. \(h_t\) is the output.

Models and Architectures in Word2vec

KK published on 2018-01-05 included in Machine-Learning

Generally, word2vec is a language model to predict the words probability based on the context. When build the model, it create word embedding for each word, and word embedding is widely used in many NLP tasks.

Models

CBOW (Continuous Bag of Words)

Use the context to predict the probability of current word. (In the picture, the word is encoded with one-hot encoding, \(W_{V*N}\) is word embedding, and \(W_{V*N}^{’}\), the output weight matrix in hidden layer, is same as \(\hat{\upsilon}\) in following equations)