<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>KK's Blog (fromkk)</title><link>https://fromkk.com/</link><description>fromkk.com is my personal blog, Explore insightful posts on Python, machine learning, and other stuff. keyword: python, machine learning, programming</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>bebound@gmail.com (KK)</managingEditor><webMaster>bebound@gmail.com (KK)</webMaster><lastBuildDate>Fri, 09 Jan 2026 21:49:00 +0800</lastBuildDate><atom:link href="https://fromkk.com/index.xml" rel="self" type="application/rss+xml"/><item><title>Kindle Paperwhite 5 Review</title><link>https://fromkk.com/posts/kindle-paperwhite-5-review/</link><pubDate>Fri, 09 Jan 2026 21:49:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/kindle-paperwhite-5-review/</guid><description>&lt;p>I bought a Kindle Paperwhite 1 in 2013, when I still in the university. I like it very much and I&amp;rsquo;ve read many programming books with it. It still works fine after 10 years, but I want to tries the new model with larger screen and faster fresh speed. So I bought a used Kindle Paperwhite 5 signature version for only 620 Yuan (about $90) recently. Here is my review.&lt;/p></description></item><item><title>Namespace Package in Python</title><link>https://fromkk.com/posts/namespace-package-in-python/</link><pubDate>Sun, 10 Aug 2025 18:04:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/namespace-package-in-python/</guid><description><![CDATA[<p>Recently, there is a <a href="https://github.com/Azure/azure-cli/issues/31843#issuecomment-3125269740" target="_blank" rel="noopener noreffer ">GitHub issue</a> about namespace package in Azure CLI. I think it is a good time to write down the knowledge about namespace package.</p>
<h2 id="what-is-namespace-package">What is Namespace Package</h2>
<p>If several packages share the same root folder, then the root folder is a namespace package. <code>subpackageA</code> and <code>subpackageb</code> can be installed separately, even in different Python path, but they can be imported as importing a single package: <code>import root</code>.</p>]]></description></item><item><title>Run Synology in QNAP NAS with PVE</title><link>https://fromkk.com/posts/run-synology-in-qnap-nas-with-pve/</link><pubDate>Sun, 29 Jun 2025 20:53:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/run-synology-in-qnap-nas-with-pve/</guid><description><![CDATA[<p>Three years ago, I bought a QNAP TS-453Dmini NAS. Although it has a slow WEB UI and slow restart, it still fits my needs as all of the applications I need are running in Docker.</p>
<p>Recently, I want to move some files from my Mac to NAS to save space. I need a application behave like Dropbox, which can show all the files in the NAS and only download the files I need. I have tried the QSync, but it does not have thumbnails for cloud image and it does not have icons to show the file status. I also tried the <a href="https://www.seafile.com/home/" target="_blank" rel="noopener noreffer ">Seafile</a>, it&rsquo;s a powerful application, which requires 4G RAM to run, and there is bug in the thumbnail. I used to have a Synology ARM NAS, the Synology Drive has all the features I need, so I want to run it on my QNAP NAS. After some research, I managed to run Synology and QNAP together on my NAS. Here is the guide.</p>]]></description></item><item><title>Modern pip build process (–-use-pep517)</title><link>https://fromkk.com/posts/modern-pip-build-process-use-pep517/</link><pubDate>Sun, 24 Nov 2024 20:49:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/modern-pip-build-process-use-pep517/</guid><description><![CDATA[<p>Nowadays, <code>pyproject.toml</code> becomes the standard configuration file for packaging. Compare with the old <code>setup.py</code>, it adds two feature pep517 and pep518.</p>
<p><a href="https://peps.python.org/pep-0517/" target="_blank" rel="noopener noreffer ">pep517</a> defines two hooks: <code>build_wheel</code> and <code>build_sdist</code>, which is required to build the package from source. Each build backend must implement these two hooks. It makes it possible to create other build backend such as <code>flit</code> or <code>poetry</code>.</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-toml">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml"><span class="line"><span class="cl"><span class="p">[</span><span class="nx">build-system</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="c"># Defined by PEP 518:</span>
</span></span><span class="line"><span class="cl"><span class="nx">requires</span> <span class="p">=</span> <span class="p">[</span><span class="s2">&#34;flit&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="c"># Defined by this PEP:</span>
</span></span><span class="line"><span class="cl"><span class="nx">build-backend</span> <span class="p">=</span> <span class="s2">&#34;local_backend&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nx">backend-path</span> <span class="p">=</span> <span class="p">[</span><span class="s2">&#34;backend&#34;</span><span class="p">]</span></span></span></code></pre></div></div>
<p>Besides <code>setuptools</code>, there are some other build back-end such as <code>hatchling</code> and <code>flit</code>. You can find the example here: <a href="https://packaging.python.org/en/latest/tutorials/packaging-projects/#choosing-a-build-backend" target="_blank" rel="noopener noreffer ">Python Packaging Uer Guide - Choosing a build backend</a></p>]]></description></item><item><title>sys.path in Python</title><link>https://fromkk.com/posts/sys-dot-path-in-python/</link><pubDate>Sun, 11 Aug 2024 15:56:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/sys-dot-path-in-python/</guid><description><![CDATA[<p>Here is the process how <code>sys.path</code> is set in Python, with some parts omitted.</p>
<h2 id="python-command-line-arguments">Python Command Line Arguments</h2>
<p>By default, as initialized upon program startup, a potentially unsafe path is prepended to <code>sys.path</code>:</p>
<p><code>python -m</code>: prepend the current working directory.</p>
<p><code>python script.py</code>: prepend the script’s directory. If it’s a symbolic link, resolve symbolic links.</p>
<p><code>python -c</code> and python (REPL): prepend an empty string, which means the current working directory.</p>
<p>You can remove these path with <code>-P</code> param.</p>]]></description></item><item><title>__import__ in Python</title><link>https://fromkk.com/posts/import-in-python/</link><pubDate>Sun, 07 Apr 2024 15:58:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/import-in-python/</guid><description><![CDATA[<p>It&rsquo;s known that Python&rsquo;s <code>import</code> statement is implemented by <code>__import__</code> function. In general, if we want to import a module dynamically, we can use <code>import_module</code> function, which is a wrapper around <code>__import__</code>.</p>
<blockquote>
<p>The most important difference between these two functions is that import_module() returns the specified package or module (e.g. pkg.mod), while <strong>import</strong>() returns the top-level package or module (e.g. pkg). &ndash; <a href="https://docs.python.org/3/library/importlib.html#importlib.import_module" target="_blank" rel="noopener noreffer ">https://docs.python.org/3/library/importlib.html#importlib.import_module</a></p></blockquote>
<p><code>import itertools</code> and <code>from requests import exceptions</code> can be translated to:</p>]]></description></item><item><title>Improve Git speed in WSL</title><link>https://fromkk.com/posts/speed-up-git-speed-in-wsl/</link><pubDate>Tue, 26 Dec 2023 11:16:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/speed-up-git-speed-in-wsl/</guid><description><![CDATA[<p>The disk performance in WSL2 is poor, it takes a long time to run <code>git status</code> in a host&rsquo;s repo. Moreover, if you set a fancy shell prompt, it will take a long time to show the prompt. This article will introduce how to speed up Git in WSL2.</p>
<h2 id="how-to-speed-up-git-command">How to speed up Git Command</h2>
<p>The performance of file system in WSL2 is poor, it takes a long time to run <code>git status</code> in a host&rsquo;s repo. The solution is to use <code>git.exe</code> in Windows folder. You can add this into your <code>bashrc</code>:</p>]]></description></item><item><title>iPod Video Review</title><link>https://fromkk.com/posts/ipod-video-review/</link><pubDate>Tue, 26 Dec 2023 11:16:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/ipod-video-review/</guid><description><![CDATA[<p>I bought a iPod Video 5.5th Gen 80G recently. It&rsquo;s only 200 Yuan (about $30) and I&rsquo;m satisfied with it.</p>
<h2 id="rockbox">Rockbox</h2>
<p>The original firmware supports few audio format, it even can&rsquo;t play FLAC. I install rockbox on it, which support FLAC and other format and I can transfer music without using iTunes or Finder. It also support theme and plugin, which makes it more powerful.</p>
<h3 id="macpod-error">MacPod error</h3>
<p>If you restore the iPod on macOS, it raises <code>Warning: This is a MacPod, Rockbox only runs on WinPods. See http://www.rockbox.org/wiki/IpodConversionToFAT32</code> during installation. The easiest way to fix this is to restore it on Windows.</p>]]></description></item><item><title>Python 3.11 changes</title><link>https://fromkk.com/posts/python-3-dot-11-changes/</link><pubDate>Sun, 10 Dec 2023 15:24:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/python-3-dot-11-changes/</guid><description><![CDATA[<p>In <a href="https://github.com/Azure/azure-cli/pull/26923" target="_blank" rel="noopener noreffer ">[Packaging] Support Python 3.11 by bebound · Pull Request #26923 · Azure/azure-cli (github.com)</a> , I bumped azure-cli to use Python 3.11. We&rsquo;ve bump the dependency in other PRs, I thought it should be a small PR, but in the end, a lot of changes are made.</p>
<h2 id="args-dot-getargspec"><code>args.getargspec</code></h2>
<p><code>getargspec</code> is dropped in 3.11. You can easily replaced it with <a href="https://docs.python.org/3/library/inspect.html#inspect.getfullargspec" target="_blank" rel="noopener noreffer "><code>getfullargspec</code></a> . It returns <code>FullArgSpec(args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults, annotations)</code> instead of <code>ArgSpec(args, varargs, keywords, defaults)</code> So <code>args, _, kw, _ = inspect.getargspec(fn)</code> can be replaced by <code>args, _, kw, *_ = inspect.getfullargspec(fn)</code> However, <code>getfullargspec</code> is retained primarily for use in code that needs to maintain compatibility with the Python 2 <code>inspect</code> module API.</p>]]></description></item><item><title>Line Ending in Git</title><link>https://fromkk.com/posts/line-ending-in-git/</link><pubDate>Sat, 21 Oct 2023 15:40:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/line-ending-in-git/</guid><description><![CDATA[<p>When working on a project with multiple developers, the line ending can be troublesome. This article will explain how to configure line ending in Git.</p>
<h2 id="basic-configuration">Basic configuration</h2>
<p>The line ending on Windows is <code>CRLF</code>, on Linux is <code>LF</code>. To prevent the line ending issue, we can set <code>core.autocrlf</code> to <code>true</code> on Windows to let git convert <code>CRLF</code> to <code>LF</code> when commit, and convert <code>LF</code> to <code>CRLF</code> when checkout. It is automatically configured if you install git on Windows.</p>]]></description></item><item><title>How to copy files temporarily in Dockerfile</title><link>https://fromkk.com/posts/how-to-copy-files-temporarily-in-dockerfile/</link><pubDate>Thu, 24 Aug 2023 11:49:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/how-to-copy-files-temporarily-in-dockerfile/</guid><description><![CDATA[<p>It&rsquo;s very common to copy a local file into the container when build docker image. In general, we use <code>COPY</code> command. But it creates a new layer and increase the final image size. If this is a temporal file and we don&rsquo;t want users waste their storage space, how can we remove it? Here are some approaches.</p>
<h2 id="download-the-file-dynamically">Download the File Dynamically</h2>
<p>If the file can be download from URL or you can create a local HTTP server to share the file, you can download the file, use it and delete it in one <code>RUN</code> command. For example:</p>]]></description></item><item><title>Memory Leak in Python multiprocessing.Pool</title><link>https://fromkk.com/posts/memory-leak-in-python-multiprocessing-dot-pool/</link><pubDate>Wed, 16 Mar 2022 21:04:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/memory-leak-in-python-multiprocessing-dot-pool/</guid><description><![CDATA[<p>There is a historical memory leak problem in our Django app and I fixed it recently. As time goes by, the memory usage of app keeps growing and so does the CPU usage.</p>
<figure>
</figure>

<p>After some research, I figure out the cause. Some views does not close <code>multiprocessing.Pool</code> after using it. The problem disappears when I use <code>Pool</code> with <code>with</code> statement.</p>
<figure>
</figure>

<p>But I&rsquo;m still interested in it and wrote some testing code. The script is run in Python 3.6.8 and produce similar result when using <code>multiprocessing.ThreadPool</code>.</p>]]></description></item><item><title>Emacs Chinese-related Settings</title><link>https://fromkk.com/posts/emacs-chinese-related-settings/</link><pubDate>Thu, 17 Feb 2022 01:27:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/emacs-chinese-related-settings/</guid><description><![CDATA[<h2 id="auto-switch-input-method-in-evil">Auto Switch Input Method in Evil</h2>
<p>This setting makes it possible to switch input method based on the context of cursor when entering insert mode.</p>
<h3 id="sis">sis</h3>
<p>I&rsquo;m using <code>sis</code> package with this configuration. You may need to install <code>macism</code> if you&rsquo;re not using <code>railwaycat/emacsmacport</code>. More settings can be found in <a href="https://github.com/laishulu/emacs-smart-input-source" target="_blank" rel="noopener noreffer ">emacs-smart-input-source</a>.</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-elisp">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-elisp" data-lang="elisp"><span class="line"><span class="cl"><span class="p">(</span><span class="nv">sis-ism-lazyman-config</span>
</span></span><span class="line"><span class="cl">  <span class="s">&#34;com.apple.keylayout.US&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="s">&#34;com.apple.inputmethod.SCIM.ITABC&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="p">(</span><span class="nv">sis-global-cursor-color-mode</span> <span class="no">t</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">(</span><span class="nv">sis-global-respect-mode</span> <span class="no">t</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">(</span><span class="nv">sis-global-context-mode</span> <span class="no">t</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">(</span><span class="nv">sis-global-inline-mode</span> <span class="no">t</span><span class="p">)</span></span></span></code></pre></div></div>
<h3 id="fcitx">fcitx</h3>
<p>You can also install <a href="https://github.com/xcodebuild/fcitx-remote-for-osx" target="_blank" rel="noopener noreffer ">fcitx-remote for-osx</a> and use <code>cute-jumper/fcitx.el</code> to do so. As <code>homebrew</code> no longer support some build options, you need to follow the install instructions in the GitHub repository to build <code>fcitx</code>.</p>]]></description></item><item><title>QNAP TS-453Dmini Review</title><link>https://fromkk.com/posts/qnap-ts-453dmini-review/</link><pubDate>Wed, 19 Jan 2022 00:17:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/qnap-ts-453dmini-review/</guid><description><![CDATA[<p>My first NAS is Synology DS120j, which is ARM based entry level product. It&rsquo;s okay to use it for downloading and backup, but not power enough for running docker and virtual machine.</p>
<p>So I bought this NAS last month, and I&rsquo;m satisfied with it. Here are the advantages and disadvantages.</p>
<h2 id="advantages">Advantages</h2>
<ol>
<li>
<p>High performance.</p>
<p>It is equipped with J4125 quad-core 2.0 GHz processor, 8G RAM, two 2.5G Ports and 4 bays. Here is the <a href="https://www.qnap.com/zh-cn/product/ts-453dmini/specs/hardware" target="_blank" rel="noopener noreffer ">spec</a>. Although J4125 is not the fastest CPU in 2022(the newer model coming with N5105), it is still able to run several docker containers together, and I can even run Synology and Windows 10 inside build-in <code>Virtualization Station</code>.</p>]]></description></item><item><title>Internet Account Keeps Coming Back after deletion on MacOS</title><link>https://fromkk.com/posts/internet-account-keeps-coming-back-after-deletion-on-macos/</link><pubDate>Sat, 08 Jan 2022 00:09:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/internet-account-keeps-coming-back-after-deletion-on-macos/</guid><description><![CDATA[<p>Today I tried to delete an inactive Internet account on system preference. It was deleted successfully but come back again after 20 seconds. This drives me nuts.</p>
<p>I tried these methods, but none of them works.</p>
<ul>
<li>Boot in <a href="https://support.apple.com/en-us/HT201262" target="_blank" rel="noopener noreffer ">safe mode</a>, delete account.</li>
<li>Delete record in <code>ZACCOUNT</code> table in <code>~/Library/Accounts/Accounts4.sqlite</code>.</li>
<li>Delete related items in Keychain Access app.</li>
</ul>
<p>Later, <code>RedHatDude</code>&rsquo;s answer gives me a clue, it looks like a iCloud sync problem. I tried to delete the account on my 3 MacBooks together. Thank goodness! It does not show up again.</p>]]></description></item><item><title>How to disable auto strip in Charfield in Django</title><link>https://fromkk.com/posts/how-to-disable-auto-strip-in-charfield-in-django/</link><pubDate>Sun, 19 Dec 2021 21:20:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/how-to-disable-auto-strip-in-charfield-in-django/</guid><description><![CDATA[<p>In Django, when edit field in admin page or post data to forms, the leading and tailing whitespace in <code>CharField</code> and <code>TextField</code> are removed.</p>
<p>The reason is <code>strip=True</code> parameter in <code>forms.CharField</code>, which is added in Djagno 1.9. You can see the discussion in <a href="https://code.djangoproject.com/ticket/4960" target="_blank" rel="noopener noreffer ">django tiket #4960</a> and here is <a href="https://github.com/django/django/blob/4ce59f602ed28320caf3035212cb4d1c5430da2b/django/forms/fields.py#L211" target="_blank" rel="noopener noreffer ">source code</a>. <code>models.CharField</code> and <code>models.TextField</code> use <code>formfield()</code> to create form to interact with user, then both of them eventually create a <code>forms.CharField</code></p>]]></description></item><item><title>Using JSONField before Django 3.1</title><link>https://fromkk.com/posts/using-jsonfield-before-django-3-dot-1/</link><pubDate>Sat, 11 Sep 2021 21:12:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/using-jsonfield-before-django-3-dot-1/</guid><description><![CDATA[<p>In Django 3.1, Django support save python data into database as JSON encoded data and it is also possible to make query based on field value in JSONField. The detailed usage can be found <a href="https://docs.djangoproject.com/en/3.2/topics/db/queries/#querying-jsonfield" target="_blank" rel="noopener noreffer ">here</a>. If you are using older version and want to try this feature. Though there are many packages ported this function, I recommend <a href="https://github.com/laymonage/django-jsonfield-backport" target="_blank" rel="noopener noreffer ">django-jsonfield-backport</a>.</p>
<h2 id="django-jsonfield-backport">django-jsonfield-backport</h2>
<p>This package save data as JSON in database and also support JSON query. If your database meet the requirements (MySQL &gt; 5.7, PG &gt; 9.5, MariaDB &gt; 10.2 or SQLite &gt; 3.9 with <a href="https://docs.djangoproject.com/en/3.1/ref/databases/#sqlite-json1" target="_blank" rel="noopener noreffer ">JSON1</a> extension), you can use JSONField like Django&rsquo;s native implementation.</p>]]></description></item><item><title>Dynamic Allocate Executors when Executing Jobs in Spark</title><link>https://fromkk.com/posts/dynamic-allocate-executors-when-executing-jobs-in-spark/</link><pubDate>Sun, 18 Jul 2021 16:52:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/dynamic-allocate-executors-when-executing-jobs-in-spark/</guid><description>&lt;p>I wrote a Spark program to process logs. The number of logs always changes as time goes by. To ensure logs can be processed instantly, the number of executors is calculated by the maximum of logs per minutes. As a consequence, the CPU usage is low in executors. In order to decrease resource waste, I tried to find a way to schedule executors during the execution of program.&lt;/p>
&lt;p>As shown below, the maximum number of logs per minutes can be a dozen times greater than the minimum number in one day.&lt;/p></description></item><item><title>Improve Kafka throughput</title><link>https://fromkk.com/posts/improve-kafka-throughput/</link><pubDate>Fri, 28 May 2021 00:57:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/improve-kafka-throughput/</guid><description><![CDATA[<p>Kafka is a high-performance and scalable messaging system. Sometimes when handling big data. The default configuration may limit the maximum performance. In this article, I&rsquo;ll explain how messages are generate and saved in Kafka, and how to improve performance by changing configuration.</p>
<h2 id="kafka-internals">Kafka Internals</h2>
<h3 id="how-does-producer-send-messages">How does Producer Send Messages?</h3>
<p>In short, messages will assembled into batches (named <code>RecordBatch</code>) and send to broker.</p>
<p>The producer manages some internal queues, and each queue contains <code>RecordBatch</code> that will send to one broker. When calling <code>send</code> method, the producer will look into the internal queue and try to append this message to <code>RecordBatch</code> which is smaller than <code>batch.size</code> (default value is 16KB) or create new <code>RecordBatch</code>.</p>]]></description></item><item><title>Fix Error: Cask 'java' is unavailable in Homebrew</title><link>https://fromkk.com/posts/fix-error-cask-java-is-unavailable-in-homebrew/</link><pubDate>Sun, 07 Mar 2021 00:10:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/fix-error-cask-java-is-unavailable-in-homebrew/</guid><description><![CDATA[<p>After update brew to latest version, when calling <code>cask</code> related command, it always outputs <code>Error: Cask 'java' is unavailable: No Cask with this name exists.</code>, such as <code>brew list --cask</code>. However, the <code>brew</code> command works.</p>
<p>After doing some research, I found <a href="https://github.com/Homebrew/homebrew-cask/pull/72284" target="_blank" rel="noopener noreffer ">Java has been moved to homebrew/core</a>. This makes sense now. I installed java by cask, but it&rsquo;s not available now and cask throw this error. If I uninstall java from cask, the error should disappear.</p>]]></description></item><item><title>Timezone in JVM</title><link>https://fromkk.com/posts/timezone-in-jvm/</link><pubDate>Sun, 18 Oct 2020 23:49:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/timezone-in-jvm/</guid><description><![CDATA[<p>I wrote a Scala code to get the current time. However, the output is different on the development server and docker.</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-scala">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-scala" data-lang="scala"><span class="line"><span class="cl"><span class="k">import</span> <span class="nn">java.util.Calendar</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">println</span><span class="o">(</span><span class="nc">Calendar</span><span class="o">.</span><span class="n">getInstance</span><span class="o">().</span><span class="n">getTime</span><span class="o">)</span></span></span></code></pre></div></div>
<p>On my development server, it outputs <code>Sun Oct 18 18:01:01 CST 2020</code>, but in docker, it print a UTC time.</p>
<p>I guess it related to the timezone setting and do a research, here is the result.</p>]]></description></item><item><title>Using cibuildwheel to Create Python Wheels</title><link>https://fromkk.com/posts/using-cibuildwheel-to-create-python-wheels/</link><pubDate>Wed, 29 Jul 2020 22:53:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/using-cibuildwheel-to-create-python-wheels/</guid><description><![CDATA[<p>Have you ever tried to install <code>MySQL-python</code>? It contains the C code and need to compile the code while install the package. You have to follow the steps in this articles: <a href="https://ruddra.com/install-mysqlclient-macos/" target="_blank" rel="noopener noreffer ">Install MySQL and MySQLClient(Python) in MacOS</a>. Things get worse if you are using Windows.</p>
<p>Luckily, as new distribution format <strong>Wheel</strong> has been published in <a href="https://www.python.org/dev/peps/pep-0427/" target="_blank" rel="noopener noreffer ">PEP 427</a>.</p>
<blockquote>
<p>The wheel binary package format frees installers from having to know about the build system, saves time by amortizing compile time over many installations, and removes the need to install a build system in the target environment.</p>]]></description></item><item><title>Retrieve Large Dataset in Elasticsearch</title><link>https://fromkk.com/posts/retrieve-large-dataset-in-elasticsearch/</link><pubDate>Sun, 21 Jun 2020 20:33:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/retrieve-large-dataset-in-elasticsearch/</guid><description><![CDATA[<p>It&rsquo;s easy to get small dataset from Elasticsearch by using <code>size</code> and <code>from</code>. However, it&rsquo;s impossible to retrieve large dataset in the same way.</p>
<h2 id="deep-paging-problem">Deep Paging Problem</h2>
<p>As we know it, Elasticsearch data is organised into indexes, which is a logical namespace, and the real data is stored into physical shards. Each shard is an instance of Lucene. There are two kind of shards, primary shards and replica shards. Replica shards is the copy of primary shards in case nodes or shards fail. By distributing documents in an index across multiple shards, and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy and scalability. By default, Elasticsearch create <strong>5</strong> primary shards and one replica shard for each primary shards.</p>]]></description></item><item><title>Program Crash Caused by CPU Instruction</title><link>https://fromkk.com/posts/program-crash-caused-by-cpu-instruction/</link><pubDate>Sun, 17 May 2020 17:36:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/program-crash-caused-by-cpu-instruction/</guid><description><![CDATA[<p>It&rsquo;s inevitable to dealing with bugs in coding career. The main part of coding are implementing new features, fixing bugs and improving performance. For me, there are two kinds of bugs that is difficult to tackle: those are hard to reproduce, and those occur in code not wrote by you.</p>
<p>Recently, I met a bug which has both features mentioned before. I write a Spark program to analyse the log and cluster them. Last week I update the code, use Facebook&rsquo;s <a href="https://github.com/facebookresearch/faiss" target="_blank" rel="noopener noreffer ">faiss</a> library to accelerate the process of find similar vector. After I push the new code to spark, the program crashed. I found this log on Spark driver:</p>]]></description></item><item><title>C-m, RET and Return Key in Emacs</title><link>https://fromkk.com/posts/c-m-ret-and-return-key-in-emacs/</link><pubDate>Sat, 11 Apr 2020 21:23:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/c-m-ret-and-return-key-in-emacs/</guid><description><![CDATA[<p>I use Emacs to write blog. In the recent update, I found <code>M-RET</code> no longer behave as leader key in org mode, but behave as <code>org-meta-return</code>. And even more strange is that in other mode, it behave as leader key. And <code>M-RET</code> also works in terminal in org mode. In GUI, pressing <code>C-M-m</code> can trigger leader key.</p>
<p>SO I opened this <a href="https://github.com/syl20bnr/spacemacs/issues/13374" target="_blank" rel="noopener noreffer ">issue</a>, with the help of these friends, the issue has been fixed. Here is the cause of the bug.</p>]]></description></item><item><title>Import custom package or module in PySpark</title><link>https://fromkk.com/posts/import-custom-package-or-module-in-pyspark/</link><pubDate>Thu, 02 Apr 2020 22:24:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/import-custom-package-or-module-in-pyspark/</guid><description><![CDATA[<p>First zip all of the dependencies into zip file like this. Then you can use one of the following methods to import it.</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-nil">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><pre tabindex="0"><code class="language-nil" data-lang="nil">|-- kk.zip
|   |-- kk.py</code></pre></div>
<h2 id="using-py-files-in-spark-submit">Using &ndash;py-files in spark-submit</h2>
<p>When submit spark job, add <code>--py-files=kk.zip</code> parameter. <code>kk.zip</code> will be distributed with the main scrip file, and <code>kk.zip</code> will be inserted at the beginning of <code>PATH</code> environment variable.</p>]]></description></item><item><title>Time boundary in InfluxDB Group by Time Statement</title><link>https://fromkk.com/posts/time-boundary-in-influxdb-group-by-time-statement/</link><pubDate>Sun, 29 Mar 2020 22:30:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/time-boundary-in-influxdb-group-by-time-statement/</guid><description><![CDATA[<p>These days I  use InfluxDB to save some time series data. I love these features it provides:</p>
<h4 id="high-performance">High Performance</h4>
<p>According to to it&rsquo;s <a href="https://docs.influxdata.com/influxdb/v1.7/guides/hardware_sizing/#single-node-or-cluster" target="_blank" rel="noopener noreffer ">hardware guide</a>, a single node will support more than 750k point write per second, 100 moderate queries per second and 10M series cardinality.</p>
<h4 id="continuous-queries">Continuous Queries</h4>
<p>Simple aggregation can be done by InfluxDB&rsquo;s continuous queries.</p>
<h4 id="overwrite-duplicated-points">Overwrite Duplicated Points</h4>
<p>If you submit a new point with same measurements, tag set and timestamp, the new data will overwrite the old one.</p>]]></description></item><item><title>C3 Linearization and Python MRO(Method Resolution Order)</title><link>https://fromkk.com/posts/c3-linearization-and-python-mro--method-resolution-order/</link><pubDate>Sat, 14 Mar 2020 17:37:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/c3-linearization-and-python-mro--method-resolution-order/</guid><description><![CDATA[<p>Python supports multiple inheritance, its class can be derived from more than one base classes. If the specified attribute or methods was not found in current class, how to decide the search sequence from superclasses? In simple scenario, we know left-to right, bottom to up. But when the inheritance hierarchy become complicated, it&rsquo;s not easy to answer by intuition.</p>
<p>For instance, what&rsquo;s search sequence of class M?</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">X</span><span class="p">:</span><span class="k">pass</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Y</span><span class="p">:</span> <span class="k">pass</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Z</span><span class="p">:</span><span class="k">pass</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">A</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">Y</span><span class="p">):</span><span class="k">pass</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">B</span><span class="p">(</span><span class="n">Y</span><span class="p">,</span><span class="n">Z</span><span class="p">):</span><span class="k">pass</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">M</span><span class="p">(</span><span class="n">B</span><span class="p">,</span><span class="n">A</span><span class="p">,</span><span class="n">Z</span><span class="p">):</span><span class="k">pass</span></span></span></code></pre></div></div>
<figure class="image-size-s">
</figure>

<p>The answer is: <code>M, B, A, X, Y, Z, object</code></p>]]></description></item><item><title>Difference between Value and Pointer variable in Defer in Go</title><link>https://fromkk.com/posts/difference-between-value-and-pointer-variable-in-defer-in-go/</link><pubDate>Thu, 19 Dec 2019 22:33:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/difference-between-value-and-pointer-variable-in-defer-in-go/</guid><description><![CDATA[<p><code>defer</code> is a useful function to do cleanup, as it will execute in LIFO order before the surrounding function returns. If you don&rsquo;t know how it works, sometimes the execution result may confuse you.</p>
<h2 id="how-it-works-and-why-value-or-pointer-receiver-matters">How it Works and Why Value or Pointer Receiver Matters</h2>
<p>I found an interesting code on <a href="https://stackoverflow.com/questions/28893586/golang-defer-clarification" target="_blank" rel="noopener noreffer ">Stack Overflow</a>:</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-go">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="cl"><span class="kd">type</span> <span class="nx">X</span> <span class="kd">struct</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nx">S</span> <span class="kt">string</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">x</span> <span class="nx">X</span><span class="p">)</span> <span class="nf">Close</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Value-Closing&#34;</span><span class="p">,</span> <span class="nx">x</span><span class="p">.</span><span class="nx">S</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="p">(</span><span class="nx">x</span> <span class="o">*</span><span class="nx">X</span><span class="p">)</span> <span class="nf">CloseP</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Pointer-Closing&#34;</span><span class="p">,</span> <span class="nx">x</span><span class="p">.</span><span class="nx">S</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nx">x</span> <span class="o">:=</span> <span class="nx">X</span><span class="p">{</span><span class="s">&#34;Value-X First&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">x</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nx">x</span> <span class="p">=</span> <span class="nx">X</span><span class="p">{</span><span class="s">&#34;Value-X Second&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">x</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nx">x2</span> <span class="o">:=</span> <span class="nx">X</span><span class="p">{</span><span class="s">&#34;Value-X2 First&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">x2</span><span class="p">.</span><span class="nf">CloseP</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nx">x2</span> <span class="p">=</span> <span class="nx">X</span><span class="p">{</span><span class="s">&#34;Value-X2 Second&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">x2</span><span class="p">.</span><span class="nf">CloseP</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nx">xp</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">X</span><span class="p">{</span><span class="s">&#34;Pointer-X First&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">xp</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nx">xp</span> <span class="p">=</span> <span class="o">&amp;</span><span class="nx">X</span><span class="p">{</span><span class="s">&#34;Pointer-X Second&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">xp</span><span class="p">.</span><span class="nf">Close</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nx">xp2</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">X</span><span class="p">{</span><span class="s">&#34;Pointer-X2 First&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">xp2</span><span class="p">.</span><span class="nf">CloseP</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nx">xp2</span> <span class="p">=</span> <span class="o">&amp;</span><span class="nx">X</span><span class="p">{</span><span class="s">&#34;Pointer-X2 Second&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">defer</span> <span class="nx">xp2</span><span class="p">.</span><span class="nf">CloseP</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div></div>
<p>The output is:</p>]]></description></item><item><title>Near-duplicate with SimHash</title><link>https://fromkk.com/posts/near-duplicate-with-simhash/</link><pubDate>Wed, 04 Dec 2019 00:16:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/near-duplicate-with-simhash/</guid><description><![CDATA[<p>Before talking about <strong>SimHash</strong>, let&rsquo;s review some other methods which can also identify duplication.</p>
<h2 id="longest-common-subsequence--lcs">Longest Common Subsequence(LCS)</h2>
<p>This is the algorithm used by <code>diff</code> command. It is also <strong>edit distance</strong> with insertion and deletion as the only two edit operations.</p>
<p>This works good for short strings. However, the algorithm&rsquo;s time complexity is \(O(m*n)\), if two strings&rsquo; lengths are \(m\) and \(n\) respectively. So it&rsquo;s not suitable for large corpus. Also, if two corpus consists of same paragraph but the order is not same. LCS treat them as different corpus, and that&rsquo;s not we expected.</p>]]></description></item><item><title>Jaeger Code Structure</title><link>https://fromkk.com/posts/jaeger-code-structure/</link><pubDate>Sun, 22 Sep 2019 17:07:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/jaeger-code-structure/</guid><description><![CDATA[<p>Here is the main logic for jaeger agent and jaeger collector. (Based on <a href="https://github.com/jaegertracing/jaeger" target="_blank" rel="noopener noreffer ">jaeger</a> 1.13.1)</p>
<figure>
</figure>

<h2 id="jaeger-agent">Jaeger Agent</h2>
<p>Collect UDP packet from 6831 port, convert it to <code>model.Span</code>, send to collector by gRPC</p>
<h2 id="jaeger-collector">Jaeger Collector</h2>
<p>Process gRPC or process packet from Zipkin(port 9411).</p>
<h2 id="jaeger-query">Jaeger Query</h2>
<p>Listen gRPC and HTTP request from 16686.</p>]]></description></item><item><title>The Annotated The Annotated Transformer</title><link>https://fromkk.com/posts/the-annotated-the-annotated-transformer/</link><pubDate>Sun, 01 Sep 2019 16:00:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/the-annotated-the-annotated-transformer/</guid><description><![CDATA[<p>Thanks for the articles I list at the end of this post, I understand how transformers works. These posts are comprehensive, but there are some points that confused me.</p>
<p>First, this is the graph that was referenced by almost all of the post related to Transformer.</p>
<figure class="image-size-s">
</figure>

<p>Transformer consists of these parts: Input, Encoder*N, Output Input, Decoder*N, Output. I&rsquo;ll explain them step by step.</p>
<h2 id="input">Input</h2>
<p>The input word will map to 512 dimension vector. Then generate Positional Encoding(PE) and add it to the original embeddings.</p>]]></description></item><item><title>Different types of Attention</title><link>https://fromkk.com/posts/different-types-of-attention/</link><pubDate>Mon, 15 Jul 2019 00:16:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/different-types-of-attention/</guid><description><![CDATA[<p>\(s_t\) and \(h_i\) are source hidden states and target hidden state, the shape is <code>(n,1)</code>. \(c_t\) is the final context vector, and \(\alpha_{t,s}\) is alignment score.</p>
<p>\[\begin{aligned}
c_t&amp;=\sum_{i=1}^n \alpha_{t,s}h_i \\
\alpha_{t,s}&amp;= \frac{\exp(score(s_t,h_i))}{\sum_{i=1}^n \exp(score(s_t,h_i))}
\end{aligned}\]</p>
<h2 id="global--soft--vs-local--hard">Global(Soft) VS Local(Hard)</h2>
<p>Global Attention takes all source hidden states into account, and local attention only use part of the source hidden states.</p>
<h2 id="content-based-vs-location-based">Content-based VS Location-based</h2>
<p>Content-based Attention uses both source hidden states and target hidden states, but location-based attention only use source hidden states.</p>]]></description></item><item><title>Torchtext snippets</title><link>https://fromkk.com/posts/torchtext-snippets/</link><pubDate>Mon, 01 Jul 2019 21:28:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/torchtext-snippets/</guid><description><![CDATA[<h2 id="load-separate-files">Load separate files</h2>
<p><code>data.Field</code> parameters is <a href="https://torchtext.readthedocs.io/en/latest/data.html#torchtext.data.Field" target="_blank" rel="noopener noreffer ">here</a>.</p>
<p>When calling <code>build_vocab</code>, torchtext will add <code>&lt;unk&gt;</code> in vocabulary list. Set <code>unk_token=None</code> if you want to remove it. If <code>sequential=True</code> (default), it will add <code>&lt;pad&gt;</code> in vocab. <code>&lt;unk&gt;</code> and <code>&lt;pad&gt;</code> will add at the beginning of vocabulary list by default.</p>
<p><code>LabelField</code> is similar to Field, but it will set <code>sequential=False</code>, <code>unk_token=None</code> and <code>is_target=Ture</code></p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">INPUT</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">Field</span><span class="p">(</span><span class="n">lower</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">TAG</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">LabelField</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">train</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">TabularDataset</span><span class="o">.</span><span class="n">splits</span><span class="p">(</span><span class="n">path</span><span class="o">=</span><span class="n">base_dir</span><span class="o">.</span><span class="n">as_posix</span><span class="p">(),</span> <span class="n">train</span><span class="o">=</span><span class="s1">&#39;train_data.csv&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                                <span class="n">validation</span><span class="o">=</span><span class="s1">&#39;val_data.csv&#39;</span><span class="p">,</span> <span class="n">test</span><span class="o">=</span><span class="s1">&#39;test_data.csv&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                                <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;tsv&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                                                <span class="n">fields</span><span class="o">=</span><span class="p">[(</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;input&#39;</span><span class="p">,</span> <span class="n">INPUT</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;tag&#39;</span><span class="p">,</span> <span class="n">TAG</span><span class="p">)])</span></span></span></code></pre></div></div>
<h2 id="load-single-file">Load single file</h2>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">all_data</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">TabularDataset</span><span class="p">(</span><span class="n">path</span><span class="o">=</span><span class="n">base_dir</span> <span class="o">/</span> <span class="s1">&#39;gossip_train_data.csv&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                               <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;tsv&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                               <span class="n">fields</span><span class="o">=</span><span class="p">[(</span><span class="s1">&#39;text&#39;</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;category&#39;</span><span class="p">,</span> <span class="n">CATEGORY</span><span class="p">)])</span>
</span></span><span class="line"><span class="cl"><span class="n">train</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">all_data</span><span class="o">.</span><span class="n">split</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">])</span></span></span></code></pre></div></div>
<h2 id="create-iterator">Create iterator</h2>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">train_iter</span><span class="p">,</span> <span class="n">val_iter</span><span class="p">,</span> <span class="n">test_iter</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">BucketIterator</span><span class="o">.</span><span class="n">splits</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">test</span><span class="p">),</span> <span class="n">batch_sizes</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="mi">256</span><span class="p">),</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">sort_key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">input</span><span class="p">)</span></span></span></code></pre></div></div>
<h2 id="load-pretrained-vector">Load pretrained vector</h2>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">vectors</span> <span class="o">=</span> <span class="n">Vectors</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;cc.zh.300.vec&#39;</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="s1">&#39;./&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">INPUT</span><span class="o">.</span><span class="n">build_vocab</span><span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">vectors</span><span class="o">=</span><span class="n">vectors</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">TAG</span><span class="o">.</span><span class="n">build_vocab</span><span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span></span></span></code></pre></div></div>
<h2 id="check-vocab-sizes">Check vocab sizes</h2>
<p>You can view vocab index by <code>vocab.itos</code>.</p>]]></description></item><item><title>Build Your Own Tiny Tiny RSS Service</title><link>https://fromkk.com/posts/build-your-own-tiny-tiny-rss-service/</link><pubDate>Mon, 10 Jun 2019 00:25:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/build-your-own-tiny-tiny-rss-service/</guid><description><![CDATA[<p>After Inoreader change the free plan, which limit the max subscription to 150, I begin to find an alternative. Finally, I found Tiny Tiny RSS. It has a nice website and has the fever API Plugin which was supported by most of the RSS reader app, so you can read RSS on all of you devices.</p>
<p>This post will tell you how to deploy it on your server.</p>
<h2 id="prerequisite">Prerequisite</h2>
<p>You need to install <a href="https://docs.docker.com/install/" target="_blank" rel="noopener noreffer ">Docker</a> and <a href="https://docs.docker.com/compose/install/" target="_blank" rel="noopener noreffer ">Docker Compose</a> before using <code>docker-compose.yml</code></p>]]></description></item><item><title>Preview LaTeX in Org Mode with Emacs in MacOS</title><link>https://fromkk.com/posts/preview-latex-in-org-mode-with-emacs-in-macos/</link><pubDate>Sun, 12 May 2019 20:26:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/preview-latex-in-org-mode-with-emacs-in-macos/</guid><description><![CDATA[<h2 id="using-the-right-emacs-version">Using the right Emacs Version</h2>
<p>I failed to preview LaTeX with <code>emacs-plus</code>. If you have installed <code>d12frosted/emacs-plus</code>, uninstall it and use <code>emacs-mac</code>.</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-nil">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><pre tabindex="0"><code class="language-nil" data-lang="nil">brew tap railwaycat/emacsmacport
brew install emacs-mac</code></pre></div>
<p>If you like the fancy spacemacs icon, install it with cask: <code>brew cask install emacs-mac-spacemacs-icon</code></p>
<h2 id="install-tex">Install Tex</h2>
<ul>
<li>Download and install BasicTeX.pkg <a href="http://www.tug.org/mactex/morepackages.html" target="_blank" rel="noopener noreffer ">here</a>.</li>
<li>Add <code>/Library/TeX/texbin</code> to PATH.</li>
<li>Install <code>dvisvgm</code> by <code>sudo tlmgr update --self &amp;&amp; sudo tlmgr install dvisvgm collection-fontsrecommended</code></li>
</ul>
<h2 id="emacs-settings">Emacs settings</h2>
<ul>
<li>Add TeX related bin to path: <code>(setenv &quot;PATH&quot; (concat (getenv &quot;PATH&quot;) &quot;:/Library/TeX/texbin&quot;))</code></li>
<li>Tell Org Mode to create svg images: <code>(setq org-latex-create-formula-image-program 'dvisvgm)</code></li>
</ul>
<p>Now you can see the rendered LaTeX equation by calling <code>org-preview-latex-fragment</code> or using shortcut <code>,Tx</code>.</p>]]></description></item><item><title>Using Dueling DQN to Play Flappy Bird</title><link>https://fromkk.com/posts/using-ddqn-to-play-flappy-bird/</link><pubDate>Sun, 14 Apr 2019 17:10:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/using-ddqn-to-play-flappy-bird/</guid><description><![CDATA[<p>PyTorch provide a simple DQN implementation to solve the cartpole game. However, the code is incorrect, it diverges after training (It has been discussed <a href="https://discuss.pytorch.org/t/dqn-example-from-pytorch-diverged/4123" target="_blank" rel="noopener noreffer ">here</a>).</p>
<p>The official code&rsquo;s training data is below, it&rsquo;s high score is about 50 and finally diverges.</p>
<figure class="image-size-s">
</figure>

<p>There are many reason that lead to divergence.</p>
<p>First it use the difference of two frame as input in the tutorial, not only it loss the cart&rsquo;s absolute information(This information is useful, as game will terminate if cart moves too far from centre), but also confused the agent when difference is the same but the state is varied.</p>]]></description></item><item><title>Circular Import in Python</title><link>https://fromkk.com/posts/circular-import-in-python/</link><pubDate>Sun, 10 Mar 2019 10:59:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/circular-import-in-python/</guid><description><![CDATA[<p>Recently, I found a really good example code for Python circular import, and I&rsquo;d like to record it here.</p>
<p>Here is the code:</p>
<div class="code-block open" style="counter-reset: code-block -1">
    <div class="code-header language-python3">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python3" data-lang="python3"><span class="line"><span class="cl"><span class="c1"># X.py</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">X1</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="s2">&#34;x1&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">Y</span> <span class="kn">import</span> <span class="n">Y2</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">X2</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="s2">&#34;x2&#34;</span></span></span></code></pre></div></div>
<div class="code-block open" style="counter-reset: code-block -1">
    <div class="code-header language-python3">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python3" data-lang="python3"><span class="line"><span class="cl"><span class="c1"># Y.py</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">Y1</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="s2">&#34;y1&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">X</span> <span class="kn">import</span> <span class="n">X1</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">Y2</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="s2">&#34;y2&#34;</span></span></span></code></pre></div></div>
<p>Guess what will happen if you run <code>python X.py</code> and <code>python Y.py</code>?</p>]]></description></item><item><title>Python Dictionary Implementation</title><link>https://fromkk.com/posts/python-dictionary-implementation/</link><pubDate>Sun, 17 Feb 2019 21:48:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/python-dictionary-implementation/</guid><description><![CDATA[<h2 id="overview">Overview</h2>
<ol>
<li>CPython allocation memory to save dictionary, the initial table size is 8, entries are saved as <code>&lt;hash,key,value&gt;</code> in each slot(The slot content changed after Python 3.6).</li>
<li>When a new key is added, python use <code>i = hash(key) &amp; mask</code> where <code>mask=table_size-1</code> to calculate which slot it should be placed. If the slot is occupied, CPython using a probing algorithm to find the empty slot to store new item.</li>
<li>When 2/3 of the table is full, the table will be resized.</li>
<li>When getting item from dictionary, both <code>hash</code> and <code>key</code> must be equal.</li>
</ol>
<h2 id="resizing">Resizing</h2>
<p>When elements size is below 50000, the table size will increase by a factor of 4 based on used slots. Otherwise, it will increase by a factor of 2. The dictionary size is always \(2^{n}\).</p>]]></description></item><item><title>TextCNN with PyTorch and Torchtext on Colab</title><link>https://fromkk.com/posts/textcnn-with-pytorch-and-torchtext-on-colab/</link><pubDate>Mon, 03 Dec 2018 15:47:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/textcnn-with-pytorch-and-torchtext-on-colab/</guid><description><![CDATA[<p><a href="https://pytorch.org" target="_blank" rel="noopener noreffer ">PyTorch</a> is a really powerful framework to build the machine learning models. Although some features is missing when compared with TensorFlow (For example, the early stop function, History to draw plot), its code style is more intuitive.</p>
<p><a href="https://github.com/pytorch/text" target="_blank" rel="noopener noreffer ">Torchtext</a> is a NLP package which is also made by <code>pytorch</code> team. It provide a way to read text, processing and iterate the texts.</p>
<p><a href="https://colab.research.google.com" target="_blank" rel="noopener noreffer ">Google Colab</a> is a Jupyter notebook environment host by Google, you can use free GPU and TPU to run your modal.</p>]]></description></item><item><title>CSRF in Django</title><link>https://fromkk.com/posts/csrf-in-django/</link><pubDate>Wed, 07 Nov 2018 13:58:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/csrf-in-django/</guid><description><![CDATA[<p>CSRF(Cross-site request forgery) is a way to generate fake user request to target website. For example, on a malicious website A, there is a button, click it will send request to <a href="https://www.B.com/logout" target="_blank" rel="noopener noreffer ">www.B.com/logout</a>. When the user click this button, he will logout from website B unconsciously. Logout is not a big problem, but malicious website can generate more dangerous request like money transfer.</p>
<h2 id="django-csrf-protection">Django CSRF protection</h2>
<p>Each web framework has different approach to do CSRF protection. In Django, the  validation process is below:</p>]]></description></item><item><title>Create Node Benchmark in Py2neo</title><link>https://fromkk.com/posts/create-node-benchmark-in-py2neo/</link><pubDate>Mon, 05 Nov 2018 15:55:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/create-node-benchmark-in-py2neo/</guid><description><![CDATA[<p>Recently, I&rsquo;m working on a neo4j project. I use <code>Py2neo</code> to interact with graph db. Although <code>Py2neo</code> is a very Pythonic and easy to use, its performance is really poor. Sometimes I have to manually write cypher statement by myself if I can&rsquo;t bear with the slow execution. Here is a small script which I use to compare the performance of 4 different ways to insert nodes.</p>
<div class="code-block code-line-numbers" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">time</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">graph_db</span> <span class="kn">import</span> <span class="n">graph</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">py2neo.data</span> <span class="kn">import</span> <span class="n">Node</span><span class="p">,</span> <span class="n">Subgraph</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">delete_label</span><span class="p">(</span><span class="n">label</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">graph</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">&#39;MATCH (n:</span><span class="si">{}</span><span class="s1">) DETACH DELETE n&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">label</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">delete_all</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;delete all&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">graph</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">&#39;match (n) detach delete n&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">count_label</span><span class="p">(</span><span class="n">label</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">graph</span><span class="o">.</span><span class="n">nodes</span><span class="o">.</span><span class="k">match</span><span class="p">(</span><span class="n">label</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">bench_create1</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Using py2neo one by one&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span> <span class="o">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">begin</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100000</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">n</span> <span class="o">=</span> <span class="n">Node</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="n">i</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">tx</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span><span class="o">.</span><span class="n">commit</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">count_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">bench_create2</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Using cypher one by one&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span> <span class="o">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">begin</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100000</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">tx</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">&#39;create (n:test {id: $id})&#39;</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="n">i</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">i</span> <span class="ow">and</span> <span class="n">i</span> <span class="o">%</span> <span class="mi">1000</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">tx</span><span class="o">.</span><span class="n">commit</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="n">tx</span> <span class="o">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">begin</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span><span class="o">.</span><span class="n">commit</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">count_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">bench_create3</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Using Subgraph&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span> <span class="o">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">begin</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100000</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">Node</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="n">i</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">s</span> <span class="o">=</span> <span class="n">Subgraph</span><span class="p">(</span><span class="n">nodes</span><span class="o">=</span><span class="n">nodes</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span><span class="o">.</span><span class="n">commit</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">count_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">bench_create4</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Using unwind&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span> <span class="o">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">begin</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">ids</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">100000</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="s1">&#39;unwind $ids as id create (n:test {id: id})&#39;</span><span class="p">,</span> <span class="n">ids</span><span class="o">=</span><span class="n">ids</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">tx</span><span class="o">.</span><span class="n">commit</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">count_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">delete_label</span><span class="p">(</span><span class="s1">&#39;test&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">bench_create</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="n">create_tests</span> <span class="o">=</span> <span class="p">[</span><span class="n">bench_create1</span><span class="p">,</span> <span class="n">bench_create2</span><span class="p">,</span> <span class="n">bench_create3</span><span class="p">,</span> <span class="n">bench_create4</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;testing create&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">create_tests</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">i</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">bench_create</span><span class="p">()</span></span></span></code></pre></div></div>
<p>Apparently, using cypher with <code>unwind</code> keyword is the fastest way to batch insert nodes.</p>]]></description></item><item><title>Deploy Nikola Org Mode on Travis</title><link>https://fromkk.com/posts/deploy-nikola-org-mode-on-travis/</link><pubDate>Sat, 03 Nov 2018 21:20:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/deploy-nikola-org-mode-on-travis/</guid><description><![CDATA[<p>Recently, I enjoy using <code>Spacemacs</code>, so I decided to switch to org file from Markdown for writing blog. After several attempts, I managed to let Travis convert org file to HTML. Here are the steps.</p>
<h2 id="install-org-mode-plugin">Install Org Mode plugin</h2>
<p>First you need to install Org Mode plugin on your computer following the official guide: <a href="https://plugins.getnikola.com/v8/orgmode/" target="_blank" rel="noopener noreffer ">Nikola orgmode plugin</a>.</p>
<h2 id="edit-conf-dot-el">Edit <code>conf.el</code></h2>
<p><code>Org Mode</code> will convert to HTML to display on Nikola. Org Mode plugin will call Emacs to do this job. When I run <code>nikola build</code>, it shows this message: <code>Please install htmlize from https://github.com/hniksic/emacs-htmlize</code>. I&rsquo;m using <code>Spacemacs</code>, the <code>htmlize</code> package is already downloaded if the <code>org</code> layer is enabled. I just need to add htmlize folder to load-path. So here is the code:</p>]]></description></item><item><title>Using Chinese Characters in Matplotlib</title><link>https://fromkk.com/posts/using-chinese-characters-in-matplotlib/</link><pubDate>Thu, 04 Oct 2018 15:53:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/using-chinese-characters-in-matplotlib/</guid><description><![CDATA[<p>After searching from Google, here is easiest solution. This should also works on other languages:</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
</span></span><span class="line"><span class="cl"><span class="o">%</span><span class="n">matplotlib</span> <span class="n">inline</span>
</span></span><span class="line"><span class="cl"><span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="o">.</span><span class="n">figure_format</span> <span class="o">=</span> <span class="s1">&#39;retina&#39;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">matplotlib.font_manager</span> <span class="k">as</span> <span class="nn">fm</span>
</span></span><span class="line"><span class="cl"><span class="n">f</span> <span class="o">=</span> <span class="s2">&#34;/System/Library/Fonts/PingFang.ttc&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">prop</span> <span class="o">=</span> <span class="n">fm</span><span class="o">.</span><span class="n">FontProperties</span><span class="p">(</span><span class="n">fname</span><span class="o">=</span><span class="n">f</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">&#34;你好&#34;</span><span class="p">,</span><span class="n">fontproperties</span><span class="o">=</span><span class="n">prop</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span></span></span></code></pre></div></div>
<p>Output:</p>
<figure class="image-size-s">
</figure>]]></description></item><item><title>LSTM and GRU</title><link>https://fromkk.com/posts/lstm-and-gru/</link><pubDate>Sun, 22 Apr 2018 14:39:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/lstm-and-gru/</guid><description><![CDATA[<h2 id="lstm">LSTM</h2>
<p>The avoid the problem of vanishing gradient and exploding gradient in vanilla RNN, LSTM was published, which can remember information for longer periods of time.</p>
<p>Here is the structure of LSTM:</p>
<figure class="image-size-s">
</figure>

<p>The calculate procedure are:</p>
<p>\[\begin{aligned}
f_t&amp;=\sigma(W_f\cdot[h_{t-1},x_t]+b_f)\\
i_t&amp;=\sigma(W_i\cdot[h_{t-1},x_t]+b_i)\\
o_t&amp;=\sigma(W_o\cdot[h_{t-1},x_t]+b_o)\\
\tilde{C_t}&amp;=tanh(W_C\cdot[h_{t-1},x_t]+b_C)\\
C_t&amp;=f_t\ast C_{t-1}+i_t\ast \tilde{C_t}\\
h_t&amp;=o_t \ast tanh(C_t)
\end{aligned}\]</p>
<p>\(f_t\),\(i_t\),\(o_t\) are forget gate, input gate and output gate respectively. \(\tilde{C_t}\) is the new memory content. \(C_t\) is cell state. \(h_t\) is the output.</p>]]></description></item><item><title>Models and Architectures in Word2vec</title><link>https://fromkk.com/posts/models-and-architechtures-in-word2vec/</link><pubDate>Fri, 05 Jan 2018 15:14:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/models-and-architechtures-in-word2vec/</guid><description><![CDATA[<p>Generally, <code>word2vec</code> is a language model to predict the words probability based on the context. When build the model, it create word embedding for each word, and word embedding is widely used in many NLP tasks.</p>
<h2 id="models">Models</h2>
<h3 id="cbow--continuous-bag-of-words">CBOW (Continuous Bag of Words)</h3>
<p>Use the context to predict the probability of current word. (In the picture, the word is encoded with one-hot encoding, \(W_{V*N}\) is word embedding, and \(W_{V*N}^{&rsquo;}\), the output weight matrix in hidden layer, is same as \(\hat{\upsilon}\) in following equations)</p>]]></description></item><item><title>Semi-supervised text classification using doc2vec and label spreading</title><link>https://fromkk.com/posts/semi-supervised-text-classification-using-doc2vec-and-label-spreading/</link><pubDate>Sun, 10 Sep 2017 15:29:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/semi-supervised-text-classification-using-doc2vec-and-label-spreading/</guid><description><![CDATA[<p>Here is a simple way to classify text without much human effort and get a impressive performance.</p>
<p>It can be divided into two steps:</p>
<ol>
<li>Get train data by using keyword classification</li>
<li>Generate a more accurate classification model by using doc2vec and label spreading</li>
</ol>
<h2 id="keyword-based-classification">Keyword-based Classification</h2>
<p>Keyword based classification is a simple but effective method. Extracting the target keyword is a monotonous work. I use this method to automatic extract keyword candidate.</p>]]></description></item><item><title>Parameters in doc2vec</title><link>https://fromkk.com/posts/parameters-in-dov2vec/</link><pubDate>Thu, 03 Aug 2017 15:20:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/parameters-in-dov2vec/</guid><description><![CDATA[<p>Here are some parameter in <code>gensim</code>&rsquo;s <code>doc2vec</code> class.</p>
<h3 id="window">window</h3>
<p>window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead.</p>
<p>In <code>skip-gram</code> model, if the window size is 2, the training samples will be this:(the blue word is the input word)</p>
<figure class="image-size-s">
</figure>

<h3 id="min-count">min_count</h3>
<p>If the word appears less than this value, it will be skipped</p>
<h3 id="sample">sample</h3>
<p>High frequency word like <code>the</code> is useless for training. <code>sample</code> is a threshold for deleting these higher-frequency words. The probability of keeping the word \(w_i\) is:</p>]]></description></item><item><title>Brief Introduction of Label Propagation Algorithm</title><link>https://fromkk.com/posts/brief-introduction-of-label-propagation-algorithm/</link><pubDate>Sun, 16 Jul 2017 21:45:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/brief-introduction-of-label-propagation-algorithm/</guid><description><![CDATA[<p>As I said before, I&rsquo;m working on a text classification project. I use <code>doc2vec</code> to convert text into vectors, then I use LPA to classify the vectors.</p>
<p>LPA is a simple, effective semi-supervised algorithm. It can use the density of unlabeled data to find a hyperplane to split the data.</p>
<p>Here are the main stop of the algorithm:</p>
<ol>
<li>Let $ (x_1,y1)&hellip;(x_l,y_l)$ be labeled data, $Y_L = \{y_1&hellip;y_l\} $ are the class labels. Let \((x_{l+1},y_{l+u})\) be unlabeled data where \(Y_U = \{y_{l+1}&hellip;y_{l+u}\}\) are unobserved, usually \(l \ll u\). Let \(X=\{x_1&hellip;x_{l+u}\}\) where \(x_i\in R^D\). The problem is to estimate \(Y_U\) for \(X\) and \(Y_L\).</li>
<li>Calculate the similarity of the data points. The most simple metric is Euclidean distance. Use a parameter \(\sigma\) to control the weights.</li>
</ol>
<p>\[w_{ij}= exp(-\frac{d^2_{ij}}{\sigma^2})=exp(-\frac{\sum^D_{d=1}{(x^d_i-x^d_j})^2}{\sigma^2})\]</p>]]></description></item><item><title>Enable C Extension for gensim on Windows</title><link>https://fromkk.com/posts/enable-c-extension-for-gensim-on-windows/</link><pubDate>Sat, 10 Jun 2017 14:43:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/enable-c-extension-for-gensim-on-windows/</guid><description><![CDATA[<p>These days, I’m working on some text classification works, and I use <code>gensim</code> ’s <code>doc2vec</code> function.</p>
<p>When using gensim, it shows this warning message:</p>
<p><code>C extension not loaded for Word2Vec, training will be slow.</code></p>
<p>I search this on Internet and found that gensim has rewrite some part of the code using <code>cython</code> rather than <code>numpy</code> to get better performance. A compiler is required to enable this feature.</p>
<p>I tried to install mingw and add it into the path, but it&rsquo;s not working.</p>]]></description></item><item><title>Some Useful Shell Tools</title><link>https://fromkk.com/posts/some-useful-shell-tools/</link><pubDate>Sun, 07 May 2017 15:34:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/some-useful-shell-tools/</guid><description><![CDATA[<p>Here are some shell tools I use, which can boost your productivity. <a href="https://github.com/johnalanwoods/maintained-modern-unix" target="_blank" rel="noopener noreffer ">Mordern-unix</a> is a great repo that list lots of modern unix tools.</p>
<h2 id="prezto"><a href="https://github.com/sorin-ionescu/prezto" target="_blank" rel="noopener noreffer ">Prezto</a></h2>
<p>A zsh configuration framework. Provides auto completion, prompt theme and lots of modules to work with other useful tools. I extremely love the <code>agnoster</code> theme.</p>
<figure class="image-size-s">
</figure>

<h2 id="fasd"><a href="https://github.com/clvv/fasd" target="_blank" rel="noopener noreffer ">Fasd</a></h2>
<p>Help you to navigate between folders and launch application.</p>
<p>Here are the official usage example:</p>]]></description></item><item><title>Start</title><link>https://fromkk.com/posts/start/</link><pubDate>Tue, 18 Apr 2017 15:46:00 +0800</pubDate><author>bebound@gmail.com (KK)</author><guid>https://fromkk.com/posts/start/</guid><description>&lt;p>Over the years, I have read so many programmers’ blogs, which has helped me a lot. Now I think it’s the time to start my own blog.&lt;/p>
&lt;p>I hope this can enforce myself to review what I have learned, and it would even be better if someone can benefit from it.&lt;/p></description></item></channel></rss>