Thursday, February 26, 2015

beaglebone black简单入门(三)

在这一章中将详细介绍sd卡的分区,格式化和挂载。为什么要这么深入的研究,因为sd卡和emmc所用的flash颗粒是有wear leveling的,现有的操作系统对于如何格式化和访问flash并无很成熟的方案,所以默认的格式化方案对于flash的访问速度和lift time来说并不是最优方案。好的配置可以提高sd卡的读写性能及延长使用寿命。因此仔细的规划sd的格式是非常值得研究的。

flash的一些背景知识:
http://en.wikipedia.org/wiki/Flash_memory
http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-2-architecture-of-an-ssd-and-benchmarking/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-4-advanced-functionalities-and-internal-parallelism/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-5-access-patterns-and-system-optimizations/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives

所以在一个flash的存储介质中,一般来说其内部架构如下图所示:
一个sd的封装可能包含N个chip,每个chip中还有若干个plane,其中每个plane中又包含若干erase block, 每个erase block还包含若干个page。

其中page是flash的最小读写单位,而flash的特殊之处在于其写入前需要erase整个erase block,将其中的bit全重置为1。也就是说读操作是可以直接读一个page的信息,但是写入时,哪怕只有一个page的信息,也需要erase掉整个block。

从图中也可以看出,并行的chip,同一chip上并行的plan其实是可以同时读写的,这个读写单位在图中叫做clustered block。这和磁盘RAID阵列其实概念非常相近。后面再解释ext4参数时会用到这个概念。

从上面的flash的基本定义,可以看出来,在format的过程中,需要知道sd卡的page,erase block size,和plane等信息才能给出正确的format参数,因为fdisk等工具默认的最小读写unit是512字节,这是旧时代老硬盘的标准,现代硬盘的标准簇实际上已经是4K了,而SD卡等flash介质的的page,erase block size都是不确定的,很难找到官方的标准。

这就需要flashbench这个工具了。
它的工具原理和使用方法见
https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey?highlight=%28FlashCardSurvey%29
http://lists.linaro.org/pipermail/flashbench-results/2012-June/000300.html
从以下网址下载安装
https://github.com/bradfa/flashbench
用git clone下载后,进入文件夹sudo make来编译
编译完成后就可以使用了,建议认真阅读readme来学习使用方法。

以我买的scandisk 32G ultra microSDHC(SDSDQU-032G-AFFP-A)为例,这是一款32G的sd卡。

首先查看emmc和sd卡的设备信息。
sudo udevadm info -a -n /dev/mmcblk1可以看到emmc的信息
ATTR{size}=="7667712"
ATTRS{preferred_erase_size}=="2097152"
ATTRS{erase_size}=="2097152"

sudo udevadm info -a -n /dev/mmcblk0查看
ATTR{size}=="62333952"
ATTRS{preferred_erase_size}=="4194304"
ATTRS{fwrev}=="0x0"
ATTRS{hwrev}=="0x8"
ATTRS{oemid}=="0x5344"
ATTRS{manfid}=="0x000003"
ATTRS{serial}=="0x228e4f3e"
ATTRS{erase_size}=="512"


可以看到emmc的erase size是2M,其大小是7667712*512byte,折合大小约3.65625GB。
使用flashbench来测试一下,可以看见速度差的突变发生在1048576(1MB)和2097152(2MB)之间,这和读取的preferred_erase_size是吻合的。
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk1 --count=20
[sudo] password for beaglebone:
align 1073741824        pre 1.97ms      on 2.36ms       post 1.75ms     diff 502µs
align 536870912 pre 2.32ms      on 2.8ms        post 2.13ms     diff 570µs
align 268435456 pre 2.58ms      on 2.85ms       post 2.2ms      diff 464µs
align 134217728 pre 1.95ms      on 2.5ms        post 2.31ms     diff 368µs
align 67108864  pre 2.04ms      on 2.54ms       post 2.15ms     diff 441µs
align 33554432  pre 2.13ms      on 2.62ms       post 2.15ms     diff 480µs
align 16777216  pre 2.1ms       on 2.63ms       post 2.24ms     diff 462µs
align 8388608   pre 2.14ms      on 2.64ms       post 2.17ms     diff 481µs
align 4194304   pre 2.1ms       on 2.63ms       post 2.2ms      diff 474µs
align 2097152   pre 2.07ms      on 2.56ms       post 2.12ms     diff 464µs
align 1048576   pre 2.11ms      on 2.23ms       post 2.25ms     diff 52.5µs
align 524288    pre 2.15ms      on 2.18ms       post 2.21ms     diff -6226ns
align 262144    pre 2.16ms      on 2.18ms       post 2.2ms      diff -1092ns
align 131072    pre 2.17ms      on 2.17ms       post 2.2ms      diff -9687ns
align 65536     pre 2.15ms      on 2.18ms       post 2.21ms     diff 611ns
align 32768     pre 2.16ms      on 2.18ms       post 2.22ms     diff -5497ns
但是还不能看出page的大小,指定读取unit为1024byte再次测试

beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk1 --blocksize=1024 --count=20
align 1073741824        pre 1.22ms      on 1.53ms       post 875µs      diff 485µs
align 536870912 pre 1.21ms      on 1.57ms       post 947µs      diff 489µs
align 268435456 pre 1.25ms      on 1.72ms       post 1.13ms     diff 532µs
align 134217728 pre 1.18ms      on 1.64ms       post 1.1ms      diff 495µs
align 67108864  pre 1.19ms      on 1.65ms       post 1.09ms     diff 514µs
align 33554432  pre 1.25ms      on 1.74ms       post 1.14ms     diff 545µs
align 16777216  pre 1.24ms      on 1.7ms        post 1.15ms     diff 504µs
align 8388608   pre 1.28ms      on 1.77ms       post 1.16ms     diff 546µs
align 4194304   pre 1.24ms      on 1.72ms       post 1.16ms     diff 520µs
align 2097152   pre 1.23ms      on 1.7ms        post 1.16ms     diff 506µs
align 1048576   pre 1.14ms      on 1.28ms       post 1.16ms     diff 127µs
align 524288    pre 1.13ms      on 1.26ms       post 1.15ms     diff 122µs
align 262144    pre 1.14ms      on 1.26ms       post 1.15ms     diff 117µs
align 131072    pre 1.14ms      on 1.26ms       post 1.15ms     diff 116µs
align 65536     pre 1.13ms      on 1.26ms       post 1.15ms     diff 117µs
align 32768     pre 1.14ms      on 1.26ms       post 1.15ms     diff 116µs
align 16384     pre 1.14ms      on 1.26ms       post 1.14ms     diff 116µs
align 8192      pre 1.14ms      on 1.26ms       post 1.15ms     diff 117µs
align 4096      pre 1.14ms      on 1.26ms       post 1.14ms     diff 120µs
align 2048      pre 1.14ms      on 1.2ms        post 1.14ms     diff 53.7µs
可以看出,从2024到4096速度差有突变,这是由于读取单位最小是4K引起的,所以该emmc的page size是4K。

这里测试的blocksize参数是猜测的page大小,如果和实际的page大小是吻合的话,那么在小于erase block size的范围内读取时间差应该是比较小且一致性好,因为都是在一个erase block size内读两个N*page的数据。

再来测试sd卡
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --count=20
align 8589934592        pre 1.6ms       on 1.66ms       post 1.6ms      diff 55.4µs
align 4294967296        pre 1.6ms       on 1.67ms       post 1.6ms      diff 72.5µs
align 2147483648        pre 1.61ms      on 1.67ms       post 1.6ms      diff 68.7µs
align 1073741824        pre 1.6ms       on 1.67ms       post 1.6ms      diff 73.3µs
align 536870912 pre 1.6ms       on 1.67ms       post 1.6ms      diff 76.3µs
align 268435456 pre 1.6ms       on 1.67ms       post 1.6ms      diff 73.5µs
align 134217728 pre 1.6ms       on 1.67ms       post 1.6ms      diff 75.5µs
align 67108864  pre 1.6ms       on 1.67ms       post 1.6ms      diff 69.1µs
align 33554432  pre 1.6ms       on 1.67ms       post 1.6ms      diff 67.9µs
align 16777216  pre 1.58ms      on 1.67ms       post 1.57ms     diff 93.2µs
align 8388608   pre 1.6ms       on 1.65ms       post 1.6ms      diff 50µs
align 4194304   pre 1.62ms      on 1.75ms       post 1.67ms     diff 103µs
align 2097152   pre 1.63ms      on 1.63ms       post 1.62ms     diff 5.51µs
align 1048576   pre 1.63ms      on 1.63ms       post 1.63ms     diff -2488ns
align 524288    pre 1.63ms      on 1.69ms       post 1.62ms     diff 58.4µs
align 262144    pre 1.63ms      on 1.63ms       post 1.63ms     diff -151ns
align 131072    pre 1.63ms      on 1.63ms       post 1.63ms     diff -1616ns
align 65536     pre 1.62ms      on 1.63ms       post 1.63ms     diff -1548ns
align 32768     pre 1.62ms      on 1.63ms       post 1.63ms     diff 6.12µs

count=20是指读20次以减少误差。这里没有指定读的blocksize,程序自动选择了16384字节(16K)为读取单位。看起来好像是4194304(4MB)是erase block size,但是注意262144到524288同样也出现了突变,这个无法解释。

同样用1024byte来测试
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=1024 --count=20
align 8589934592        pre 602µs       on 649µs        post 593µs      diff 50.8µs
align 4294967296        pre 613µs       on 670µs        post 596µs      diff 66.1µs
align 2147483648        pre 843µs       on 917µs        post 845µs      diff 73.2µs
align 1073741824        pre 848µs       on 921µs        post 845µs      diff 74.5µs
align 536870912 pre 844µs       on 917µs        post 846µs      diff 71.9µs
align 268435456 pre 842µs       on 916µs        post 845µs      diff 72.3µs
align 134217728 pre 846µs       on 918µs        post 847µs      diff 71.6µs
align 67108864  pre 847µs       on 915µs        post 843µs      diff 69.8µs
align 33554432  pre 842µs       on 921µs        post 847µs      diff 76.2µs
align 16777216  pre 830µs       on 924µs        post 821µs      diff 98.5µs
align 8388608   pre 838µs       on 896µs        post 843µs      diff 55.3µs
align 4194304   pre 878µs       on 961µs        post 823µs      diff 110µs
align 2097152   pre 880µs       on 921µs        post 871µs      diff 45.4µs
align 1048576   pre 878µs       on 915µs        post 877µs      diff 37.9µs
align 524288    pre 883µs       on 972µs        post 867µs      diff 97.5µs
align 262144    pre 874µs       on 915µs        post 878µs      diff 38.8µs
align 131072    pre 876µs       on 921µs        post 871µs      diff 46.9µs
align 65536     pre 873µs       on 909µs        post 874µs      diff 35.5µs
align 32768     pre 880µs       on 914µs        post 870µs      diff 38.6µs
align 16384     pre 874µs       on 910µs        post 871µs      diff 37.2µs
align 8192      pre 870µs       on 906µs        post 870µs      diff 35.9µs
align 4096      pre 827µs       on 905µs        post 869µs      diff 56.7µs
align 2048      pre 828µs       on 830µs        post 827µs      diff 2.05µs
看起来似乎4K是page的大小,而erase block size是4MB,但是524288处的时间差偏大,如果假设erase block size的大小是512K的话,很难解释为什么1M的读数差又变小,看起来更像是猜测的blocksize和实际page不是align引起的。

再用4096byte测试
beaglebone@beaglebone:~/flashbench$
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=4096 --count=20
align 8589934592        pre 734µs       on 782µs        post 727µs      diff 51.6µs
align 4294967296        pre 803µs       on 874µs        post 799µs      diff 73.1µs
align 2147483648        pre 986µs       on 1.05ms       post 988µs      diff 67µs
align 1073741824        pre 985µs       on 1.06ms       post 984µs      diff 75.8µs
align 536870912 pre 989µs       on 1.06ms       post 987µs      diff 66.9µs
align 268435456 pre 989µs       on 1.06ms       post 983µs      diff 70µs
align 134217728 pre 987µs       on 1.06ms       post 986µs      diff 73.6µs
align 67108864  pre 990µs       on 1.06ms       post 987µs      diff 71.7µs
align 33554432  pre 985µs       on 1.06ms       post 983µs      diff 73.8µs
align 16777216  pre 967µs       on 1.06ms       post 963µs      diff 95.2µs
align 8388608   pre 987µs       on 1.04ms       post 979µs      diff 56.5µs
align 4194304   pre 1.01ms      on 1.11ms       post 963µs      diff 119µs
align 2097152   pre 1.02ms      on 1.06ms       post 1.01ms     diff 43.9µs
align 1048576   pre 1.01ms      on 1.06ms       post 1.02ms     diff 42.7µs
align 524288    pre 1.02ms      on 1.11ms       post 1.01ms     diff 95.5µs
align 262144    pre 1.01ms      on 1.05ms       post 1.01ms     diff 36.6µs
align 131072    pre 1.02ms      on 1.05ms       post 1.01ms     diff 37.9µs
align 65536     pre 1.01ms      on 1.05ms       post 1.02ms     diff 35.8µs
align 32768     pre 1.02ms      on 1.05ms       post 1.01ms     diff 39.4µs
align 16384     pre 1.01ms      on 1.05ms       post 1.02ms     diff 37.4µs
align 8192      pre 1.01ms      on 1.05ms       post 1.01ms     diff 33.1µs
同样512K处很难解释

8K也是类似的分布
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=8192 --count=20
align 8589934592        pre 921µs       on 967µs        post 913µs      diff 49.5µs
align 4294967296        pre 1.2ms       on 1.27ms       post 1.19ms     diff 69.8µs
align 2147483648        pre 1.2ms       on 1.27ms       post 1.19ms     diff 72.1µs
align 1073741824        pre 1.2ms       on 1.27ms       post 1.2ms      diff 70µs
align 536870912 pre 1.19ms      on 1.27ms       post 1.19ms     diff 76.1µs
align 268435456 pre 1.2ms       on 1.26ms       post 1.2ms      diff 67.7µs
align 134217728 pre 1.2ms       on 1.27ms       post 1.19ms     diff 72.9µs
align 67108864  pre 1.19ms      on 1.26ms       post 1.19ms     diff 69.4µs
align 33554432  pre 1.19ms      on 1.26ms       post 1.19ms     diff 71.1µs
align 16777216  pre 1.18ms      on 1.26ms       post 1.17ms     diff 88.9µs
align 8388608   pre 1.19ms      on 1.25ms       post 1.19ms     diff 58.4µs
align 4194304   pre 1.22ms      on 1.3ms        post 1.2ms      diff 93.3µs
align 2097152   pre 1.23ms      on 1.23ms       post 1.22ms     diff 1.74µs
align 1048576   pre 1.22ms      on 1.22ms       post 1.22ms     diff -4151ns
align 524288    pre 1.23ms      on 1.28ms       post 1.22ms     diff 57.3µs
align 262144    pre 1.22ms      on 1.22ms       post 1.22ms     diff -1387ns
align 131072    pre 1.22ms      on 1.22ms       post 1.22ms     diff 1.01µs
align 65536     pre 1.22ms      on 1.21ms       post 1.22ms     diff -7133ns
align 32768     pre 1.22ms      on 1.23ms       post 1.22ms     diff 5.34µs
align 16384     pre 1.22ms      on 1.22ms       post 1.22ms     diff 211ns

32K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=32768
[sudo] password for beaglebone:
align 8589934592        pre 2.03ms      on 2.08ms       post 2.02ms     diff 56µs
align 4294967296        pre 2.42ms      on 2.49ms       post 2.42ms     diff 73.5µs
align 2147483648        pre 2.4ms       on 2.49ms       post 2.43ms     diff 76.2µs
align 1073741824        pre 2.42ms      on 2.49ms       post 2.42ms     diff 68.7µs
align 536870912 pre 2.42ms      on 2.49ms       post 2.42ms     diff 72.8µs
align 268435456 pre 2.42ms      on 2.49ms       post 2.42ms     diff 72.7µs
align 134217728 pre 2.41ms      on 2.49ms       post 2.42ms     diff 74.6µs
align 67108864  pre 2.45ms      on 2.52ms       post 2.42ms     diff 89.3µs
align 33554432  pre 2.4ms       on 2.49ms       post 2.43ms     diff 75.4µs
align 16777216  pre 2.44ms      on 2.57ms       post 2.44ms     diff 128µs
align 8388608   pre 2.42ms      on 2.47ms       post 2.42ms     diff 56.9µs
align 4194304   pre 2.45ms      on 2.54ms       post 2.43ms     diff 94.3µs
align 2097152   pre 2.45ms      on 2.45ms       post 2.45ms     diff 1.25µs
align 1048576   pre 2.44ms      on 2.44ms       post 2.45ms     diff -3208ns
align 524288    pre 2.45ms      on 2.5ms        post 2.44ms     diff 58.9µs
align 262144    pre 2.45ms      on 2.44ms       post 2.45ms     diff -6505ns
align 131072    pre 2.44ms      on 2.45ms       post 2.44ms     diff 3.37µs
align 65536     pre 2.44ms      on 2.44ms       post 2.45ms     diff -3327ns

至此猜测如果以8K*N作为page大小,无法和测试align。

该sd卡的大小是62333952*512byte,用
beaglebone@beaglebone:~/flashbench$ factor 62333952
62333952: 2 2 2 2 2 2 2 2 2 2 3 103 197

公因数中有3,考虑到该sd卡有可能是TLC闪存,一位可以存储3bit,所以有可能page size是4K为基础的3的倍数。

103*197并不是2的指数,这不是问题,因为over provisioning已经被很多厂商采用,controller会保留一部分的flash单元专门用于gc等操作,这些单元对用户是不可见的。

对于flash来说,存储单元page仍是以4K*N(N是2的指数)为单位的,但是一个TLC存储单元表示3bit,所以换算成logic page size就是12K*N了。

try 12K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=12288 --count=20
align 6442450944        pre 1.4ms       on 1.48ms       post 1.4ms      diff 75.7µs
align 3221225472        pre 1.4ms       on 1.49ms       post 1.4ms      diff 85.1µs
align 1610612736        pre 1.4ms       on 1.49ms       post 1.4ms      diff 92.8µs
align 805306368 pre 1.4ms       on 1.49ms       post 1.4ms      diff 93.7µs
align 402653184 pre 1.4ms       on 1.49ms       post 1.4ms      diff 93.8µs
align 201326592 pre 1.39ms      on 1.49ms       post 1.4ms      diff 94.4µs
align 100663296 pre 1.4ms       on 1.49ms       post 1.4ms      diff 96.1µs
align 50331648  pre 1.4ms       on 1.49ms       post 1.4ms      diff 96.6µs
align 25165824  pre 1.4ms       on 1.49ms       post 1.4ms      diff 89.7µs
align 12582912  pre 1.4ms       on 1.66ms       post 1.55ms     diff 184µs
align 6291456   pre 1.4ms       on 1.4ms        post 1.4ms      diff 1.35µs
align 3145728   pre 1.43ms      on 1.47ms       post 1.43ms     diff 40.7µs
align 1572864   pre 1.43ms      on 1.47ms       post 1.42ms     diff 41.6µs
align 786432    pre 1.43ms      on 1.47ms       post 1.42ms     diff 46.5µs
align 393216    pre 1.43ms      on 1.47ms       post 1.43ms     diff 41.6µs
align 196608    pre 1.43ms      on 1.47ms       post 1.42ms     diff 43.3µs
align 98304     pre 1.43ms      on 1.47ms       post 1.42ms     diff 48µs
align 49152     pre 1.42ms      on 1.47ms       post 1.42ms     diff 42.8µs
align 24576     pre 1.42ms      on 1.47ms       post 1.43ms     diff 41.2µs
看起来不错,但是6291456出有突变

24K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=24576 --count=20
align 6442450944        pre 1.65ms      on 1.72ms       post 1.65ms     diff 69.9µs
align 3221225472        pre 2ms on 2.09ms       post 2ms        diff 88.3µs
align 1610612736        pre 2.01ms      on 2.09ms       post 2ms        diff 87.3µs
align 805306368 pre 2ms on 2.09ms       post 2ms        diff 88µs
align 402653184 pre 2ms on 2.09ms       post 2ms        diff 87.6µs
align 201326592 pre 2ms on 2.09ms       post 2ms        diff 87.4µs
align 100663296 pre 2.01ms      on 2.09ms       post 2ms        diff 88.1µs
align 50331648  pre 2.01ms      on 2.09ms       post 2ms        diff 86µs
align 25165824  pre 2ms on 2.09ms       post 2ms        diff 89.2µs
align 12582912  pre 2ms on 2.28ms       post 2.18ms     diff 187µs
align 6291456   pre 2ms on 2ms  post 2.01ms     diff -660ns
align 3145728   pre 2.03ms      on 2.03ms       post 2.02ms     diff -312ns
align 1572864   pre 2.03ms      on 2.03ms       post 2.02ms     diff 1.23µs
align 786432    pre 2.03ms      on 2.04ms       post 2.03ms     diff 3.9µs
align 393216    pre 2.03ms      on 2.03ms       post 2.03ms     diff 1.68µs
align 196608    pre 2.03ms      on 2.03ms       post 2.02ms     diff 825ns
align 98304     pre 2.03ms      on 2.03ms       post 2.03ms     diff -1041ns
align 49152     pre 2.03ms      on 2.03ms       post 2.03ms     diff -672ns
比较通顺了,看起来24K可以align,且erase block size的大小是12582912,也就是12M。如果24K可以和page align,那么24K*N应该都可以align。

48K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=49152 --count=20
align 6442450944        pre 2.75ms      on 2.83ms       post 2.75ms     diff 72.3µs
align 3221225472        pre 3.21ms      on 3.3ms        post 3.21ms     diff 89.1µs
align 1610612736        pre 3.21ms      on 3.3ms        post 3.21ms     diff 87µs
align 805306368 pre 3.22ms      on 3.3ms        post 3.21ms     diff 87µs
align 402653184 pre 3.22ms      on 3.3ms        post 3.21ms     diff 86.9µs
align 201326592 pre 3.22ms      on 3.3ms        post 3.22ms     diff 86.3µs
align 100663296 pre 3.21ms      on 3.3ms        post 3.21ms     diff 90.9µs
align 50331648  pre 3.21ms      on 3.3ms        post 3.21ms     diff 89.3µs
align 25165824  pre 3.21ms      on 3.3ms        post 3.21ms     diff 86.9µs
align 12582912  pre 3.21ms      on 3.52ms       post 3.39ms     diff 214µs
align 6291456   pre 3.21ms      on 3.21ms       post 3.21ms     diff 3.22µs
align 3145728   pre 3.24ms      on 3.24ms       post 3.24ms     diff 4.21µs
align 1572864   pre 3.24ms      on 3.24ms       post 3.24ms     diff 4.54µs
align 786432    pre 3.24ms      on 3.24ms       post 3.24ms     diff 2.77µs
align 393216    pre 3.24ms      on 3.24ms       post 3.24ms     diff 2.68µs
align 196608    pre 3.23ms      on 3.24ms       post 3.24ms     diff 5.64µs
align 98304     pre 3.24ms      on 3.24ms       post 3.24ms     diff 7.47µs

96K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=98304 --count=20
align 6442450944        pre 5.16ms      on 5.22ms       post 5.15ms     diff 67.2µs
align 3221225472        pre 5.8ms       on 5.89ms       post 5.8ms      diff 87.6µs
align 1610612736        pre 5.81ms      on 5.89ms       post 5.8ms      diff 87.9µs
align 805306368 pre 5.81ms      on 5.89ms       post 5.8ms      diff 82µs
align 402653184 pre 5.8ms       on 5.89ms       post 5.8ms      diff 91.2µs
align 201326592 pre 5.81ms      on 5.89ms       post 5.8ms      diff 85.9µs
align 100663296 pre 5.81ms      on 5.89ms       post 5.8ms      diff 88.1µs
align 50331648  pre 5.81ms      on 5.89ms       post 5.8ms      diff 82.6µs
align 25165824  pre 5.81ms      on 5.89ms       post 5.8ms      diff 82.2µs
align 12582912  pre 5.81ms      on 6.12ms       post 6.01ms     diff 213µs
align 6291456   pre 5.81ms      on 5.81ms       post 5.81ms     diff -3359ns
align 3145728   pre 5.86ms      on 5.86ms       post 5.85ms     diff 3.84µs
align 1572864   pre 5.86ms      on 5.86ms       post 5.85ms     diff 1.57µs
align 786432    pre 5.86ms      on 5.85ms       post 5.85ms     diff -5169ns
align 393216    pre 5.86ms      on 5.86ms       post 5.86ms     diff -2854ns
align 196608    pre 5.85ms      on 5.86ms       post 5.86ms     diff 661ns

128K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=196608 --count=20
[sudo] password for beaglebone:
align 6442450944        pre 9.72ms      on 9.79ms       post 9.73ms     diff 62.9µs
align 3221225472        pre 10.7ms      on 10.8ms       post 10.7ms     diff 87.5µs
align 1610612736        pre 10.7ms      on 10.8ms       post 10.7ms     diff 77.6µs
align 805306368 pre 10.8ms      on 10.8ms       post 10.7ms     diff 76.1µs
align 402653184 pre 10.7ms      on 10.8ms       post 10.7ms     diff 116µs
align 201326592 pre 10.7ms      on 10.8ms       post 10.7ms     diff 87µs
align 100663296 pre 10.7ms      on 10.8ms       post 10.7ms     diff 89.9µs
align 50331648  pre 10.7ms      on 10.8ms       post 10.7ms     diff 93.7µs
align 25165824  pre 10.7ms      on 10.8ms       post 10.7ms     diff 94.8µs
align 12582912  pre 10.7ms      on 11.1ms       post 10.9ms     diff 247µs
align 6291456   pre 10.7ms      on 10.7ms       post 10.7ms     diff -17098n
align 3145728   pre 10.8ms      on 10.8ms       post 10.8ms     diff 19.5µs
align 1572864   pre 10.9ms      on 10.8ms       post 10.8ms     diff -57897n
align 786432    pre 10.8ms      on 10.8ms       post 10.8ms     diff -6543ns
align 393216    pre 10.8ms      on 10.8ms       post 10.8ms     diff -31925n


看起来在使用4K*3*N作为page是较为规律,且突变都发生在1258912(12MB)出。

对于TLC颗粒来说,就是以8K为page,4M为一个erase block size。

在25165824处读取时间的降低也可以解释。当我们写入是使用的logic block address(LBA),但在sd卡内部,不是所有的颗粒都是线性地址,前面提到sd卡内部可能有多个plane,不同plane上的block其实是可以同时访问的,sd卡的controller通过FTL来将LBA翻译成内部实际的physic block address(PBA)。当我们读取两个不同block上的page时,FTL可以将这两个block映射成sd卡上不同plane的block,读取两个不同plane上的block的时间和读取一个plane上一个block的时间应该相近,所以可以看到25165824之后的读取速度反而提高了。

再来测试一下速度
12K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench /dev/mmcblk0 --open-au --erasesize=$[12*1024*1024] --blocksize=$[12*1024] --open-au-nr=1
12MiB   13.2M/s
6MiB    13.2M/s
3MiB    12.3M/s
1.5MiB  13M/s
768KiB  12.9M/s
384KiB  13.1M/s
192KiB  13M/s
96KiB   11.6M/s
48KiB   11.3M/s
24KiB   6.53M/s
12KiB   3.47M/s

24K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench /dev/mmcblk0 --open-au --erasesize=$[12*1024*1024] --blocksize=$[24*1024] --open-au-nr=1
12MiB   12.2M/s
6MiB    13M/s
3MiB    12.7M/s
1.5MiB  12.9M/s
768KiB  13.2M/s
384KiB  13.6M/s
192KiB  13.4M/s
96KiB   12M/s
48KiB   11.7M/s
24KiB   6.23M/s

48K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench /dev/mmcblk0 --open-au --erasesize=$[12*1024*1024] --blocksize=$[48*1024] --open-au-nr=1
12MiB   12.8M/s
6MiB    12.7M/s
3MiB    12.9M/s
1.5MiB  13.1M/s
768KiB  13.3M/s
384KiB  13.1M/s
192KiB  13.4M/s
96KiB   12.2M/s
48KiB   11.8M/s

看起来参数差不多,但经过多次测试,48K的写入输出最为稳定。
稳妥起见,用24K作为page size,12M为erase block size。

在分区的时候,起始地址使用12MB来对齐12M的erase block size。
sudo -c /fdisk/mmcblk0
n创建新分区
起始地址选24576,因为单位是512byte,换算就是12M。

或者用parted也可以,fdisk比较老旧了,实际man手册中也不推荐使用了。

然后需要用ext4格式化,ext4是支持日志和RAID的文件系统,可以

>>sudo mkfs.ext4 -O ^has_journal -E stride=6,stripe-width=3072 -b 4096 -L Fedora14Arm  /dev/mmcblk0p1

-O ^has_journal 表示关闭日志记录功能,这个可以减轻写入,但是系统崩溃后文件恢复就很困难,这个可以自己选择是否需要。
-b 4096代表以4K来作为文件系统的簇,最小读写单位。
-L 是创建label
stride可以理解为读的最小单位,而strip-width可以理解为写的最小单位,这是以RAID磁盘阵列为基础的,但是flash的内部读写和RAID有一定的相似之处,所以这里也可以复用。读希望是page的大小,也就是24K,而写希望是12M,一个erase block size的大小,避免read-modif
y-write的操作,以4K为单位计算,stride=page size/cluster size=24K/4K=6,strip-width=12*1024K/4K=3072.

mount该ext4分区的参数也需要调整,假设格式化时已经关闭了日志功能。
>> sudo mount -o data=writeback,noatime,nodiratime /dev/mmcblk0p1 /media/sd
data=writeback表示不写入日志了。
noatime 不更新node的时间
nodiratime 不更新文件夹的修改时间



No comments: