flash的一些背景知识:
http://en.wikipedia.org/wiki/Flash_memory
http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-2-architecture-of-an-ssd-and-benchmarking/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-3-pages-blocks-and-the-flash-translation-layer/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-4-advanced-functionalities-and-internal-parallelism/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-5-access-patterns-and-system-optimizations/
http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives
所以在一个flash的存储介质中,一般来说其内部架构如下图所示:
一个sd的封装可能包含N个chip,每个chip中还有若干个plane,其中每个plane中又包含若干erase block, 每个erase block还包含若干个page。
其中page是flash的最小读写单位,而flash的特殊之处在于其写入前需要erase整个erase block,将其中的bit全重置为1。也就是说读操作是可以直接读一个page的信息,但是写入时,哪怕只有一个page的信息,也需要erase掉整个block。
从图中也可以看出,并行的chip,同一chip上并行的plan其实是可以同时读写的,这个读写单位在图中叫做clustered block。这和磁盘RAID阵列其实概念非常相近。后面再解释ext4参数时会用到这个概念。
从上面的flash的基本定义,可以看出来,在format的过程中,需要知道sd卡的page,erase block size,和plane等信息才能给出正确的format参数,因为fdisk等工具默认的最小读写unit是512字节,这是旧时代老硬盘的标准,现代硬盘的标准簇实际上已经是4K了,而SD卡等flash介质的的page,erase block size都是不确定的,很难找到官方的标准。
这就需要flashbench这个工具了。
它的工具原理和使用方法见
https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey?highlight=%28FlashCardSurvey%29
http://lists.linaro.org/pipermail/flashbench-results/2012-June/000300.html
从以下网址下载安装
https://github.com/bradfa/flashbench
用git clone下载后,进入文件夹sudo make来编译
编译完成后就可以使用了,建议认真阅读readme来学习使用方法。
以我买的scandisk 32G ultra microSDHC(SDSDQU-032G-AFFP-A)为例,这是一款32G的sd卡。
首先查看emmc和sd卡的设备信息。
sudo udevadm info -a -n /dev/mmcblk1可以看到emmc的信息
ATTR{size}=="7667712"
ATTRS{preferred_erase_size}=="2097152"
ATTRS{erase_size}=="2097152"
sudo udevadm info -a -n /dev/mmcblk0查看
ATTR{size}=="62333952"
ATTRS{preferred_erase_size}=="4194304"
ATTRS{fwrev}=="0x0"
ATTRS{hwrev}=="0x8"
ATTRS{oemid}=="0x5344"
ATTRS{manfid}=="0x000003"
ATTRS{serial}=="0x228e4f3e"
ATTRS{erase_size}=="512"
sudo udevadm info -a -n /dev/mmcblk0查看
ATTR{size}=="62333952"
ATTRS{preferred_erase_size}=="4194304"
ATTRS{fwrev}=="0x0"
ATTRS{hwrev}=="0x8"
ATTRS{oemid}=="0x5344"
ATTRS{manfid}=="0x000003"
ATTRS{serial}=="0x228e4f3e"
ATTRS{erase_size}=="512"
可以看到emmc的erase size是2M,其大小是7667712*512byte,折合大小约3.65625GB。
使用flashbench来测试一下,可以看见速度差的突变发生在1048576(1MB)和2097152(2MB)之间,这和读取的preferred_erase_size是吻合的。
使用flashbench来测试一下,可以看见速度差的突变发生在1048576(1MB)和2097152(2MB)之间,这和读取的preferred_erase_size是吻合的。
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk1 --count=20
[sudo] password for beaglebone:
align 1073741824 pre 1.97ms on 2.36ms post 1.75ms diff 502µs
align 536870912 pre 2.32ms on 2.8ms post 2.13ms diff 570µs
align 268435456 pre 2.58ms on 2.85ms post 2.2ms diff 464µs
align 134217728 pre 1.95ms on 2.5ms post 2.31ms diff 368µs
align 67108864 pre 2.04ms on 2.54ms post 2.15ms diff 441µs
align 33554432 pre 2.13ms on 2.62ms post 2.15ms diff 480µs
align 16777216 pre 2.1ms on 2.63ms post 2.24ms diff 462µs
align 8388608 pre 2.14ms on 2.64ms post 2.17ms diff 481µs
align 4194304 pre 2.1ms on 2.63ms post 2.2ms diff 474µs
align 2097152 pre 2.07ms on 2.56ms post 2.12ms diff 464µs
align 1048576 pre 2.11ms on 2.23ms post 2.25ms diff 52.5µs
align 524288 pre 2.15ms on 2.18ms post 2.21ms diff -6226ns
align 262144 pre 2.16ms on 2.18ms post 2.2ms diff -1092ns
align 131072 pre 2.17ms on 2.17ms post 2.2ms diff -9687ns
align 65536 pre 2.15ms on 2.18ms post 2.21ms diff 611ns
align 32768 pre 2.16ms on 2.18ms post 2.22ms diff -5497ns
但是还不能看出page的大小,指定读取unit为1024byte再次测试
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk1 --blocksize=1024 --count=20
align 1073741824 pre 1.22ms on 1.53ms post 875µs diff 485µs
align 536870912 pre 1.21ms on 1.57ms post 947µs diff 489µs
align 268435456 pre 1.25ms on 1.72ms post 1.13ms diff 532µs
align 134217728 pre 1.18ms on 1.64ms post 1.1ms diff 495µs
align 67108864 pre 1.19ms on 1.65ms post 1.09ms diff 514µs
align 33554432 pre 1.25ms on 1.74ms post 1.14ms diff 545µs
align 16777216 pre 1.24ms on 1.7ms post 1.15ms diff 504µs
align 8388608 pre 1.28ms on 1.77ms post 1.16ms diff 546µs
align 4194304 pre 1.24ms on 1.72ms post 1.16ms diff 520µs
align 2097152 pre 1.23ms on 1.7ms post 1.16ms diff 506µs
align 1048576 pre 1.14ms on 1.28ms post 1.16ms diff 127µs
align 524288 pre 1.13ms on 1.26ms post 1.15ms diff 122µs
align 262144 pre 1.14ms on 1.26ms post 1.15ms diff 117µs
align 131072 pre 1.14ms on 1.26ms post 1.15ms diff 116µs
align 65536 pre 1.13ms on 1.26ms post 1.15ms diff 117µs
align 32768 pre 1.14ms on 1.26ms post 1.15ms diff 116µs
align 16384 pre 1.14ms on 1.26ms post 1.14ms diff 116µs
align 8192 pre 1.14ms on 1.26ms post 1.15ms diff 117µs
align 4096 pre 1.14ms on 1.26ms post 1.14ms diff 120µs
align 2048 pre 1.14ms on 1.2ms post 1.14ms diff 53.7µs
[sudo] password for beaglebone:
align 1073741824 pre 1.97ms on 2.36ms post 1.75ms diff 502µs
align 536870912 pre 2.32ms on 2.8ms post 2.13ms diff 570µs
align 268435456 pre 2.58ms on 2.85ms post 2.2ms diff 464µs
align 134217728 pre 1.95ms on 2.5ms post 2.31ms diff 368µs
align 67108864 pre 2.04ms on 2.54ms post 2.15ms diff 441µs
align 33554432 pre 2.13ms on 2.62ms post 2.15ms diff 480µs
align 16777216 pre 2.1ms on 2.63ms post 2.24ms diff 462µs
align 8388608 pre 2.14ms on 2.64ms post 2.17ms diff 481µs
align 4194304 pre 2.1ms on 2.63ms post 2.2ms diff 474µs
align 2097152 pre 2.07ms on 2.56ms post 2.12ms diff 464µs
align 1048576 pre 2.11ms on 2.23ms post 2.25ms diff 52.5µs
align 524288 pre 2.15ms on 2.18ms post 2.21ms diff -6226ns
align 262144 pre 2.16ms on 2.18ms post 2.2ms diff -1092ns
align 131072 pre 2.17ms on 2.17ms post 2.2ms diff -9687ns
align 65536 pre 2.15ms on 2.18ms post 2.21ms diff 611ns
align 32768 pre 2.16ms on 2.18ms post 2.22ms diff -5497ns
但是还不能看出page的大小,指定读取unit为1024byte再次测试
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk1 --blocksize=1024 --count=20
align 1073741824 pre 1.22ms on 1.53ms post 875µs diff 485µs
align 536870912 pre 1.21ms on 1.57ms post 947µs diff 489µs
align 268435456 pre 1.25ms on 1.72ms post 1.13ms diff 532µs
align 134217728 pre 1.18ms on 1.64ms post 1.1ms diff 495µs
align 67108864 pre 1.19ms on 1.65ms post 1.09ms diff 514µs
align 33554432 pre 1.25ms on 1.74ms post 1.14ms diff 545µs
align 16777216 pre 1.24ms on 1.7ms post 1.15ms diff 504µs
align 8388608 pre 1.28ms on 1.77ms post 1.16ms diff 546µs
align 4194304 pre 1.24ms on 1.72ms post 1.16ms diff 520µs
align 2097152 pre 1.23ms on 1.7ms post 1.16ms diff 506µs
align 1048576 pre 1.14ms on 1.28ms post 1.16ms diff 127µs
align 524288 pre 1.13ms on 1.26ms post 1.15ms diff 122µs
align 262144 pre 1.14ms on 1.26ms post 1.15ms diff 117µs
align 131072 pre 1.14ms on 1.26ms post 1.15ms diff 116µs
align 65536 pre 1.13ms on 1.26ms post 1.15ms diff 117µs
align 32768 pre 1.14ms on 1.26ms post 1.15ms diff 116µs
align 16384 pre 1.14ms on 1.26ms post 1.14ms diff 116µs
align 8192 pre 1.14ms on 1.26ms post 1.15ms diff 117µs
align 4096 pre 1.14ms on 1.26ms post 1.14ms diff 120µs
align 2048 pre 1.14ms on 1.2ms post 1.14ms diff 53.7µs
可以看出,从2024到4096速度差有突变,这是由于读取单位最小是4K引起的,所以该emmc的page size是4K。
这里测试的blocksize参数是猜测的page大小,如果和实际的page大小是吻合的话,那么在小于erase block size的范围内读取时间差应该是比较小且一致性好,因为都是在一个erase block size内读两个N*page的数据。
这里测试的blocksize参数是猜测的page大小,如果和实际的page大小是吻合的话,那么在小于erase block size的范围内读取时间差应该是比较小且一致性好,因为都是在一个erase block size内读两个N*page的数据。
再来测试sd卡
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --count=20
align 8589934592 pre 1.6ms on 1.66ms post 1.6ms diff 55.4µs
align 4294967296 pre 1.6ms on 1.67ms post 1.6ms diff 72.5µs
align 2147483648 pre 1.61ms on 1.67ms post 1.6ms diff 68.7µs
align 1073741824 pre 1.6ms on 1.67ms post 1.6ms diff 73.3µs
align 536870912 pre 1.6ms on 1.67ms post 1.6ms diff 76.3µs
align 268435456 pre 1.6ms on 1.67ms post 1.6ms diff 73.5µs
align 134217728 pre 1.6ms on 1.67ms post 1.6ms diff 75.5µs
align 67108864 pre 1.6ms on 1.67ms post 1.6ms diff 69.1µs
align 33554432 pre 1.6ms on 1.67ms post 1.6ms diff 67.9µs
align 16777216 pre 1.58ms on 1.67ms post 1.57ms diff 93.2µs
align 8388608 pre 1.6ms on 1.65ms post 1.6ms diff 50µs
align 4194304 pre 1.62ms on 1.75ms post 1.67ms diff 103µs
align 2097152 pre 1.63ms on 1.63ms post 1.62ms diff 5.51µs
align 1048576 pre 1.63ms on 1.63ms post 1.63ms diff -2488ns
align 524288 pre 1.63ms on 1.69ms post 1.62ms diff 58.4µs
align 262144 pre 1.63ms on 1.63ms post 1.63ms diff -151ns
align 131072 pre 1.63ms on 1.63ms post 1.63ms diff -1616ns
align 65536 pre 1.62ms on 1.63ms post 1.63ms diff -1548ns
align 32768 pre 1.62ms on 1.63ms post 1.63ms diff 6.12µs
count=20是指读20次以减少误差。这里没有指定读的blocksize,程序自动选择了16384字节(16K)为读取单位。看起来好像是4194304(4MB)是erase block size,但是注意262144到524288同样也出现了突变,这个无法解释。
同样用1024byte来测试
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=1024 --count=20
align 8589934592 pre 602µs on 649µs post 593µs diff 50.8µs
align 4294967296 pre 613µs on 670µs post 596µs diff 66.1µs
align 2147483648 pre 843µs on 917µs post 845µs diff 73.2µs
align 1073741824 pre 848µs on 921µs post 845µs diff 74.5µs
align 536870912 pre 844µs on 917µs post 846µs diff 71.9µs
align 268435456 pre 842µs on 916µs post 845µs diff 72.3µs
align 134217728 pre 846µs on 918µs post 847µs diff 71.6µs
align 67108864 pre 847µs on 915µs post 843µs diff 69.8µs
align 33554432 pre 842µs on 921µs post 847µs diff 76.2µs
align 16777216 pre 830µs on 924µs post 821µs diff 98.5µs
align 8388608 pre 838µs on 896µs post 843µs diff 55.3µs
align 4194304 pre 878µs on 961µs post 823µs diff 110µs
align 2097152 pre 880µs on 921µs post 871µs diff 45.4µs
align 1048576 pre 878µs on 915µs post 877µs diff 37.9µs
align 524288 pre 883µs on 972µs post 867µs diff 97.5µs
align 262144 pre 874µs on 915µs post 878µs diff 38.8µs
align 131072 pre 876µs on 921µs post 871µs diff 46.9µs
align 65536 pre 873µs on 909µs post 874µs diff 35.5µs
align 32768 pre 880µs on 914µs post 870µs diff 38.6µs
align 16384 pre 874µs on 910µs post 871µs diff 37.2µs
align 8192 pre 870µs on 906µs post 870µs diff 35.9µs
align 4096 pre 827µs on 905µs post 869µs diff 56.7µs
align 2048 pre 828µs on 830µs post 827µs diff 2.05µs
看起来似乎4K是page的大小,而erase block size是4MB,但是524288处的时间差偏大,如果假设erase block size的大小是512K的话,很难解释为什么1M的读数差又变小,看起来更像是猜测的blocksize和实际page不是align引起的。
再用4096byte测试
beaglebone@beaglebone:~/flashbench$
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=4096 --count=20
align 8589934592 pre 734µs on 782µs post 727µs diff 51.6µs
align 4294967296 pre 803µs on 874µs post 799µs diff 73.1µs
align 2147483648 pre 986µs on 1.05ms post 988µs diff 67µs
align 1073741824 pre 985µs on 1.06ms post 984µs diff 75.8µs
align 536870912 pre 989µs on 1.06ms post 987µs diff 66.9µs
align 268435456 pre 989µs on 1.06ms post 983µs diff 70µs
align 134217728 pre 987µs on 1.06ms post 986µs diff 73.6µs
align 67108864 pre 990µs on 1.06ms post 987µs diff 71.7µs
align 33554432 pre 985µs on 1.06ms post 983µs diff 73.8µs
align 16777216 pre 967µs on 1.06ms post 963µs diff 95.2µs
align 8388608 pre 987µs on 1.04ms post 979µs diff 56.5µs
align 4194304 pre 1.01ms on 1.11ms post 963µs diff 119µs
align 2097152 pre 1.02ms on 1.06ms post 1.01ms diff 43.9µs
align 1048576 pre 1.01ms on 1.06ms post 1.02ms diff 42.7µs
align 524288 pre 1.02ms on 1.11ms post 1.01ms diff 95.5µs
align 262144 pre 1.01ms on 1.05ms post 1.01ms diff 36.6µs
align 131072 pre 1.02ms on 1.05ms post 1.01ms diff 37.9µs
align 65536 pre 1.01ms on 1.05ms post 1.02ms diff 35.8µs
align 32768 pre 1.02ms on 1.05ms post 1.01ms diff 39.4µs
align 16384 pre 1.01ms on 1.05ms post 1.02ms diff 37.4µs
align 8192 pre 1.01ms on 1.05ms post 1.01ms diff 33.1µs
同样用1024byte来测试
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=1024 --count=20
align 8589934592 pre 602µs on 649µs post 593µs diff 50.8µs
align 4294967296 pre 613µs on 670µs post 596µs diff 66.1µs
align 2147483648 pre 843µs on 917µs post 845µs diff 73.2µs
align 1073741824 pre 848µs on 921µs post 845µs diff 74.5µs
align 536870912 pre 844µs on 917µs post 846µs diff 71.9µs
align 268435456 pre 842µs on 916µs post 845µs diff 72.3µs
align 134217728 pre 846µs on 918µs post 847µs diff 71.6µs
align 67108864 pre 847µs on 915µs post 843µs diff 69.8µs
align 33554432 pre 842µs on 921µs post 847µs diff 76.2µs
align 16777216 pre 830µs on 924µs post 821µs diff 98.5µs
align 8388608 pre 838µs on 896µs post 843µs diff 55.3µs
align 4194304 pre 878µs on 961µs post 823µs diff 110µs
align 2097152 pre 880µs on 921µs post 871µs diff 45.4µs
align 1048576 pre 878µs on 915µs post 877µs diff 37.9µs
align 524288 pre 883µs on 972µs post 867µs diff 97.5µs
align 262144 pre 874µs on 915µs post 878µs diff 38.8µs
align 131072 pre 876µs on 921µs post 871µs diff 46.9µs
align 65536 pre 873µs on 909µs post 874µs diff 35.5µs
align 32768 pre 880µs on 914µs post 870µs diff 38.6µs
align 16384 pre 874µs on 910µs post 871µs diff 37.2µs
align 8192 pre 870µs on 906µs post 870µs diff 35.9µs
align 4096 pre 827µs on 905µs post 869µs diff 56.7µs
align 2048 pre 828µs on 830µs post 827µs diff 2.05µs
看起来似乎4K是page的大小,而erase block size是4MB,但是524288处的时间差偏大,如果假设erase block size的大小是512K的话,很难解释为什么1M的读数差又变小,看起来更像是猜测的blocksize和实际page不是align引起的。
再用4096byte测试
beaglebone@beaglebone:~/flashbench$
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=4096 --count=20
align 8589934592 pre 734µs on 782µs post 727µs diff 51.6µs
align 4294967296 pre 803µs on 874µs post 799µs diff 73.1µs
align 2147483648 pre 986µs on 1.05ms post 988µs diff 67µs
align 1073741824 pre 985µs on 1.06ms post 984µs diff 75.8µs
align 536870912 pre 989µs on 1.06ms post 987µs diff 66.9µs
align 268435456 pre 989µs on 1.06ms post 983µs diff 70µs
align 134217728 pre 987µs on 1.06ms post 986µs diff 73.6µs
align 67108864 pre 990µs on 1.06ms post 987µs diff 71.7µs
align 33554432 pre 985µs on 1.06ms post 983µs diff 73.8µs
align 16777216 pre 967µs on 1.06ms post 963µs diff 95.2µs
align 8388608 pre 987µs on 1.04ms post 979µs diff 56.5µs
align 4194304 pre 1.01ms on 1.11ms post 963µs diff 119µs
align 2097152 pre 1.02ms on 1.06ms post 1.01ms diff 43.9µs
align 1048576 pre 1.01ms on 1.06ms post 1.02ms diff 42.7µs
align 524288 pre 1.02ms on 1.11ms post 1.01ms diff 95.5µs
align 262144 pre 1.01ms on 1.05ms post 1.01ms diff 36.6µs
align 131072 pre 1.02ms on 1.05ms post 1.01ms diff 37.9µs
align 65536 pre 1.01ms on 1.05ms post 1.02ms diff 35.8µs
align 32768 pre 1.02ms on 1.05ms post 1.01ms diff 39.4µs
align 16384 pre 1.01ms on 1.05ms post 1.02ms diff 37.4µs
align 8192 pre 1.01ms on 1.05ms post 1.01ms diff 33.1µs
同样512K处很难解释
8K也是类似的分布
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=8192 --count=20
align 8589934592 pre 921µs on 967µs post 913µs diff 49.5µs
align 4294967296 pre 1.2ms on 1.27ms post 1.19ms diff 69.8µs
align 2147483648 pre 1.2ms on 1.27ms post 1.19ms diff 72.1µs
align 1073741824 pre 1.2ms on 1.27ms post 1.2ms diff 70µs
align 536870912 pre 1.19ms on 1.27ms post 1.19ms diff 76.1µs
align 268435456 pre 1.2ms on 1.26ms post 1.2ms diff 67.7µs
align 134217728 pre 1.2ms on 1.27ms post 1.19ms diff 72.9µs
align 67108864 pre 1.19ms on 1.26ms post 1.19ms diff 69.4µs
align 33554432 pre 1.19ms on 1.26ms post 1.19ms diff 71.1µs
align 16777216 pre 1.18ms on 1.26ms post 1.17ms diff 88.9µs
align 8388608 pre 1.19ms on 1.25ms post 1.19ms diff 58.4µs
align 4194304 pre 1.22ms on 1.3ms post 1.2ms diff 93.3µs
align 2097152 pre 1.23ms on 1.23ms post 1.22ms diff 1.74µs
align 1048576 pre 1.22ms on 1.22ms post 1.22ms diff -4151ns
align 524288 pre 1.23ms on 1.28ms post 1.22ms diff 57.3µs
align 262144 pre 1.22ms on 1.22ms post 1.22ms diff -1387ns
align 131072 pre 1.22ms on 1.22ms post 1.22ms diff 1.01µs
align 65536 pre 1.22ms on 1.21ms post 1.22ms diff -7133ns
align 32768 pre 1.22ms on 1.23ms post 1.22ms diff 5.34µs
align 16384 pre 1.22ms on 1.22ms post 1.22ms diff 211ns
32K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=32768
[sudo] password for beaglebone:
align 8589934592 pre 2.03ms on 2.08ms post 2.02ms diff 56µs
align 4294967296 pre 2.42ms on 2.49ms post 2.42ms diff 73.5µs
align 2147483648 pre 2.4ms on 2.49ms post 2.43ms diff 76.2µs
align 1073741824 pre 2.42ms on 2.49ms post 2.42ms diff 68.7µs
align 536870912 pre 2.42ms on 2.49ms post 2.42ms diff 72.8µs
align 268435456 pre 2.42ms on 2.49ms post 2.42ms diff 72.7µs
align 134217728 pre 2.41ms on 2.49ms post 2.42ms diff 74.6µs
align 67108864 pre 2.45ms on 2.52ms post 2.42ms diff 89.3µs
align 33554432 pre 2.4ms on 2.49ms post 2.43ms diff 75.4µs
align 16777216 pre 2.44ms on 2.57ms post 2.44ms diff 128µs
align 8388608 pre 2.42ms on 2.47ms post 2.42ms diff 56.9µs
align 4194304 pre 2.45ms on 2.54ms post 2.43ms diff 94.3µs
align 2097152 pre 2.45ms on 2.45ms post 2.45ms diff 1.25µs
align 1048576 pre 2.44ms on 2.44ms post 2.45ms diff -3208ns
align 524288 pre 2.45ms on 2.5ms post 2.44ms diff 58.9µs
align 262144 pre 2.45ms on 2.44ms post 2.45ms diff -6505ns
align 131072 pre 2.44ms on 2.45ms post 2.44ms diff 3.37µs
align 65536 pre 2.44ms on 2.44ms post 2.45ms diff -3327ns
至此猜测如果以8K*N作为page大小,无法和测试align。
该sd卡的大小是62333952*512byte,用
beaglebone@beaglebone:~/flashbench$ factor 62333952
62333952: 2 2 2 2 2 2 2 2 2 2 3 103 197
公因数中有3,考虑到该sd卡有可能是TLC闪存,一位可以存储3bit,所以有可能page size是4K为基础的3的倍数。
103*197并不是2的指数,这不是问题,因为over provisioning已经被很多厂商采用,controller会保留一部分的flash单元专门用于gc等操作,这些单元对用户是不可见的。
对于flash来说,存储单元page仍是以4K*N(N是2的指数)为单位的,但是一个TLC存储单元表示3bit,所以换算成logic page size就是12K*N了。
103*197并不是2的指数,这不是问题,因为over provisioning已经被很多厂商采用,controller会保留一部分的flash单元专门用于gc等操作,这些单元对用户是不可见的。
对于flash来说,存储单元page仍是以4K*N(N是2的指数)为单位的,但是一个TLC存储单元表示3bit,所以换算成logic page size就是12K*N了。
try 12K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=12288 --count=20
align 6442450944 pre 1.4ms on 1.48ms post 1.4ms diff 75.7µs
align 3221225472 pre 1.4ms on 1.49ms post 1.4ms diff 85.1µs
align 1610612736 pre 1.4ms on 1.49ms post 1.4ms diff 92.8µs
align 805306368 pre 1.4ms on 1.49ms post 1.4ms diff 93.7µs
align 402653184 pre 1.4ms on 1.49ms post 1.4ms diff 93.8µs
align 201326592 pre 1.39ms on 1.49ms post 1.4ms diff 94.4µs
align 100663296 pre 1.4ms on 1.49ms post 1.4ms diff 96.1µs
align 50331648 pre 1.4ms on 1.49ms post 1.4ms diff 96.6µs
align 25165824 pre 1.4ms on 1.49ms post 1.4ms diff 89.7µs
align 12582912 pre 1.4ms on 1.66ms post 1.55ms diff 184µs
align 6291456 pre 1.4ms on 1.4ms post 1.4ms diff 1.35µs
align 3145728 pre 1.43ms on 1.47ms post 1.43ms diff 40.7µs
align 1572864 pre 1.43ms on 1.47ms post 1.42ms diff 41.6µs
align 786432 pre 1.43ms on 1.47ms post 1.42ms diff 46.5µs
align 393216 pre 1.43ms on 1.47ms post 1.43ms diff 41.6µs
align 196608 pre 1.43ms on 1.47ms post 1.42ms diff 43.3µs
align 98304 pre 1.43ms on 1.47ms post 1.42ms diff 48µs
align 49152 pre 1.42ms on 1.47ms post 1.42ms diff 42.8µs
align 24576 pre 1.42ms on 1.47ms post 1.43ms diff 41.2µs
看起来不错,但是6291456出有突变
24K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=24576 --count=20
align 6442450944 pre 1.65ms on 1.72ms post 1.65ms diff 69.9µs
align 3221225472 pre 2ms on 2.09ms post 2ms diff 88.3µs
align 1610612736 pre 2.01ms on 2.09ms post 2ms diff 87.3µs
align 805306368 pre 2ms on 2.09ms post 2ms diff 88µs
align 402653184 pre 2ms on 2.09ms post 2ms diff 87.6µs
align 201326592 pre 2ms on 2.09ms post 2ms diff 87.4µs
align 100663296 pre 2.01ms on 2.09ms post 2ms diff 88.1µs
align 50331648 pre 2.01ms on 2.09ms post 2ms diff 86µs
align 25165824 pre 2ms on 2.09ms post 2ms diff 89.2µs
align 12582912 pre 2ms on 2.28ms post 2.18ms diff 187µs
align 6291456 pre 2ms on 2ms post 2.01ms diff -660ns
align 3145728 pre 2.03ms on 2.03ms post 2.02ms diff -312ns
align 1572864 pre 2.03ms on 2.03ms post 2.02ms diff 1.23µs
align 786432 pre 2.03ms on 2.04ms post 2.03ms diff 3.9µs
align 393216 pre 2.03ms on 2.03ms post 2.03ms diff 1.68µs
align 196608 pre 2.03ms on 2.03ms post 2.02ms diff 825ns
align 98304 pre 2.03ms on 2.03ms post 2.03ms diff -1041ns
align 49152 pre 2.03ms on 2.03ms post 2.03ms diff -672ns
比较通顺了,看起来24K可以align,且erase block size的大小是12582912,也就是12M。如果24K可以和page align,那么24K*N应该都可以align。
比较通顺了,看起来24K可以align,且erase block size的大小是12582912,也就是12M。如果24K可以和page align,那么24K*N应该都可以align。
48K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=49152 --count=20
align 6442450944 pre 2.75ms on 2.83ms post 2.75ms diff 72.3µs
align 3221225472 pre 3.21ms on 3.3ms post 3.21ms diff 89.1µs
align 1610612736 pre 3.21ms on 3.3ms post 3.21ms diff 87µs
align 805306368 pre 3.22ms on 3.3ms post 3.21ms diff 87µs
align 402653184 pre 3.22ms on 3.3ms post 3.21ms diff 86.9µs
align 201326592 pre 3.22ms on 3.3ms post 3.22ms diff 86.3µs
align 100663296 pre 3.21ms on 3.3ms post 3.21ms diff 90.9µs
align 50331648 pre 3.21ms on 3.3ms post 3.21ms diff 89.3µs
align 25165824 pre 3.21ms on 3.3ms post 3.21ms diff 86.9µs
align 12582912 pre 3.21ms on 3.52ms post 3.39ms diff 214µs
align 6291456 pre 3.21ms on 3.21ms post 3.21ms diff 3.22µs
align 3145728 pre 3.24ms on 3.24ms post 3.24ms diff 4.21µs
align 1572864 pre 3.24ms on 3.24ms post 3.24ms diff 4.54µs
align 786432 pre 3.24ms on 3.24ms post 3.24ms diff 2.77µs
align 393216 pre 3.24ms on 3.24ms post 3.24ms diff 2.68µs
align 196608 pre 3.23ms on 3.24ms post 3.24ms diff 5.64µs
align 98304 pre 3.24ms on 3.24ms post 3.24ms diff 7.47µs
96K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=98304 --count=20
align 6442450944 pre 5.16ms on 5.22ms post 5.15ms diff 67.2µs
align 3221225472 pre 5.8ms on 5.89ms post 5.8ms diff 87.6µs
align 1610612736 pre 5.81ms on 5.89ms post 5.8ms diff 87.9µs
align 805306368 pre 5.81ms on 5.89ms post 5.8ms diff 82µs
align 402653184 pre 5.8ms on 5.89ms post 5.8ms diff 91.2µs
align 201326592 pre 5.81ms on 5.89ms post 5.8ms diff 85.9µs
align 100663296 pre 5.81ms on 5.89ms post 5.8ms diff 88.1µs
align 50331648 pre 5.81ms on 5.89ms post 5.8ms diff 82.6µs
align 25165824 pre 5.81ms on 5.89ms post 5.8ms diff 82.2µs
align 12582912 pre 5.81ms on 6.12ms post 6.01ms diff 213µs
align 6291456 pre 5.81ms on 5.81ms post 5.81ms diff -3359ns
align 3145728 pre 5.86ms on 5.86ms post 5.85ms diff 3.84µs
align 1572864 pre 5.86ms on 5.86ms post 5.85ms diff 1.57µs
align 786432 pre 5.86ms on 5.85ms post 5.85ms diff -5169ns
align 393216 pre 5.86ms on 5.86ms post 5.86ms diff -2854ns
align 196608 pre 5.85ms on 5.86ms post 5.86ms diff 661ns
128K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench -a /dev/mmcblk0 --blocksize=196608 --count=20
[sudo] password for beaglebone:
align 6442450944 pre 9.72ms on 9.79ms post 9.73ms diff 62.9µs
align 3221225472 pre 10.7ms on 10.8ms post 10.7ms diff 87.5µs
align 1610612736 pre 10.7ms on 10.8ms post 10.7ms diff 77.6µs
align 805306368 pre 10.8ms on 10.8ms post 10.7ms diff 76.1µs
align 402653184 pre 10.7ms on 10.8ms post 10.7ms diff 116µs
align 201326592 pre 10.7ms on 10.8ms post 10.7ms diff 87µs
align 100663296 pre 10.7ms on 10.8ms post 10.7ms diff 89.9µs
align 50331648 pre 10.7ms on 10.8ms post 10.7ms diff 93.7µs
align 25165824 pre 10.7ms on 10.8ms post 10.7ms diff 94.8µs
align 12582912 pre 10.7ms on 11.1ms post 10.9ms diff 247µs
align 6291456 pre 10.7ms on 10.7ms post 10.7ms diff -17098n
align 3145728 pre 10.8ms on 10.8ms post 10.8ms diff 19.5µs
align 1572864 pre 10.9ms on 10.8ms post 10.8ms diff -57897n
align 786432 pre 10.8ms on 10.8ms post 10.8ms diff -6543ns
align 393216 pre 10.8ms on 10.8ms post 10.8ms diff -31925n
看起来在使用4K*3*N作为page是较为规律,且突变都发生在1258912(12MB)出。
对于TLC颗粒来说,就是以8K为page,4M为一个erase block size。
在25165824处读取时间的降低也可以解释。当我们写入是使用的logic block address(LBA),但在sd卡内部,不是所有的颗粒都是线性地址,前面提到sd卡内部可能有多个plane,不同plane上的block其实是可以同时访问的,sd卡的controller通过FTL来将LBA翻译成内部实际的physic block address(PBA)。当我们读取两个不同block上的page时,FTL可以将这两个block映射成sd卡上不同plane的block,读取两个不同plane上的block的时间和读取一个plane上一个block的时间应该相近,所以可以看到25165824之后的读取速度反而提高了。
在25165824处读取时间的降低也可以解释。当我们写入是使用的logic block address(LBA),但在sd卡内部,不是所有的颗粒都是线性地址,前面提到sd卡内部可能有多个plane,不同plane上的block其实是可以同时访问的,sd卡的controller通过FTL来将LBA翻译成内部实际的physic block address(PBA)。当我们读取两个不同block上的page时,FTL可以将这两个block映射成sd卡上不同plane的block,读取两个不同plane上的block的时间和读取一个plane上一个block的时间应该相近,所以可以看到25165824之后的读取速度反而提高了。
再来测试一下速度
12K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench /dev/mmcblk0 --open-au --erasesize=$[12*1024*1024] --blocksize=$[12*1024] --open-au-nr=1
12MiB 13.2M/s
6MiB 13.2M/s
3MiB 12.3M/s
1.5MiB 13M/s
768KiB 12.9M/s
384KiB 13.1M/s
192KiB 13M/s
96KiB 11.6M/s
48KiB 11.3M/s
24KiB 6.53M/s
12KiB 3.47M/s
24K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench /dev/mmcblk0 --open-au --erasesize=$[12*1024*1024] --blocksize=$[24*1024] --open-au-nr=1
12MiB 12.2M/s
6MiB 13M/s
3MiB 12.7M/s
1.5MiB 12.9M/s
768KiB 13.2M/s
384KiB 13.6M/s
192KiB 13.4M/s
96KiB 12M/s
48KiB 11.7M/s
24KiB 6.23M/s
48K
beaglebone@beaglebone:~/flashbench$ sudo ./flashbench /dev/mmcblk0 --open-au --erasesize=$[12*1024*1024] --blocksize=$[48*1024] --open-au-nr=1
12MiB 12.8M/s
6MiB 12.7M/s
3MiB 12.9M/s
1.5MiB 13.1M/s
768KiB 13.3M/s
384KiB 13.1M/s
192KiB 13.4M/s
96KiB 12.2M/s
48KiB 11.8M/s
看起来参数差不多,但经过多次测试,48K的写入输出最为稳定。
稳妥起见,用24K作为page size,12M为erase block size。
在分区的时候,起始地址使用12MB来对齐12M的erase block size。
sudo -c /fdisk/mmcblk0
n创建新分区
起始地址选24576,因为单位是512byte,换算就是12M。
或者用parted也可以,fdisk比较老旧了,实际man手册中也不推荐使用了。
然后需要用ext4格式化,ext4是支持日志和RAID的文件系统,可以
在分区的时候,起始地址使用12MB来对齐12M的erase block size。
sudo -c /fdisk/mmcblk0
n创建新分区
起始地址选24576,因为单位是512byte,换算就是12M。
或者用parted也可以,fdisk比较老旧了,实际man手册中也不推荐使用了。
然后需要用ext4格式化,ext4是支持日志和RAID的文件系统,可以
用
-b 4096代表以4K来作为文件系统的簇,最小读写单位。
-L 是创建label
stride可以理解为读的最小单位,而strip-width可以理解为写的最小单位,这是以RAID磁盘阵列为基础的,但是flash的内部读写和RAID有一定的相似之处,所以这里也可以复用。读希望是page的大小,也就是24K,而写希望是12M,一个erase block size的大小,避免read-modif
y-write的操作,以4K为单位计算,stride=page size/cluster size=24K/4K=6,strip-width=12*1024K/4K=3072.
>>sudo mkfs.ext4 -O ^has_journal -E stride=6,stripe-width=3072 -b 4096 -L Fedora14Arm /dev/mmcblk0p1
-O ^has_journal 表示关闭日志记录功能,这个可以减轻写入,但是系统崩溃后文件恢复就很困难,这个可以自己选择是否需要。-b 4096代表以4K来作为文件系统的簇,最小读写单位。
-L 是创建label
stride可以理解为读的最小单位,而strip-width可以理解为写的最小单位,这是以RAID磁盘阵列为基础的,但是flash的内部读写和RAID有一定的相似之处,所以这里也可以复用。读希望是page的大小,也就是24K,而写希望是12M,一个erase block size的大小,避免read-modif
y-write的操作,以4K为单位计算,stride=page size/cluster size=24K/4K=6,strip-width=12*1024K/4K=3072.
mount该ext4分区的参数也需要调整,假设格式化时已经关闭了日志功能。
>> sudo mount -o data=writeback,noatime,nodiratime /dev/mmcblk0p1 /media/sd
data=writeback表示不写入日志了。
noatime 不更新node的时间
nodiratime 不更新文件夹的修改时间
>> sudo mount -o data=writeback,noatime,nodiratime /dev/mmcblk0p1 /media/sd
data=writeback表示不写入日志了。
noatime 不更新node的时间
nodiratime 不更新文件夹的修改时间
No comments:
Post a Comment