How much is the size the file stores per character and how does it is stored?

Well, not sure if this is interesting but just wanted to share for people who are not that nerdy geeky in the inside.

Anyhow, file system recognizes the characters base on bytes when it's being stored like ASCII character which takes a byte in every character and UTF-8 character takes 2 bytes.

Specifically, I'm using Mac OS X, so I'm on HFS+ filing system.

Try creating a file with,


$> vim test

In your file, insert character "p".


If you check the file size by "ls -alth test", you'll notice it takes 2B of size. It's because the character "p" takes a byte, while it also inserts the line feed, which takes now 2B of the size including the inserted character (line feed).

If you try to edit/open the file in a hex application like Hex Fiend, you'll notice that in the left side, it has a hex value of "50" plus the inserted "0A" (it has the value of "500A") which has the value of 10 in decimal which in the ASCII table, value of "0A" from hex into decimal, which is 10 is the "new line feed". Take a look at the screenshot below.



Now, with the UTF-8 chars, it takes up to 2 bytes. Take a look at the screenshot I have below,


In the above example, I have added these characters ©, ®, and Æ respectively. You notice that still it takes up to 7 bytes which in every size of those 3 characters, it takes up to 2 bytes plus a byte of the line feed.

If you take a look at the copyright sign and the registered sign, the copyright sign has a hex value of "C2 A9" and registered sign has a hex value of "C2 AE" which now takes these two chracters with 2 bytes. Try checking this site http://www.utf8-chartable.de/ to check the UTF-8 values.

Hope this helps you understand how much size and how does your filing system store the characters in to a file.



Comments

Popular posts from this blog

Converting sectors into MB - Useful in understanding the sectors in iostat in Linux

What is Disk Contention?

Installing MySQL from source: Could NOT find Curses (missing: CURSES_LIBRARY CURSES_INCLUDE_PATH)