[deepin exploration] Situations Where Hard Disk Usage and File Size Don't Match
Technology Exchange 928 views · 1 replies ·
vIann
deepin
2024-06-25 15:13
Author
This post includes content related to both Unix-like systems and Windows. Let's help you find the "missing" hard disk space.
Clusters
The smallest unit of storage on a hard disk is not a byte but a cluster, which is a continuous series of bytes. Files use several clusters to store data. If the last part of a file does not fill a cluster, the remaining space in that cluster is wasted. To store another file, a new cluster must be used.
The size of clusters is determined by the file system design and the options chosen during file system creation (formatting). Choosing the right cluster size is complex and can affect both disk space utilization and performance. If clusters are too large, storing files will occupy more disk space, especially for many small files. Typically, the file size we refer to is the size of the file itself, while the disk usage of the file is the size of the clusters it occupies, which is usually larger. Some file managers display both, but some may show only one.
You can create a file and use the following commands to see the difference. If the file is small enough, the size displayed will be one cluster size.
du -B 1
du --apparent-size -B 1
To change the cluster size, you usually need to recreate the file system (reformat).
Hidden Files
This is quite familiar to everyone. In Unix design, files beginning with a dot (.) are hidden files. This is because a single dot (.) represents the current directory, and double dots (..) represent the parent directory. Some applications wanted to hide these special directories, so they adopted the simple method of checking if the first character is a dot. This also led to other directories starting with a dot being hidden as well. We have continued this tradition.
In file managers, pressing Ctrl+H usually toggles the visibility of hidden files. Graphical applications often create hidden files or directories to store configuration or temporary files. However, it is now more recommended to store them in .local or .config directories to avoid clutter.
In Windows, hidden files are not related to file names. Any file can be a hidden file, a feature of the NTFS file system. Therefore, when Windows reads Unix-like file systems or vice versa, hidden files may become visible.
In GNU/Linux systems, some applications store cached data in hidden directories. Regular cleaning can save some hard disk space. On Windows, application data is usually stored in the folder specified by the %AppData% variable, but some unprivileged applications may use this path as the installation directory.
If you need to delete hidden files, identify their contents carefully. Deleting some files may cause applications to lose their configuration. Although these files can be regenerated, configurations are important assets for many users.
GNU/Linux offers many intuitive disk usage analysis tools, such as GNOME's baobab, which can help you see which directories occupy the most space.
Virtual Memory
This issue is more common on Windows. By default, some space is automatically allocated on the system disk for virtual memory, and users can manually adjust the settings to specify how much disk space a partition should use for virtual memory. This will occupy some disk space (even if all files are deleted, some disk usage can still be seen).
In addition to virtual memory, enabling Windows hibernation also occupies some hard disk space (up to the size of the RAM). You can toggle hibernation on and off with the following commands run as an administrator in cmd or PowerShell:
powercfg -H ON
powercfg -H OFF
On Unix-like systems, we call a similar technology swap, which can use a swap partition or swap file. Swap also stores hibernation data, so a separate hibernation file is not needed. Since swap is usually user-configured and swap partitions are more common, adjusting hard disk pressure via virtual memory settings is mostly useful on Windows.
You can adjust the mounting scheme of Unix-like systems by modifying /etc/fstab (which also allows changing swap settings). Some graphical partition editors, like GNOME Disks, can help users with these configurations.
Lost Files
Sometimes, file system or hardware errors can corrupt the file system. One situation is where the file index is lost, but the file data remains, making the files invisible to any file manager while still occupying hard disk space. Besides recreating the file system (reformatting), file system check tools can be used to recover these files.
On Unix-like systems, we use the fsck command to check and repair file system issues. Recovered lost files are placed in a directory named lost+found. This directory is more like a file system feature than a regular directory, and if deleted, it needs to be recreated with the mklost+found command.
For Microsoft file systems like NTFS and exFAT, GNU/Linux tools may not handle all situations. If you have Windows installed, you can use the chkdsk command to repair it. Recovered files will be placed in a FOUND.000 directory. However, in some versions of Windows, this directory is still invisible in the file explorer (even if hidden files are shown). Commands can be used to list and operate on these files, like:
dir /A H
Or use the built-in disk cleanup tool to "delete old chkdsk files".
Note that file system checks are risky commands. Back up important data in advance, and ensure you have enough time and power to complete the operation (some hard disks may take several hours). Immature repairs may worsen the situation. Read the instructions carefully and proceed with caution before executing these commands.
Power outages, improper shutdowns, and disconnecting storage devices without safely ejecting them (modern systems use write caching, so after clicking safely eject, wait for the system to confirm before unplugging) can all cause file loss. However, the ability to recover files varies depending on the file system design. Some file systems automatically check for issues, while others require manual commands to reclaim hard disk space.
Conversion Factors
Although this is common knowledge, I still get confused sometimes. The lost hard disk space is often a false alarm. We use two types of digital units: one based on binary (1KB = 2^10B = 1024B) and one based on decimal (1KB = 10^3B = 1000B). To distinguish them, we now use KiB, MiB, GiB for binary-based units, but mixed usage is still common. When calculating hard disk space with various programs, pay attention to whether the unit used is 1024 or 1000. Especially since 1TiB is nearly 100GB more than 1TB, such a "discrepancy" cannot be ignored.
This post includes content related to both Unix-like systems and Windows. Let's help you find the "missing" hard disk space.
Clusters
The smallest unit of storage on a hard disk is not a byte but a cluster, which is a continuous series of bytes. Files use several clusters to store data. If the last part of a file does not fill a cluster, the remaining space in that cluster is wasted. To store another file, a new cluster must be used.
The size of clusters is determined by the file system design and the options chosen during file system creation (formatting). Choosing the right cluster size is complex and can affect both disk space utilization and performance. If clusters are too large, storing files will occupy more disk space, especially for many small files. Typically, the file size we refer to is the size of the file itself, while the disk usage of the file is the size of the clusters it occupies, which is usually larger. Some file managers display both, but some may show only one.
You can create a file and use the following commands to see the difference. If the file is small enough, the size displayed will be one cluster size.
du -B 1du --apparent-size -B 1To change the cluster size, you usually need to recreate the file system (reformat).
Hidden Files
This is quite familiar to everyone. In Unix design, files beginning with a dot (.) are hidden files. This is because a single dot (.) represents the current directory, and double dots (..) represent the parent directory. Some applications wanted to hide these special directories, so they adopted the simple method of checking if the first character is a dot. This also led to other directories starting with a dot being hidden as well. We have continued this tradition.
In file managers, pressing Ctrl+H usually toggles the visibility of hidden files. Graphical applications often create hidden files or directories to store configuration or temporary files. However, it is now more recommended to store them in .local or .config directories to avoid clutter.
In Windows, hidden files are not related to file names. Any file can be a hidden file, a feature of the NTFS file system. Therefore, when Windows reads Unix-like file systems or vice versa, hidden files may become visible.
In GNU/Linux systems, some applications store cached data in hidden directories. Regular cleaning can save some hard disk space. On Windows, application data is usually stored in the folder specified by the %AppData% variable, but some unprivileged applications may use this path as the installation directory.
If you need to delete hidden files, identify their contents carefully. Deleting some files may cause applications to lose their configuration. Although these files can be regenerated, configurations are important assets for many users.
GNU/Linux offers many intuitive disk usage analysis tools, such as GNOME's baobab, which can help you see which directories occupy the most space.
Virtual Memory
This issue is more common on Windows. By default, some space is automatically allocated on the system disk for virtual memory, and users can manually adjust the settings to specify how much disk space a partition should use for virtual memory. This will occupy some disk space (even if all files are deleted, some disk usage can still be seen).
In addition to virtual memory, enabling Windows hibernation also occupies some hard disk space (up to the size of the RAM). You can toggle hibernation on and off with the following commands run as an administrator in cmd or PowerShell:
powercfg -H ONpowercfg -H OFFOn Unix-like systems, we call a similar technology swap, which can use a swap partition or swap file. Swap also stores hibernation data, so a separate hibernation file is not needed. Since swap is usually user-configured and swap partitions are more common, adjusting hard disk pressure via virtual memory settings is mostly useful on Windows.
You can adjust the mounting scheme of Unix-like systems by modifying /etc/fstab (which also allows changing swap settings). Some graphical partition editors, like GNOME Disks, can help users with these configurations.
Lost Files
Sometimes, file system or hardware errors can corrupt the file system. One situation is where the file index is lost, but the file data remains, making the files invisible to any file manager while still occupying hard disk space. Besides recreating the file system (reformatting), file system check tools can be used to recover these files.
On Unix-like systems, we use the fsck command to check and repair file system issues. Recovered lost files are placed in a directory named lost+found. This directory is more like a file system feature than a regular directory, and if deleted, it needs to be recreated with the mklost+found command.
For Microsoft file systems like NTFS and exFAT, GNU/Linux tools may not handle all situations. If you have Windows installed, you can use the chkdsk command to repair it. Recovered files will be placed in a FOUND.000 directory. However, in some versions of Windows, this directory is still invisible in the file explorer (even if hidden files are shown). Commands can be used to list and operate on these files, like:
dir /A HOr use the built-in disk cleanup tool to "delete old chkdsk files".
Note that file system checks are risky commands. Back up important data in advance, and ensure you have enough time and power to complete the operation (some hard disks may take several hours). Immature repairs may worsen the situation. Read the instructions carefully and proceed with caution before executing these commands.
Power outages, improper shutdowns, and disconnecting storage devices without safely ejecting them (modern systems use write caching, so after clicking safely eject, wait for the system to confirm before unplugging) can all cause file loss. However, the ability to recover files varies depending on the file system design. Some file systems automatically check for issues, while others require manual commands to reclaim hard disk space.
Conversion Factors
Although this is common knowledge, I still get confused sometimes. The lost hard disk space is often a false alarm. We use two types of digital units: one based on binary (1KB = 2^10B = 1024B) and one based on decimal (1KB = 10^3B = 1000B). To distinguish them, we now use KiB, MiB, GiB for binary-based units, but mixed usage is still common. When calculating hard disk space with various programs, pay attention to whether the unit used is 1024 or 1000. Especially since 1TiB is nearly 100GB more than 1TB, such a "discrepancy" cannot be ignored.