Files on a computer are created and placed on the basis of system principles. Thanks to their implementation, the user gets the opportunity to comfortably access the necessary information without thinking about complex algorithms for accessing it. How are file systems organized? Which ones are the most popular today? What are the differences between PC-adapted file systems? And those that are used in mobile devices - smartphones or tablets?

File systems: definition

According to a common definition, a file system is a set of algorithms and standards used to organize effective access for a PC user to data located on a computer. Some experts consider it to be part of it. Other IT experts, recognizing the fact that it is directly related to the OS, believe that the file system is an independent component of computer data management.

How were computers used before the file system was invented? Computer science - as a scientific discipline - has recorded the fact that for a long time data management was carried out through structuring within the framework of algorithms embedded in specific programs. Thus, one of the criteria for a file system is the presence of standards that are the same for most programs that use data access.

How file systems work

The file system is, first of all, a mechanism that involves the use of computer hardware resources. As a rule, we are talking about magnetic or laser media - hard drives, CDs, DVDs, flash drives, floppy disks that have not yet become obsolete. In order to understand how the corresponding system works, let's define what the actual file itself is.

According to the definition generally accepted among IT experts, this is a data area of ​​a fixed size, expressed in basic units of information - bytes. The file is located on disk media, usually in the form of several interconnected blocks with a specific "address" of access. The file system determines these very coordinates and "reports" them, in turn, to the OS. Which in an understandable way broadcasts the relevant data to the user. There is an appeal to data in order to read them, modify them, create new ones. The specific algorithm for working with the "coordinates" of files can be different. It depends on the type of computer, OS, the specifics of the stored data, and other conditions. Because there is different kinds file systems. Each of them is optimized for use in a specific OS or for working with certain types of data.

Adapting disk media for use through the algorithms of a particular file system is called formatting. The corresponding hardware elements of the disk - clusters - are prepared for the subsequent writing of files to them, as well as for reading them in accordance with the standards laid down in one or another data management system. How to change the file system? In most cases, this can only be done by reformatting the storage medium. As a rule, the files are erased in this case. However, there is an option in which, using special programs, it is still possible, although this usually requires a lot of time, to change the data management system, leaving the latter intact.

File systems are not without errors. There may be some failures in the organization of work with data blocks. But in most cases they are not critical. As a rule, there are no problems with how to fix the file system, fix errors. In Windows OS, for this, in particular, there are built-in software solutions available to any user. Such as, for example, the program "Check disk".

Varieties

What types of file systems can be called the most common? Probably primarily those used by the world's most popular PC OS, Windows. The main Windows file systems are FAT, FAT32, NTFS and their various modifications. Along with computers, smartphones and tablets have gained popularity. Most of them, if we talk about the global market and do not consider differences in technology platforms, are controlled by Android and iOS. These operating systems use their own algorithms for working with data, which are different from those that characterize Windows file systems.

Standards open to all

Note that recently in the world electronics market there has been some unification of standards in terms of operating systems with different types of data. This can be seen in two aspects. First, on different devices Ax running two dissimilar types of OS often uses the same file system, which is equally compatible with each OS. Secondly, modern versions of the OS, as a rule, are able to recognize not only typical file systems, but also those that are traditionally used in other OS - both through built-in algorithms and with the help of third-party software. For example, modern versions of Linux generally recognize marked filesystems for Windows without problems.

File system structure

Despite the fact that the types of file systems are presented in a fairly large number, they generally work according to very similar principles (we outlined the general scheme above) and within similar structural elements or objects. Let's consider them. What are the main objects of a file system?

One of the key ones - It is an isolated data area in which files can be placed. The directory structure is hierarchical. What does it mean? One or more directories can be hosted in another. Which, in turn, is part of the "superior". The most "main" is the root directory. If we talk about the principles on the basis of which the Windows file system works - 7, 8, XP or another version, the root directory is considered to be a logical drive, denoted by a letter - usually C, D, E (but you can configure any that is in English alphabet). As for, for example, Linux OS, the magnetic media as a whole acts as the root directory there. This and other operating systems based on its principles - such as Android - do not use logical drives. Is it possible to store files without directories? Yes. But this is not very convenient. Actually, the comfort in using a PC is one of the reasons for introducing the principle of distributing data into directories in file systems. By the way, they can be called differently. On Windows, directories are called folders, on Linux they are basically the same. But the traditional name used for many years for directories in this OS is "directories". As in previous Windows and Linux OS - DOS, Unix.

There is no unequivocal opinion among IT professionals as to whether a file should be considered a structural element of the corresponding system. Those who believe that this is not entirely correct argue their point of view by saying that the system may well exist without files. Let it be from a practical point of view and useless phenomenon. Even if no files are written to the disk, the corresponding system may still be present. As a rule, magnetic media sold in stores do not contain any files. But they already have a corresponding system. According to another point of view, files should be considered an integral part of the systems that manage them. Why? But because, according to experts, the algorithms for using them are adapted primarily to work with files within the framework of certain standards. For anything else, the systems in question are not intended.

Another element present in most file systems - It is a data area containing information about the placement of a particular file in a particular location. That is, you can place a shortcut in one place on the disk, but it is possible to provide access to the desired data area, which is located in another part of the media. It is possible to consider that shortcuts are full-fledged objects of the file system, if we agree that files are also such.

One way or another, it will not be a mistake to say that all three types of data - files, shortcuts and directories - are elements of their respective systems. At least this thesis will correspond to one of the common points of view. The most important aspect that characterizes how the file system works is the principles of naming files and directories.

File and directory names on different systems

If we agree that files are still constituent elements of the systems corresponding to them, then it is worth considering their basic structure. What can be noted first of all? For the convenience of organizing access to them, most modern data management systems provide a two-level file naming structure. The first level is the title. The second is expansion. Let's take the Dance.mp3 music file as an example. Dance is the name. Mp3 is an extension. The first is intended to reveal to the user the essence of the contents of the file (and for the program to be a guide for quick access). The second designates the file type. If it's Mp3, then it's easy to guess that it's about music. Files with the Doc extension are, as a rule, documents, Jpg - pictures, Html - web pages.

Directories, in turn, have a single-level structure. They only have a name, no extension. If we talk about the differences between different types of data management systems, then the first thing you should pay attention to is the principles of naming files and directories that are implemented in them. Regarding Windows OS, the specifics are as follows. In the world's most popular operating system, files can be named in any language. The maximum length, however, is limited. The exact interval depends on the data management system used. Usually these are values ​​in the range of 200-260 characters.

The general rule for all operating systems and their corresponding data management systems is that files with the same names cannot be located in the same directory. In Linux, however, there is a certain "liberalization" of this rule. In the same directory, there can be files with the same letters, but in different case. For example, Dance.mp3 and DANCE.mp3. This is not possible on Windows OS. The same rules are also established in the aspect of placing directories inside others.

Addressing files and directories

Addressing files and directories is the most important element of the corresponding system. On Windows, its custom format might look like this: C:/Documents/Music/ is access to the Music directory. If we are interested in a particular file, then the address may look like this: C:/Documents/Music/Dance.mp3. Why "custom"? The fact is that at the level of software-hardware interaction of computer components, the structure of access to files is much more complex. The file system determines the location of file blocks and interacts with the OS for the most part in operations hidden from the user. However, a PC user rarely needs to use other "address" formats. Almost always, access to files is carried out in the specified standard.

Comparison of file systems for Windows

We have studied the general principles of the functioning of file systems. Consider now the features of their most common types. The most commonly used file systems on Windows are FAT, FAT32, NTFS, and exFAT. The first in this series is considered obsolete. At the same time, it was for a long time a kind of industry flagship, but as the technology of the PC grew, its capabilities ceased to satisfy user requests and software resource requirements.

The file system designed to replace FAT is FAT32. According to many IT experts, now it is the most popular, if we talk about the PC market under Windows control. It is most often used when storing files on hard drives and flash drives. It can also be noted that this data management system is used quite regularly in the memory modules of various digital devices - telephones, cameras. The main advantage of FAT32, which is highlighted by IT experts, is that despite the fact that this file system was created by Microsoft, most modern operating systems, including those installed on these types of digital equipment, can work with data within the algorithms embedded in it.

The FAT32 system also has a number of disadvantages. First of all, we can note the limitation on the size of one taken file - it cannot be more than 4 GB. Also, in a FAT32 system, built-in Windows tools specify a logical drive that is larger than 32 GB. But this can be done by installing additional specialized software.

Another popular file management system developed by Microsoft is NTFS. According to some IT experts, it is superior to FAT32 in most parameters. But this thesis is true when it comes to the operation of a computer running Windows. NTFS is not as versatile as FAT32. Features of its functioning make the use of this file system not always comfortable, in particular, in mobile devices. One of the key advantages of NFTS is reliability. For example, in cases where hard drive If the power goes out suddenly, the chance of files getting corrupted is minimized thanks to NTFS's data access duplication algorithms.

One of the latest file systems from Microsoft is exFAT. It is best adapted for flash drives. The basic principles of work in it are the same as in FAT32, but there is also a significant upgrade in some aspects: for example, there are no restrictions on the size of a single file. At the same time, the exFAT system, as noted by many IT experts, is among those that have low versatility. On computers running operating systems other than Windows, working with files when using exFAT can be difficult. Moreover, even in some versions of Windows itself, such as XP, data on disks formatted with exFAT algorithms may not be readable. You will need to install an additional driver.

Note that due to the use of a fairly wide range of file systems in Windows, the user may experience periodic difficulties in terms of the compatibility of various devices with a computer. In some cases, for example, you need to install the WPD file system driver (Windows Portable Devices - technology used when working with portable devices). Sometimes the user may not have it at hand, as a result of which the OS may not recognize the external media. The WPD file system may require additional software to adapt to the operating environment on a particular computer. In some cases, the user will be forced to contact IT specialists to solve the problem.

How to determine which file system - exFAT or NTFS, or maybe FAT32 - is optimal for use in specific cases? The recommendations of IT specialists in general are as follows. Two main approaches can be used. According to the first one should distinguish between typical file systems hard drives, as well as those that are better adapted to flash drives. FAT and FAT32, according to many experts, are better suited for "flash drives", NTFS - for hard drives (due to the technological features of working with data).

In the framework of the second approach, the size of the carrier matters. If we are talking about using a relatively small volume of a disk or flash drive, you can format them in the FAT32 system. If the disk is larger, then you can try exFAT. But only if you do not intend to use media on other computers, especially those with not the latest versions of Windows. If we are talking about large hard drives, including external ones, then it is advisable to format them in NTFS. Approximately these are the criteria by which the optimal file system can be selected - exFAT or NTFS, FAT32. That is, one of them should be used, taking into account the size of the media, its type, as well as the version of the OS on which the drive is mainly used.

File systems for Mac

Another popular hardware and software platform in the global computer technology market is Apple's Macintosh. PCs of this line are running operating system MacOS. What are the features of file organization in Mac computers? Apple's most modern PCs use the Mac OS Extended file system. Earlier in computers Mac work data was managed in accordance with HFS standards.

The main thing that can be noted in terms of its characteristics is that the disk managed by the Mac OS Extended file system can contain very large files - we can talk about several million terabytes.

File system in Android devices

The most popular operating system for mobile devices - a form of electronic technology that is not inferior in popularity to the PC - is Android. How are files managed on devices of the corresponding type? First of all, we note that this operating system is actually a "mobile" adaptation of the Linux OS, which, thanks to the open source code, can be modified with the prospect of using it on the widest range of devices. Therefore, file management in mobile devices under Android control carried out in general according to the same principles as in Linux. We have noted some of them above. In particular, file management in Linux is carried out without dividing the media into logical drives, as is the case in Windows. What else is interesting in the file android system?

The root directory in Android is typically a data area called /mnt. Accordingly, the address of the desired file may look something like this: /mnt/sd/photo.jpg. In addition, there is another feature of the data management system that is implemented in this mobile OS. The fact is that the flash memory of the device is usually classified into several sections, such as, for example, System or Data. At the same time, the initially set size of each of them cannot be changed. An approximate analogy regarding this technological aspect can be found by remembering that it is impossible (unless you use special software) to change the size of logical drives in Windows. It must be fixed.

One more interesting feature organization of work with files in Android - the corresponding operating system, as a rule, writes new data to a specific area of ​​\u200b\u200bthe disk - Data. Work, for example, with the System section is not carried out. So when the user activates the reset function software settings smartphone or tablet to the "factory" level, then in practice this means that those files that are written to the Data area are simply erased. The System section, as a rule, remains unchanged. Moreover, the user, without having specialized software, cannot make any adjustments to the content in the System. The procedure involved in updating the media system area in an Android device is called flashing. This is not formatting, although both operations are often performed at the same time. As a rule, flashing is used to install on mobile device more new version Android OS.

Thus, the key principles on the basis of which the Android file system works are the absence of logical drives, as well as strict differentiation of access to system and user data. It cannot be said that this approach is fundamentally different from what is implemented in Windows, however, according to many IT experts, Microsoft's OS provides users with somewhat greater freedom in working with files. However, as some experts believe, this cannot be considered a clear advantage of Windows. The "liberal" mode in terms of file management is involved, of course, not only users, but also computer viruses, to which Windows is very susceptible (unlike Linux and its "mobile" implementation in the form of Android). This, according to experts, is one of the reasons why there are so few viruses for Android devices - from a purely technological point of view, they cannot fully function in an operating environment that works on the principles of strict file access control.

The ability of the OS to "shield" the complexities of real hardware is very clearly manifested in one of the main subsystems of the OS - file system. The operating system virtualizes a separate set of data stored on an external drive as a file - a simple unstructured sequence of bytes that has a symbolic name. For convenience of working with data, files are grouped into catalogs, which, in turn, form groups - directories of a higher level. The user can use the OS to perform actions on files and directories such as searching by name, deleting, displaying content on an external device (for example, on a display), changing and saving content.

To represent a large number of datasets scattered randomly across cylinders and disk surfaces various types, in the form of a well-known and convenient hierarchical structure of files and directories, the operating system must solve many problems. The OS file system converts the symbolic names of files that the user or application programmer works with into physical data addresses on the disk, organizes shared access to files, and protects them from unauthorized access.

When performing its functions, the file system closely interacts with the external device management subsystem, which, at the request of the file system, transfers data between disks and RAM.

The external device control subsystem, also called the input-output subsystem, acts as an interface to all devices connected to the computer. The range of these devices is very extensive. The product range of hard drives, floppy drives, optical drives, printers, scanners, monitors, plotters, modems, network adapters, and more specialized I/O devices such as A/D converters can run into hundreds of models. These models can differ significantly in the set and sequence of commands used to exchange information with the processor and memory of the computer, the speed of operation, the encoding of the transmitted data, the possibility of sharing, and many other details.

The program that controls specific model external device and taking into account all its features, is usually called driver this device (from English drive - to manage, lead). A driver can control a single device model, such as ZyXEL's U-1496E modem, or a group of devices of a particular type, such as any Hayes-compatible modem. It is very important for the user that the operating system includes as many different drivers as possible, as this guarantees the ability to connect a large number of external devices from various manufacturers to the computer. The success of the operating system in the market largely depends on the availability of suitable drivers (for example, the lack of many necessary external device drivers was one of the reasons for the low popularity of OS / 2).



The creation of device drivers is carried out both by the developers of a particular OS and by specialists from companies that produce external devices. The operating system must maintain a well-defined interface between the drivers and the rest of the OS so that developers from I/O device companies can ship drivers for that operating system with their devices.

Application programmers can use the driver interface when developing their programs, but this is not very convenient - such an interface is usually low-level operations burdened with a lot of details.

Maintaining a high-level unified application programming interface to heterogeneous I / O devices is one of the most important tasks of the OS. Since the advent of UNIX, this unified interface in most operating systems has been based on the concept of file access. This concept is that the exchange with any external device looks like an exchange with a file that has a name and is an unstructured sequence of bytes. The file can be either a real file on a disk or an alphanumeric terminal, printer, or network adapter. Here again we are dealing with the property of the operating system to replace real hardware with user-friendly and programmer-friendly abstractions.

OS tasks for managing files and devices

The Input-Output Subsystem of a multiprogram OS, when exchanging data with external devices of a computer, must solve a number of general tasks, of which the most important are the following:

Organization of parallel operation of input-output devices and processor;

Coordination of exchange rates and data caching;

Separation of devices and data between processes;

Providing a convenient logical interface between devices and the rest of the system;

Support for a wide range of drivers with the ability to easily include a new driver in the system;

Support for multiple file systems;

Support for synchronous and asynchronous I/O operations.

One of the main tasks of the operating system is to provide convenience to the user when working with data stored on disks. To do this, the OS replaces the physical structure of the stored data with some user-friendly logical model. Logical file system model materializes in the form directory tree, displayed by utilities such as Norton Commander or Windows Explorer, in symbolic compound file names, in file commands. The basic element of this model is file, which, like the file system as a whole, can be characterized by both logical and physical structure.

File is a named area of ​​external memory that can be written to and read from. Files are stored in power-dependent memory, usually on magnetic disks. However, there are no rules without exception. One such exception is the so-called ramdisk, when a structure is created in RAM that mimics the file system.

The main purposes of using the file:

Long-term and reliable storage of information. Longevity is achieved through the use of storage devices that do not depend on power, and high reliability is determined by the means of protecting access to files and the general organization of the OS program code, in which hardware failures most often do not destroy information stored in files.

Sharing information. Files provide a natural and easy way to share information between applications and users by having a human-readable symbolic name and persistence of the information stored and the location of the file. The user must have convenient tools for working with files, including directory directories that combine files into groups, tools for searching files by features, a set of commands for creating, modifying and deleting files. A file can be created by one user and then used by a completely different user, and the file creator or administrator can define other users' access rights to it. These goals are implemented in the OS by the file system.

File system(FS) is a part of the operating system, including:

The collection of all files on a disk;

Sets of data structures used to manage files, such as file directories, file descriptors, free and used disk space allocation tables;

A set of system software tools that implement various operations on files, such as creating, deleting, reading, writing, naming and searching for files.

The file system allows programs to get by with a set of fairly simple operations to perform actions on some abstract object that represents a file. In doing so, programmers do not have to deal with the details of the actual location of data on disk, data buffering, and other low-level problems of transferring data from long-term storage. All these functions are performed by the file system. The file system allocates disk space, supports file naming, maps file names to corresponding addresses in external memory, provides access to data, and supports file sharing, protection, and recovery.

Thus, the file system plays the role of an intermediate layer that shields all the complexities of the physical organization of long-term data storage, and provides programs with a simpler logical model of this storage, as well as providing them with a set of easy-to-use commands for manipulating files.

The tasks solved by the FS depend on the way the computing process is organized as a whole. The simplest type is the FS in single-user and single-program operating systems, which include, for example, MS-DOS. The main functions in such a FS are aimed at solving the following tasks:

File naming;

Programming interface for applications;

Mapping the logical model of the file system to the physical organization of the data warehouse;

File system resilience to power failures, hardware and software errors.

FS tasks become more complicated in operating single-user multiprogram operating systems, which, although designed for the work of one user, give him the opportunity to run several processes simultaneously. One of the first operating systems of this type was OS/2. In addition to the tasks listed above, a new multi-process file sharing task is added. The file in this case is a shared resource, which means that the file system must solve the whole complex of problems associated with such resources. In particular, the FS should provide means for blocking the file and its parts, preventing races, eliminating deadlocks, coordinating copies, etc.

In multi-user systems, another task appears: protecting the files of one user from unauthorized access by another user. Even more complex are the functions of a file system that operates as part of a network operating system.

File systems support several functionally different file types, which typically includes regular files, directory files, special files, named pipes, memory-mapped files, and others.

regular files, or simply files, contain information of an arbitrary nature that the user enters into them or that is formed as a result of the operation of system and user programs. Most modern operating systems (eg UNIX, Windows, OS/2) do not restrict or control the contents and structure of a regular file in any way. The content of a regular file is determined by the application that works with it. For example, text editor creates text files consisting of strings of characters represented in some code. These can be documents, program source codes, etc. Text files can be read on the screen and printed on the printer. Binary files do not use character codes, they often have a complex internal structure, such as an executable program code or an archive file. All operating systems should be able to recognize at least one type of file - their own executable files.

Catalogs- this is a special type of files that contain system reference information about a set of files grouped by users according to some informal feature (for example, files containing documents of one agreement or files that make up one software package are combined into one group). On many operating systems, a directory can contain any type of file, including other directories, resulting in a tree structure that is easy to find. Directories map filenames to characteristics used by the file system to manage files. Such characteristics include, in particular, information (or a pointer to another structure containing this data) about the type of file and its location on disk, access rights to the file, and the dates of its creation and modification. In all other respects, directories are treated like normal files by the file system.

Special Files are dummy files associated with I/O devices that are used to unify the mechanism for accessing files and external devices. Special files allow the user to perform I/O operations through normal file write or file read commands. These commands are first processed by the file system programs, and then, at some stage of the request, are converted by the operating system into commands to control the corresponding device.

Modern file systems also support other types of files, such as symbolic links, named pipes, and memory-mapped files.

Users access files by symbolic names. However, the capacity of human memory limits the number of object names that a user can refer to by name. The hierarchical organization of the namespace allows you to significantly expand these boundaries. This is why most file systems have a hierarchical structure in which levels are created by allowing a lower-level directory to be contained within a higher-level directory (Figure 2.16).

Figure 2.16. Hierarchy of file systems (a - single-level structure, b - tree structure, c - network structure)

A graph describing a directory hierarchy can be a tree or a network. Directories form a tree if the file is allowed to enter only one directory (Figure 2.16, b), and a network - if the file can enter several directories at once (Figure 2.16, c). For example, in MS-DOS and Windows, directories form a tree structure, while in UNIX they form a network structure. In a tree structure, each file is a leaf. The top level directory is called root directory, or the root.

With such an organization, the user is freed from remembering the names of all files, it is enough for him to roughly imagine to which group this or that file can be assigned in order to find it by sequential browsing of directories. The hierarchical structure is convenient for multi-user work: each user with his files is located in his own directory or subtree of directories, and at the same time all files in the system are logically connected.

A special case of a hierarchical structure is a single-level organization, when all files are included in one directory (Figure 2.16, a).

All file types have symbolic names. Three types of filenames are commonly used in hierarchically organized file systems: simple, compound, and relative.

Simple, or short, symbolic name identifies a file within the same directory. Simple names are assigned to files by users and programmers, while they must take into account OS restrictions on both the nomenclature of characters and the length of the name. Until relatively recently, these boundaries were very narrow. So, in the popular FAT file system, the length of names was limited by the 8.3 scheme (8 characters - the name itself, 3 characters - the name extension), and in the s5 file system, supported by many versions of UNIX OS, a simple symbolic name could not contain more than 14 characters. However, it is much more convenient for the user to work with long names, since they allow you to give files easy-to-remember names that clearly say what is contained in this file. Therefore, modern file systems, as well as improved versions of existing file systems, tend to support long, simple symbolic file names. For example, in the NTFS and FAT32 file systems included with the Windows NT operating system, a file name can be up to 255 characters long.

In hierarchical file systems, different files are allowed to have the same simple symbolic name, provided they belong to different directories. That is, the “many files - one simple name” scheme works here. To uniquely identify a file in such systems, the so-called full name is used.

Full name is a chain of simple symbolic names of all directories through which the path from the root to the given file passes. Thus, the full name is a compound name, in which simple names are separated from each other by the delimiter accepted in the OS. Often a forward or backslash is used as a separator, and it is customary to omit the name of the root directory. In Figure 2.16b, two files have the simple name main.exe, but their compound names /depart/main.exe and /user/anna/main.exe are different.

In a tree file system, there is a one-to-one correspondence "one file - one full name" between a file and its full name. In file systems that have a network structure, a file can be included in several directories, and therefore have several full names; here the correspondence "one file - many full names" is valid. In both cases, the file is uniquely identified by its full name.

A file can also be identified by a relative name . Relative name file is defined through the concept of "current directory". For each user at each moment of time, one of the directories of the file system is current, and this directory is selected by the user himself at the command of the OS. The file system fixes the name of the current directory so that it can be used in addition to relative names to form a full filename. When using relative names, the user identifies a file by a chain of directory names through which the route passes from the current directory to the given file. For example, if the current directory is /user, then the relative filename of /user/anna/main.exe is anna/main.exe.

Some operating systems allow you to give the same file multiple simple names that can be interpreted as aliases. In this case, just as in a system with a network structure, the correspondence is "one file - many full names", since each simple file name corresponds to at least one full name.

Although a fully qualified name uniquely identifies a file, it is easier for an operating system to work with a file if there is a one-to-one correspondence between files and their names. To this end, it assigns a unique name to the file, so that the relationship "one file - one unique name" is valid. The unique name exists along with one or more symbolic names assigned to the file by users or applications. The unique name is a numeric identifier and is intended only for the operating system. An example of such a unique filename is the inode number on a UNIX system.

The concept of "file" includes not only the data and name stored by it, but also the attributes. Attributes is information that describes the properties of a file. Examples of possible file attributes:

File type (regular file, directory, special file, etc.);

File owner;

File creator;

Password to access the file;

Information about allowed file access operations;

Times of creation, last access and last modification;

Current file size;

Maximum file size;

Read-only flag;

Sign "hidden file";

Feature " system file»;

Sign "archive file";

Sign "binary/character";

Sign "temporary" (delete after the process is completed);

Sign of blocking;

The length of the record in the file;

A pointer to a key field in the entry;

Key length.

The set of file attributes is determined by the specifics of the file system: in file systems of different types, different sets of attributes can be used to characterize files. For example, in file systems that support flat files, it is not necessary to use the last three attributes in the above list related to file structuring. In a single user OS, the set of attributes will lack user and security related characteristics such as file owner, file creator, file access password, file access permission information.

The user can access the attributes using the means provided for this purpose by the file system. It is usually allowed to read the values ​​of any attribute, but only some of them can be changed. For example, a user can change the permissions on a file (provided they have the necessary permissions to do so), but they are not allowed to change the creation date or the current size of the file.

File attribute values ​​can be directly contained in directories, as is done in the MS-DOS file system (Figure 2.17, a). The figure shows the structure of a directory entry containing a simple symbolic name and file attributes. Here, the letters indicate the characteristics of the file: R - read-only, A - archived, H - hidden, S - system.

Figure 2.17. Directory structure: a - MS-DOS directory entry structure (32 bytes), b - UNIX OS directory entry structure

Another option is to place the attributes in special tables when the directories contain only links to those tables. This approach is implemented, for example, in the UNIX ufs file system. In this file system, the directory structure is very simple. The record about each file contains a short symbolic file name and a pointer to the file's inode descriptor, as the table in ufs is called, in which the values ​​of the file's attributes are concentrated (Figure 2.17, b).

In either case, directories provide a link between filenames and the actual files. However, the approach, when the file name is separated from its attributes, makes the system more flexible. For example, a file can easily be included in multiple directories at once. Entries about this file in different directories may contain different simple names, but the link field will contain the same inode number.

The user's view of the file system as a hierarchically organized set of information objects has little to do with the order in which files are stored on disk. A file that has an image of a single, continuous set of bytes is in fact very often scattered in “pieces” throughout the disk, and this splitting has nothing to do with the logical structure of the file, for example, its separate logical record can be located in non-adjacent sectors of the disk. Logically combined files from the same directory are not required to coexist on the disk at all. The principles of placing files, directories, and system information on a real device are described by the physical organization of the file system. Obviously, different file systems have different physical organization.

The main type of device used in modern computing systems for storing files are disk drives. These devices are designed to read and write data to hard and floppy disks. A hard drive consists of one or more glass or metal plates, each of which is coated on one or both sides with a magnetic material. Thus, the disk generally consists of a pack of plates (Figure 2.18).

On each side of each plate, thin concentric rings are marked - tracks(traks) on which the data is stored. The number of tracks depends on the disc type. Track numbering starts at 0 from the outer edge to the center of the disc. As the disk spins, an element called a head reads binary data from the magnetic track or writes it to the magnetic track.

Figure 2.18. Diagram of the hard disk device

The head can be positioned over a given track. The heads move over the disc surface in discrete steps, each step corresponding to a shift of one track. Recording to a disc is accomplished by the ability of the head to change the magnetic properties of the track. Some discs have one head moving along each surface, while others have one head per track. In the first case, to search for information, the head must move along the radius of the disk. Usually all heads are fixed on a single moving mechanism and move synchronously. Therefore, when the head is fixed on a given track of one surface, all other heads stop over the tracks with the same numbers. In cases where each track has a separate head, no movement of heads from one track to another is required, thereby saving time spent searching for data.

The set of tracks of the same radius on all surfaces of all plates of the package is called cylinder(cylinder). Each track is divided into sections called sectors(sectors), or blocks (blocks), so that all tracks have an equal number of sectors, in which the maximum number of bytes can be written. The sector has a fixed size for a particular system, expressed as a power of two. The most common sector size is 512 bytes. Considering that tracks of different radii have the same number of sectors, the recording density becomes higher, the closer the track is to the center.

Sector- the smallest addressable unit of data exchange between a disk device and RAM. In order for the controller to find the required sector on the disk, it is necessary to set all components of the sector address to it: cylinder number, surface number and sector number. Since the application program generally does not need a sector, but a certain number of bytes, not necessarily a multiple of the sector size, a typical request includes reading several sectors containing the required information, and one or two sectors containing, along with the required, redundant data (Figure 2.19) .

Figure 2.19. Reading redundant data when exchanging with disk

The operating system, when working with a disk, usually uses its own unit of disk space, called cluster(cluster). When a file is created, disk space is allocated to it in clusters. For example, if a file has a size of 2560 bytes, and the cluster size in the file system is defined as 1024 bytes, then the file will be allocated 3 clusters on disk.

Tracks and sectors are created as a result of performing a physical, or low-level, disk formatting procedure prior to using the disk. To determine the block boundaries, identification information is written to the disk. The low-level disk format does not depend on the type of operating system that this disk will use.

Disk partitioning for a specific type of file system is performed by high-level, or logical, formatting procedures.

With high-level formatting, the size of the cluster is determined and the information necessary for the operation of the file system is written to the disk, including information about the available and unused space, the boundaries of the areas allocated for files and directories, and information about damaged areas. In addition, the operating system loader is written to the disk - a small program that starts the process of initializing the operating system after turning on the power or restarting the computer.

Before formatting a disk for a specific file system, it can be partitioned. Chapter is a contiguous part of a physical disk that the operating system presents to the user as a logical device (the names logical disk and logical partition are also used). The logical unit functions as if it were a separate physical disk. It is with logical devices that the user works, referring to them by symbolic names, using, for example, the designations A, B, C, SYS, etc. Operating systems of various types use a common representation of partitions for all of them, but create logical partitions based on it. devices specific to each OS type. Just as a file system operated by one OS cannot in general be interpreted by another type of OS, logical devices cannot be used by operating systems of different types. Only one file system can be created per logical device.

Why can't a smartphone run programs from a memory card? How is ext4 fundamentally different from ext3? Why does a flash drive last longer if it is formatted in NTFS and not in FAT? What is the main problem with F2FS? The answers lie in the structure of file systems. We will talk about them.

Introduction

File systems define how data is stored. They determine what limitations the user will encounter, how fast read and write operations will be, and how long the drive will work without failure. This is especially true for budget SSDs and their younger brothers - flash drives. Knowing these features, you can get the most out of any system and optimize its use for specific tasks.

You have to choose the type and parameters of the file system whenever you need to do something non-trivial. For example, you want to speed up the most frequent file operations. At the filesystem level, this can be achieved in a variety of ways: indexing will provide fast lookups, and pre-reservation of free blocks will make it easier to overwrite frequently changing files. Pre-optimizing data in RAM will reduce the amount of I/O required.

Such properties of modern file systems as lazy writing, deduplication and other advanced algorithms help to increase the uptime of operation. They are especially relevant for cheap SSDs with TLC memory chips, flash drives and memory cards.

Separate optimizations exist for disk arrays of different levels: for example, the file system can support simplified volume mirroring, instant snapshots, or dynamic scaling without disabling the volume.

Black box

Users mainly work with the file system that is offered by default by the operating system. They rarely create new disk partitions and even less often think about their settings - just use the recommended settings or even buy pre-formatted media.

For fans of Windows, everything is simple: NTFS on all disk partitions and FAT32 (or the same NTFS) on flash drives. If there is a NAS and some other file system is used in it, then for the majority this remains beyond perception. They simply connect to it over the network and download files, as if from a black box.

On mobile gadgets with Android, ext4 is most often found in internal memory and FAT32 on microSD cards. Yabloko does not care at all what kind of file system they have: HFS +, HFSX, APFS, WTFS ... for them there are only beautiful folder and file icons drawn by the best designers. Linux users have the richest choice, but you can tie support for non-native OS file systems in both Windows and macOS - more on that later.

common roots

Over a hundred different file systems have been created, but a little more than a dozen can be called relevant. Although they were all designed for their specific applications, many ended up being related on a conceptual level. They are similar because they use the same type of (meta)data representation structure - B-trees ("b-trees").

Like any hierarchical system, the B-tree starts with a root entry and then branches down to the final elements - individual entries about files and their attributes, or "leaves". The main reason for creating such a logical structure was to speed up the search for file system objects on large dynamic arrays - like hard drives with a capacity of several terabytes or even more impressive RAID arrays.

B-trees require far fewer disk accesses than other types of balanced trees to perform the same operations. This is achieved due to the fact that the final objects in B-trees are hierarchically located at the same height, and the speed of all operations is just proportional to the height of the tree.

Like other balanced trees, B-trees have the same length of paths from the root to any leaf. Instead of growing up, they branch more and grow wider: all branch points in a B-tree store many references to their child objects, making them easy to find in fewer hits. A large number of pointers reduces the number of longest disk operations - head positioning when reading arbitrary blocks.

The concept of B-trees was formulated back in the seventies and has since been subject to various improvements. In one form or another, it is implemented in NTFS, BFS, XFS, JFS, ReiserFS, and a variety of DBMSs. All of them are relatives in terms of the basic principles of data organization. The differences relate to details, often quite important ones. The disadvantage of related file systems is also common: they were all created to work with disks even before the advent of SSDs.

Flash memory as an engine of progress

Solid-state drives are gradually replacing disk drives, but so far they are forced to use legacy file systems that are alien to them. They are built on flash memory arrays, the principles of which differ from those of disk devices. In particular, flash memory must be erased before being written to, and this operation in NAND chips cannot be performed at the individual cell level. It is possible only for large blocks as a whole.

This limitation is due to the fact that in NAND memory all cells are combined into blocks, each of which has only one common connection to the control bus. We will not go into the details of the page organization and paint the full hierarchy. What is important is the principle of group operations with cells and the fact that the sizes of flash memory blocks are usually larger than the blocks addressed in any file system. Therefore, all addresses and commands for drives with NAND flash must be translated through the FTL (Flash Translation Layer) abstraction layer.

Flash memory controllers provide compatibility with the logic of disk devices and support for their native interface commands. Usually FTL is implemented in their firmware, but it can (partially) be executed on the host - for example, Plextor writes write-speeding drivers for its SSDs.

You cannot do without FTL at all, since even writing one bit to a specific cell leads to the launch of a whole series of operations: the controller finds the block containing the desired cell; the block is read in full, written to the cache or to free space, then erased in its entirety, after which it is overwritten back with the necessary changes.

This approach is reminiscent of army everyday life: in order to give an order to one soldier, the sergeant makes a general formation, calls the poor fellow out of action and orders the rest to disperse. In the now rare NOR memory, the organization was spetsnaz: each cell was controlled independently (each transistor had an individual contact).

Controllers have more and more tasks, because with each generation of flash memory, the manufacturing process of its manufacture decreases in order to increase the density and reduce the cost of data storage. Together with technological standards, the estimated life of the chips is also reduced.

Modules with single-level SLC cells had a declared resource of 100 thousand rewrite cycles and even more. Many of them still work in old flash drives and CF cards. An enterprise-class MLC (eMLC) claimed a resource in the range of 10 to 20 thousand, while for a regular consumer-grade MLC it is estimated at 3-5 thousand. This type of memory is being actively squeezed by the even cheaper TLC, whose resource barely reaches a thousand cycles. Keeping the life of flash memory at an acceptable level is due to software tricks, and new file systems are becoming one of them.

Initially, manufacturers assumed that the file system was unimportant. The controller itself must serve a short-lived array of memory cells of any type, distributing the load between them in an optimal way. For the file system driver, it mimics a regular disk, and itself performs low-level optimizations on any access. However, in practice, optimization for different devices varies from magical to fictitious.

AT enterprise SSD the embedded controller is a small computer. It has a huge memory buffer (half a gig or more), and it supports many methods to improve the efficiency of working with data, which allows you to avoid unnecessary write cycles. The chip arranges all the blocks in the cache, performs lazy writes, performs on-the-fly deduplication, reserves some blocks and clears others in the background. All this magic happens absolutely imperceptibly for the OS, programs and the user. With such an SSD, it really doesn't matter which file system is used. Internal optimizations have a much greater impact on performance and resource than external ones.

In budget SSDs (and even more so - flash drives) they put much less smart controllers. The cache in them is truncated or absent, and advanced server technologies are not used at all. In memory cards, the controllers are so primitive that it is often claimed that they do not exist at all. Therefore, for cheap devices with flash memory, external methods of load balancing remain relevant - primarily with the help of specialized file systems.

From JFFS to F2FS

One of the first attempts to write a file system that would take into account the principles of organizing flash memory was JFFS - Journaling Flash File System. Initially, this development of the Swedish company Axis Communications was focused on improving the memory efficiency of network devices that Axis produced in the nineties. The first version of JFFS supported only NOR memory, but already in the second version it made friends with NAND.

JFFS2 is currently of limited use. It is still mostly used in Linux distributions for embedded systems. It can be found in routers, IP cameras, NAS and other habitues of the Internet of things. In general, wherever a small amount of reliable memory is required.

A further development of JFFS2 was LogFS, which kept its inodes in a separate file. The authors of this idea are an employee of the German division of IBM Jörn Engel and a teacher at the University of Osnabrück Robert Mertens. The source code for LogFS is available on GitHub. Judging by the fact that the last change to it was made four years ago, LogFS has not gained popularity.

But these attempts spurred the emergence of another specialized file system - F2FS. It was developed by Samsung Corporation, which accounts for a large part of the flash memory produced in the world. Samsung makes NAND Flash chips for its own devices and by order of other companies, and also develops SSDs with fundamentally new interfaces instead of legacy disk ones. The creation of a specialized file system optimized for flash memory was a long overdue necessity from Samsung's point of view.

Four years ago, in 2012, Samsung created F2FS (Flash Friendly File System). Her idea is good, but the execution was a bit rough. The key task when creating F2FS was simple: to reduce the number of cell rewrite operations and distribute the load on them as evenly as possible. This requires performing operations on multiple cells within the same block at the same time, rather than forcing them one at a time. This means that we need not instant overwriting of existing blocks at the first request of the OS, but caching of commands and data, adding new blocks to free space and deferred erasing of cells.

Today, F2FS support has already been officially implemented in Linux (and, therefore, in Android), but it still does not give any particular advantages in practice. The main feature of this file system (delayed overwriting) led to premature conclusions about its effectiveness. The old caching trick even fooled earlier versions of benchmarks, where F2FS showed an imaginary advantage not by a few percent (as expected) and not even by several times, but by orders of magnitude. It's just that the F2FS driver reported on the operation that the controller was just planning to do. However, if the real performance gain of F2FS is small, then the cell wear will definitely be less than when using the same ext4. Those optimizations that a cheap controller cannot do will be performed at the level of the file system itself.

Extents and bitmaps

While F2FS is perceived as exotic for geeks. Even in their own Samsung smartphones ext4 still applies. Many consider it a further development of ext3, but this is not entirely true. It's more about a revolution than about breaking the 2TB barrier per file and simply increasing other quantitative indicators.

When computers were big and files were small, addressing was easy. Each file was allocated a certain number of blocks, the addresses of which were entered in the correspondence table. This is how the ext3 file system, which remains in service until now, worked. But ext4 introduced a fundamentally different way of addressing - extents.

Extents can be thought of as extensions of inodes as separate sets of blocks that are addressed as a whole as contiguous sequences. One extent can contain a whole medium-sized file, and for large files it is enough to allocate a dozen or two extents. This is much more efficient than addressing hundreds of thousands of small blocks of four kilobytes.

Changed in ext4 and the recording mechanism itself. Now the distribution of blocks occurs immediately in one request. And not in advance, but immediately before writing data to disk. Deferred multiblock allocation allows you to get rid of unnecessary operations that ext3 sinned: in it, blocks for a new file were allocated immediately, even if it entirely fit in the cache and was planned to be deleted as temporary.


FAT Restricted Diet

In addition to balanced trees and their modifications, there are other popular logical structures. There are file systems with a fundamentally different type of organization - for example, linear. You probably use at least one of them often.

Mystery

Guess the riddle: at twelve she began to gain weight, by sixteen she was stupidly fat, and by thirty-two she became fat, and remained simple. Who is she?

That's right, this is a story about the FAT file system. Compatibility requirements ensured her a bad heredity. On floppy disks, it was 12-bit, on hard drives - at first 16-bit, and has reached our days as 32-bit. In each subsequent version, the number of addressable blocks increased, but in essence nothing changed.

The still popular FAT32 file system appeared already twenty years ago. Today, it is still primitive and does not support access control lists, disk quotas, background compression, or other modern data optimization technologies.

Why is FAT32 needed these days? Still just for compatibility. Manufacturers rightly believe that any OS can read a FAT32 partition. Therefore, they create it on external hard drives, USB Flash and memory cards.

How to free up the flash memory of a smartphone

The microSD(HC) cards used in smartphones are formatted in FAT32 by default. This is the main obstacle to installing applications on them and transferring data from internal memory. To overcome it, you need to create a partition on the card with ext3 or ext4. All file attributes (including owner and access rights) can be transferred to it, so any application can work as if it was launched from internal memory.

Windows does not know how to make more than one partition on flash drives, but for this you can run Linux (at least in a virtual machine) or an advanced utility for working with logical partitioning - for example, MiniTool Partition Wizard Free. Having found an additional primary partition with ext3 / ext4 on the card, the Link2SD application and similar applications will offer much more options than in the case of a single FAT32 partition.


As another argument in favor of choosing FAT32, the lack of logging in it is often called, which means faster write operations and less wear on NAND Flash memory cells. In practice, the use of FAT32 leads to the opposite and gives rise to many other problems.

Flash drives and memory cards just die quickly due to the fact that any change in FAT32 causes overwriting of the same sectors where two chains of file tables are located. I saved the entire web page, and it was overwritten a hundred times - with each addition of another small gif to the flash drive. Launched portable software? He created temporary files and constantly changes them while working. Therefore, it is much better to use NTFS on flash drives with its fault-tolerant $MFT table. Small files can be stored directly in the main file table, and its extensions and copies are written to different areas of the flash memory. In addition, thanks to indexing on NTFS, searches are faster.

INFO

For FAT32 and NTFS, the theoretical nesting level limits are not specified, but in practice they are the same: only 7707 subdirectories can be created in the first level directory. Those who like to play nesting dolls will appreciate it.

Another problem that most users face is that it is impossible to write a file larger than 4 GB to a FAT32 partition. The reason is that in FAT32 the file size is described by 32 bits in the file allocation table, and 2^32 (minus one, to be precise) is exactly four gigabytes. It turns out that neither a film in normal quality nor a DVD image can be written to a freshly purchased flash drive.

Copying large files is not so bad: when you try to do this, the error is at least immediately visible. In other situations, FAT32 acts as a time bomb. For example, you copied portable software onto a flash drive and at first you use it without problems. After a long time, one of the programs (for example, accounting or mail) has a database that swells up, and ... it simply stops updating. The file cannot be overwritten because it has reached the 4 GB limit.

A less obvious problem is that, in FAT32, the creation date of a file or directory can be given up to two seconds. This is insufficient for many cryptographic applications that use timestamps. The low precision of the "date" attribute is another reason why FAT32 is not considered a complete file system from a security point of view. However, its weaknesses can be used for your own purposes. For example, if you copy any files from an NTFS partition to a FAT32 volume, they will be cleared of all metadata, as well as inherited and specially set permissions. FAT just doesn't support them.

exFAT

Unlike FAT12/16/32, exFAT was designed specifically for USB Flash and large (≥ 32 GB) memory cards. Extended FAT eliminates the disadvantage of FAT32 mentioned above - overwriting the same sectors with any change. As a 64-bit system, it has no practical limits on the size of a single file. Theoretically, it can be 2 ^ 64 bytes (16 EB) long, and cards of this size will not appear soon.

Another fundamental difference between exFAT is support for access control lists (ACLs). This is no longer the same simple thing from the nineties, however, the closeness of the format prevents the introduction of exFAT. ExFAT support is fully and legally implemented only in Windows (starting with XP SP2) and OS X (starting with 10.6.5). On Linux and *BSD, it is either limitedly supported or illegally supported. Microsoft requires licenses to use exFAT, and there is a lot of legal controversy in this area.

btrfs

Another prominent B-tree filesystem is called Btrfs. This FS appeared in 2007 and was originally created in Oracle with an eye to working with SSD and RAID. For example, it can be dynamically scaled: create new inodes right on the running system or divide a volume into subvolumes without allocating free space to them.

The copy-on-write mechanism implemented in Btrfs and full integration with the Device mapper kernel module allow you to make almost instantaneous snapshots through virtual block devices. Data precompression (zlib or lzo) and deduplication speed up basic operations, while extending the lifetime of flash memory. This is especially noticeable when working with databases (2–4 times compression is achieved) and small files (they are written in orderly large blocks and can be stored directly in the “leaves”).

Btrfs also supports full logging (data and metadata), volume check without unmounting, and many other modern features. The Btrfs code is published under the GPL license. This file system has been supported as stable on Linux since kernel version 4.3.1.

Logbooks

Almost all more or less modern file systems (ext3 / ext4, NTFS, HFSX, Btrfs and others) belong to the general group of journaled ones, since they keep track of changes made in a separate log (journal) and check with it in case of failure during disk operations . However, the granularity of logging and fault tolerance vary between these file systems.

ext3 supports three logging modes: loopback, ordered, and full logging. The first mode involves writing only general changes (metadata), performed asynchronously with respect to changes in the data itself. The second mode does the same metadata writing, but strictly before any changes are made. The third mode is equivalent to full logging (changes both in metadata and in the files themselves).

Only the last option provides data integrity. The other two only speed up the detection of errors during the check and guarantee the restoration of the integrity of the file system itself, but not the contents of the files.

Journaling in NTFS is similar to the second logging mode in ext3. Only changes to the metadata are logged, and the data itself may be lost in the event of a failure. This method of journaling in NTFS was not intended as a way to achieve maximum reliability, but only as a compromise between speed and fault tolerance. This is why people who are used to working with fully journaled systems consider NTFS to be pseudo-journaled.

The approach implemented in NTFS is in some ways even better than the default in ext3. NTFS additionally creates checkpoints periodically to ensure that all previously pending disk operations are completed. Checkpoints have nothing to do with restore points in \System Volume Information\ . These are just service entries in the log.

Practice shows that in most cases such partial NTFS journaling is enough for trouble-free operation. After all, even with a sharp power outage, disk devices do not de-energize instantly. The power supply and numerous capacitors in the drives themselves provide just the minimum amount of energy that is enough to complete the current write operation. With modern SSDs, with their speed and efficiency, the same amount of energy is usually enough to perform pending operations. An attempt to switch to full logging would reduce the speed of most operations at times.

We connect third-party file systems in Windows

The use of file systems is limited by their support at the OS level. For example, Windows does not understand ext2/3/4 and HFS+, but sometimes you need to use them. You can do this by adding the appropriate driver.

WARNING

Most drivers and plug-ins for third-party file systems support have their limitations and do not always work stably. They may conflict with other drivers, antivirus and virtualization programs.

An open source driver for reading and writing ext2/3 partitions with partial support for ext4. AT latest version Extents and partitions up to 16TB are supported. LVM, access control lists, and extended attributes are not supported.


There is a free plugin for Total Commander. Supports reading ext2/3/4 partitions.


coLinux is an open and free port of the Linux kernel. Together with a 32-bit driver, it allows you to run Linux on Windows 2000 to 7 without the use of virtualization technologies. Supports only 32-bit versions. The development of the 64-bit modification has been cancelled. coLinux allows, among other things, to organize from Windows access to ext2/3/4 partitions. Support for the project was suspended in 2014.

Windows 10 may already have built-in support for Linux-specific file systems, it's just hidden. These thoughts are suggested by the Lxcore.sys kernel-level driver and the LxssManager service, which is loaded as a library by the Svchost.exe process. For more on this, see Alex Ionescu's "The Linux Kernel Hidden Inside Windows 10" talk at Black Hat 2016.


ExtFS for Windows is a paid driver released by Paragon. It works on Windows 7 to 10, supports read/write access to ext2/3/4 volumes. Provides almost complete support for ext4 on Windows.

HFS+ for Windows 10 is another proprietary driver from Paragon Software. Despite the name, it works in all Windows versions since XP. Provides full access to HFS+/HFSX file systems on disks with any layout (MBR/GPT).

WinBtrfs is an early development of the Btrfs driver for Windows. Already in version 0.6, it supports both read and write access to Btrfs volumes. Can handle hard and symbolic links, supports alternate data streams, ACLs, two types of compression and asynchronous read/write mode. While WinBtrfs does not know how to use mkfs.btrfs, btrfs-balance and other utilities to maintain this file system.

Capabilities and limitations of file systems: summary table

File system Maxi-small volume-size Limit size of one file Length of own file name Half-length of the filename (including the path from the root) Limit number of files and / or directories Accuracy of indicating the date of the file / catalog Dos-tu-pa rights Hard links Symbolic links Instant vein shots (snap-shots) Data compression in the background Cipher-ro-va-nie data in the background Dedu-pli-ka-tion data
FAT16 2 GB in 512 byte sectors or 4 GB in 64 KB clusters 2 GB 255 bytes with LFN - - - - - - - - - -
FAT32 8 TB in 2 KB sectors 4 GB (2^32 - 1 byte) 255 bytes with LFN up to 32 subdirectories with CDS 65460 10ms (create) / 2s (change) No No No No No No No
exFAT ≈ 128 PB (2^32-1 clusters of 2^25-1 bytes) theoretical / 512 TB due to third party limits 16 EB (2^64 - 1 byte) 2796202 in the catalog 10 ms ACL No No No No No No
NTFS 256 TB in 64 KB clusters or 16 TB in 4 KB clusters 16TB (Win 7) / 256TB (Win 8) 255 Unicode characters (UTF-16) 32760 Unicode characters, but no more than 255 characters in each element 2^32-1 100 ns ACL Yes Yes Yes Yes Yes Yes
HFS+ 8 EB (2^63 bytes) 8 EB 255 Unicode characters (UTF-16) not limited separately 2^32-1 1 s Unix ACL Yes Yes No Yes Yes No
APFS 8 EB (2^63 bytes) 8 EB 255 Unicode characters (UTF-16) not limited separately 2^63 1 ns Unix ACL Yes Yes Yes Yes Yes Yes
Ext3 32 TB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs utilities) 2 TB (theoretically) / 16 GB for older programs 255 Unicode characters (UTF-16) not limited separately - 1 s Unix ACL Yes Yes No No No No
Ext4 1 EB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs utilities) 16 TB 255 Unicode characters (UTF-16) not limited separately 4 billion 1 ns POSIX Yes Yes No No Yes No
F2FS 16 TB 3.94 TB 255 bytes not limited separately - 1 ns POSIX, ACL Yes Yes No No Yes No
BTRFS 16 EB (2^64 - 1 byte) 16 EB 255 ASCII characters 2^17 bytes - 1 ns POSIX, ACL Yes Yes Yes Yes Yes Yes

The file system allows you to organize programs and data and organize the orderly management of these objects.

For operating systems personal computers left a deep imprint on the concept of the file system underlying the Unix operating system. In Unix, the I/O subsystem unifies the way you access both files and peripherals. In this case, a file is understood as a set of data on a disk, terminal, or some other device.

File system is a functional part of the operating system that provides operations on files. The file system allows you to work with files and directories (directories) regardless of their content, size, type, etc.

File system is a data management system.

A data management system is a system whose users are relieved of most of the physical manipulation of files and can focus primarily on the logical properties of the data.

OS file systems create for users some virtual representation of external storage devices, allowing them to work with them not at the low level of physical device control commands, but at a high level of data sets and structures.

File system (destination):

  • hides the picture of the real location of information in external memory;
  • ensures the independence of programs from the features of a specific computer configuration (logical level of working with files);
  • provides standard responses to errors that occur during data exchange.

File structure

The whole set of files on the disk and the relationships between them is called the file structure. Developed operating systems have a hierarchical, multi-level file structure organized as a tree.

A tree structure of directories is used − directory tree. Borrowed from Unix. Hierarchical structure - the structure of the system, parts (components) of which are connected by relations of inclusion or subordination.

The hierarchical structure is represented by a oriented tree, in which the vertices correspond to the components, and the arcs correspond to the links.

G drive directory tree

A directed tree is a graph with a distinguished vertex (root) in which there is only one path between the root and any vertex. In this case, two orientation options are possible: either all paths are oriented from the root to the leaves, or all paths are oriented from the leaves to the root.

Trees are used in describing and designing hierarchical structures.

The root is the starting position, the leaves are the final position.

Sections

Any hard or magneto-optical disk during formatting can be divided into several parts and work with them as with separate (independent) disks. These parts are called sections or logical drives. Partitioning a disk into several logical disks may be necessary due to the fact that the OS cannot work with disks that are larger than a certain size. It is very convenient to store data and user programs separately from system programs (OS), because the OS can “fly off the computer”.

Chapter– area of ​​the disk. Under logical disk (partition) a computer is understood as any storage medium with which the operating system works as a single entity.

Drive name– designation of the logical drive; entry in the root directory.

Logical disks (partitions) are indicated by Latin letters A, B, C, D, E, ... (32 letters from A to Z).

The letters A, B are reserved for floppy disks.

FROM - HDD, usually from which the OS is loaded.

The remaining letters are logical drives, CDs, etc. The maximum number of logical drives for Windows OS is infinite.

AT partition table indicates the location of the beginning and end of this section and the number of sectors in this section (location and size).

File structure of a logical drive

To access information on a disk in a file, you need to know the physical address of the first sector (surface number + track number + sector number), the total number of clusters occupied by this file, the address of the next cluster if the file size is larger than the size of one cluster

File structure elements:

    starting sector (bootstrap, boot sector);

    table accommodationfiles (FAT - File Allocation Table);

    root directory (Root Directory);

    data area (remaining free disk space).

Boot-sector

Boot-sector - the first (initial) sector of the disk. Located on the 0-side, 0-track.

The boot sector contains service information:

    disk cluster size (a cluster is a block that combines several sectors into a group to reduce the size of the FAT table);

    location of the FAT table (in the boot sector there is a pointer to where the FAT table is located);

    FAT table size;

    the number of FAT tables (there are always at least 2 copies of the table to ensure reliability and security, since the destruction of the FAT leads to information loss and is difficult to recover);

    the address of the beginning of the root directory and its maximum size.

The boot sector contains the boot block (bootloader) - the Boot Record boot record.

A bootloader is a utility program that places an executable program into RAM and brings it to a state of readiness for execution.

FAT (File Allocation Table)

FAT (File Allocation Table) - file allocation table. It defines which parts of the disk belong to each file. The disk data area is represented in the OS as a sequence of numbered clusters.

FAT is an array of elements addressing clusters of the disk's data area. Each data area cluster corresponds to one FAT entry. The FAT elements serve as a chain of links to file clusters in the data area.

File Allocation Table Structure:

FAT consists of elements of length 16/32/64 bits. In total, the table can have up to 65520 such elements, each of them (except the first two) corresponds to a disk cluster. A cluster is the unit in which space is allocated in the data area of ​​a disk for files and directories. The first two elements of the table (with numbers 0 and 1) are reserved, and each of the remaining elements of the table describes the state of the disk cluster with the same number. The element may indicate that the cluster is free, that the cluster is defective, that the cluster belongs to the file, and that it is the last cluster in the file. If the cluster belongs to the file and is not its last cluster, then the table entry contains the number of the next cluster in this file.

FAT is an extremely important element of the file structure. FAT violations can lead to complete or partial loss of information on the entire logical drive. That is why, two copies of FAT are stored on the disk. There are special programs that monitor the state of FAT and correct violations.

Required for different OS different versions FAT

Windows 95 FAT16, FAT32

Windows NT (XP) NTFS

Novell Netware TurboFAT

UNIX NFS,ReiserFS

The logical structure of the storage medium

Your removable drive must use FAT32 for better compatibility, but if you plan to store large files, then format in NTFS. Mac formats drives to the HFS+ standard, which doesn't work with Windows. Linux also has its own file systems.

Why are there so many?

File system 101

Different file systems - it's easy various ways organizing and storing files on a hard drive, flash drive or any other storage device. Each storage device has one or more sections, and each section must be "formatted" to a specific file system mode. The formatting process creates an empty file system of this type on the device.

File system provides a way to divide data on disk into separate parts, which are files. It also provides a way to store data about these files, such as their names, permissions, and other attributes. File system also provides an index-list of files on the disk and where they are located on the disk, so that the operating system can see what's on the disk in one place, and it doesn't have to "comb" the entire disk to find the .

The operating system must understand the file system so that it can display its contents, open files, and save files to them. If your operating system does not understand the file system, you can install a file system driver that provides support for such a file system.

The file system of a computer disk can be compared to a document storage system - the bits of data on a computer are called "files" and they are organized in a "file system" just as paper files can be organized in file cabinets. Exist different ways organizing these files and storing data - these are "file systems".

Why are there so many file systems

Not all file systems are equal. Different file systems have different ways of organizing their data. Some file systems are faster than others, some have additional security features, and some support disks with more memory, while others only work on disks with less memory. Some file systems are more robust and resistant to file corruption, while others compromise reliability in favor of speed.

Does not exist the best file system, which would be suitable for all purposes. Each computer operating system tends to use its own file system, which is also worked on by the operating system developers. Microsoft, Apple, and the Linux kernel developers are working on their own file systems. New file systems can be faster, more stable, scale better for larger storage devices, and have more features than older ones.

The file system is not like a partition, which is just a chunk of storage space. The file system determines how files are laid out, organized, indexed, and how metadata is associated with them. There is always room to tweak and improve how it's done.

Switching file systems

Each partition has a file system. Sometimes you can "convert" a partition's file system, but this is rarely possible. Instead, you will probably have to copy important data from the partition first.

Operating systems automatically format partitions to the appropriate file system during the installation process. If you have a Windows-formatted partition on which you want to install Linux, during the installation process, Linux will format the NTFS or FAT32 partition to the Linux file system preferred by your Linux distribution.

Thus, if you have a storage device and want to use a different file system, just copy the files from it to back them up. Then use the tool Disk Management on Windows gparted on Linux or disk utility in MacOS.

Overview of common file systems

Here short review some of the most common file systems you will encounter. It is not exhaustive - there are many other file systems for special purposes:

  • FAT32: is one of the oldest Windows file systems, but it is still used on removable media - small in volume. Large external hard drives of 1TB or more will be formatted with NTFS anyway. FAT32 only makes sense for small storage devices or for compatibility with other devices such as digital cameras, game consoles, set-top boxes, and other devices that only support FAT32 but NTFS.
  • NTFS: modern version of the Windows file system - used since Windows XP. External drives can be formatted with FAT32 or NTFS.
  • HFS+: Mac uses HFS+ for its internal partitions, it also formats external drives- Using an external hard drive with Time Machine requires that file system attributes can be added to the backup. Macs can also read and write files to FAT32 file systems, but you will need a third party software to write to NTFS file systems from a Mac.
  • ext2 / Ext3/Ext4: You will often see ext2, ext3, and ext4 filesystems on Linux. Ext2 is an older file system and it lacks important features like journaling - if the power goes out or the computer crashes while writing to the ext2 drive, data can be lost. Ext3 adds these robust characteristics at the expense of some speed. Ext4 is a more modern and faster option - it is the default file system on most Linux distributions. Windows and Mac do not support these file systems - you will need a third party tool to access files in such file systems. However, Linux can read and write to both FAT32 and NTFS.
  • btrfs: this is the new file Linux system which is still in development. It's not standard on most Linux distributions at the moment, but will likely replace Ext4 one day. The goal is to provide additional features that allow Linux to scale to large amounts of storage.
  • Swap: On Linux, the "swap" file system is not really a file system. A partition formatted as "swap" can be used as an operating system's swap space - like a Windows swap file, but requires a dedicated partition.

There are other file systems, especially on Linux and other Unix-like systems.

The typical computer user shouldn't know much of this stuff - but knowing the basics will help you understand questions like "why doesn't this Mac-formatted disc work with my Windows PC?" and "should I format this USB hard drive as FAT32 or NTFS?".