Distributed file system
•A distributed file system is a client/server-based application.it allows clients to access and process data stored on the server as if it were on their own computer.
•A distributed file system organizes file and directory services of individual servers into a global directory in such a way that remote data access is not location-specific but is identical from any client.
•When a user accesses a file on the server, the server sends the user a copy of the file, which is cached on the user's computer while the data is being processed and is then returned to the server.
•All files are accessible to all users of the global file system and organization is hierarchical and directory-based.
Distributed file system models
(a) The remote access model.
(b) The upload/download model.
•Since more than one client may access the same data simultaneously, the server must have a mechanism in place to organize updates so that the client always receives the most current version of data and that data conflicts do not arise.
•Sun Microsystems' Network File System (NFS), Novell NetWare, Microsoft's Distributed File System
Structure of Distributed file system
•Service – software entity running on one or more machines and providing a particular type of function to a priori unknown clients
•Server – service software running on a single machine.
•Client – process that can invoke a service using a set of operations that forms its client interface
Two ways of implementing DFS on a server
Standalone DFS namespace allow for a DFS root that exists only on the local computer, and thus does not use Active Directory.
•A Standalone DFS can only be accessed on the computer on which it is created. It does not offer any fault tolerance and cannot be linked to any other DFS.
•This is the only option available on Windows NT 4.0 Server systems. Standalone DFS roots are rarely encountered because of their limited utility.
Domain-based DFS namespace stores the DFS configuration within Active Directory, the DFS namespace root is accessible at \\domainname\.
•The namespace roots do not have to reside on domain controllers, they can reside on member servers.
What does distributed file system (DFS) do?
DFS Namespaces. Enables you to group shared folders that are located on different servers into one or more logically structured namespaces.
• Each namespace appears to users as a single shared folder with a series of subfolders.
• This structure increases availability and automatically connects users to shared folders in the same Active Directory.
DFS Replication. DFS Replication is an efficient, multiple-master replication engine that you can use to keep folders synchronized between servers across limited bandwidth network connections.
•It replaces the File Replication Service (FRS) as the replication engine for DFS Namespaces.
Types of distributed file system
•File Transfer Protocol (FTP)
•Sun’s Network File System (NFS)
•Andrew File System (AFS)
Other older file systems:
1. CODA
2. Sprite
3. Echo
4. Amoeba Bullet File Server
5. xFs
File Transfer Protocol (FTP)
•it is a Motivation is to provide file sharing (not a distribute file system)
•It helps to Connect to a remote machine and interactively send or fetch an arbitrary file.
•FTP deals with authentication, listing a directory contents, ascii or binary files, etc
Sun’s Network File System (NFS)
•Sun's NFS is one of the most popular and widespread distributed file systems in use today.
The design goals of NFS were:
•Any machine can be a client and/or a server.
•NFS must support diskless workstations (that are booted from the network). Diskless workstations were Sun’s major product line.
•Heterogeneous systems should be supported: clients and servers may have different hardware and/or operating systems. Interfaces for NFS were published to encourage the widespread adoption of NFS.
•High performance: try to make remote access as comparable to local access through caching and read-ahead
Andrew File System (AFS)
•The goal of the Andrew File System was to support information sharing on a large scale (thousands to 10000+ users).
•There were several incarnations of AFS, with the first version being available around 1994, AFS-2 in 1986, and AFS-3 in 1989).
•The assumptions about file usage were:
•most files are small
•reads are much more common than writes
•most files are read/written by one user
From these assumptions, the original goal of AFS was to use whole file serving on
the server (send an entire file when it is opened) and whole file caching on the
client (save the entire file onto a local disk).
Issues of distributed file system
•Naming :-In designing a distributed file service, we should consider whether all machines (and processes) should have the exact same view of the directory hierarchy.
•We might also wish to consider whether the name space on all machines should have a global root directory (a.k.a. super root) so that files can be accessed as, for example, //server/path
Caching
We can employ caching to improve system performance. There are four places in a distributed system where we can hold data.
1. on the server's disk
2. in a cache in the server's memory
3. in the client's memory
4. on the client's disk
Should servers maintain state?
In a stateless system:
•Fault tolerance: if a server crashes and then recovers, no state was lost about client connections because there was no state to maintain.
• No remote open/close calls are needed
• No wasted server space per client.
• No limit on the number of open files on the server per-client state.
•No problems if the client crashes.
•The server does not have any state to clean up.
On a stateful system:
•requests are shorter (less info to send).
•better performance in processing the requests.
•file locking is possible; the server can keep state that a certain client is locking a file
Features of distributed file system
•Uniform access: a distributed computing environment should support global file names. One mechanism that allows the name of a file to look the same on all computers is called a uniform name space.
•Manageability: systems should provide a way of keep track of configuration information (e.g. location of files). DFS uses distributed databases for this task.
•Security: distributed file systems must provide authentication. Furthermore, once users are authenticated, the system must ensure that the performed operations are permitted on the resources accessed. This process is called authorization.
•Standard conformance: DFS complies with the IEEE POSIX 1003.1 file systems semantics standard
•Reliability: the distributed file system scheme itself improves the reliability because its distributed nature, that is, the elimination of the single point of failure of non-distributed systems.
•DFS uses file replication to achieve this goal, i.e., multiple copies of files on multiple servers.
•Server load balancing A DFS root can support multiple targets that are physically distributed across a network.
•for example, if you have a file that you know will be accessed heavily by your users. Rather than all users physically accessing this file on a single server, and thus taxing the server
•DFS ensures that user access to the file is distributed to multiple servers. To users, however, the file resides in one location on the network.
•File and folder security:- Because the shared resources DFS manages use standard NTFS and file sharing permissions, you can use pre-existing security groups and user accounts to ensure that only authorized users have access to sensitive data.
•Easy access to files: A distributed file system makes it easier for users to access files. Users need only go to one location on the network to access files, even though the files may be physically spread across multiple servers
•Performance: the network is considerably slower than the internal buses. Therefore, the less clients have to access servers, the more performance can be achieved. DFS uses a cache (both of file status and real data) to lower the network load
