INTRODUCTION TO DISTRIBUTED
SYSTEMS
`
Definition
Motivation for Distributed system
Architectural Categories
Characteristics, Issues, Goals,
Advantages
Disadvantages
3.
DEFINITION
A distributedsystem is a collection of independent
computers, interconnected via a network, capable of
collaborating on a task.
A distributed system can be characterized as collection
of multiple autonomous computers that communicate
over a communication network and having following
features:
No common Physical clock
Enhanced Reliability
Increased performance/cost ratio
Access to geographically remote data and resources
Scalability
3
4.
DEFINITION CNTD…
Distributedsystem is a collection of independent
entities that cooperate to solve a problem that
cannot be solved individually.
So, basically it is nothing but a collection of
computers.
DCS do not share a common memory or do not have
a common physical clock, and the only way they can
communicate is through the message passing and
for that they require a communication network
5.
Definition of aDistributed System
A distributed system is (Tannenbaum):
A collection of independent computers that
appears to its users as a single coherent system.
A distributed system is (Lamport):
One in which the failure of a computer you
didn't even know existed can render your own
computer unusable
6.
Overview…
Distributed systemconnects autonomous processors
by communication network.
The software component that run on each of the
computers use the local operating system and network
protocol stack.
The distributed software is termed as middleware.
The distributed execution is the execution of the
processes across the distributed system to collectively
achieve a common goal.
7.
Centralized system
Alldata and computational resources are kept and
controlled in a single central place, such as a server, in
a centralized system.
Applications and users connect to this hub in order to
access and handle data.
Although this configuration is easy to maintain and
secure, if too many users access it simultaneously or if
the central server malfunctions, it could become a
bottleneck.
8.
Motivation for Distributedsystem
Inherently distributed computation that is many
applications such as money transfer in the banking, or
reaching a consensus among the parties that are
geographically distant, the computation is inherently
distributed.
Resource sharing the sharing of the resources such as
peripherals, and a complete data set and so on and so forth.
Access the geographically remote data and resources,
such as bank database, supercomputer and so on.
Reliability enhanced reliability possibility of replicating
the resources and execution to enhance the reliability.
Client-Server Architecture
Inthis setup, servers provide resources or services, and
clients request them. Clients and servers communicate
over a network.
Examples: Web applications, where browsers (clients)
request pages from web servers.
12.
Peer-to-Peer (P2P) Architecture
Each node, or "peer," in the network acts as both a
client and a server, sharing resources directly with each
other.
Examples: File-sharing networks like BitTorrent,
where files are shared between users without a central
server.
13.
Three-Tier Architecture
Thismodel has three layers: presentation (user
interface), application (business logic), and data
(database). Each layer is separated to allow easier
scaling and maintenance.
Examples: Many web applications use this to separate
user interfaces, logic processing, and data storage.
14.
Microservices Architecture
Theapplication is split into small, independent
services, each handling specific functions. These
services communicate over a network, often using
REST APIs or messaging.
Examples: Modern web applications like Netflix or
Amazon, where different services handle user
accounts, orders, and recommendations independently.
15.
Service-Oriented Architecture (SOA)
Similar to microservices, SOA organizes functions as
services. However, SOA typically uses an enterprise
service bus (ESB) to manage communication between
services.
Examples: Large enterprise applications in finance or
government, where different services handle various
aspects of business processes.
16.
Event-Driven Architecture
Componentsinteract by sending and responding to
events rather than direct requests. An event triggers
specific actions or processes in various parts of the
system.
Examples: Real-time applications like IoT systems,
where sensors trigger actions based on detected events.
TIGHTLY COUPLED SYSTEMS
Inthese systems, there is a single system wide
primary memory (address space) that is shared
by all the processors . Usually tightly coupled
systems are referred to as parallel processing
systems.
CPU CPU
System-
Wide
Shared
memory CPU
Interconnection hardware
CPU
19.
LOOSELY COUPLED SYSTEMS
In these systems, the processors do not
share memory, and each processor has its
own local memory .Loosely coupled systems
are referred to as distributed computing
systems, or simply distributed systems
Local memory
CPU
Local memory
CPU
Local memory
CPU
Local memory
CPU
Communication network
EXAMPLES OF DISTRIBUTED
SYSTEMS
Database Management System
Automatic Teller Machine Network
Internet/World-Wide Web
Mobile and Ubiquitous Computing
21
WEB SERVERS ANDWEB BROWSERS
26
Internet
Browsers
Web servers
www.google.com
www.uu.se
www.w3c.org
Protocols
Activity.html
http://www.w3c.org/Protocols/Activity.html
http://www.google.comlsearch?q=lyu
http://www.uu.se/
File system of
www.w3c.org
Distributed System
A distributedsystem organized as middleware. The
middleware layer extends over multiple machines,
and offers each application the same interface.
Making resources accessible
•The main goal of a distributed system is to make it
easy for the users (and applications) to access
remote resources, and to share them in a controlled
and efficient way.
• Resources can be just about anything, but typical
examples include things like printers, computers,
storage facilities, data, files, Web pages, and
networks,
Reasons to share resources.
• Economics.
31.
OPENNESS
An opendistributed system is a system that
offers services according to standard rules that
describe the syntax and semantics of those
services.
Detailed interfaces of components need to be
published.
New components have to be integrated with
existing components. An open distributed system
should also be extensible.
Differences in data representation of interface
types on different processors (of different
vendors) have to be resolved. 31
32.
TRANSPARENCY
Distributed systemsshould be perceived by users
and application programmers as a whole rather
than as a collection of cooperating components.
Ability to hide the fact that process and resources
are distributed .
Transparency has different aspects.
These represent various properties that
distributed systems should have.
32
ACCESS TRANSPARENCY
Enableslocal and remote information objects
to be accessed using identical operations.
Example: File system operations in NFS.
Example: Navigation in the Web.
Example: SQL Queries
34
35.
LOCATION TRANSPARENCY
Enablesinformation objects to be accessed
without knowledge of their location.
Example: File system operations in NFS
Example: Pages in the Web
Example: Tables in distributed databases
35
36.
CONCURRENCY TRANSPARENCY
Enablesseveral processes to operate
concurrently using shared information
objects without interference between them.
Example: Automatic teller machine network
Example: Database management system
36
37.
REPLICATION TRANSPARENCY
Enablesmultiple instances of information
objects to be used to increase reliability and
performance without knowledge of the
replicas by users or application programs
Example: Distributed DBMS
Example: Mirroring Web Pages.
37
38.
FAILURE TRANSPARENCY
Enablesthe concealment of faults
Allows users and applications to complete
their tasks despite the failure of other
components.
Partial failure transparency is achievable
but complete failure transparency is not
possible
Example: Database Management System
38
39.
MIGRATION TRANSPARENCY
Allowsthe movement of information objects
within a system without affecting the
operations of users or application programs
Relocation Transparency:
Situation in which resources can be
relocated while they are being accessed
without the user or application noticing
anything. In such cases, the system is said
to support relocation transparency.
39
40.
PERFORMANCE TRANSPARENCY
Allowsthe system to be reconfigured to
improve performance as loads vary.
Load should be evenly distributed among
all the machines.
40
41.
SCALING TRANSPARENCY
Allowsthe system and applications to
expand in scale without change to the
system structure or the application
algorithms.
Example: World-Wide-Web
Example: Distributed Database
41
42.
HETEROGENEITY
Variety anddifferences in
Networks
Computer hardware
Operating systems
Programming languages
Implementations by different developers
42
43.
SECURITY
In adistributed system, clients send
requests to access data managed by servers,
resources in the networks:
Doctors requesting records from hospitals
Users purchase products through electronic commerce
Security is required for:
Concealing the contents of messages: security and
privacy
Identifying a remote user or other agent correctly
(authentication)
New challenges:
Denial of service attack
Security of mobile code 43
44.
FAILURE HANDLING (FAULT
TOLERANCE)
Hardware, software and networks fail!
Distributed systems must maintain
availability even at low levels of
hardware/software/network reliability.
Fault tolerance is achieved by
recovery
redundancy
44
45.
CONCURRENCY
Components indistributed systems are
executed in concurrent processes.
Components access and update shared
resources (e.g. variables, databases, device
drivers).
Integrity of the system may be violated if
concurrent updates are not coordinated.
45
46.
SCALABILITY
Scalability ofa system can be measured along at
least three different dimensions
scalability with respect to size: meaning that we
can easily add more users and resources to the
system.
geographically scalable :system is one in which
the users and resources may lie far apart.
Administratively scalable: meaning that it can
still be easy to manage even if it spans many
independent administrative organizations.
47.
SCALING TECHNIQUES
Hidingcommunication latencies
Asynchronous communication
Allocate more job to client machine
Distribution
Distribution involves taking a component, splitting it
into smaller parts, and subsequently spreading those
parts across the system. An excellent example of
distribution is the Internet Domain Name System
(DNS)
Replicate
48.
4. BASIC DESIGNISSUES
Specific issues for distributed
systems:
Naming
Communication
Software structure
System architecture
Workload allocation
Consistency maintenance
48
49.
NAMING
A nameis resolved when translated into an
interpretable form for resource/object reference.
Communication identifier (IP address + port number)
Name resolution involves several translation steps
Design considerations
Choice of name space for each resource type
Name service to resolve resource names to comm. id.
Name services include naming context
resolution, hierarchical structure, resource
protection
49
50.
COMMUNICATION
Separated componentscommunicate with
sending processes and receiving processes for
data transfer and synchronization.
Message passing: send and receive primitives
synchronous or blocking
asynchronous or non-blocking
Abstractions defined: channels, sockets, ports.
Communication patterns: client-server
communication (e.g., RPC, function shipping)
and group multicast
50
51.
SOFTWARE STRUCTURE
Layersin centralized computer systems:
51
Applications
Middleware
Operating system
Computer and Network Hardware
52.
SOFTWARE STRUCTURE
Layersand dependencies in distributed systems:
52
Applications
Distributed programming
support
Open
services
Open system kernel services
Computer and network hardware
ADVANTAGES OF DISTRIBUTED
SYSTEM
Information Sharing among Distributed
Users
Resource Sharing
Extensibility and Incremental growth
Shorter Response Time and Higher Output
Higher Reliability
Better Flexibility’s in meeting User’s needs
Better price/performance ratio
Scalability
Transparency
7
55.
DISADVANTAGES OF DISTRIBUTED
SYSTEM
Difficulties of developing distributed
software
Networking Problem
Security Problems
Performance
Openness
Reliability and Fault Tolerance
8
56.
NEXT LECTURE
•Introductionto Big Data
Big Data Sources
5 V’s of Big Data
Big Data Processing Frameworks (Hadoop,
Spark, and NoSQL Databases)
Introduction to Apache Hadoop Stack (HDFS,
MapReduce, Sqoop, Zookeeper, HBase, Hive, Pig)
57.
REFERENCES:
Tanenbaum, AndrewS., and Maarten Van
Steen. Distributed systems: principles and
paradigms. Prentice-Hall, 2007.
Sinha, Pradeep K. Distributed operating systems:
concepts and design. PHI Learning Pvt. Ltd.,
1998.
NOC:Distributed Systems,NPTEL