SEM1020 - Current Issues in Software Engineering
This module focuses on current issues in the field of Software Engineering. The module will involve students in an examination of current research literature of interest to software engineers.
Introduction
It is okay to produce a paper based on my own paper :-)
Staff
Learning Outcomes
- Identify and use the main research resources that are available to software engineers
- Constructively participate in advanced technical debate in the field.
- Have a general overview of the field of Software Engineering and be aware of focused areas of research interest within it.
- Present current research at an appropriate level of detail to a technical audience.
- Produce a survey paper on some current area of Software Engineering research.
Assessment
- Participation and Contribution 15%
- Survey Paper (9000 words) 60%
- Poster 25%
How to Read a Scientific Paper
Papers provides a good view of what is happening now. Journal papers give a more out of date view than conference papers. Blogs typically give the most up-to-date view but are least verifiable.
Papers should give enough information that there is enough information to replicate the experiment.
It will show some actual data.
Steps
- Read the abstract, check the information will be relevant
- Read the conclusion and discussion, check the results agree with the abstract and if they are still relevant.
- Read the introduction, check you understand the background information and see if you need to look up more items to understand this paper.
- Read the results. Usually tables and figures are provided to show the data.
Now the methods become important, it gives details on how the experiment was set up and carried out. Try to pick out the basic methods and work upwards.
References provide a way of tracking back through the sources.
Paper Presentations
Set down for presenting Abstract State Machines (Feb 28) and Swarm Robotics (Mar 28) and asking questions on Biological Data Sharing (Mar 14) and Swarm Robotics (Apr 4).
Literature Review/Survey Paper
9000 words of pure gold.
May address any aspect of software engineering.
MSc should probably review the literature of their project.
SEM2220 - Mobile Solutions
Mobile devices (phones, tablets and other handheld devices) are the fastest growing area of computing. Typical applications involve a wide range of software and system issues. This module will investigate those issues, giving students experience and understanding of the best ways of building different types of mobile applications.
Introduction
Scope of mobile systems, , mobile web, native apps, design choices, main platforms, overview of implications (2 lectures)
Staff
- Chris Loftus (E38)
- Neil Taylor (C58)
- Andrew Star (C47)
Learning Outcomes
- Demonstrate an advanced understanding of the domain of mobile systems.
- Quite a broad overview of these systems.
- High level view.
- Give a flavour of specific systems (iOS, Android).
- Analyse and evaluate mobile solutions in a range of application areas and be able to critically evaluate their effectiveness.
- Have to be critical at M level.
- Research. Pros and cons.
- Evaluate the social, legal, ethical and professional issues involved in implementing mobile applications.
- Neil will be talking about this in the context of mobile.
- Implement representative mobile systems.
- Playing with the technologies.
- Building an android, iOS and mobile web app.
Assessment
- Mobile Web-based Programming project 20% 22/10/2013 to 04/11/2013
- iOS-based Programming project 20% 12/11/2013 to 25/11/2013
- Android-based Programming project 20% 02/12/2013 to 16/12/2013
- Case study analysis 40% 10/12/2013 to 22/01/2014
Motivation
In 2012 the whole mobiles ecosystem’s revenue was $1,551 billion; equivalent to 2.2% of global GDP.
5% CAGR (Compound Annual Growth Rate).
These are only projected figures, the picture is much bigger than this as this only includes money which takes into account the money which stays within the ecosystem (doesn’t include ecommerce, etc.). The affect is much higher.
Mobile Landscape
Constantly changing given that its a new market. The iPhone was the real kickstarter for this in 2007.
Timeline of the history mobile telephones.
Difference between “smart” and “feature” phones.
Mobile subscribers are growing 4 times faster than the global population. Reducing now (especially in the West) due to market saturation.
More scope for growth in developing countries.
Asia Pacific accounts for approximately half of global subscribers.
Total sim-enabled connections (including machine to machine (M2M) connections). Estimated 7.6% CAGR 2012-2017. Percentage of M2M connections is increasing rapidly.
Multiple devices (tables/phones/etc.) allow growth in sim-enabled connections.
Mobile data traffic is increasing rapidly. Video speculated to be one of the most rapidly increasing traffic.
Mobile versus Desktop
Over time the sales of desktops (PCs & Laptops) is decreasing while the sale of mobile devices is increasing.
Some predictions by GSMA
- Asia predicted to add approx. 50% of all connections/subscribers between 2013-17
- Same period Latin America and Africa next 20%
- Subscriber growth in developed nations slowing (e.g. 1% in UK)
- Total data traffic load in 2012 was 0.9 Exabyte/month).
- Predicted to be 11.2 EB/mth.
Chief Stakeholders
- Network operators
- Content providers
- OS vendors
- Device vendors
Fragmentation and Differentiation
Issue of fragmentation and differentiation, related to:
- the number of OS variants,
- the number of device vendors wishing to customise the OS,
- different browsers,
- Network operators and device vendors do not like commoditisation.
- Don’t want to be treated like commodities which are values on price.
Top 25 ranked apps in 2012 take 15% of all revenue. Only 2% of App Store newcomers get into the top 250 apps.
Free apps are becoming more of a norm, especially with the saturation of android.
Mobile Network Operators
Revenues are decreasing due to over-the-top services (i.e. using online services over SMS or phone calls).
Operators will push, via their shop sales-assistant commission, those devices they gain the biggest discounts on.
High profile phones promoted, often Android.
Apple disrupts this model due to a locked mindshare.
Psychological hook of “cheap” contract phones. Free now, pay later ideology.
Billing relationship with the customer. Paying for things using your mobile account rather than via a bank account (directly). Small transaction fee for this service to generate revenue.
- 48% of the world’s population do not have a bank account.
Something for the future, not quite so prevalent at the moment.
Rich Communication Service
Threat of over the top (OTT) services such as Skype. Quality of service is currently an issue here, but voice call is growing very slowly (13% in 2008 to 4% 2012).
GSMA responded by developing a Rich Communication Service specification. Only aimed at long term evolution (LTE) networks (4G). Aims to provide this as a universal service. 17 different implementations by MNOs including AT&T, Telefonica, Verizon and Verdafon.
Device Vendors (OEMs)
Nokia sales falling. Haven’t kept up with the trends.
Samsung now the major player. Mainly due to their adoption of Android.
Apple improving with mobiles.
RIM/Blackberry are focusing on development.
Building an Android smartphone involves:
- Choosing an OS.
- OEMs design phone (HW, UX, required changes to the Android OS UX) plus apps to include.
- Marketing.
- Negotiations with MNOs to arrange bulk discounts.
- Production test units sent to MNOs, app developers and Google.
- Phone is shipped to MNOs and independent stores
- MNOs customise with their own apps.
- Sales to public plus marketing.
Can also go down the stock phone route.
Inception to market is approximately around 6 months when based on an existing OS.
Apple is a little easier to to the locked-in syndrome.
Types of Phone
- Basic Phones
- Call and SMS support only.
- Feature Phones
- No touch screen
- Browser becoming standardised.
- Smart Phones
- Definition changes year-on-year.
- Expectations change a lot.
- More sensors and technology.
- Phablets
- Large smartphones.
- Tablets
- Same functionality as smartphones, often without SIM.
- Slates
- Large tablets.
- Non-phone devices
- iPod touch
- eBook readers
- Google Glass
- Watches like the Apple Pebble.
- Rely on bluetooth or similar.
Operating System Vendors
Android leading market player.
iOS popular.
Symbian (dead now) and Blackberry not doing well.
Windows Phone a late starter.
Android Fragmentation
There are 24 listed Android device manufacturers but at least 259 different device models currently on sale.
Lots of different screen sizes, resolutions, hardware capability, etc.
There is only one manufacturer for Apple products.
Version 4.1+ Jelly Bean most prevalent, but versions 4.0, 3.0 and 2.3 are still a large market share.
Security problems, most malware attacks on Android.
iOS is mostly all the same version. However, companies often prescribe a 2-3 purchase cycle. Corporate world might need to take into account older versions.
Google allow fragmentation to make Android more interesting to OEMs, However this can get in the way of updates.
Webkit configuration varies; browser fragmentation.
Tried to stop fragmentation in version 3, but this lead to new forks. So this has been re-introduced in version 4.
Fragmentation and Differentiation
Differentiation gives a unique selling point (USP) for vendors. Sticking with the same model.
Fragmentation leads to security issues and makes the OS more complex to handle screen sizes, etc. More testing needed. Therefore increased cost.
Windows Phone 8
Microsoft and Nokia. A closed environment (or “walled garden”).
No fragmentation/differentiation. Very few OEMs interested.
How to join the OS game?
Buy into it.
Web as OS platform. Firefox OS is trying hard with cheap alternatives. HTML5, JS, CSS3 and APIs to access hardware.
Very fragmented, depending on rendering engine.
Mobile Webkits are not always the same
Can be configured differently. Must be careful with mobile websites.
Depends on location too (UC Browser more popular than Android in Asia).
Proxy versus full browsers.
Opera mini, for example.
- HTTP request (encrypted)
- Proxy Server
- Target HTTP Server resources
- Renders page
- Compresses into an image map
- Image map returned to Opera mini (encrypted).
Cheaper and faster over a slow network. Very popular in certain developing countries with low bandwidth.
Cheap in terms in processing too.
JS is handled by the proxy server. Not always perfect and may not be enabled.
Content Providers
App Stores
- Apple App Store is a walled garden. 70% developer revenue.
- Google Play. 70% developer revenue.
- Amazon Android Appstore.
- Others (apk) security issues with this.
Mobile Web
Basic tools, responsive web, matching platform expectations, testing, delivery. (5 lectures plus practicals)
Objectives
- To be able to build some simple mobile web sites/applications and view them via desktop browsers, mobile browsers and as mobile applications (hybrid apps).
- To gain an insight into design issues especially multiple screen sizes and browser support.
- To use a common mobile web framework to help build a mobile web application.
- To user PhoneGap™ to turn a webb application into a “native” application which can be deployed.
- To use some common testing technologies for mobile web.
Overview
- Is mobile web being exploited?
- Three main approaches:
- Web site
- Hybrid App
- Native App
- Challenges for mobile web developers
- Standards are a moving target.
- How can we be informed what support there is?
- Given a complex situation, how can we make testing tractable?
Core philosophy
Progressive enhancement and not graceful degradation.
Web site/app Approach
Pros
- One codebase
- Cross platform
- Familiar technology
- Easy to distribute to both mobile and non-mobile
Cons
- Lack app store distributions
- Weak access to device services
- Performance (JS interpretation).
- Not as slick as a native app
Hybrid Applications
Native wrapper around HTML and CSS and JS.
Pros
- One codebase
- Cross platform
- Better accesses to device services
- Can distribute via app stores
- Familiar technologies
Cons
- No full access to device services
- Performance
- Not as slick as native apps
Native Applications
Pros
- Highly polished
- Good performance (potentially)
- Good developer support platform
- App store distribution.
Cons
- Platform specific
- Multiple codebases
Progressive Enhancement not Graceful Degradation
Graceful degradation (of a website):
- Focus is on building a great experience for the latest browsers, testing one browser version back.
- Old browsers provide poor, but passable, experience, only show-stopped bugs fixed.
Progressive enhancement (of website):
- Focus on content, not browsers.
Responsive Web Design (RWD)
- Fluid grids
- Media queries - use
em
overpx
.
Mobile virtual viewports
Mobiles have a virtual viewport which bypass the media queries.
Use the viewport meta-tag to ask the viewport width to be the same as the actual width.
Breakpoints
Research suggest lines of text should be between 45-75 characters for comfortable reading. Columns help with this.
Mobile-first progressive enhancement.
Design site for the mobile device. If media queries are supported and on a bigger screen, add to the existing CSS for mobile to add for the styling rules and changing font sizes.
Simplification of complex designs is harder than making a simple design more complex.
Server-side Detection of Device Capabilities
Client side RWD alone may not be sufficient.
There are some server-side approaches:
- User-agent sniffing.
- Client-side reconnaissance missions.
Classes of “device”.
RESS (Responsibe Web Design and Server-side Components).
The problem
Some services will only run on a mobile device (Phone calls, accelerometer).
Some services are more significant on a mobile device (location awareness, etc.), be careful not to run into the Hide and Cry anti-pattern.
Some classes of device are very different (feature versus smart phone).
What we spit out is so different that RWD alone won’y help. Need variants of the website.
Image sizing issues that RWD won’t solve.
User-agent sniffing
Can use HTTP User-Agent request header with a device description repository to work out:
- Device hardware characteristics
- Operating System
- Browser characteristics
Example: Detecting viewport width
var width = (winder.innerWidth > 0) ? Window.innderWidth : screen.width;
document.cookie = "sitewidth="width;
document.location.reload();
if(isset($_COOKIE["sitewidth"]) {
$_SESSION["screenwidth"] = $_COOKIE["sitewidth"];
}
// ...
if($_SESSION["screenwidth"] <= "380") {
include("includes/mobile-nav.inc.php");
} else {
include("includes/desktop-nav.inc.php");
}
Herding Devices into Device Classes
Device Class: An abstract collection of common characteristics of similar devices and their browsers.
Why? Because we don’t want our server-side code to have to deal with every device (this isn’t scalable).
Typical Classes
- Higher Mobile (WebKit-based, >= 320px width)
- Simpler Mobile (minimal JS support, >176px width)
- Tablets
- Desktop
- Unsupported (if no SLL, Cookies, JS, or < 176px width)
Does unsupported break progressive enhancement?
What to send?
- Stylesheets and Code based on class
- Sometimes single codebase is too small (watch out for duplication of common code).
- RESS can help with codebase maintenance.
Responsive Web Design and Server Side Components
Combine client-side RWD with templating in the codebase for different classes of device.
Sometimes HTML5, CSS3 and JS isn’t enough
Many HTML5 features aren’t widely supported.
Can’t publish to application stores.
Perhaps go for a hybridge: part native, part HTML.
PhoneGap/Cordova is the best known hybrid technology.
IOS Native Apps
Objective-C, relevant design patterns, platform design considerations, handling data, using sensors and location (5 lectures plus practicals)
Objective-C
Based on C and Object-Orientation (based on Smalltalk), has basic rules of C but most of the syntax is smalltalk-based.
Some repeated information is involved, but this is improving.
Uses late binding (names are looked up at runtime, not compile time).
Good, but unusual, object syntax with some useful extensions and libraries.
Verbose (A: yay boilerplate), designed to be descriptive (A: yay dot-enter programming).
Likely to change with each release (good and bad).
Basic Syntax and Objects
Logging and Strings
NSLog(@"Message");
NSString *var = @"Some message";
NSLog(@"Message: %@.", var);
Booleans
BOOL true = YES;
BOOL false = NO;
Numbers
NSNumber *num = @10;
// Conversion to unsigned int
NSUInteger uInt = [num unsignedIntegerValue];
Arrays
NSArray *arr = @[...];
// Accessing elements
arr[index];
// Something about NSArrays being immutable objects.
Dictionaries
NSDictionary *dict = @{@"key": @"value"};
// Accessing
NSString *val = dict[@"key"];
Objects and Messages
Messages are Objective-C’s version of methods, where the Java syntax would be:
ObjectName.methodName()
The Objective-C syntax is:
[objectName messageName]
Example:
NSArray *temp = @[@"Test", @"2"];
NSLog(@"%@", [temp description]);
// Logs information about temp
Storing the output of Messages
Just need to worry about typing
NSArray *temp = @[@"Test", @"2"];
NSString *result = [temp description]
NSLog(result);
Parameters
Messages can have parameters, however they may act differently than expected.
[object methodName:parameter namedParameter:value]
Initialisation
[[Class alloc] init];
Conditionals and Loops
if(condition) {
// ...
} else if(condition) {
// ...
} else {
// ...
}
switch(thing) {
case value:
// ...
break;
}
for(Class *item in iterable) {
// ...
}
Blocks
Blocks are like basic methods.
void (^block)(void) = ^{
// ...
};
Inkoving a block:
block();
Classes
In a header file (.h):
@interface Name : NSObject
@property Type *name;
@property NSString *myName;
@end
Implementing the interface (.m):
#import Name.h
@implementation Name
@end
Accessing properties
Name *name = [[Name alloc] init];
name.myName = @"Value";
Method Signatures
In many languages, you might see:
void m(int x, int y, String text)
A similar method in Obj-C is:
- (void) mAtX: (NSInteger) x
atY: (NSInteger) y
withText: (NSString *) text
Calling this:
[obj mAtX: 2 atY: 32 withText @"a message"]
The method name is: mAtX:atY:withText
Properties
Defined with the @property
syntax.
The attributes in the ()
control how the property manages the underlying data value.
The IBOutlet
is only used on properties that are linked to the user interface.
atomic
is used for multi-threading purposes (read locks, etc.).
Under the bonnet, this is an instance variable (ivar).
Good practise is to cover your ears and pretend they’re just properties.
Instances
Example *inst;
inst = [[Example alloc] init];
// or
inst = [Example new];
alloc
and init
is the typical Obj-C way as classes often have different initialisation techniques. alloc
is a class method, init
is an instance method.
Protocols
Similar to Java interfaces. There are some features where you can have required and optional methods.
Optional methods act as empty methods if they are not implemented.
Set in the header file (.h)
Primitive Types
Primitive types from C can be used in Obj-C, e.g. int, long, float, double.
Additional types:
BOOL
(YES
orNO
)NSInteger
(typedef to either long or int depending on architecture)CGFloat
(typedef to either float or double depending on architecture)
Memory Management
Memory management is achieved by checking the number of references to an object.
In older code, you needed to write lines of code to state when you wanted to keep an object in memory.
Modern applications use ARC (Automatic Reference Counting).
The compile tracks when objects should be kept in memory or removed from the application.
Application Structure
Typical project implementations have a simple structure which corresponds to MVC
AppDelegate
accesses the model (if there is one).
The storyboard is the view, describe many screens in a single file.
ControllerView
is the controller.
Developing an Application
- Make a new view-based progect,
- Add extra views in the Storyboard,
- Design the application by placing buttons, etc. allows easy prototyping
- Make new
ViewController
files and link them to the views - Via drag and drop, define Actions and Outlets in the
ViewController
and link outlets and actions to interface (associate views to controllers) - Implement model separately (if complex enough).
Rule of Thumb
If your newly created app crashes when trying to load a new screen, then you usually have:
- Messed up linking the view to the view controller.
- Done something wrong when linking an outlet or action to a screen.
The error message should give you a clue as to what is wrong.
Types of Interfaces
- Tabbed screens
- Tables
- Linked Views
- More examples from iOSDevUK
You can put each of the views into the storyboard and define how you segue from screen to screen then write code to pass data between them.
Tabbed Screens
Tables
UITableView
is used to display repeating section of data. UITableViewCell
is a cell in the table.
Unlike classic tables on other platforms, there is only a single column.
Tables contains many sections, which in turn contains many rows.
The entire table can have a header and footer. Sections can also have a header and footer.
Each table must have at least one section to be complete.
Table Styles
Table style is set at build time only.
Plain
Cells flush to each other
Grouped
Not in iOS7.
Cells
Each row contains a cell.
When a row’s data is to be displayed, the UITableView
will request the cell.
By default, the UITableViewCell
is used to degine a cell.
The cell is initialised in the method: tableView:cellForRowAtndexPath:
A table can contain different types of cells for different rows.
The cell is reused.
Cell Structure
- Content
- Layout depends on Style
UITableViewCellStyleDefault
UITableViewCellStyleSubtitle
UITableViewCellStyle1
(right detail)UITableViewCellStyle2
(left detail)
- Layout depends on Style
- Accessory
- Disclosure Indicator
- Detail Disclosure Button
- Check Mark
Custom Cells
Content of a cell can be defined in code, but is tedious.
It is more likely that IB will be used to define the layout for the cell and use a View Controller to control the behaviour.
Table View Contoller
UITableViewContoller
is a subclass of UIViewControler
that is designed to manage UITableView
s.
If a new view based controller is created, the XIB will contain the table linked to this controller.
In order to provide custom behaviour, the following methods needed to be implemented:
UITableViewDelegate
- Manage selections,
- Configure section headings and footers,
- Help to delete and reorder cells,
- Perform other actions
- (All optional)
UITableViewDataSource
- The data source provides the table-view object with the information it needs to construct and modify a table view. Two are required.
Linked Views
Data Storage on iOS
Techniques for storing and retrieving data on the iPhone.
XML and SQLite provide the best portability.
NSUserDefaults
User preferences.
Persistent map.
Property List Serialisation
Mechanism to store a selected number of data types in a property list (plist).
NSCoder
Technique to save more complex objects.
XML
XML data files.
SQLite
Small memory footprint SQL database.
Core Data
More complex system, but offering a powerful way to manage data and relationships. Best integration with iCloud.
Application Sandbox
iOS applications are run in a protected area on the device. Within this area there is a defined structure of Directories that you can access.
The developer is responsible for managing files that are created within the application area.
SQLite 3
Database and library available in iOS.
C library, which needs to be included in the project.
Create a SQLite database on the command line and include in the project.
Need to marshal data into and out of the database, with some conversions (e.g. Strings).
Where to store the database
By default, your database is in the default bundle, inside your application.
If read-only data is needed, it can remain there.
Write access needs to be moved to the Documents directory.
Android Native Apps
Relevant design patterns, platform design considerations, handling data, using sensors and location, handling device variation (5 lectures plus practicals)
Introducing Android
Android Inc. started developing an open source mobile OS, acquired by Google in 2005.
Uses the Dalvik Virtual Machine:
- Virtual Machine and Dex Language
- Linux kernel
- Apache Harmony class libraries.
The effect of OS segmentation
The Android platform is currently changing very quickly.
Each manufacturer has a coperate theme to it.
Why Android?
- Popularity and market saturation.
Android IDEs
- Eclipse and ADT.
- Android Studio.
Application Space
There are other apps which are isolated from the application, but are able to communicate with intents.
Linux Kernel and Android OS
Operating system process
Owned by the application.
Dalvik Virtual Machine
A special Java VM for running on Mobile Devices.
May be moved to the ART (Android Runtime).
Application
No single main method. The way that Android apps are developed, and the model following, doesn’t have a concept of a start point or end point, rather it has the idea of the behaviours.
Activity
A single screen layout and the code behind.
A functional piece of UI.
Service
A background process.
Does not have a UI, may interact with them though.
Content Provider
A way of sharing a local database.
Queries and cursors used to interact.
Broadcast Reciever
Application Lifecycle
Memory of mobile devices is much more limited than on PCs or servers.
Short battery life is an issue.
Lots of apps vying for limited resources.
Android Manages Memory Carefully
Android has the opportunity to remove resorces from memory when screens are switched.
If necessary a whole application and its process will be removed.
Need to be able to reincarnate apps.
List Views
Used to display scrollable lists of data.
List Activity
Extended by classes to handle clicks, etc.
Acts as a controller in an MVC sense.
List View
Used by the List Activity
Acts as a view in an MVC sense. Does the hard work of sizing rows, etc.
List Adapter
Created by the controller and acts as the data source for the list view.
Acts as a model in the MVC pattern.
Array Adapter <T>
Simple array of type T.
Generally a hard-coded list of static data.
Simple Adapter
A list of maps, where the map keys relate to UI element IDs.
Simple Cursor Adapter
A dynamic adapter which is based on a cursor of an iteratable object.
Filesystem
Given a private filestore under: /data/data/package-path
Public
On SD Card
Private
Stored under private filestore.
/data/data/package-path/files
Access
Apps access their private files folder with: Context.getFilesDir();
Read/Write
Context.openFileOutput(String name, int mode);
Context.openFileInput(String name)
Media
On SD Card under the standard location for media files.
Temporary
/data/data/package-path/cache
Shared Preferences
/data/data/package-path/shared-prefs
XML only
Databases
/data/data/package-path/databases
Bringing it all together
Legal, social, ethical and professional issues. Selecting platforms and delivery methods, getting the interface right. (3 lectures)
Motivations
The BCS, under its Royal Chater, is required to establish and maintain standards of competence, conduct and ethical practice for information systems professionals.
The ability to recognised the legal, social, ethical and professional issues involved in the exploitation of computer technology and be guided by the adoption of appropriate professional, ethical and legal practises.
Legal Issues
Which laws relate to the use of computers?
Data Protection Act
The Equality Act
Copyright, Design and Patents Act
Contract Law and Licenses
Computer Misuse Act
Social Issues
What is the impact of computers on society?
Ethical Issues
How should we act?
Professional Issues
What are our professional responsibilities?
SEM5640 - Developing Advanced Internet-Based Applications
This module builds on SE31520 examining further the development processes, techniques and technologies for constructing Internet-based, distributed software systems. Particular emphasis is placed on studying and applying enterprise design patterns. Students, working in small groups, will build, using a modern software engineering methodology, a distributed application that employs the technologies and concepts taught in this and previous modules.
Nigel Hardy awesome tally: 2
Introduction
Provides a road-map to the module, with a brief overview of main concepts and how they interrelate.
Staff
Learning Outcomes
- Analyse a complex software engineering problem.
- Within a group design and implement an original solution to the problem.
- Test and critically evaluate their solution.
- Critically assess the relative merits of various software development methodologies within the context of the given problem and nature of the project team.
- Working in a group, apply the chosen software development methodology to solve the given problem.
- Critically explain the relative merits of alternative server-side technologies.
- Demonstrate a critical appreciation of design issues encountered when developing multi-tier, distributed applications.
Assessment
- Group Programming project and report 75%
- Exam 25%
Internet Applications
- Uses network connection
- Browser-based
- Client-server model
- Reliance of collection of internet standards (e.g. HTTP)
Advanced Internet Applications
- AJAX/Plugins support to provide a client rich experience
- Multiple Application/Servers
- Distributed or more complex processing
- Transaction (DB) support
- REST/SOAP or other remote procedure calls
No precise definition.
Overview
There are several themes this module will look at:
- Multi-Tier Internet Applications
- Developing using .NET and Java EE
- Design Patterns
- MVC
- ORM
- Web Services
- REST
- SOAP
- RMI/RPC
- Building Applications
- Cloud Computing
- Structure and deployment
Multi-Tier Internet Applications
Applications that involve multiple servers which run different parts of the application so that the business logic is separated from the presentation logic.
Different parts tend to include:
- Client
- Web
- Business Logic
- Data/EIS
Lots of small applications which build up a larger, more complex, application.
Communication between the servers provides the linkage between the applications.
Supports scalability if done correctly. Also more resistant to failure. Potentially more secure (more important servers firewalled).
Should all be transparent to the user.
Java Enterprise Edition (J2EE)
- Building web applications
- JavaServer Faces
- Enterprise Java Beans
- Portability
.NET
- Building web applications
- Web Forms, MVC
- C#
- Tiers and layers
Design Patterns
Two types of issues:
Patterns of Enterprise Architecture
- Building and structuring the application
- Relate to .NET and Java
Patterns of Enterprise Integration
- Making applications communicate
- Examples:
- Apache Camel
- Apache ActiveMQ
Building Applications
Using Cloud computing and how relevant it really is.
Examples from eBay of structure and deployment.
Web Services
Application that provide services which have destination endpoints with operations.
SOAP XML, WSDL and REST are the big players.
Alternatives to REST?
Construction Technologies
Examination of technical solutions for building enterprise applications as provided by enterprise Java and .NET for building distributed applications.
These include:
- support for server-side MVC, e.g. JavaServer Faces and ASP.NET MVC;
- business objects, e.g. Enterprise JavaBeans;
- object to relational mapping, e.g. MS LINQ, Java Persistence API;
- messaging systems, e.g. Java Messaging Service;
- interoperability, e.g. SOAP-based web services.
Java Enterprise Edition (Java EE)
Java is a language (JDK) and a platform (JRE).
The platform is the Java Virtual Machine and a set of APIs.
Four platforms:
- Java Standard Edition (SE)
- Java Enterprise Edition (EE)
- Java Mobile Edition (ME)
- Java FX
Java EE
A superset of Java Set, providing a development model, APIs and runtime environment (the server).
“For developing and running large-scale, multi-tiered, scalable, reliable and secure network systems” - The Oracle BS
Java EE Web Tier
Access the web tier with HTTP(S).
Provides static resources and dynamic page generation. Handles client input and navigation flow with maintance of state.
JavaBeans provide temporary storage on the Web Tier.
Can do simple logic, could build the whole application in this tier, but would be a bad design.
Java EE Business Tier
Provides Enterprise Java Beans (EJB) components.
Access to Java Logic, etc. which provide the business logic.
Java EE Enterprise Information System (EIS) Tier
Logically and typically on other machines, typically “foreign” technology.
Other Clients
Can build Java Applications or Web Applications which access the EJB components directly. Uses remote procedure calls.
Java EE Application Server
Provides:
- The JVM and core APIs
- The EE APIs
- Component containers
- component lifecycle management
- creation
- association
- services to components
- security
- transaction management
- directory lookup
- remote connectivity
- component lifecycle management
Containers
A set of Java classes packaged into a jar file.
Can be deployed using either:
- XML descriptors
- Java annotations
Annotations are widely used, but can be overriden by descriptors.
Components plus deployment descriptors make up a module, each for one container type.
Four types of EE modules:
- Web modules
.war
- EJB modules
.jar
- Application client modules
.jar
- Resource Adapter Modules
.rar
A set of modules makes up a Enterprise archive: .ear
Servlets
Servlets are the olderest EE web component.
Servlets act as a dynamic page, which have a URL, and must receive a HTTP request, from this it generates a HTTP response.
Is a Java object with a managed lifecycle.
Extension of javax.servlet.http.HttpServlet
class.
Lifecycle
Managed by the web container.
The first call is:
- Load class
- Create an instance
- Call
init()
Any call:
- Handle the HTTP request
Perhaps calls destroy()
Only one instance is needed by a running server.
Handling Requests
The HttpServlet
class has stubs for all the HTTP methods doGet(...)
etc.
service()
method used to handle the finding of appropriate methods for the HTTP headers.
Common to use the same method to handle GET
and POST
(for some strange reason, this sounds bad).
Request
Provides access to:
- Call details
- Parameters
- Body
- A reader
- Context
Response
Provides access to:
- Response code
- Headers
- A writer (for the body)
Forwarding
Can forward requests to other servlets, passing on the request and response objects. Used for branching on context, for example.
RequestDispatcher d = req.getRequestDispatcher('name');
d.forward(req, resp)
Instance Variables and Servlets
A servlet could have instance variables, however these variables are not specific to a connection, rather the whole servlet.
There may also be multiple servlets so there is no guarantee that these will be the same.
Need features for session data.
Reminder that HTTP is stateless.
Sessions
The request must carry some identification of the session, server usually generates this information.
- Parameters
- Cookies
Remember EU Cookie Law
Error Redirects
Specified in the web.xml
descriptor.
Useful Features
- Templating
- Validation & Schemas
- i18n
- Security
- Database access
- Support for good design patterns
- Tight Integration
- Testing Frameworks
- Web Service support
Generally: higher level libraries and code reuse.
i18n and L10n
Internationalisation and localisation are both important. Don’t hardcode for the locale, instead store translations in DB or files and load them dynamically.
Authentication
Authorization
Testing
Use JUnit where possible (with mocks, stubs, etc.)
Can test Java Beans using an embedded EJB Container.
Service testing is slightly more difficult
Service Testing
Testing the resources in an automated way to avoid manual input.
Provides rapid and reliable regression testing.
Handles:
- HTTP(S)
- FTP
- LDAP
- JMS
- SOAP
Specify calls, with parameters or sequences.
Selenium or JMeter are options (JMeter also does load testing).
JMeter also has a NetBeans plugin.
.NET
Microsoft framework for developing and deploying applications with:
- Cloud (Azure)
- Web Server
- Desktop
- Restricted Resource Devices and Mobile Phones
Similarities with Java (JRE).
Focused on Windows, but can run on other platforms through mono.
Overview
Common Language Runtime (CLR)
The .NET JRE effectively.
Loads and executes .NET “managed code”.
CLR loads code into application domains to provide a level of isolation and the ability to stop and remove a domain.
Provides Just In Time (JIT) compilation, memory management and GC.
Security based one roles and defined by policies.
Building and Deploying Code
- Compile source code files.
- Combine IL (MSIL), metadata, config files and other resources into assembly files.
- DLL or EXE extensions
- DLL is a library module
- EXE is only type that will activate the CLR
- DLL or EXE extensions
- Deploy Applications to CLR
Internet Information Services (IIS)
The application server which runs ASP.NET applications.
Server-Side MVC
Methods for implementing the MVC pattern in an Web Application.
JavaServer Faces (JSF)
Provides:
- Templating
- Facelets
- JavaServer Pages (JSP)
- Validators
- MVC Pull
- Navigation
- Fixed
- Dynamic
- AJAX
- i18n/L10n
- Session Management
- DB Access
- JDBC
- ORM frameworks
- Testing Frameworks
- Code reuse
- Web Services and resources
Basic Technology
Basic Api for represention components and managing their state; handling events, server-side validation and data conversion; defining page navigation; supporting i18n and accessibility; and providing extensibility.
A tag library for adding components to web pages and connecting components to server-side objects.
Expression Language
Like most scripting languages $
and #
are used.
${expr}
are rvalues (read-only) and are processed immediately.
#{expr}
are *lvalues (read-write) and have deferred evaluation. Also used for method expressions.
${object.variable}
calls object.getVariable()
under the covers.
#{object.method}
calls object.method()
.
Faces
A replacement for JavaServer Pages (which are now deprecated).
Write (X)HTML with special tags which have an XML appearance which will be rendered as HTML with functionality behind it.
<h:body>
is the top level tag.
All pages will tend to act like a form, so <h:form>
is very commonly used, even in cases where there would not normally be a form.
Common Attributes
id
style
to associate CSSrendered
is a condition to check is the element should be rendered on the page. The condition can be an expression from the Expression Language.value
the value of the component, again this can be from the Expression Language, which links the view to the model.
Composite Components
Built from standard components.
These can be parametrised and stored in resources
; they have a namespace (default: http://xmlns.jcp.org/jsf/composite/ezcomp
)
<!DOCTYUPE html PUBLIC ...>
<html xmlns="..."
xmlns:cc = "..."
xmlns:h = "...">
<!-- INTERFACE -->
<cc:interface>
<cc:attribute name="x" required="true" />
</cc:interface>
<!-- IMPLEMENTATION -->
<cc:implementation>
<p>
<h:outputLabel for="i" value="Repeats" />
<h:inputText id="i" value="{cc.attrs.x}" />
<h:message for="repeat" />
</p>
</cc:implementation>
Templating
Can create template pages which has values to fill in. The inheriting page has to fill in these values.
Navigation
Write navigation rules in XML, each rule is a page (view). For each page there is a set of outcomes (in string form). For each outcome a next page is specified.
Typically stored in faces-config.xml
Outcomes are generated from the action
attributes from components.
There are implicit navigation rules which falls back to a matching page if it can be found.
Converters
Implement javax.faces.convert.Converter
Have associated error string displayed if conversion fails (<h:messages>
).
Number of default converters for most Java Objects and primitives.
Converters are used in four ways:
- Bind the component to a managed bean property of the right type (default).
- Put the converter class in the component’s
converter
attribute. - For numbers and dates, nest the
f:convertDateTime
orf:convertNumber
tag inside the component. - Put the
f:converter
tag inside the component and refer to the converter (general purpose tag).
Events & Listeners
Part of the component model, application events are generated by components (ultimately from the rendered page).
The JSF application can map HTTP requests to the specific handling code.
Two types of event:
- Application event; when the user activates a component that implements
ActionSource
(buttons, links, etc.). - Value-change event; when the users changes the value of a component represented by
UIInput
.
Listeners cause the application to respond to events:
- Implement an event listener class to handle the event and register it to a component.
- Implement a method of a managed bean and refer to it in the EL attribute in the component tag.
Validators
Similar to converters, but only provide validation.
Implement javax.faces.validator.Validator
Again, default validators are provided.
Registered by:
- nesting the validator’s tag inside the component
- Nesting the
f:validator
tag inside the component.
BeanValidator
uses validation methods in the bean instead of having to write a separate class. Also allows the validation to be model-state specific.
Bean Validation
public void validateVar(FacesContext context, UIComponent toValidate, Object value) {
int input = (Integer) value; // cast the given value
if(!condition) {
((UIInput) toValidate).setValid(false);
FacesMessage message = newFacesMessage(
"error message");
context.addMessage(
toValidate.getClientId(context),
message);
}
}
Lifecycle
Starts when the client request a URL and ends with the server response.
ASP.NET
.NET Applications for the web, which uses:
- Web Forms - Event based
- Web Services - Application to Application
- MVC
- ASP.NET AJAX
View State
Stores the entire state in a hidden item in the form. Simplistic, but does affect performance and security.
Sent in base64.
Can disable for the entire page, or for specific elements.
Also have the ability to encrypt, EnableViewStateMac
or store server-side.
Session State
There are different techniques which can be used to store session state:
- Cookies (bad)
- REST Requests (not stateful)
- Server Side session tokens (persistence/scalability questions)
ASP.NET uses server side session tokens:
String myVallue = "A. Name";
Session["name"] = myValue;
myVale = (String) Session["name"];
Can configure where the data is held. Can also turn off the requirement for cookies.
In Process (InProc)
Same as ASP.
SQL Server (SQLServer)
Store state in Microsoft’s SQL Server, which can be on another machine.
StateServer
Store in a separate process, which can be on another machine.
Custom
Write your own mechanism.
Off
System.Web.UI.Page
Default base class for code-bheind filels.
Implements System.Web.IHttpHandler
:
IsReusable()
- true if the handler can be pooledProcessRequest(
- Processes the actual HTTP request.
HTTP Pipeline
GridView and SQL DataSource
Ways of display DB data.
Business Objects
Objects which relate to the business layer of a multi-tier application.
Enterprise Java Beans (EJB)
EJBs act as an ORM framework, where the Java Objects are annotated with the DB schema details.
Three types of EJB Session:
- Stateful
- Stateless
- Singleton
Can also have message-driven beans.
Access can be done through:
- No-interface view (all the public methods are available)
- Through a business interface
- Injection
- JNDI
Transactions
A transaction is a unit of work which supports “ACID”.
Atomic
Every task within a unit of work must complete successfully otherwise the transaction is aborted
Consistent
Atomicity, isolation, durability lead to consistent data. Developer must also define database consistency checks.
Isolated
Preventing interference from other transactions.
Durable
Data written to disc before a transaction can fully complete.
EJB Transactions
In EJB Transactions have a scope. A transaction manager manages this process.
EJBs support two types of transactions:
- Container managed
- Bean Managed
Container Managed Transactions
Transactional attributed specified using annotations or in the deployment descriptor.
Outside of the EJB code.
Reccommended
Bean Managed Tranasctions
Annotate EJB class with:
@TransactionManagement(TransactionManagerType.BEAN)
Offers fine grained control.
EJB Transactions Attributes
Six types of transaction attributes, specified by annotations or descriptor.
The default is Required
.
Can annotate the class to set the default for the class.
NotSupported
Transactions are not supported by the method, any current transaction will be suspended during this method call and resumed once it terminates.
Supports
Follows the caller; if it was in a transaction, it will remain so, if it was not, this will not create a new one.
Required
The method must be part of a transaction.
If the caller was not in a transaction, this will start one.
RequiresNew
This method will always crate a new transaction. An existing one will be suspended and will resume after this is complete.
The outer transaction will not roll-back if this one does.
Useful to nest transactions
Mandatory
Must already be in a transaction, otherwise a TransactionRequiredException
will be thrown.
Never
Must never be in a transaction, otherwise a RemoteException
will be thrown.
EJB Isolation
Problems:
- Dirty Reads
- Non-repeatable Reads
- Phantom Reads
Provide four (fairly standard) levels of transactions:
TRANSACTION_READ_UNCOMMITTED
TRANSACTION_READ_COMMITTED
TRANSACTION_REPEATABLE_READ
TRANSACTION_SERIALIZABLE
In bean-managed transaction, can specify isolation using the JDBC API:
connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE) ;
Using serializable guarantees data consistency, but may affect performance.
Read Uncommitted
Uncommitted changes are visible to other transactions, other transactions can change rows that another transaction has read.
May result in inconsistencies as a rollback by T1 may mean that T2s data is out of date.
Read Committed
Rows updated in a transaction cannot be read by another transaction, but rows read by one transaction can be changed by another.
Rows updated by T1 cannot be read by T2.
Rows read by T1 can be updated by T1 or T2.
Repeatable Read
Rows read by one transaction cannot be updated by any transactions.
Rows read by T1 cannot be updated by T1 or T2.
Serializable
Transaction (appears) to have a lock on the data.
EJB Transactions with Exceptions
Checked exceptions (including application exceptions) do not cause an automatic rollback, but can be set using @ApplicationException(rollback=true)
Unchecked exceptions and RemoteException
s automatically cause a rollback and the EJB instance is discarded.
Injections
Getting a reference to object instances without instantiation or passing in as parameters, providing good decoupling.
There are two mechanisms for this:
- Resource Injection
- Dependency Injection
Annotations to fields or methods specify the injection points.
Provides a runtime binding which provides inversion of control.
Compatible abstractions are required to avoid typing errors; the code using an object is not responsible for choosing an implementation or creating the instance.
Injection is often the implementation for this, but can use:
- Factories
- Service Locators
- Contextualised lookup
Resource Injection
Inject references to objects in the JNDI namespace into any container-managed object.
Gets a reference to a resource using:
- Annotations
- Deployment Descriptor
InitialContext
lookup
Inject as an interface type, code independent of specific implementation.
Not type safe!
Variable Resource Injection
public class MyServlet extends HttpServlet
@Resource(name="java:comp/DfeaultDataSource")
private javax.sql.DataSource dsc;
}
Method Resource Injection
public class MyServlet extends HttpServlet {
private javax.sql.DataSource dsc;
@Resource(name="java:comp/DefaultDataSource")
public void setDsc(javax.sql.DataSource dsc) {
this.dsc = dsc
}
}
Method name must being with set
, with a void
return type and only one parameter.
Object Relational Mapping (ORM)
Libraries which provide a mapping from a relational database to objects in an object-orientated paradigm.
Language-Integrated Query (LINQ)
A query language for .NET languages which is akin to SQL.
Java Persistence API
Access to database at SQL level or through an ORM Framework
JDBC
Provides direct access to a database.
Only provides relational records, not Objects.
Java Naming and Directory Interface (JNDI)
Access to naming and directory services, part of the Java SE API. Looks up objects by name, returning values or references.
API for the clients.
Service provider interface (SPI) wadds new directory and naming services.
Java Persistence API (JPA)
Provides an ORM mapping of Java Object using Java annotations. Provides:
- Create
- Retrieve
- Modify
- Write
(So CURD then? -A)
Entities are JavaBeans, annotated with @Entity
. Automatically mapped, but can label instance variables with extra details; @Column
can map Java types to SQL types.
@Id
is required.
(JPA can be a real pain though, he glossed over this just a little bit -A).
Contexts and Dependency Injection (CDI)
Any Java object can be made managed by its container. The CDI defines scpoes and a managed object has a declared (or default) scope.
This can then be injected into any other managed object.
Application Scoped
@ApplicationScoped
Session Scoped
@SessionScoped
Request Scoped
@RequestScoped
Conversation Scoped
@ConversationScoped
Covers the AJAX exchanged (can be extended using an ID)
Dependent
@Dependent
Inherits the scope from the object it is inserted into.
This is the default scope.
Injection
import javax.inject.Inject
public class Printer {
@Inject
Greeting greeting;
}
Qualified Injection
public class Printer {
@Inject
@Informal
Greeting greeting;
}
Can create a subtype of Greeting which is annotated using @Informal
to allow variants of a type.
Messaging Systems
Acts as a middleware which decouples:
- time
- platform
- language
Which is usually machine-to-machine.
There are a lot of standards.
Models for Messaging Systems.
- Point-to-Point (queue)
- Publish/Subscribe
- Fanout
- Request/Response
Java Messaging Service
A Java API which “allows applications to create, send, receive and read messages using reliable, asynchronous, loosely coupled communication.”
Requires an implementation.
Access Connection Factories or Queues using injection.
Message Brokers
Most patterns require middleware (all can use it).
Possible roles include:
- Routing
- Receiving
- Storing
- Persistent
- Transmit
- Broadcast
- Translate
- Monitor
Brokers
JavaEE applcation servers must have one. Glassfish has Open MQ.
.NET has one part of the service bus.
Apache Active MQ is a good offering.
Security
Authentication & Authorisation, encryption of messages, integrity of messages, transactional, etc.
Some protocols and some implementation already on offer.
Interoperability
Methods of performing B2B operation.
Socket Connections
Powerful, but not generally available as an API. Need a Domain Specific Language.
Remote Method Invocation (RMI)
Marshalling and unmarshalling are handled by generated code.
Some distribution services are provided.
Access to CORBA IIOP.
Builds on services provided by sockets.
Has the problem of specificity to Java. Even with IIOP it is still better in an intranet.
Focuses on synchronous transmission.
RESTful Web Services
Use HTTP methods (GET
, POST
, PUT
, DELETE
, PATCH
and HEAD
) with a URI for the API. Return any form of data; usually XML, JSON or YAML.
Very constrained on header information.
Simple Object Access Protocol (SOAP) based Web Services
Expressed as XML supporting interoperability (both Business to Business (B2B) and EAI).
Typically sent over HTTP but can use other network protocols.
Usually generated automatically using facilities in a program, but it is possible to create messages manually.
More popular than CORBA, extensible through the use of XML.
HTTP avoids firewalls.
Industry backing:
- Miscrosoft
- I’ve Been Moved (IBM)
- Oracle
- BEA
- HP
- SAP
- etc.
Structure
POST /webservice/OrderProcessing HTTP/1.1
...
<?xml version="1.0" encoding="UTF8" ?>
<soapenv:Envelope xmlns:soapenv=“http://schemas.xmlsoap.org/soap/envelope/”
xmlns:xsd=“http://www.w3.org/2001/XMLSchema”
xmlns:ns1=“http://orders.borthdistributors.co.uk”>
<soapenv:Header>
<!-- Optional. Meta-data in form of header blocks. e.g.: security credentials, id, etc. -->
</soapenv:Header>
<soapenv:Body>
<!-- Mandatory. Arbitrary application-specific XML data or elements that map to method/procedure calls or code that processes the XML document. Could be fault data -->
</soapenv:Body>
</soapenv:Enveloper>
Web Services Description Language (WSDL)
SOAP manages the process of communicating information but doesn’t define operations, data types or faults of the service.
WSDL is an XML description of a service, generally autogenerated by tools.
WSDL Elements
In the specification, there are the following eleents that define a service.
- types - the data types used that can be used in the messages
- message - the messages that can be transmitted, which also specifies the types used
- portType - the list of possible operations, linking the input and output messages
- binding - the link between a portType and the specific protocol settings
- port - the definition of a specific location that a particular binding is available.
- service - a collection of ports.
In .NET
Generates proxy code for you. A set of asynchronous methods are provided:
BeginMethodName(...)
EndMethodName(...)
Interoperability in SOAP
How do you expose types as parameters and return values in a way which in interoperable?
Value types map onto XML Schema types, classes or structs are converted into complex types in the schema.
- Classes must be public
- Public fields and properties are serialised as elements
- Methods are not exposed.
Arrays and collections are supported by this, but not all types can be mapped easily (e.g. custome types where methods are used to provide access to calculated data). (A: as you’d expect really)
Windows Communication Foundation (WCF)
Splits Application domain from communication domain.
Brings together lots of different interoperability technologies.
Adress
Binding
Contracts
Software Development Process
There is no such thing as a silver bullet for software development processes.
Testing
Methods for testing a system.
White/Black Box Testing
A basis for which testing is built on
Unit Testing
Testing the smallest possible unit.
Functional Testing
Test that a part provides an expected function.
Non-functional Testing
Tests that a part will not fail to function under certain circumstances.
Integration/Subsystem Testing
Testing larger units of work combine together.
Verification Testing
Have we built the right thing?
Does it conform to standards.
System Testing
Our testing of the entire system.
Acceptance Testing
Testing the system is acceptable for the client.
Formal Methods
Write a formal specification and then program to that specification so that something can be proved to have worked.
VDM
Z Notation
Traditional Methods
Focus on up front design before development
Waterfall Model
- Requirements
- Planning
- Design
- Development
- Testing
- Deployment
- Maintainace
Spiral Model
Iterative Development
Agile Methods
Focus on providing features without large amounts of up-front development and client involvement.
Extreme Programming
Promotes no-one authorship of code with refractoring.
Planning game to put features.
Can use a metaphor to describe the system.
Test Driven Development
Write tests first then produce code to pass those tests (but no more).
Feature Driven Development
Break the system down into user stories, make these stories into features which can then be developed independently.
Need some overall architecture.
Rapid Application Development
Hack it until it works.
Scrum
Self organising teams with a focus on co-location of all team members and communication.
Model-Driven Engineering
- Build a model
- Test the model
- Translate the model into executables
Platform Independent Model -> Platform Dependent Model.
Can use tools like executable UML.
Open Source
Methods for developing software in a more unmanaged and dynamic way.
Cathedral
A big in-house style of development where every developer understands the big picture.
Bazaar
A very dynamic way of developing, which not every developer has the big picture, but understands their small parts, usually needs some figureheads.
Crowd Sourcing
Getting a large amount of people to come up with potential solutions and selecting the best ones.
Hacking
Playing with a system and getting it to do tasks it was not necessarily designed to do.
Design Issues
A number of enterprise application design patterns will be reviewed within the context of technologies discussed in 2. Building multi-tier applications requires developers to know more than just technologies, they must also know and use design techniques suited to their development. Students will learn about reusable enterprise design patterns, and in particular those used for the development of distributed, multi-tier applications.
Domain Logic Patterns
Transaction Script
Organizes business logic by procedures where each procedure handles a single request from the presentation.
Domain Model
An object model of the domain that incorporates both behavior and data.
Table Model
A single instance that handles the business logic for all rows in a database table or view.
Service Layer
Defines an application’s boundary with a layer of services that establishes a set of available operations and coordinates the application’s response in each operation.
Data Source Architectural Patterns
Table Data Gateway
An object that acts as a Gateway to a database table. One instance handles all the rows in the table.
Row Data Gateway
An object that acts as a Gateway to a single record in a data source. There is one instance per row.
Active Record
An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.
Data Mapper
A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself.
Object-Relational Structural Patterns
Identity Field
Saves a database ID field in an object to maintain identity between an in-memory object and a database row.
Foreign Key Mapping
Maps an association between objects to a foreign key reference between tables.
Association Table Mapping
Saves an association as a table with foreign keys to the tables that are linked by the association.
Dependent Mapping
Has one class perform the database mapping for a child class.
Embedded Value
Maps an object into several fields of another object’s table.
Serialized LOB
Saves a graph of objects by serializing them into a single large object (LOB), which it stores in a database field.
Single Table Inheritance
Represents an inheritance hierarchy of classes as a single table that has columns for all the fields of the various classes.
Class Table Inheritance
Represents an inheritance hierarchy of classes with one table for each class.
Concrete Table Inheritance
Represents an inheritance hierarchy of classes with one table per concrete class in the hierarchy.
Inheritance Mappers
A structure to organize database mappers that handle inheritance hierarchies.
Object-Relational Metadata Mapping Patterns
Metadata Mapping
Holds details of object-relational mapping in metadata.
Query Object
An object that represents a database query.
Repository
Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.
Web Presentation Patterns
Model View Controller
Splits user interface interaction into three distinct roles.
Page Controller
An object that handles a request for a specific page or action on a Web site.
Front Controller
A controller that handles all requests for a Web site.
Template View
Renders information into HTML by embedding markers in an HTML page.
Transform View
A view that processes domain data element by element and transforms it into HTML.
Two Step View
Turns domain data into HTML in two steps: first by forming some kind of logical page, then rendering the logical page into HTML.
Application Controller
A centralized point for handling screen navigation and the flow of an application.
Distribution Patterns
Remote Facade
Provides a coarse-grained facade on fine-grained objects to improve efficiency over a network.
Data Transfer Object
An object that carries data between processes in order to reduce the number of method calls.
Offline Concurrency Patterns
Optimistic Offline Lock
Prevents conflicts between concurrent business transactions by detecting a conflict and rolling back the transaction.
Pessimistic Offline Lock
Prevents conflicts between concurrent business transactions by allowing only one business transaction at a time to access data.
Coarse-Grained Lock
Locks a set of related objects with a single lock.
Implicit Lock
Allows framework or layer supertype code to acquire offline locks.
Session State Patterns
Client Session State
Stores session state on the client.
Server Session State
Keeps the session state on a server system in a serialized form.
Database Session State
Stores session data as committed data in the database.
Base Patterns
Gateway
An object that encapsulates access to an external system or resource.
Mapper
An object that sets up a communication between two independent objects.
Layer Supertype
A type that acts as the supertype for all types in its layer.
Separated Interface
Defines an interface in a separate package from its implementation.
Registry
A well-known object that other objects can use to find common objects and services.
Value Object
A small simple object, like money or a date range, whose equality isn’t based on identity.
Money
Represents a monetary value.
Special Case
A subclass that provides special behavior for particular cases.
Plugin
Links classes during configuration rather than compilation.
Service Stub
Removes dependence upon problematic services during testing.
Record Set
An in-memory representation of tabular data.
eBay Example
Partition Everything
Split everything into manageable chunks: data, load and usage.
If you can’t split it, you can’t scale it
Motivations:
- Scalability - can scale horizontally and independently
- Availability - can isolate failures
- Manageability - can decouple segments and functional areas
- Cost - can use less expensive hardware
Functional Segmentation
Group data using standard data modelling techniques.
Logical hosting: abstract the logical representation from hosts physical location.
Support combining and splitting without code change.
Horizontal Split
Split databases horizontally along primary access path.
Multiple split patterns:
- Hashing based on a key
- Lookup- or range-based
Aggregation/routing in Data Access Layer:
- Abstracts developers from split logic.
- Routes CRUD operations to appropriate split
- Supports rebalancing through configuration changes
No (cross-database) Transactions
Absolutely no client-side transactions, two-phase commits.
Auto-commit for vast majority of DB writes.
Anonymous PL/SQL blocks for transactions within single database.
Consistency without transactions using:
- Careful ordering
- Recovery through:
- Asynchronous recovery
- Reconciliation batch
- Fallover to guaranteed asnyc flow.
Has the added benefits:
- No deadlocks
- Avoids coupling availability
- Maximises update correctly.
No session state
User session flow moves through multiple application pools
Absolutely no session state in application tier.
Transient state maintained/references by:
- URL rewriting
- Cookies (sadface)
- Scratch database
Async Everywhere
- Scalability - can scale components independently
- Availability - Decouples state (up/down), characteristics (always available, best effort), retry operations
- Latency - Can improve user experience at cost of data/execution latency. Can allocate more time than the user would be willing to do.
- Cost - can spread peak load over time.
Message Dispatch
Consumers subscribe to event.
- Multiple logical consumers can process each event
- Each logical consumer processes events independently
- Within each individual consumer, single consumer instance processes event
- Guaranteed at least once delivery, no guaranteed order.
Managing timing conditions
- Idempotent - processing event N times should give the same results as processing once.
- Readback - consumer typically reads back to primary database for latest data.
Search Feeder
Read and transform item updates from primary database.
Reliable multicast
- Publish updates to search service
- Persist messages to intermediate data store for recovery
- Resent recovery messages when messages are missed
Search nodes listen to updates
- Listen to assigned subset of messages
- Update in-memory index
- Request recovery.
Periodic Batch
Schedule offline batch processes
Most appropriate for:
- Infrequent or scheduled processing
- Non-incremental computation
Often drives further downstream processing through message dispatching.
Automate Everything
Prefer adaptive/automated systems to manual systems.
- Scalability - can scale with machines, not humans
- Availability/Latency - can adapt to changing environments
- Cost - machines cheaper than humans
- Functionality - consider more feature space
Adaptive Configuration
Define service-level agreement (SLA) for a given logical event.
Consumer dynamically adjusts to meet defined SLA with minimal resources.
Automatically adapt to changes in:
- Load
- Event processing time
- Number of consumer instances
Machine Learning
Dynamically adapt experience
- choose pages, modules and which provide best experience for that user and context
- Order results by combination of demand, supply and other factors
Feedback loop enables system to learn and improve over time:
- Collect user behaviour
- Aggregate and analyse offline
- Deploy updated metadata
- Decide and serve appropriate experiences
Best practises:
- Perturbation (anneling) for continual improvement
- Dampening of positive feedback
Remember Everything Fails
Build all systems to be tolerant of failure.
- Assume every operation will fail
- Assume every resource will be unavailable
- Detect failure as rapidly as possible
- Recover from failure as rapidly as possible
- Do as much as possible during failure.
Motivation:
- Availability - duh
Failure Detection
Application servers log all requests
- Detailed logging of all application activity
Messages broadcast on multicast message bus
Listeners automate failure detection and notification.
Rollback
Absolutely no changes to the site cannot be done.
Code deployment: Rollout/Rollback
- Entire site rolled every 2 weeks
- Many deployments have dependencies between pools
- Rollout plan contains explicit set (transitive closure) of all rollout dependencies.
- Automatic tool executes staged rollout, with built-in checkpoints and immediate rollback if necessary.
Feature deployment: Wire on/Wire off
- Every feature has an on/off state driven by central configuration
- Allows feature to be immediately turned off for operational or business reasons
- Decouples code deployment from feature deployment
- Applications can check for feature “availability” in the same way as they check for resource availability.
Markdown
Detecting that a resource is down or slow.
Application “marks down” the resource.
- Stops it from making calls to it
- Sends alert
Non-critical functionality is removed or ignored
- “Limp mode” operation
Critical functionality is retired or deferred
- Fallover to alternative resource
- Defer process to guarantee aynch message
Explicit “markup”
- Allows resource to be restored and brought online in a controlled way.
Wider Issues
Examination of current issues that are related to 2 and 3, e.g. scalability; approaches to testing and deployment; the use of cloud computing.
Load Balancing
Balance the load of the request across a number of different machines.
There are hardware and software solutions for performing this.
Benefits:
- Scalability
- Manageability
- Server and Content balancing
Has a single point of failure.
Round Robin
Allocate in a circular pattern
Least Connections
Allocate to the machine with the least connections
Response Times
Allocate to the machine with the quickest response time.
Server Probes
Gain even more information about the servers and allocate based on this.
Weighting
Apply a weighting to each machine and allocate accordingly.
Scalability
- Partition by function
- Split horizontally
- Avoid distributed transactions
- Decouple functions asynchronously
- Move processing to asynchronous flow
- Virtualise at all levels
- Cache appropriately
Scaling
- Scale horizontally (scale out) is preferential. Adding more machines
- Scale vertically (scale up). Upgrade machines
Cost basis - try to keep on commodity hardware.
Partition Everywhere
Functional segmentation into logical function (example is the database; split the database into the segments, continue splitting until no more splits can be made).
Horizontal split using an internal ORM implementation, requires additional routing.
Deployment
Integration
- Integration and the enterprise
- Distribution vs Integration
- Coupling
8 Types of integration - Main pattern topics in the EAI book
Integration and the enterprise
Integrating separate applications so that they work together.
Enterprises tend to have numerous systems which weren’t implicitly designed, for example:
- Databases
- Calendars
- Accounting
- HR
- Website
- CRM
- Warehouse Management
- Shipping
- Stock
- Disaster Recovery
These may
- Support different areas of the enterprise
- Mergers, take overs, …
- Different technologies
- Legacy systems
Coupling
Thins “fused” (read dependent on another) together.
Multi-tiered applications tend to be tightly coupled.
Messaging systems offer the chance to be loosely coupled by decoupling the ends of the communication, even offering asynchronous communication.
Types of integration
- File integration
- Remote procedure invocation
- Shared database
- Messaging
File Integration
Export data from one system and import into another using a file system.
Works with different technologies.
Issues of file format, shared file systems, timing.
Question of how data is exported and imported?
- Timed task
- User managed
- Other communication
Automation problems. File locking might be required.
Shared Database
Collection of applications that need to share common data.
Quick access to information for each application.
DBMS removes a lot of issues from file integration.
Tightly coupling the applications to a common area.
Remote Procedure Invocation
Invoking actions in addition to sharing data, using some form of RPC methodology:
- Sockets
- SOAP
- REST
- etc.
Again needs a shared understanding of the data. No shared file format.
Timing question of when do we start the transfer.
Communication might fail, does that affect the coupling of the two applications?
Messaging
Share data by sending Messages over a Message Channel. Application(s) pick up messages and process the information.
Fire and forget for the sender.
Message channel stores and forward to provide reliable message delivery which mimics the real world.
Message Formats
- Custom data formats
- Industry standards for exchange
- XML
Messaging Systems
Example system using message systems:
Message Channel
Logical pipe between senders and receivers, the implementation of this vary:
- Direct Connections
- Central message hub
Each channel has an identity, typically a name.
Different channels exist for different methods of solving problems.
Point to Point
Publish and Subscribe
Messages
A unit of information containing a header and a body.
Different types of message:
- Messages to send commands
- Messages to send documents
- Messages to send events
There are ways to specify how long messages are relevant for
Routing
Sending messages might involve more than just forwarding.
Message Filtering
Appling filters to messages
Content-Based Router
Routing messages to other channels and/or filters based on the content of a message
Splitter and Aggregator
Recipient List
Others
Endpoints
The interfaces to the senders and receivers
- How do they link to the message channel
- Polling consumer
- Event driven consumer
- Messaging gateway
- Idempotent receiver
- Other
Transformation
Isolate different stages and then translate later.
System Management
The challenge for these systems is how do we know what’s going on, given that there may be many different queues in a system.
Object to object communication is easy(ish) to use a debugger or log.
Web systems are a little harder, but possible to debug and monitor.
But how do you monitor message based systems?
Message History
Wire tap
Message Store
Others
Example - Widgets and Gadgets
Infrastructure
WGRUS has four different channels to interact with customers:
- Web site
- Call centre
- Fax
- Notification by email
Taking orders
Processing Orders
Implementing this we gets something like this:
Looking in at the inventory request:
Because an order may have many items, split the order up:
To know which item is for which type, we need to enrich the data at the order stage:
The high level view of the order process now looks like this:
Checking Status
Need to know the status of the order as it can take some time to process.
Using a message store we can log the system:
In situations where point-to-point MQs are used a wire tap is needed:
A process manage can be added to manage the flow of messages in the queues, providing two main bits of functionality:
- Storing data between messages
- Keeping track of progress and deciding the next step
Change of Address
What happens if an address changes?
What happen if the address change is not part of an order?
New Catalog
Announcements
Cloud
How can we use the cloud for advanced internet applications?
Overview
What is the cloud?
- The cloud
- Cloud computing
Principles of cloud computing
X as a service
Cloud - Computing as a Utility
Largely a new business model.
Just computers, networking and applications, but arranged different; pay for usage rather than buying the infrastructure, etc.
The benefit to business
What problems is the cloud trying to solve?
How can the cloud be used for:
- Small to medium enterprise (SME) manufacturing business
- Startup company looking to launch an eCommerce site
- Company gathering data from devices
5 Aspects of Cloud Computing
- Pooled computing resources
- Virtualised computing resources
- Elastic scaling up or down according to need
- Automated creation of new virtual machines or deletion of existing ones
- Resources only billed as used
The Cloud at Your Service, Resenberg and Mateos, 2011, Manning Publications
Data Centres
Cloud providers have large data centres at locations around the world.
SaaS: Software as a Service
The web browser is your computer.
(Pay?) for resources used.
Is this different from managed servers?
Is this different from web applications?
Platform as a Service (PaaS)
Infrastructure as a Service (IaaS)
A lower lever set of infrastructure to choose from:
- Services
- Data storage
- Load balancing
- Content delivery networks
- Message queues
Examples:
- Amazon Web Services
- Windows Azure
Revision
Preparing for the examination in June
2012-13 Paper
1. This question is about integration of enterprise systems and is based on an extension of the fish marketing system developed during the course.
a) Discuss what the terms interoperability and integration mean within the context of enterprise systems
Example extract from 2011-2012 paper
This question is about development technologies and enterprise application patterns applied to the HoBo Room Brokering application, developed during the module.
a. Describe and compare the facilities provided by .NET and by the Java Platform, Enterprise Edition 6 (Java EE 6) for supporting the implementation of each of the following features within the application.
i Registration of human users and authentication and authorisation for use of the various aspects of the application by a range of user types
- Java
- Registration
- Realms (File, JDBC, LDAP, …)
- Group of users
- Facilities to manage/persist user data
- Authorisation
- Roles
- Authentication
- Configuration
- Web server
- Registration
- .NET
- Wizards/template (not many marks for such a galling answer)
- ASP.NET Identity System
- Authentication
- Web.config (protected)
- SQLServer (
.mdf
) - ActiveDirectory
- OAuth
- Authorisation
- Roles
ii Concurrent use of the application by many users, each with their own server-side interaction context.
- Java
- Session beans
- Distinguishing between request, session and application
- Session Management?
- Session beans
- .NET
- Session Management
Session["myKey"] = ...
- InProc
- SQL Server (DB)
- Session State Server
- View State
- Session Management
iii Provision to client applications of a SOAP based search service that returns a set of descriptions of establishments with current availability, which meet given criteria.
- Java
- Annotations
- WSDL generation of exposed class
- Operations through
@WebMethod
on methods @WebService
on class
- .NET
- Attributes
- Operations through
[WebMethod...]
on methods [WebService...]
on class.asmx
still exists, but older.WCF
Windows Communication Foundation- Interface
- methods
- Implementing class
- Address Bound Contract (ABC)
- Interface
- WS* specifications
- Generate proxy from WSDL (client)
Problems of mapping data types between languages.
Public properties/fields exposed for sending over a web service (attributes in .NET)
Raw data only.
Underlying standards and tool support.
SEM5720 - The Internet And How It Really Works
The Internet is a complex, multi-organisation network reaching nearly all parts of the world. The functioning of this network and the applications running upon it depend on a complex set of protocols. This module addresses the fundamental aspects of the most important issues that permit the network and its applications to operate successfully. The module also addresses the current threats to the Internet and topics still emerging from R&D studies around the world.
Postal service analogies: 4
Introduction
This module discusses the detailed underlying operation of the Internet and its constituent components and is an essential topic in its own right as well as providing a solid foundation for much of the other material covered in the MEng.
Staff
Learning Outcomes
- Participate in planning networks that are cost effective and realistic in terms of products and services currently available.
- Critically assess proposed networking solutions.
- Assess the effect of likely technological developments on existing network applications.
- Make decisions and provide guidance to others in the choice of appropriate communications technologies to deploy, to solve real world requirements.
- Demonstrate extensive knowledge of the internal operation of the Internet and its protocols.
- Demonstrate an appreciation of the problems that appear in the management of routing and naming in large networks.
- Exercise judgment in the choice of appropriate protocols and services to address the real needs of Internet operators and users.
- investigate, and propose solutions to problems of quality of service.
- Demonstrate an appreciation of the security issues that surround the Internet and its applications and how these can be mitigated.
- Explain the need for a new generation of the Internet and describe current progress towards it.
Assessment
- Assignment 40% ??/??/2013 to ??/??/????
- Exam 60%
Commitment
20 credits = 200 hours of work.
44 hours of lectures, around 20 hours of practicals.
This leaves about 140 hours of personal study, including extra practical work.
There is an assignment worth 40% of the marks (2000 word report).
Text book study and revision.
2 hour exam.
Recommended Reading
- TCP/IP Protocol Suite - Foruzan B., Fegan S.
- RFC or internet drafts available online from IEFT
- Other textbooks and journals
- The Internet Protocol Journal by Cisco Systems
Practicals
Practical work sessions focusing mainly on the electronics and hardware of network issues.
Practical 1
Using the computer connected digital oscilloscopes or picoscopes.
More DNS
nslookup minted.dcs.aber.ac.uk
whilst capturing packets on port 53 captured 8 packets.
The type of the message was of type A and of class IN and was a recursive lookup. There was no response because the message was a query.
The server this was sent to was 193.60.11.253
. In /etc/resolv.conf/
the nameserver: 127.0.1.1 c57net.aber.ac.uk
.
1 question and 3 answers were held in the first answering packet. 4 authoritative records were received specifying the 2 authoritative nameservers from dcs.aber.ac.uk
and 2 from yale.ac.uk
.
The additional records provided all possible IP addresses of all nameservers contacted during the lookup to cache the information.
nslookup set type=MX aber.ac.uk
was then captured. The answer record contained: Mail exchange: aber-ac-uk.mail.protection.outlook.com
.
The authorative record contained nameservers for: aber.ac.uk
(3) and yale.ac.uk
(1).
nslookup set type=SOA dcs.aber.ac.uk
was then captured.
The email of the person responsible for this domain is: cs-root.aber.ac.uk
.
There are 4 authoritative nameservers for this domain: dcs.aber.ac.uk
(2), aber.ac.uk
(1) and yale.ac.uk
(1).
nslookup aber-ac-uk.mail.protection.outlook.com.
The response from this was from microsoft. 2 IP addresses which can switch (load balanced?).
nslookup 193.60.11.36
had a type of PTR
for the name 36.11.60.193.in-addr.arpa
. Still used the standard nameservers.
nslookup sillyname
gave an authoritative and non-authoritative response (one authoritative for sillyname.c57net.dcs.aber.ac.uk
and non-authoritative for sillyname
). Uses two different sets of nameservers, the Aberystwyth local ones and the root nameserver.
dig @dns0.aber.ac.uk 98.34.124.144.in-addr.arpa PTR
shows lots of packets, initially gathering DNS information, then on 144.124.34.98
.
dig www.yahoo.com A
shows that yahoo uses CNAME
instead, with an IP address of 87.248.122.122
.
Basic Issues in Data Communication
A revision of the basic issues in data communication.
Protocol Models and Frameworks
In the 1970s there was no master plan, overall structure nor agreements on application protocols.
Proprietary protocols and architectures lead to a large amount of anarchy.
ISO Committees
In 1977 ISO establishes committees and subcommittees and so on and so forth.
Not just ISO doing this, telecommunications (CCITT) also got involved.
OSI Reference Model - IS 7498
Provides a basic framework using a “divide and conquer” principle.
Uses layering to reduce complexity, where each layer handles one (group of) problem(s).
- Keep things simple.
- Choose boundaries at places that minimise interaction between adjacent layers.
- Functions of a different nature or purpose in different layers
- Similar function in same layer.
- Use all part knowledge and experience.
- Hide implementation within layers
- Special hardware/processors
- Data abstraction levels.
- Internal changes do not affect other layers
- Only create interfaces to directly surrounding layers (controversial).
OSI Reference Model Layers
- Physical - wires, radio frequency
- Data Link - direct link from one to another
- Network - global issues like addressing
- Transport - methods for ensuring quality of service.
- Session - availability of resources, “checkpoints”, etc.
- Presentation - Language/character set encoding
- Application - Not supposed to contain or control whole applications, just the useful parts to applications, e.g. directory service
Exercise
Discuss the statement: The existence of a communications framework like the OSI model promotes competition between companies.
For
- Companies spend more time actually building products than on firmware to deal with communications.
- Standardised method of communication allow products to be linked: more scope for different products.
Against
- No competition to improve the network process.
- No differentiation.
Local Area Networks
A detailed study of variants of the technologies collectively known as Ethernet.
Ethernet
A passive, contention-based broadcast technology that uses baseband signalling.
Passive
No one device controlling the network.
Contention-based
Each device must compete with every other device for access to the network.
Broadcast
Ever device on a shared network fears broadcast transmissions.
Baseband Signalling
Entire bandwidth of a cable for a single transmission.
802.3 CSMA/CD Bus (Ethernet)
Derived from work by:
- Xerox
- Intel
- Digital (DEC)
Publised in Ethernet (DIS Blue Book) V1.0 September 30th 1980.
802.3 Revision D dated December 1982 publish by IEEE.
Revision F published July 1984 and also issued as draft proposal 8802/3 by ISO.
Operation: Carrier Sense Multiple Access with Collision Detection (CSMA/CD)
Distributed Control
- Contend for use
- Acquire access and send a data packet
- No priority
Simple Algorithm:
1.
if(network active)
goto 1
transmit(data)
check for collision
if(collision)
transmit(JAM)
delay(a random time)
goto 1
transmission complete
Checking for collision is the difficult part. Can check:
- I’m transmitting a
1
, is there a1
on the wire. - Do the signals on the wire look like the output of one station and one station only.
Delay must be random to avoid future collisions and not have priority.
Coaxial cable
Very rigorous specifications. Maximum length of 500m and limits on bend.
These are to stop the electric current being interfered with.
Hard limit on propagation: c
(speed of light).
Transceiver (MAU) to link devices to co-ax conveniently.
Terminators absorb the signal at the ends of the cable.
Repeaters repeat the signal onto other cables, repeat a perfect version of the signal (if possible).
Broadcast Networks
If data is inserted into the network, it should reach every member on the network.
Need to know how long it will take for data to get across the network.
Rules tend to be the distance between two extremes in the network is 5 of the co-ax cables and 4 repeaters. 2 of the co-ax cables must be point to point without any devices attached.
Bridges, switches, hubs and routers help build larger networks.
Co-ac Medium Options 802.3
- Original (10BASE5)
- 10Mbps
- Baseband coaxial cable
- segment length 500m
- Max 10 taps per segment
- Max 4 repeaters
- 10BASE2
- 10Mbps
- Thinner coaxial cable
- 185m segments
- Max 30 taps per segment.
Slot Time
Worst case:
B
stars transmitting when A
s packet has nearly reached B
.
B
must corrupt, at least, the last bit of A
s packet.
Therefore, packet length must be such that time to transmit is greater than twice the transmission delay for longest route in the network.
For a 2.5km network, 10MBit/sec, the slot time is 51.2 micro seconds.
Smallest allowed packet needs to be the number of bits it takes to reach from one end of the network to the other, times 2, plus a little extra for luck.
512 bit packets capture the network for 10BASE5.
“Your bits are now all over the network” - Dave Price, 10/10/2013
Calculatng Slot Time
Slot time s
.
Path time p
.
s > 2p
- 3 coax (with stations) = 21.65 bits
- 2 links (no stations) = 25.65 bits
- 4 repeater delays = 7.5 bits
- Transceiver delays = 6.0 bits
- Transceiver cable delays = 2.57 bits
- etc.
Total about 499 bits worst case. Specified as 512 bits to give a safety margin.
Network JAMs
- Repeaters required to propagate JAMs.
- Must make sure all repeaters see JAMs.
- Time delay of 500m coax about 2.165 micoseconds
- 32 bits JAM
…
MAC Frame Format for 802.3
Number of Octets | Field Usage |
---|---|
7 | Preamble |
1 | Start of frame delimiter |
6 | Destination Address |
6 | Source Address |
2 | Length in 802.3 |
1500 | LLC data and padding |
4 | Frame Check Sequence |
- Preamble synchronises.
Ethernet Addresses
48 bits long. Designed to be globally unique.
Globally unique should be completely unique, not always though.
Not expected to be able to be changed, can now be programmed (simply, apparently; nmap --spoof-mac 0
)
Frame details
Minimum frame size is 512 bits, not including the preamble.
Maximum frame size is 1518 octets (1500 octets of data).
Above assumes 48 bit addresses which IEEE 302.3 says “shall be used” for 10 Mbps networks.
9.6µs inter-frame gap to allow desynchronisation.
Time for frame is 51.2µs plus 6.4µs for preamble.
Collision Backoff and Retransmission
Transmitter tries to send for a maximum of 16 attempts to send a frame.
The transmitter waits for an integer multiple slot times determined by the following algorithm prior to each retry.
Rescheduling of each transmission uses “truncated binary exponential backoff”
This ensures a random but increasing delay if many collisions occur:
DelaySlots = rand[0 <= r < 2^k]
Where k = min(n, 10)
for the n
th retansmission
Probabilistic Characteristics of CSMA/CD
All access to the 802.3 LAN only completes with some probability. It is thus impossible to guarantee transfer rates. When the net is very busy collisions might go on forever.
Debates often take place on the usability of CSMA/CD LANs for real time use. The answer depends on the true use intended rather than the LAN.
Bridges
Partition LAN to segregate load, add reliability, add security.
Combine remote LAN segments into a single logical network.
Combine separately developed and controlled LANs.
IEEE 802 LANs often include bridges.
Repeaters clean and forward all data.
Bridges selectively forward data. It will store and forward complete packets.
Forwarding based on header information,
Sometimes known as MAC level relay.
Source Routing Bridges
Hosts discover the route to each other host and are very aware of the presence of multiple LANs couple by bridges.
Needs identifiers for bridges and LANs.
Data is transmitted with routing attached.
Bridges obey the routing.
Transparent Bridges
Bridge learns (or is told) the LAN on which each address exists. Hosts need not to know anything about the location of other hosts and indeed are not even aware of the presence of the bridges.
Frames which arrive are handled in one of 3 different ways:
- Same LAN
- Different LAN
- Unknown destination.
- Broadcast address.
Same LAN
If destination address on same LAN as source address then discard the packet
Differnet LAN
If destination address on different LAN, forward packet.
Unknown Destination
If location of destination address is not known then ‘flood’.
Broadcast Address
‘flood’.
Flood
Forward the packet to every other network attached to the bridge.
Address Learning
Bridges listen to the traffic and can learn where machines where by looking at the source addresses in the packets.
Address Tables
Need to be quite large depending on the network.
Generally there is a time-out associated with the addresses known.
Bridge Cycles
Address Learning works well if there are no alternative routes in the internetwork connections (i.e. a tree structure).
However there often are alternative routes, so bridges might cause loops.
Solution to Bridge Cycles
Need a protocol to avoid cycles.
Result for graph theory states:
For any connected graph, there is a spanning tree of edges which maintains the connectivity but contains no closed loop.
Each LAN represents a graph node and each bridge corresponds to an edge.
Spanning Tree
- Bridges have numbers
- Broadcast number every few seconds
- One bridge becomes the root bridge
- Bridges discover route to root via root port.
- Routes might have costs.
- A designated bridge is determined for each LAN (minimum cost of path to root).
- Only the designated bridge can forward to and from its LAN.
- Bridges communicate with a Bridge Protocol Data Unit (BPDU) consisting of:
- The originating bridge number
- The number of the bridge thought to be the root
- The path cost to root.
Initially every bridge thinks it is the root and it broadcasts a BPDU to assert this fact.
If a bridge gets a BPDU indicated a ‘superior’ bridge exists it assigns its root port and the path cost to root.
If a bridge gets a BPDU from a bridge with a shorter root path it releases any claim to be the designated bridge for the segment.
The lowest numbered bridge becomes the root.
Bridge ports which are not root or designated ports are blocks.
Local Bridge
Connect two (or more) adjacent LANs. Throughput is likely to be high. Hosts not likely to notice much performance degradation unless waiting for each packet to be acknowledged.
Remote Bridge
Connect two (or more) LANs which are widely separated. Bridge consists of two ‘half bridges’ connected by a WAN type link. Link typically 64Kbps to 2Mbps.
Managed Bridges
Bridges often available in a managed form. These are managed from a station and can load the bridges with lots of things.
Lots of monitoring, etc.
Another protocol needed and this needs to be standardised.
Other Network Technologies
A brief look at fast and wireless network technologies.
Alternative LAN Technologies
- 802.4 - Token Bus
- 802.5 - Token Ring
Standards differ at the physical layer
Differ at MAC sublayer of Data Link Layer.
Compatible at the LLC sublayer of the Data Link Layer.
Token Bus
Uses a bus physical topology and ring logical topology. The physical order of nodes does not matter.
Coaxial cable served as a common communication bus.
Token was created bus the token bus protocol to manage access to the bus
Any station that holds the token packet has permission to transmit data.
The station releases the token when it is done communication or when a higher priority device needs to transmit.
Two major problems:
- Bus failure leads to a network failure.
- Adding a new node is not simple.
Token Ring
Similar to token bus, but in a ring physical ring topology.
Messages are transferred in one direction along the ring at all times.
Token Ring networks sequentially pass a token to each connected device.
When the token arrives at a particular node, the recipient is allowed to transmit data onto the network.
Since only one device may be transmitting at any given time, no data collisions occur.
Access to the network is guaranteed
Time-sensitive applications can be supported.
Still used for some real-time applications.
Based initially for the mainframe infrastructure.
Problems:
- Costly
- Complex.
Quick notes on Ethernet
A quick summary of 802.3
Why Ethernet is used
- Inexpensive
- Easy to install, maintain, troubleshoot and expand
- Widely accepted industry standard.
- Structured to allow compatibility with Network Operating Systems.
- Reliable
Ethernet Versions
- Ethernet 10 BASE-T/F 802.3 (10 Mbps)
- Fast Ethernet 100 BASE-T/F 802.3u (100 Mbps)
- Gigabit Ethernet 1000 BASE-T/F 802.3z (1000 Mbps)
- 10 Gigabit Ethernet 10 GBASE-F 802.3ae (10 Gbps)
Coding Convention
- Network speed
- Baseband Signalling
- 2 or 5 stands for coaxial cable and length
- T stands for Twisted Pair
- F/FL/S/L Stands for Fiber-Optic
- X stands for full duplex.
Ethernet Components
- CSMA/CD (No protocol for 10 GigE) as 10 GigE is completely full duplex so it isn’t needed.
- Baseband Signalling.
- Ethernet cables.
- Ethernet card or adapter.
- Ethernet devices.
Ethernet Cables: Twisted Pair (CAT5/CAT5e)
- CAT5: Ethernet and Fast Ethernet.
- CAT5e: Gigabit Ethernet.
Contains four pairs of copper wire.
Cable runs are limited to a maximum run length of 100m.
Operates at 100MHz.
CAT5 comes in two main varieties, solid and stranded.
Fast Ethernet communications only utilise 2 out of 4 pairs.
CAT5 enhanced (CAT5e) uses all 4 pairs to support GigE over short distances.
CAT5e is backwards compatible with CAT5.
CAT6 (250MHz) and CAT7 (600MHz).
Signalling
Baseband is a single transmission involved (digital signal)
Broadband can handle multiple transmissions with a modulated analogue signal.
Baseband Signalling
Network uses all available signal frequencies or the entire bandwidth.
One signal can be transmitted at a time.
Light or electrical pulses based transmission.
Related to digital technologies, but can be used for analogue technologies.
Bi-directional communication (Half duplex) is possible, but difficult.
Multiplexing (using Time Division Multiplexing) allows transfer of multiple signals on a single medium.
Broadband Signalling
Network uses only one frequency or a part of the entire bandwidth.
Multiple signals can be transmitted at a time.
Electromagnetic or optical waves transmission.
Related to analogue technologies but can be used for digital technologies.
Bi-directional communication is possible and fairly easy.
Multiplexing (using Frequency Division Multiplexing) allows transfers of multiple signals on a single medium.
Multiplexing and Demultiplexing
Putting more than one signal on a wire at once and getting it back again at the other end.
Time Division Multiplexing
Gives the illusion of multiplexing.
Each signal is allowed a slot of time to transmit for.
Frequency Division Multiplexing
Actual multiplexing.
Each signal uses a single frequency channel to transmit.
Fibre Optic
Supports all Ethernet versions above 10BASE5.
Carries high-bandwidth data through beams of light carrying electromagnetic signals. Not so affected by noise. Can carry signals up to approximately 70 miles without amplification.
Light can be pulsed in a single wavelength to represent 45 Gigabits of information.
Single-Mode Fibre Optic
Transmits one signal per fibre.
Diameter: 8.3 to 10 microns that has one mode of transmission.
Speed and distance: up to 40Gbps over up to 100km
Used in telephones and cable TV applications.
Multi-Mode Fibre Optic
Transmits many signals per fibre.
Multi-modes result from the fact that light will only propergate in the fibre core at discrete angles within the cone of acceptance.
Diameter: 50 to 100 microns.
Speed and distance: 10Mbps to 1 Gbps over 275m to 2km.
Used for Computer Networks and LAN Applications.
Network Adapters
Adapters for connecting to ethernets.
Ethernet cards, etc.
10/100 Ethernet
Connect computer using a PCI, PCIe or ISA moethboard interface slot, CAT5/6
Gigabit Ethernet
PCIe slots, CAT5/5e/6/7 and Fibre Optic cables.
Fibre Optics
For tier 1 and 2 internet backbones, External device, Fibre Optic cables.
Slightly different, can use not only for network but for the backbone (a sophisticated network).
Wireless Network Interface Cards
Internal or external device
Wireless Dongles
External device
Ethernet Devices
Note that most devices are completely tied to a specific function, some repeaters might have some Data Link Layer functions depending on the manufacturer.
Repeater
Repeat an electrical signal including noise.
Takes an incoming signal and then generates a new, clean copy of that exact signal.
Overcomes signal attenuation problem.
Allows LANs to extend beyond normal distance limitations.
Physical layer device.
Doesn’t read data frames, just repeat the signal unintelligently.
Hub
Multiport repeater.
Broadcasts frame to all ports and devices.
Allows users to share Ethernet for transmission of data onto a single network (collision domain).
Physical Layer device.
Doesn’t read data frames.
Repeater is usually used for the extension of the length of a network, while the hub is a simple connectivity gadget that is used to broaden a network.
Bridge
Connect two LAN segments of similar or dissimilar types such as Ethernet and Token Ring.
Split a networking into separate collision domains.
Map the Ethernet addresses of the nodes residing on each network segment and allow only the necessary traffic to pass through the bridge.
Do not forward bad or misaligned packets.
Works on the Physical and lower Data Link (MAC) Layers
Switch
Expansion of the Bridge.
Used in heavily loaded networks to isolate data flow and improve performance (parallelism).
Up to on-half of the computers connected to a switch can send data at the same time.
Data Link and Network Layer device.
Cut-through Switches
Are faster because they examined the packet destination address only before forwarding it on to its destination segment.
Store-and-forward Switches
Work like bridges in that they accept and analyse the packet before forwarding it to its destination.
Takes more time to examine the whole packet, although it does allow the switch to catch certain packet errors and keep them from propagating through the network.
Ethernet Encoding
Methods for encoding bits in different Ethernet technologies.
Manchester Coding
Used in 10-Base*
Bipolar scheme, synchronisation via a clock.
Fast Ethernet 802.3u
CSMA/CD and Full Duplex
Backwards compatible.
Physical layer structure (PHY) has been redesigned.
Complex signal encoding mechanisms than the simple Manchester code.
Uses three sublayers.
Convergence Sublayer
Introduces the concept of auto-negotiation; negotiates between two different medium to make the properties similar to for communication
Media Independent Interface
Interfaces the physical medium used so that it can be handled generically.
Media Dependent Sublayer
Deals with encoding, transmission in a way specific to the medium.
Gigabit Ethernet 802.3z
Two distinctive approaches for medium access:
- Half Duplex
- Full Duplex
Most follow the full-duplex approach.
Star topology.
Support standard Ethernet frame format.
Uses 802.3x flow control.
Backwards compatible.
Physical layer is more complex
Gigabit Media Independent Interface
Physical Layer
Media Dependent Interface
Medium
Encoding
Digital to digital encoding schemes.
Unipolar Encoding
Only use one pole (+
or -
). One pole is used to represent 1
or 0
, zero is used to represent the other.
Two problems:
- DC Component
- Synchronisation
DC cannot be transmitted properly through an AC line without noise.
The receiver cannot synchronise its clock to the sender.
Polar Encoding
Uses both poles to represent bits.
Non-Return to Zero (NRZ)
NRZ-Level
1 as positive voltage, 0 as negative voltage.
NRZ-Inverse
Voltage inverts on a 1.
Return to Zero (RZ)
- Positive to 0 is 1.
- Negative to 0 is 0
Biphase
Manchester
Differential Manchester
Bipolar Encoding
Alternating Mark Inversion
Bipolar 8 Zero Subsitution
High Density Bipolar 3
Broadband
Digital Subscriber Line
Uses local telephone lines.
Allows simultaneous void and data transmission.
Asymmetric Digital Subscriber Line (ADSL)
Designed to deliver more bandwidth down than up.
Rates range from 1.5 to 9 Mbps down, 16 to 640kbps up.
Distances of 18,000 feet over a single copper twisted pair.
A splitter is needed.
Very high bit-rate Digital Subscriber Line (VDSL)
VDSL is designed to provide higher bandwidths.
Up to 52Mbps down, 16Mbps up.
VDSL 2
Support the wide deployment og triple play services.
100 Mbps down, 20 Mbps up
150 to 500m distance.
500 Mbps transmission rates over copper cabling by using crosstalk cancellation or “vectorized” VDSL2 based modems.
BT Infinity
Now rolling out ultrafast fibre optic broadband with speeds of up to 100Mb.
Next step up from FTCC is FTTh (Fibre to the Home).
Negative factor is the costs.
Infinity-1
Top speeds are up to 40Mbps down and 10Mbps up
Infinity-2
Topspeeds are up to 76Mbps down and 19Mbps up.
Cable
Wireless LAN (802.11)
Station (STA)
Access Point (AP)
Access points act as a central transmitter and receiver of WLAN radio signals.
Basic Service Set (BSS)
A set of STAs that communicate with each other.
When two or more stations communicate together they form a BSS.
Infrastructure Mode
All wireless clients are connected to an AP.
Generally the default mode for the 802.11b cards.
All mobile STAs communicate with each other via the AP.
Network consumes double bandwidth for one communication.
AP provide buffering of traffic for a mobile broadcast.
Ad-hoc Mode
Wireless clients are connected without an AP.
An ad-hoc network is where stations only communicate peer-to-peer (P2P).
There is no base and no one gives permission to talk.
Mostly these networks are set up spontaneously.
Distributed System (DS)
Two or more BSSs are interconnected using a distribution system.
APs communicate via the DS.
The DS is the backbone of the WLAN and may be constructed of either wired or wireless networks.
The DS is also a thin layer in each AP.
That determines if communications received from the BSS are to be relayed back to a destination in the BSS, forwarded on to the DS to another AP or sent into the wired network infrastructure to a destination not in the ESS
Extended Service Set (ESS)
An ESS is a BSS where APs communicate amongst themselves from one BSS to another..
Entire network looks like an interdependent BSS to the Logical Link Control (LLC).
This means stations can communicate or even move between BSSs transparently to the LLC.
Portal
The logical point at which medium access control (MAC) service data units (MSDUs) from a non-IEEE 802.11 local are network (LAN) enter the distribution system (DS) of an extended service set (ESS).
WLAN Frame
MSDU and AMSDU
Multiple MSDUs are aggregated at the MAC layer and are pushed into a single MPDU.
They have a single frame header with multiple frames and they are destined for the same client and the same service class.
Mac Service Data Unit (MSDU)
It is the MAC service data unit. This is the unit o transmission used at the MAC layer which is received from the upper layer.
Aggregated Mac Service Data Unit
Aggregation of the MSDU directly performed at the MAC layer is called AMSDU.
Such AMSDUs are now passed to the lower PHY later where they are dealt with as MPDUs
MPDU and AMPDU
Mac Protocol Data Unit (MPDU)
These are the frames passed from the MAC layers into the PHY later.
Aggregated Mac Protocol Data Unit
These are the aggregated MPDU units which are pushed into a single Physical Protocol Data Unit (PPDU).
DS Services
While the implementation for the DS is not specified, 802.11 does specify the services which the DS must support.
Station Mobility
No Transition
If a station is not moving within its own BSS or it is not moving
BSS Transition
If a station moves between BSSs within the same ESS
ESS Transition
If the station moves between BSSs of differing ESS.
Station Services (SS)
Authentication
With a wireless system, the medium is not exactly bounded as with a wired system.
In order to control access to the network, stations must first establish their identity.
The authentication relationship may be between two stations inside an IBSS or to the AP of the BSS.
Authentication outside of the BSS does not take place.
Two types of authentication:
- Open System Authentication
- Shared Key Authentication
Deauthentication
When either the station or AP wishes to terminate a stations authentication.
Privacy
An encryption algorithm, which is used so that other 802.11 users cannot eavesdrop on your LAN traffic.
Distribution Station Services (DSS)
Association
A station must affiliate itself with the BSS infrastructure if it wants to use the LAN.
This is done by Associating itself with an AP. Associations are dynamic in nature because stations can move, turn on or turn off.
A station can only be associated with one AP.
This ensures that the DS always knowns where the station is.
Association supports no-transition mobility, but is not enough to support BSS transition.
Reassociation
The service allows the station to switch its association from one AP to another.
Both association and reassociation are initiated by the station.
Disassociation
When the association between the station and the AP is terminated.
Can be initiated by either party.
A disassociated station cannot send or receive data.
ESS transition is not supported.
A station can move to a new ESS but will have to reinitiate connections
Distribution
Getting data from the sender to the receiver.
The message is sent to the local AP (input AP) then distributed via the DS to the AP (output AP) that the recipient is associated with.
If the sender and receiver are on the same BSS the input and output APs are the same.
The distribution service is not logically invoked whether the data is going through the DS or not.
Integration
Where the output AP is a portal. Thus 802.X LANs are integrated into the 802.11 DS.
Physical Layer
Spread Spectrum
The three physical layers originally defined in 802.11 included two spread-spectrum radio techniques and a diffuse infrared specification.
Spread spectrum is a communication technique that spreads a narrowband communication signal over a wide range of frequencies for transmission then de-spreads it into the original data bandwidth at the receive.
Spread spectrum increases the bandwidth of the signal compared to narrow band by spreading the signal.
Techniques
The Code of Federal Regulations (CFR) Part 15 originally only described two spread spectrum techniques to be used in the licensed free Industrial, Scientific, Medical (ISM) band, 2.4 GHz, thus 802.11 and 802.11b.
- Frequency Hopping Spread Spectrum (FHSS)
- Direct Sequence spread Spectrum (DSSS)
Orthogonal Frequency Division Multiplexing (OFDM) was not covered by the CFR and would have required licensing.
802.11a, employing OFDM, was created to work in the 5GHz.
Frequency Hopping Spread Spectrum (FHSS)
FHSS spreads the signal by hopping from one frequency to another across a bandwidth of 83 MHz.
The data is spread over 83 MHz in the 2.4 GHz ISM band.
A short burst of data is sent on one frequency (usually less than half a second).
Then the sender changes to another pseudorandom frequency and broadcasts another burst of data before changing to another frequency, and so on.
The carrier then repeats this pattern.
Direct Sequence Spread Spectrum (DSSS)
DSSS spreads the signal by adding redundant bits to the signal prior to transmission.
The signal is divided into many different parts and sent on different frequencies simultaneously.
Spectrum is divided into 13/14 channels.
However, the FCC specifies only 11 channels for non-licensed (ISM band) use in the US.
Narrowband and Wideband
In communications, band is referred to as the range of frequencies (bandwidth) used in the channel.
Depending on the size of the band (in terms of kHz, MHz or GHz) and some other properties of the communication channel, they can be categorized as narrowband and wideband etc.
Narrowband uses a smaller frequency range (bandwidth)- 300–3400 Hz.
Wideband is a relative term, and the size of the band may be in kHz, MHz or GHz depending on the application.
Narrow Band vs. Spread Spectrum
Narrow Band
Uses only enough frequency spectrum to carry the signal
- High peak power
- Easily jammed
- Easy to detect
- Easy to intercept
Spread Spectrum
The bandwidth is much wider than required to send to the signal
- Low peak power
- Difficult to jam
- Hard to detect
- Hard to intercept
Wireless Mobile Communication/Cellular Telephony.
Utilises lower frequency radio spectrum.
Mobile RF Spectrum
Typically delivered over a wide range of radio frequency spectrum bands (e.g. 900MHz, 1800MHz, 2.6GHz, etc.).
Most of which can also reach indoors to some degree.
Architecture of the GSM Network
Base Station Subsystem (BSS)
Composed of one or more BSC.
BSS is the point where all radio transmission related functions are performed.
BTS and BSC connected through the Abis interface.
BSS connected to the MSC through the A interface.
Base Station Controller (BSC)
The management of several BTS is done by the BSC.
It also provides all the control functions and physical links amoung the different BTS and between the mobile switching centre (MSC) and the BTSs.
Being a high-capacity switch, it provides functions such as cell configuration data, control of radio frequency power levels in BTS, frequency hopping, and handovers.
One MSC serves a number of BSCs.
Base Transceiver Station (BTS)
It is a station or site where antennas and radio transmitters and receivers are placed to create a radio coverage area in the mobile network.
Contains one or more transceivers (TRC) and antennas.
The cell site has a 360 degree omni-directional (omni-sector) antenna that is turned to create a cellular area of a specific size.
Omni-sector means the same frequencies are used in all directions.
Communication from the mobile terminal to the cell site is referred to as uplink.
Cell site to mobile terminal is downlink
Mobile Station (MS)
Mobile phone with ME and SIM.
It consists of the mobile equipment (uniquely identified by International Mobile Equipment Identity (IMEI)
The SIM card contains the International Mobile Subscriber Identity (IMSI) used to identify the subscriber to the system, a secret key for authentication and other information.
The IMEI and the IMSI are independent, thereby allowing personal mobility.
The SIM card may be protected against unauthorised by a password or PIN.
Network Switching Subsystem
Contains the network elements MSC, LR, VLR, EIR, AuC and GMSC.
Mobile Switching Centre (MSC)
Primary service delivery node for GSM and central components of NSS.
It provides all the functionality needed to handle a mobile subscriber, such as:
- Registraion
- Authentication
- Sets up and releases the end-to-end connection
- Location Updating
- Handovers
- Cell routing and roaming subscribers
HLR and VLR, together with the MSC, provide the call-routing and roaming capabilities of GSM.
It takes care of charging and real time pre-paid account monitoring.
These services are provided in conjunction with several functional entities, which together form the Network Switching Subsystem (NSS).
The MSC provides the connection to the fixed networks (such as the PSTN or ISDN).
Home Location Register (HLR)
The database that contains a subscription record for each subscriber of the GSM network.
All the administrative information related to each subscriber registered in the respective communication network, including the current location of the subscriber, is contained in the HLR.
The HLR is responsible for the sending of subscription data to the VLR (during registration)
or GMSC (during mobile terminating call handling).
The location of the mobile is typically in the form of the signalling address of the VLR associated with the mobile station.
A GSM subscriber is normally associated with one particular HLR.
There is logically one HLR per GSM network, although it may be implemented as a distributed database.
Visitor Location Register (VLR)
The database that contains subscriber data for subscribers registered in a MSC.
It contains all the temporary information about the subscribers.
This information is needed by the MSC to service the visiting subscribers.
Every MSC contains a VLR. Although MSC and VLR are individually addressable.
They are always contained in one integrated node.
Equipment Identity Register (EIR)
A database of all valid mobile equipment on the network.
Where each mobile station is identified by its International Mobile Equipment Identity (IMEI).
An IMEI is marked as invalid if it has been reported stolen or is not type approved.
Authentication Centre (AuC)
A protected database that stores a copy of the secret key stored in each subscriber’s SIM card which is used for authentication and encryption.
Gateway MSC (GMSC)
Switching entity that controls mobile terminating calls.
When a call is estabished towards a GSM subscriber, a GMSC contacts the HLR of that subscriber, to obtain the address of the MSC where that subscriber is currently registered.
That MSC address is used to route the call to that subsciber
GSM Interfaces
A Interface
The connection between MSC and BSC.
Abis Interface
The connection between BSC and BTS.
D interface
The connection between MSC and HLR.
Um interface
The radio connection between MS and BTS.
Cell
Cellular telephony derives its name from the partition of geographical area into small cells.
A cell is roughly circular (exactly hexagonal) area with a central transmitter and receiver base station.
The size and shape of each cell is determined by the features of the surrounding area, such as buildings, trees and hills, which can block signals.
In a city, there are many small cells, while rural areas may have very large cells.
Sector
Site coverage (cell) is partitioned into different directions called sectors.
Each cell is usually split into tree sectors. Which overlap with other sectors of neighbouring cells so network is uninterrupted.
Switched Communications Networks
Long distance transmission between stations (called end devices) is typically done over a network of switching nodes.
Switching nodes do not concern with content of data.
Their purpose is to provide a switching facility that will move the data from node to node until they reach their destination.
A collection of nodes and connections forms a communications network.
In a switched communications network, data entering the network from a station is routed to the destination by being switched from node to node.
Circuit Switching Network
A dedicated radio channel is allocated to a single transmission
As long as data transmissions are long and continuous (file transfers) a circuit is used efficiently.
However, most data transmissions are bursty, and dedicating an entire circuit to them is usually a waste of valuable wireless bandwidth.
During idle periods when no data is being sent, bandwidth is still dedicated to the user and not available for others to use.
Packet Switching Network
An entire network may be designed just for packet data.
Packets do not have a dedicated path, it is decided by the routers.
Most old wireless data systems offered minimal data rates, usually in the 10Kbps range. However new wireless protocols bond multiple channels to increase data rates.
Multiple Access (Multiplexing) Protocol
Mobile development is facing the major problem to find protocols that could be used to maximise bandwidth efficiently.
Multiple access enables …
Frequency Division Multiple Access (FDMA)
Time Division Multiple Access (TDMA)
Code Division Multiple Access (CDMA)
Each user is assigned a different psuedorandom binary sequence that modulates the carrier, spreading the spectrum of the waveform, giving each user a unique code pattern.
Cellular Generations
…
1G
2G
2.5G
2.75G
3G
3.5G
3.75G/Pre-4G
4G
Standards
The ISO OSI model.
IEEE 802 Standards
Encapsulation at each level.
Data Link and Physical Layer
More hardware-based layers.
Data Link layer could be divided into two sub-layers:
- Logical Link Control (LLC)
- Media Access Control (MAC)
LLC is more software orientated and interacts with higher layers.
MAC is more hardware orientated and interacts with lower layer.
(R)ARP protocol links hardware addresses to logical addresses at the Network layer.
Physical Layer
Defines:
- Physical and electrical properties of the media
- Mechanical properties of the connectors
- Bit representation by the signals (encoding)
- Transmission rate
- Physical topology
- Transmission mode
Data Link Layer
Defines:
- Encoding bits into packets prior to transmission
- Decoding the packets back into bits at the destination
- Flow control
- Error control (Automatic repeat request (ARQ))
- Access control
- Hardware addressing
- Defining physical layer standards
Logical Link Control
- Frames the Network layer packet
- Identifies the Network layer protocol.
Provides multiplexing and demultiplexing to and from the network layer.
Manages flow control and error control.
Media Access Control
- Addresses the frame
- Marks the beginning and ending of the frame
Managed access control.
Related to the MAC address.
LAN Technologies
- Ethernet
- Fast Ethernet
- Gig Ethernet
- 10 Gig Ethernet
- WiFi
- FDDI
- Token Ring
- ATM LANE
WAN Technologies
- X.25
- Frame Relay
- T-carrier
- ISDN
- DSL Technologies
- SONET/SDH
- ATM
OSI Layers
Protocol Data Unit (PDU) is a term to define any one of the following:
- Data (at Application, Presentation and Session layers)
- Segment (at Transport layer)
- Packet (at Network layer)
- Frame (at Data Link Layer)
- Bits (at Physical Layer)
Each layer, with the exception of the physical layer, adds its own header. This is done because each layer has its own specific function.
Unicast Network Level Protocols
Unicast Network Level Protocols in use in today’s Internet. Including further study of protocols such as IPv4, ICMP, ARP, RARP used in unicast applications and IPv4 and IGMP used in multicast applications.
Internet Protocol version 4 (IPv4)
IP Headers
Organised in octlets as bytes didn’t used to be just 8 bits long.
IPv4 designed for 32 bits.
Version indicates the version of the IP protocol
Time to live stops network loops.
Header checksum to check the integrity of the header but not the data. Very basic checksum which is used by the router to ensure the packet has not be damaged.
This does add significant load to the router, not only this, but due to TTL it has to create a new checkum without much gain. There are better checksums to be used and they do not appear in IPv6.
The IP Address
Tied to an interface, not an actual machine.
Common to have multiple interfaces (eth0, wlan0, etc.)
Can have multiple addresses on a single interface.
Classes of Network
5 classes.
- Class A -
0, 7 bits netid, 24 bits hostid
many host. - Class B -
10, 14 bits netid, 16 bits hostid
- Class C -
110, 21 bits netid, 8 bits hostid
many network. - Class D
- Class E - intended for future use
Class Es will never be used because of Microsoft and lack of value.
Special Addresses
X.X.255.255
= Broadcast AddressX.X.0.0
= Network255.255.255.255
= Limited broadcast address, used in auto-configuration (e.g. DHCP)
Network masks
IF you own the address:
10.0.0.0
But want to share the address so that:
10.1.0.0
and 10.2.0.0
are different networks.
10.0.0.0
is a class A address. therefore is is actually: 10.0.0.0/8
.
So if we change the networks to: 10.1.0.0/16
the network id will be expanded to the first 16 bits.
Loopback Interface
Most TCP implementations have a loopback interface with the IP address 127.0.0.1
(127.X
) and name localhost (or anything else in /etc/hosts
which specifies 127.X
as an address).
The localhost behaves as a separate data link interface,
A packet that is sent to the loopback interface moves down the protocol stack and is returned back by the driver software for the localhost “device”.
Used for debugging.
Loopback is done in software, never appears on the network.
Subnets: An example
Take the university address:
144.124.76.0
Say we want a network per department, can’t use a 24 bit address as that doesn’t leave enough addresses.
So they can use a 22 bit network id.
144.124.76.30/22
or 144.124.76.30/255.255.252.0
Apply the mask 255.255.252.0
to the IP address to get the host id. Invert for the host address.
Network address is: 144.124.76.0
But what’s the broadcast address?
144.124.79.255
(Look at the bits for this)
Here, 255.255.252.0
is the subnet mask.
Used to be allowed to have subnet masks like 255.0.255.255
as it doesn’t add anything.
Classless Interdomain Routing (CIDR) - Supernetting
Treat two contiguous class C networks as a single network.
This eases routing (supernets). For example supernet on geographic locations to make routing tables easier at a router level.
Usual class C is /24, we make them a /23 if we have to contiguous networks
This is the answer to the 192 soup.
Non-contiguous versions of this are disallowed. Must choose numbers for which the maths works.
Private Address Space
There are some special IP addresses
- e.g. loopback
RFC 1918 written when started IP addresses to reserve certain IP addresses:
These can be used for private addressing schemes, but not routable on the internet.
- 10.0.0.0 - 10.255.255.255 (10/8)
- 172.16.0.0 - 172.31.255.255 (172.16/12)
- 192.168.0.0 - 192.168.255.255 (192.168/16)
Network Address Translation
Router has a pool of public IP addresses, when a private IP address attempts to access an external resource. The router maps the private IP address to a public IP address which accesses the resource and returns. The public IP address is then translated back to the private IP address and routed.
In industry the pool may be a class C network. In home networks there is only a pool of one. They also do port address translation.
Address Resolution Protocol (ARP)
IPv4 has the problem that we know our IP address and the ones which we want to talk to, either router or machines on the link.
Very first header is the Ethernet header, need this because every machine would need to look at the IP header, which is a process which needs to be done in software. Other network protocols exist too.
Ethernet headers can be processed in hardware.
Ethernet headers don’t contain much, but the machines can automate the process of deciding to ignore the packet in firmware not by the CPU.
There is a problem of mapping IPv4 address to Ethernet address. There’s no direct link between IP addresses and Ethernet addresses so we need more at the data link layer.
ARP Packets
ARP is cached for a limited amount of time.
If no reply, retransmit after stand-off.
Hardware Type
Which type of hardware are we using
Protocol Type
ARP can be used with other protocols (other than IP).
Hardware Size & Protocol Size
Size of the addresses (hardware and protocol).
Doesn’t map just MAC addresses, etc.
Sender Hardware/IP Address
Hardware address is repeated as the Ethernet header is harder to access in software.
Operation
Code to differentiate request/reply.
- ARP Request
- ARP Reply
- RARP Request
- RARP Reply
RARP - Reverse ARP (looking up IP from MAC). Can be used to discover the machine’s own IP address, for example.
Used before DHCP.
Target Hardware Address
If unknown to the sender, filled with 0
s, otherwise the actual address
Target IP Address
The address being looked for
ARP Delay
First packet:
- ARP Request.
- Processing Time
- ARP Reply
- ICMP Echo Request
- ICMP Echo Reply
Following packets:
- ICMP Echo Request
- ICMP Echo Reply
Fooling ARP
Locate an interface in promiscuous mode, ARP request not broadcast, a dummy address is used.
Some TCP/IP stacks pass the ARP request up the line.
A reply indicates a machine is in promiscuous mode.
neped
Gratuitous ARP
A host sends a request for its own MAC.
The sender’s IP and MAC address are broadcast, and other hosts will insert this mapping into their ARP tables.
Useful to detect duplicate IP addresses (should be no reply).
Causes other hosts to update their ARP cache (useful if the network adapter has been changed or if this is a hot spare).
Why switched networks are not safe
Man-in-the-Middle attacks.
ARP allows for MITMA as the IP and MAC are learned by the switch. Sending out a false ARP reply last will allow you to to spoof the IP address to your MAC address.
Proxy ARP
Same problems with MITMA
MAC Addresses (IEEE 802)
- Gotcha: each Octet of MAC frame transmitted low order first (back-to-front).
- I/G - Individual (
0
), Group (1
) - Unicast or Multicast - U/L - Universal (
0
), Local (1
)
Internet Control Message Protocol (ICMP)
The Internet Control Message Protocol (ICMP) is the protocol used for error and control messages in the Internet
ICMP provides an error reporting mechanism of routers to the sources.
All ICMP packets are encapsulated as IP datagrams.
ICMP Message Types
ICMP messages are either query messages or error messages.
- ICMP query messages:
- Echo request / Echo reply
- Router advertisement / Router solicitation
- Timestamp request / Timestamp reply
- Address mask request / Address mask reply
- ICMP error messages:
- Host unreachable
- Source quench
- Time exceeded
- Parameter problem
Message Codes
Specific to message type.
E.g. for message type 3 (destination unreachable):
- 0 = network unreachable
- 1 = host unreachable
- 3 = port unreachable
- 4 = fragmentation needed but don’t fragment bit is set
- Etc
ICMP Error Codes
Each ICMP error message contains the header and at least the first 8 bytes of the IP datagram payload that triggered the error message.
To avoid too many ICMP messages, ICMP error messages are not sent
- for multiple fragments of the same IP datagrams
- in response to an error message
- in response to a broadcast packet
ICMP Codes
Echo Request/Reply Message Format
Identifier is set to process the ID of querying process.
Sequence numbers are created for each new echo request.
Port unreachable error
Reply telling that a port is not available.
Network scanners may make use of this, e.g. NMAP.
Many hosts will not reply.
Traceroute
Sends out ICMP ping messages with increasing TTL starting at 1. For each host that isn’t the target, an ICMP time exceeded message is sent back.
On UNIX uses the port unreachable message instead (using a UDP packet) by default.
Fragmentation
If a packet needs to be split then a flag is set to say “more fragments”. Identification is the same through the fragments. More fragments flag is not set for the last fragment, but is known to be a fragment due to the fragmentation offset.
If a part of a fragment is missing, after an amount of time packet loss is assumed.
Occurs if MTU < datagram size to be sent out on an interface, “Don’t fragment” flag allowed: causes ICMP “Destination unreachable: fragmentation needed but don’t fragment bit set”.
Reassembly takes place at destination.
Fragments may be fragmented.
Experience shows that fragmentation is best avoided.
Avoiding Fragmentation
- Never send a datagram which is greater than the minimum MTU maximum size it can be sent wholly.
- Path MTU discovery to make sure you never send a datagram which is grater than the smallest MTU maximum size.
Path MTU Discovery
Intermediate routers may have lower MTUs.
Uses ICMP uncreachable Error: Fragmentation Required
ping -s n -M do
:
- Set packet size and “do not fragment” bi
- Returns “message too long” is
s > MTU
. - RFC 1191 has likely values
traceroute.pmtu
- Version of tracerout that automates discovery.
Firewalls can break PMTU discovery.
Sock Traffic Generator
Sock is a test program which can be run as a client or as a server.
Ping
TTL doesn’t always start at 255, might be around 64 but not standardised.
Ping through a router
Routers have many different interfaces. This means, with traceroute, only the nearside interface can be found.
There is a record route option which can find this: ping -R
Unicast Routing in the Internet
Example routing problems. Interior and exterior routing protocols. Protocols covered will include RIP, OSPF and BGP.
IP Routing
Split into two parts:
- Packet Forwarding or Packet Switching
- Routing
Packet Forwarding
Needs to be fast and efficient, every packet involves a forwarding option.
Based on prior knowledge the packet is moved towards the destination.
Routing
Filling routing tables. This requires knowledge of the connectivity in some sense.
This needs to be stable and should converge relatively quickly.
This task should not be performed frequently
Classifying Routes
Routes can be classified into three different categories:
- Default Routes
- Static Routes
- Dynamic Routes
Could also classify by interior and exterior routing.
Default Routes
A default route which is used as a fallback if all other routing fails.
Static Routes
Manually configured information.
Not very scalable.
Dynamic Routes
Automatically use information from other participating routers.
Routing Algorithms/Protocols
Different methods for finding out routes.
Distance View Routing
Uses the Bellman-Ford algorithm.
Regularly broadcast the entire routing table, containing known networks and the “distance” to each (usually a number of hops).
Has the problem of routing loops and convergence after change.
Routing Internet Protocol (RIP)
Uses Distance View Routing
RIPv1 (RFC 1058)
Classful routing with no ability to pass netmasks.
Typically broadcasts every 30 seconds.
Metrics range from 1 to 16 (infinity).
“Split Horizon” often used.
“Triggered Updates” send an update when something (close) changes to improve convergence.
Sent via UDP to the IP broadcast address (usually) to port 520.
Single Splint Horizon
Don’t repeat information to the originator.
Poison Reverse Horizon
Repeat back information, but metric is set to infinity
RIPv2 (RFC 1723)
Refinement of RIPv1.
Includes netmasks and some support for authentication and multicast.
Link State Routing
Each router sends the state of all its links to all adjacent routers.
- Link State Advertisements (LSA)
Each router independently calculates its routing table based on the link state database it has constructed.
Normally uses Dijkstra Shortest Path (Shortest Path First).
Open Shortest Path First (OSPF)
Uses IP directly.
Early versions could calculate a different set of routes for each value of the IP Type-Of-Service (ToS) field, but this was abandoned in the latest RFC.
Interfaces have a dimension-less costs (these were potentially different for each value of ToS).
Supports equal cost load balancing.
Supports subnet mask and thus CIDR.
Point-to-point links can be used without IP addresses.
Supports authentication
Uses multicasting to reduce load.
On a multi-access network (e.g. ethernet), two routers are elected as Designated Router and Backup Designated Router.
IP Address Class Based Only
Classless Inter-domain Routing
Multicast Routing in the Internet
Example routing problems. Protocols covered will include PIM-DM, PIM-SM and MSDP. We will also cover the role of the Rendezvous Point, Anycast IP, and issues still under debate in the technical community.
Multicast
Refers to the sending of data from one to many or many to many registered recipients.
Contrast to broadcast, which floods the network with data, which all hosts will receive whether they want it or not.
Unicast is the sending of data to a single recipient.
Unlike unicast, multicast is scalable,if ten clients request a connection then unicast must provide those ten connections with their own copy of the data.
Multicast, whether there are 1, 10 or 100 members of a ground, only one copy of the data is transmitted and is only replicated where there are group members on different paths to and from the router.
Media Access Control (MAC)
All devices have a unique 48-bit MAC address.
Devices on the LAN keep a table that maps unicast IP to MAC.
A special range of MAC addresses is used for multicast as a multicast address does not correspond to a single end host.
1:32 IP Multicast to MAC Multicast Mapping
Class D address range: 224.0.0.0 - 239.255.255.255
High order 4 bits of the first octet of a class D address are always 1110
(0xE
)
To provide a 1:1 mapping between MAC and multicast IP addresses the remaining 28 bits of the IP address would need a unique representation in MAC addresses.
MAC address range assigned for multicast is only 24 bits. The high order bit is reserved, leaving 23 bits.
Thus, 28 bits of IP addresses need to be mapped to just 23 bits of the MAC address.
This corresponds to a 1:32 mapping MAC multicast addresses to IP multicast addresses.
Translation
Six octets in an Ethernet header. Three high order octets are constant 01-00-5E
.
The remaining three octets have a range of 00-00-00
to 7F-FF-FF
(high order bit is always 0
for IP multicast.
An IP address of 224.1.1.1
maps to 01-00-5E-01-01-01
. Byt the MAC address also maps to 31 other IP addresses.
Host Behaviour Reception
Discarding of unwanted packets has to be handled by the IP stack.
Hosts interesting in 224.1.1.1 will also receive (if on LAN) the other 31 possibilities.
After de-capsulation the Ethernet fram and discovering the IP address of an unwanted packet, the host discards the packet.
The probability of this are quite slim at the moment
Special Multicast Addresses
Local Network Control Block
224.0.0.0 - 244.0.0.255 (224.0.0.0/24)
Should never leave the local network.
Internetwork Control Block
224.0.1.0 - 244.0.1.255 (244.0.0.1/24)
Session Description Protocol (SDP)/Session Announcement Protocol (SAP) Block
224.2.0.0 - 224.2.255.255 (224.2/16)
Source Specific Multicast
232.000.000.000 - 232.255.255.255
Specifically permit or block source addresses.
GLOP Block
233.000.000.000 - 233.255.255.255
Ensure that addresses are specific to an administrative domain. Low number of specific addresses (255).
Administratively Scoped
239.000.000.000 - 239.255.255.255
Private multicast addresses.
IP Group Management
Protocol used is Internet Group Management Protocol (IGMP)
- IGMPv1 - RFC1112
- IGMPv2 - RFC2236
- IGMPv3 - RFC3376
IGMPv1
No explicit leave message.
IGMP Report Suppression
Membership Queries
Sent by routers to ask for existence of hosts interested in receiving multicast groups.
Membership Reports
Sent by hosts in response to Queries,
Also sent by the host if they wish to join a new group.
Message Format
IGMPv2
Major change, introduces an explicit Leave Message
Membership Queries
General Query as before.
Group-Specific Query. Used to query if there are any members of a group left after a leave message.
IGMPv1 and IGMPv2 Interoperability
IGMPv2 Hosts must send IGMPv1 if they spot an IGMPv1 router is the querier.
IGMPv2 Hosts may suppress Leave Messages if they spot an IGMPv1 router is the querier.
IGMPv2 Hosts must suppress reports if they see others using IGMPv1 or IGMPv2 reports for a given group.
IGMPv2 routers must ignore Leave messages if IGMPv1 hosts are present.
If any IGMPv1 routers are present must use IGMPv1
IGMPv3
Major change: supports the specification of sources in messages.
Host now do not suppress Reports.
Thus, supports the concept of Source Specific Multicast.
Protocol messages are now more complex.
Membership Queries
Now contain source address information.
Membership Reports
Now contain information relating to multiple groups and each group report can contain source address information.
Multicast Distribution Trees
The set of links and connections that traffic will follow to get to a destination.
Source Trees
Sometimes called Shortest Path Trees (SPT)
A tree where the root is the source of the data.
A different forwarding tree for every combination of source (S) and group (G). N groups and M sources implies trees.
This makes it difficult to calculate the best tree.
Routers need to maintain separate states for all trees.
(S,G) notation.
Shared Trees
Rendezvous Point Trees or Core Based Trees.
Traffic follows some form of common path, unlike source trees in which two different trees can send data through different paths, at least most of the route will follow a common tree.
Each group in the system uses the same tree. N trees implies N groups, no matter how many sources.
Normally, each group has a nominated router as the Rendezvous Point, which is the root of the tree.
All sources send towards the Rendezvous Point.
Can potentially lead to a single point of failure if no backup is specified.
(*, G) Notation
Multicast Forwarding
Routers must no just forward all multicast traffic. Potentially there could be customers anywhere, unlike with unicast. However, forwarding all multicast would be broadcast instead and would lead to chaos.
Techniques include:
- Reverse Path Forwarding
- Multicast Forwarding Caches
- TTL Thresholds
- Administratively Scoped Boundaries
Reverse Path Forwarding
Note the source address of the arriving packet and interface of arrival. The IP tables are checked for the correct interface towards the source.
If the packet has arrived into the expected interface, it can be forwarded onwards. If not it can be discarded.
Multicast Forwarding Caches
Used to help avoid some RPF calculations
Sometimes called multicast routing tables.
Used to help make decisions for routing tables.
TTL Thresholds
TTL normally used by IP and decremented as each router passed. When this TTL reaches zero, the packet is dropped.
Multicast routers often set TTL Thresholds on interfaces, if the TTL of the packet is less than this threshold then the packet isn’t forwarding.
If traffic was only designed to reach internal locations, the TTL would be set such that it could reach all internal routers, but external routers would have high enough threshold to stop this traffic, but not those expected to reach external locations.
Administratively Scoped Boundaries
These boundaries don’t pass certain multicast addresses.
Multicast Routing Protocols
Dense Mode Protocol
Send all valid packets out of all interfaces, flooding the network.
When a router receives unwanted packets it sends a prune message upstream.
When a prune message is received, the router removes the interface it was received on from the forwarding table for the specified group.
Prune times out in typically 2 to 3 minutes.
DVMRP
Protocol Independant Multicast - Dense Mode (PIM-DM)
Sparse Mode Protocol
Routers send no traffic onwards unless asked.
Shared tree branch constructed from rendezvous point (root) to any interested receiver.
Join messages are sent from the receiver to the root via other routes, creating a shared tree where it goes.
Prune sent when the traffic is no longer wanted.
Protocol Independant Multicast - Sparse Mode (PIM-SM)
Has a rendezvous point using a shared tree, but uses source trees to get data from sources to the rendezvous point. Also allows the final router feeding receivers to switchover to a source tree from the source if the traffic level exceeds some threshold (in Cisco routers the default value for the threshold is zero).
CBT
Link State Protocol
MOSPF
Transport Level Protocols
An in-depth study addressing the behaviour of TCP and UDP. Connection establishment and termination, flow control under various load conditions, timeouts and retransmission, newer features and performance.
User Datagram Protocol (UDP)
RFC 768
- Datagram orientated
- Unreliable, connectionless
- Simple
- Unicast and multicast
- Useful for only a few application
- Used a lot for services
No handshake required which allows it to be fast, especially for many short connections.
Applications must handle failures.
UDP Header
Checksum is the same as the IPv4 checksum, but is over the UDP header, the data, and the UDP pseudo header
IPv4 UDP Pseudo Header
IPv6 UDP Pseudo Header
Example: Trivial File Transfer Protocol (TFTP)
TFTP used UDP, uses a stop-and-wait flow window control algorithm:
- Stop for ACK before sending the next data packet
- A lost packet causes timeout and retransmission.
Designed for diskless systems to download configuration files during bootstrapping.
Does waste a lot of network time.
Transmission Control Protocol (TCP)
- Stream orientated
- Reliable, connection-orientated
- Complex
- Only unicast
- Used for most internet applications
Done by creating connections between two points and aim to provide data integrity.
Also provides flow control.
TCP Header
Source Port Number
Destination Port Number
Sequence Number
Ensures ordering of packets (is actually a byte count of the first byte count in the packet of data).
Doesn’t start at 1.
Acknowledgement Number
Helps find missing packets, is the last byte of the packet.
Header Length
Describes the length of the header.
Reserved
Flags
Urgent (URG)
The value in the Urgent field is valid.
Acknowledge (ACK)
The value in the Acknowledge field is valid.
Push (PSH)
Push the data up to the application as quickly as possible.
Reset (RST)
Reset the connection
Synchronise (SYN)
Used to set up connections
Finish (FIN)
Used to tear down connections
Window Size
TCP Checksum
Same as UDP Checksum
Urgent Pointer
Options
Connections
How TCP establishes, keeps and tears down connections.
States
Listening
Waiting for a connection
SYN Sent
Waiting for acknowledgement of the first send of a SYN
SYN Received
Established
FIN Wait 1
Wait for an ACK
of a sent FIN
FIN Wait 2
Wait for a FIN
from an acknowledged FIN
.
Close Wait
Acknowledged a FIN
but still need the connection to send data.
Last ACK
Final FIN
sent, waiting for the last ACK
.
Time Wait
Final FIN
received and ACK
sent, wait for a number of seconds before closing.
Done in case the ACK
is not received by the other party.
Closed
Not listening, connection closed.
Wait States
TIME_WAIT
is also called the 2MSL wait state.
- MSL - Maximum segment life.
- RFC 793 specifies MSL as 2 minutes.
- 30 seconds, 1 minute, 2 minutes are common implementations.
Connection Establishment
- Server listens on a given port (no network traffic).
- Client sends a TCP packet with the
SYN
flag set to a random number. - Server receives the
SYN
, the 0 offset is the value of theSYN
flag. Server goes intoSYN Received
state. - Server sends a TCP packet with a different
SYN
flag and anACK
flag, returning the Client’sSYN
flag plus one*. - Client returns an
ACK
flag.
* The SYN
flag is assumed to have consumed 1 byte.
If no ACK
is received at SYN Received
state then the SYN
is resent after a timeout.
Known as the three way handshake.
Note: Initial sequence numbers are important! Both client and server choose different ISNs.
Half-Open
When you have a connection between two machines, but one crashes. There’s no data sent between them so one still believes it is connected.
Simultaneous Open
Possible (but unlikely) that two applications will perform active open to one another. Not Client/Server, each opens to a known port.
Results in a single open connection.
Four-way handshake required.
Connection Termination
- Client sends a
FIN
- Server sends an
AWK
and aFIN
- Client sends an
AWK
.
Three way closedown.
ACK
and FIN
might not be sent together (server might still be transmitting data).
Half-Close
When a FIN
is sent to the server but there is still more data to be sent should not close fully until a FIN AWK
is sent and the AWK
returned. Microsoft broke this by sending a RST
instead of a proper AWK
Simultaneous Close
Both sides of a connection might perform an active close.
Four segments required, which s not unusual.
TCP Options
TCP Header also supports options.
Originally only MSS
, NOOP
and EOS
were specified.
End of Option List (0)
No Operation (NOOP) (1)
Maximum Segment Size (MSS) (2)
Length is 4
2 byte MSS. Usually based on the MTU size.
Window Scale Factor (3)
Length is 3
1 byte shift count.
Timestamp (4)
Length is 10
4 bytes timestamp value
4 bytes timestamp echo reply
Bulk Data Transfer
Bulk data transfer (e.g. FTP) has different requirements.
Low overhead from headers.
Receiving systems have limited buffers.
Sliding windows used for flow control.
Packet loss is expensive; timeouts are the simplest way, but there are better.
Want to send as many packets as possible without flooding the network.
Flow Control In TCP for Bulk Data Transfer
Sliding window flow control is carried out by the receiver.
CWND flow control is carried out by the receiver.
Sliding Windows Flow Control
The idea is that in every ACK
we also advertise a window. The window is a number of bytes, usually a multiple of the MSS
.
The initial window size will be the size of the buffer TCP has available.
Sender can send up to the limit of the window.
When a segment is acknowledged, the window slides making another slot in the window available.
The window is said to close when the left-hand edge moves to the right.
The window is said to open when the right-hand edge moves to the right.
The window is said to shrink if the right-hand edge moves to the left.
Window Scaling
Scale factor is actually a shift.
Largest scaling factor is 14 (maximum window size of 1,073,741,823 bytes).
Remember this increases the size of the header.
Used for high capacity networks.
RFC 1323, TCP Extensions for High Performance.
Calculating Optimal Window Size
Can never reach optimal capacity but can approach it.
Bandwidth delay product
Capacity(bits) = bandwidth (bits/s) * RTT (sec)
Increase Round Trip Time (RTT) and data travels more slowly, thus higher capacity.
Likewise increased bandwidth involves higher capacity.
Max window size of 65535 bytes, but window scale option can increase this to 1024MB.
Around 10% of the capacity is a good size for the TCP buffer size.
TCP Slow Start (RFC 2001)
Intermediate routers must queue packets - congestion may occur at routers.
Congestion window (CWND).
- Initially set to 1 segment (based on announced MSS).
- On first ACK, increase CWND by 1 segments of bytes.
- Now 2 segments are sent and ACK’d. CWND is increased to 4 segments.
- Exponentiall increase until peak flow reached or router discards packets.
- Sender can transmit up to the minimu of the sliding window size and CWND.
Has two phases:
- Exponential increase.
- Linear increase.
Based on a threshold (the slow start threshold).
Drops drastically when there is packet loss.
Fast Retransmit and Fast Recovery (TCP Reno)
Problem: TCP timeouts lead to idle periods.
Fast retransmit: use 3 duplicate ACKs to trigger retransmission.
Fast recovery: start CWND at SSTHRESH and do incremental increase after fast retransmit.
Problems with Fast Retransmit and Fast Recovery
WiFi and very fast networks.
Can be a problem on high capacity networks.
TCP Westwood +
Improved congestion control algorithm for fast, high latency links and lossy links.
Can take many hours to reach optimal throughput with TCP Reno as packet loss has a large affect on throughput.
Westwood carries out end to end bandwidth estimate (BWE) using received ACKs and RTT monitoring.
TCP Reno overreacts to random loss by cutting cwnd in half.
A small fraction of random packet loss does not impact the BWE.
Thus the ssthresh remains unchanged, allowing Westwood to be much more efficient than Reno.
On ACK Reception
Increase cwnd according to Reno algorithm.
Estimate available bandwidth.
When 3 Duplicate ACKs received
cwnd is then set to ssthresh.
On RTO (coarse timeout)
cwnd is then set to 1.
Flow Control in TCP for Interactive Data Transfer
Interactive data transfer can result in many small segments which, together with their ACKs, can lead to congestion.
ACKs are, therefore, piggybacked onto data segments - delayed ACKs.
Nagle’s algorithm result in some collection of data to produce larger segments.
Nagle Algorithm
Many small datagrams results in very large overhead.
Can cause congestion, particularly on a WAN.
- Only one outstanding segement not ACK’d
- Cannot send any more small segments until ACK received.
- Whilst waiting for ACK, TCP will collect small segments together to send as a single segment.
- On a fast network, more segments are sent.
- On a congested network less segments are sent, but data still get through (with less overhead).
Sometimes desirable to disable Nagle algorithm, e.g., X mouse movements need to be sent without delay.
Persist Timer
Keeps window size information flowing even if the other end closes its receive window.
Possible situation where an ACK
is lost and both ends waiting.
- Sender waiting for acknowledgement with window update so that it can send more data.
- Receiver waiting to receive the data as it indicated a non-zero window in the acknowledgement that go lost.
Sending TCP uses a persist timer to periodically query the receiver to see if the window size has updated.
Timeout and Retransmission
Fundamental to the use of acknowledgements is the need for timeout and re-transmission of packets.
TCP keeps four different timers for a connection and performs exponential backoff when a packet goes unacknowledged.
TCP also has a congestion avoidance algorithms to complement the slow start algorithm to cope with packet loss caused by congestion.
Timers
Retransmission Timer
Used when expecting an acknowledgement from the other end.
Persist Timer
Keeps window size information flowing even if the other end closes its receive window.
Keepalive Timer
Detects when the other end of a connection has crashed or re-booted.
2MSL Timer
Round Trip Time Measurement
Used to calculate retransmission timeout (RTO).
Mean deviation is used to allow arithmetic to be carried out with integers and without square roots.
Where:
- D is the smoothed mean deviation
- M is the latest measured RTT value
- Gain (g) for average is set to 1/8
- Gain (h) for deviation is set to 1/4
Karn’s Algorithm
If a packet times out and there is a retransmission, when an ACK is received, whose was it.
Karn’s algorithms specifies that RTT estimate cannot be update when a timeout and retransmission occur.
Re-use the RTO after such an exponential back off until an acknowledgement is received.
Transactional Transmission Control Protocol (T/TCP) RFC 1644
TCP for transactions.
Nearly as fast as UDP.
TCP Accelerated Open (TAO)
Reduces minimum number of packets required to three in many circumstances.
Data and FIN are piggybacked on the ACK.
Connection Count (CC) used to avoid duplicate SYNs.
Security issues caused due to this.
Truncation of TIME_WAIT State
2MSL state is abandonded.
8 times RTO used instead.
Allows retransmission of final ACK.
A new incarnation of the same connection, using TAO, implicitly acknowledges the ACK of FIN.
Port Numbers
…
Demultiplexing
Layered like the OSI model (but pre-dates the OSI model).
Ethernet driver captures the incoming frame, strips the Ethernet header and passes to IP
IP layer strips out the IP header and passes it to the transport layer, etc.
Levels:
- Application
- Transport
- Network
- Link
- Physical
The process of moving things up and down layers is demultiplexing and multiplexing.
Data Encapsulation
Don’t sent arbitrary length methods, to allow the multiplexing of networking.
(In TCP) Data is encapsulated into frames, frames have a frame header, trailer and a datagram.
This datagram contains an IP Header and a segment or protocol data unit (PDU)
The PDU has the (TCP) protocol header and the actual data.
The frame header is used to drop the packet onto the local link. The address used in the frame header is embedded in the hardware in the network card (MAC address), this is why the IP address is not used. This is for efficiency.
Frame headers are: source address, destination address, protocol and checksum.
Naming and Directory Services
Including the DNS and LDAP and their use.
Domain Name Service (DNS)
A distributed database mapping hostnames to IP addresses and vice versa.
How DNS is used
- Applications access the information in the DNS by way of resolver programs
gethostbyname()
returns an IP address in response on a hostnamegethostbyaddr()
returns a hostname in response to an IP address.
- People access information by way of client programs
nslookup
dig
- Common implementation of DNS client and server is BIND (Berkeley Internet Domain Server).
- Server is named
- The DNS defines a protocol (see RFCs) that is used for communication between client and server.
The DNS namespace
DNS is stored as a tree:
13 canonical name servers which server information for the root “unnamed” root.
.arpa
A special domain which is used for reverse lookup.
Now .ipv6
Fully qualified domain names
www.google.com.
is a fully qualified domain name. Missing the trailing .
allows the DNS to lookup other entries based on the domain it resides on.
DNS Lookup
Is a recursive process and therefore quite slow. Results are cached to improve performance.
To try and reduce the effects of this when changing hostnames the TTL of the cache is reduced prior to performing this.
Name Ownership
No one company owns all names
Name Server Zones and Boundaries
Can have zones within a name server.
Resource Record
This is where DNS gets its information from.
Examples:
3www4aber2ac2uk0
3www6google3com0
The 0
occurs because the top-level domain has no name. This is how the end of the string is reached.
SOA
Name of primary source of info for zeon
A
IP address of host
CNAME
Canonical name
PTR
Alias for an IP address for reverse lookups
MX
Mail exchange information
NS
The Name of the nameserver
AAAA
IPv6 address of host
A6
Experimental IPv6 address.
HINFO
Host information (CPU and OS).
LOC
Location of host.
Dynamic Host Configuration Protocol (DHCP)
DHCP Timeline
Request are broadcast address 255.255.255.255
.
Often used during host configuration.
Never forwarded by a route, by a DHCP/BOOTP relay agent can be used to forward this onto the DHCP server.
Replies are routed normally.
Lightweight Directory Access Protocol (LDAP)
The Directory Information Tree
Similar to the DNS tree, but the whole world doesn’t typically use the same tree.
The Information Model
Service model based on entries.
An entry is a collection of attributes, that has a distinguished name (DN).
Attributes
Type/value pairs, e.g.:
C = GB
o = University of Wales
Attributes described by a schema.
Quality of Service
The need for and the provision of Quality of Service (QoS) within packet based networks such as the Internet which are inherently best efforts at heart.
Challenges
- QoS needs to be end-to-end, involving numerous management domains.
- QoS service should address all requirements - not just ‘top quality’
- QoS policy distribution, maintenance and monitoring will increase in importance.
- Better to implement QoS earlier - provides scalability, experience. Pro-active not reactive.
Queueing Disciplines
Ways in which resources can be allocated to priority traffic best.
Priority Queue
Other queues only get any resources at all when the priority queue is empty.
Simple and effective.
Traffic must be policed to stay within low limit as other traffic can get starved or the priority queue reverts to Best Effort
Weighted Round Robin
Multiple queues, each queue gets a guaranteed minimum of resource even under congestion conditions.
Can be combined with Priority Queues.
Security Issues
The inherent risks within networks such as the Internet, cracking, viruses, trojans, worms and denial of service attacks. The role of the Firewall and the problems it can bring.
Data Encryption
Four aspects of security to consider:
- Privacy
- Authentication
- Integrity
- Nonrepudiation
Aspects of Security
Privacy
Snoopers should not be able to read confidential data
Authentication
Verifies that the apparent sender really sent a message, and not an imposter.
Integrity
Verifies that data has not been corrupted or altered in transmission
Nonrepudiation
Ensures that the sender or receiver cannot deny sending or receiving a piece of information.
Encryption Techniques
Classical encryption techniques:
- Permutation: The order of the plaintext characters is changed
- Substitution: a plaintext alphabet is mapped to a different one.
A cipher used to perform the encryption.
- Stream ciphers encrypt data bit by bit or byte by byte.
- Block ciphers first pack data bits into a fixed length block, then encrypt the whole block into a ciphertext block.
Symmetric Key Encryption
Use a shared secret key to encrypt and decrypt.
For users,
keys required.
Can be done in hardware.
Public Key Encryption
The key is split into two: a public key and a private key.
Anything encrypted by the public key can only be decrypted by the private key.
The private key must be kept private, but the public key can be shared without worry.
Cannot be done in hardware.
Combining Both
Create a shared secret based from public key encryption.
Transport Layer Security (TLS)
Client sends a “hello” message with TLS version number and prefernces.
Server sends a certificate including a public key.
- Public key encrypted by some CAs.
- Client has a list of CAs and their keys and uses a corresponding key to decrypt the server key.
- This authenticates the server, unless it is self-signed.
Client sends a secret key encrypted with the server’s public key.
Server decrypts message and then encrypts a response with secret key which the client decrypts.
Hashing and Message Authentication
Hashing is the operation that maps the message of variable length into a hash value with fixed length.
Hashing is not reversible.
Hashing can be used to generate a digest of the message, called the Message Authentication Code.
The receiver can use the digest to verify if the message is authentic.
Current and Future Issues
The (still) emerging IPv6 protocol and other active issues.
Internet Protocol Version 6 (IPv6)
Problems with IPv4 is that 32 bit addresses is too small.
- 128 bit address space solves the problem for the long term.
- Large space allows addresses to be more structured.
An IPv6 Address
fe80::2c0:dfff:fee4:bd87/10
Loopback address: ::1/128
::
is a string of 0
s of indeterminate length.
IPv4 addresses are encapsulated in IPv6 are expressed with dotted decimal for last four octlets: `::194.123.1.2
IPv6 addresses are leased (possibly infinitely).
Current Allocation
0000 0000
- Reserved0000 001
- NSAP Allocation001
- Aggregatable Global Unicast1111 1110 10
- Link-Local Unicast1111 1110 11
- Site-Local Unicast1111 1111
- Multicast
Link-Local Addresses
Format Prefix: 1111 1110 10
or FE80::/64
Postfix is usually the last 24 bits of MAC address. middle byte is fffe
due to IEEE standards.
fe80::00ff:fe00:0000
for MAC address: 00:00:00:00:00:00
Site-Local Addresses
Format Prefix 1111 1110 11
or FEC0
Neighbour Discovery
Router Advertisements
DHCPv6
Where autoconfiguration is undesirable for operational reasons.
Similar in concept to DHCP for IPv4
Compatible with autoconfiguration
Nodes may request multiple addresses.
Authentication of nodes.
Makes use of address deprecation and reconfiguration-init message…
Co-existence and Tunnelling
Exam Preparation
Some chances to get a feel for what is expected in the exam.
Steve Kingston
s/(Use a diagram) is necessary/\1/g
Past Paper 2012-113
1. This is a question about the Transport Layer Protocols
a) With reference to the User Datagram Protocol (UDP) and the Transmission Control Protocol (TCP), describe features that are provided by transport layer protocols. In your answer distinguish between features provided just by TCP and those provided by both UDP and TCP.
- Both use port numbers. Server listens of a specific port and client sends data to that port.
- UDP Connectionless
- No acknowledgement that data is recieved
- Up to the application to manage successful transmission of data
- TCP sets up a connection between two machines
- Connection must first be established
- All sent packets (except ACK’s) must be acknowledged by the other party
- Support for flow control using sliding window protocol
- Must also terminate the connection
- Protocol ensures sucessful delivery of data (or failure).
b) Describe with the aid of diagrams the TCP/IP connection establishment processes, and explain how the initial sequence number (ISN) is exchanged between two nodes during connection establishment. Label the diagrams with the TCP connection states at each stage.
- Server listens on a specified port
- Client sends a SYN packet to that with a random ISN and a receiving port number (usually fairly high).
- Server responds to the client with an ACK with a sequence one higher than the SYN’s ISN (the SYN is seen to have taken up one byte of data).
- Server also responds with a SYN to the client, the ISN here is generated by the server and is different to the clients.
- Client responds to the server’s SYN with an ACK, again here the sequence number of this packet will be the ISN of the Servers SYN packet + 1.
- At this point the connection is established.
The ISNs must be different as they are used to acknowledge how much data was actually received later in the connection.
Sequence numbers returned in the ACK are the sequence number of the acknowledge packets plus the size of that packet (which can then be used for flow control).
c) What is meant by a flow control mechanism? In your answer describe the likely effect of not employing flow control on TCP connections.
- Flow control used to send multiple packets of data and acknowledging them all in a single packet.
- Flow control is also used to stop flooding the network with packets that would just be dropped by the receiver because it’s buffers are full.
Not using flow control would either:
- Require the sender to wait for the receiver to acknowledge a single packet before sending the next one, meaning the whole transmission would be very slow.
- Allow the sender to flood the network with packets which would eventually be dropped by the receiver as its buffers fill up to maximum (or by intermediary devices to the same effect). The sender would receive acknowledgements for some of the packets and would have to constantly resend dropped packets after a timeout period, leading to a lot of traffic on the network, especially when the receiver does not process the information quickly.
d) Describe, with the aid of diagrams, the problem with transferring very large files using the slow start /congestion avoidance flow control mechanism on fast (e.g. gigabit) but high latency networks. Explain in your answer how the TCP Westwood+ algorithm modifies the mechanism to provide better throughput whilst retaining a concept of fair access.
- Slow start sets the window to 1 to begin with and grows it exponentially after each ACK received. Once it reaches the slow start threshold (ssthres), this growth is linear (by 1 from each ACK) - this is congestion avoidance.
- Eventually the congestion will be encountered, and a packet will not be ACK’d before a timeout. The size of sstresh is set to one half of the current window size and then the size of the window is reset to 1. The process then repeats endlessly.
- This tends to lead to “sawtooth” graphs.
- Problem of slow start is that it has to wait for a timeout to occur before it knows congestion has occurred.
- To change this, TCP Reno was implemented; if the receiver does not receive the next expected packet (i.e. it receives one with a higher sequence number than expected) it will send a duplicate ACK of the last ACK it sent.
- If this ACK is received by the sender 3 times it triggers retransmission and will retransmit the next packet from that sequence number.
- The receiver may then acknowledge further packets in the sequence in the next ACK.
- Instead of reseting the window size to 1, it is started at sstresh instead and grown incramentally.
- TCP Reno still leads to sawtooth graphs, but is less affected by congestion.
- TCP Westwood+ implemented to further improve TCP connections.
- Westwood+ estimates the bandwidth available using received ACKs and Round Trip Time monitoring.
- In both cases of 3 duplicate ACKs or coarse timeout, the ssthresh is set to
max(2, (BWE * RTTmin) / Seg_size)
. The congestion window is set to ssthresh in the case of 3 dup ACKs or 1 in the case of coarse timeout. - A small fraction of randomly lost packets doesn’t affect the BWE and therefore the ssthresh, unlike with Reno.
SEM6120 - Introduction to Intelligent Systems
This module introduces the key ideas in Artificial Intelligence and ensures all students are at roughly the same level before moving on to the specialist modules.
Introduction
General introduction to Artificial Intelligence (AI), including discussion of what AI is, its history, definitions, and philosophical debates on the issue (the Turing test and the Chinese room). Ethical issues (3 hours).
Staff
Learning Outcomes
- Describe and use the basic principles of Artificial Intelligence and Machine Learning.
- Be able to reflect on project needs.
- Practically apply AI and ML principles to meet those needs.
- Present the material they have learned in an informed, clear manner.
- Demonstrate understanding and insight into the material that they are presenting.
Assessment
- Presentation 20% 01/10/2013-18/10/2013
- Report 20% 01/10/2013-25/10/2013
- Essay 60% (coding + report) 10/10/2013-01/11/2013
Tips: if you disagree with a paper, you should comment on why, this will be a good thing :)
Commitment
- 26h seminars
- 6h practicals
- The rest of the time is spent on background reading and assignments.
Booklist
- Artificial Intelligence: A Modern Approach Russell, S. and Norvig, P.
- Artificial Intelligence: Structures and Strategies for Complex Problem Solving. - Luger G.
- Artificial Intelligence Illuminated - Coppin B.
What is Artificial Intelligence?
- Understand intelligent entities; learning more about ourselves?
- Building intelligent entites; creating things which exhibit “intelligence”.
Two ways of looking at this:
- Scientific goal
- Engineering goal
Many definitions, all potentially valid.
Problem: what is intelligence and how do we prove a system is intelligent.
Strong versus Weak AI
- Strong AI can actually think intelligently.
- Weak AI can possibly act intelligently.
Turing Test
Human interrogator talks to another system (human or AI), if the interrogator cannot tell the difference then the argument is we must acknowledge it is able to think like a human.
Has not yet been passed.
The Chinese Room
Behaving intelligently may not be enough.
Computers are just a symbol manipulation device and therefore cannot have mental states.
Ethics and AI
We have investigated whether we can develop AI, but not whether we should.
Problems of AI:
- Job losses
- Too much/little leisure
- Privacy rights
- Accountability
- End of the human race?
- Harm in the wrong hands
Branches of AI
- Logical AI
- Search
- Pattern recognition
- Representation
- Inference
- Common-sense knowledge and reasoning
- Learning from experience
- Planning
- Epistemology
- Ontology
- Heuristics
- Genetic Programming
Search
Why search is important in AI and how to go about it. This includes both informed and uninformed strategies. Evolutionary search (6 hours).
Defining the Problem
Many AI problems can be framed in terms of a search problem.
Representation is very important.
The search strategy is the way in which searching is performed:
- Uninformed
- Informed (Heuristic)
Can’t always evaluate all the search space. Massive search space (e.g. Chess) or even infinite.
Terminology
Search State
Summarises the state of search.
May not lead to a solution.
e.g. Representation of a chess board.
Initial State
The first search state.
Solution
A special example of a search state. It solves the problem.
Goal State
The state trying to be reached.
State Space
All possible search states.
Successor Function
Ways to move around in the state space (action/operators).
Goal Function
To check if the goal has been reached.
Cost Function
Measures the path cost.
Search Trees
Visualise the progression of a particular algorithm.
Shouldn’t evaluate every possibility of the state space.
Initial state is the root, goal is a leaf.
Don’t store the whole search tree. Requires a lot of space. Can discard explored nodes.
Store the frontier of search (i.e. nodes in search tree with some unexplored children).
Evaluation
Time Complexity
In big O notation. Number of nodes generated during a search (worst case).
Space complexity
In big O notation. Maximum number of nodes stored in memory.
Optimality
Is it guaranteed to find the optimal solution?
Completeness
If there is a solution, will it be found?
Branching Factor
b
Maximum number of successors of any node
Depth of shallowest goal
d
Maximum length of any path in the state space
m
Breadth First Search
Branching factor b
and depth of solution d
.
- Time complexity:
O(b^d)
- Space complexity:
O(b^d)
- Optimal (given step costs are identical)
- Complete (provided
b
is finite)
Depth First Search
Branching factor b
, depth of solution d
and maximum depth m
.
- Time complexity:
O(b^m)
- Space complexity:
O(bm)
- Not optimal
- Not complete (complete if no loops)
Depth Limited Search
A Depth First Search with a limited maximum depth.
Depth limit l
.
- Time Complexity:
O(b^l)
- Space Complexity:
O(bl)
- Complete if
l >= d
- Not optimal
Uniform Cost Search
A form of Breadth First Search, using a Priority Queue.
Node with the lowest total path cost is expanded.
If all the cost steps are equal, it is exactly the same as BFS.
Complete and Optimal if no negative path costs.
Iterative Deepening
A depth-limited search where the limit is increased iteratively.
Avoids the space complexity of BFS.
- Time Complexity:
O(b^d)
- Space Complexity:
O(bd)
- Optimal
- Complete
Iterative Improvement
Paths not retained - low memory
Heuristic Search
Heuristic is a sort of rule of thumb.
Hill Climbing
Heuristic best child chosen at each point until goal reached or no change in current state.
No backtracking.
Best First Search
Priority queue based on heuristic. Like BFS other than that.
Greedy Best First Search
f(n) = h(n)
A*
A best first search which takes into account current path cost.
f(n) = g(n) + h(n)
Dominance/Informedness
If h2(n) >= h1(n)
then h2
dominates h1
.
Genetic Algorithms
Encode solutions in Chromosomes.
Mutate and crossover chromosomes.
Evaluate the new population and select the best.
Genetic Programming
A branch of genetic algorithms which uses programming statements as genes.
Easy to do in LISP.
Usually represented as trees.
Koza’s Algorithms
- Tree consists of functions and terminals
- Choose a set of functions and terminals
- e.g.:
{+,-,*,/,sqrt},{A,B}
- e.g.:
- Generate random population which are syntactically correct.
- Follow GA-like procedure.
Ant Colony Optimisation
In nature, ants could solve complex problems unsupervised. Capable of finding the shortest route between a food source and the nest.
Can react to changes in the environment,
- Each ant moves “randomly”
- Pheromone is deposited on path
- Ants detect lead ant’s path, inclined to follow.
- More pheromone on a path increases the probability of that path being followed.
- Pheromone decays over time.
Mechanics of ACO
- Graph Representation
- Heuristic desirability of edges
- Construction of feasible solutions
- Pheromone update rule (attached to edges).
Algorithm for one ant
- Select starting node at random.
- While not finished:
- Evaluate all edges from this node.
- Select the best-looking edge via probabilistic transition rule
- Deposite artificial pheromon on the chosen edge.
- Finished path is a potential solution, analysed for optimality.
Particle Swarm Optimisation
Each particle is searching for the optimum and encodes a solution (like a GA).
Each particle is moving (can’t search otherwise), and hence has a velocity. It also maintains the position it was in where it had its best result so far (its personal best).
The particles co-operate, exchanging information about what they’ve discovered in the places they’ve visited.
This co-operation only needs to be very simple;
- A particle has a neighbourhood associated with it
- A particle knows the fitness of those in its neighbourhood, and uses the position of the one with the best fitness.
- Adjusts using this.
Multi-objective Optimisation
Sometimes an answer has to be optimal in several aspects.
Examples:
- Quickest and cheapest flights
- Lightest and strongest construction material.
Knowledge Representation
Ways of representing knowledge in a computer-understandable way. Semantic networks, rules. Examples of the importance of KR (4 hours).
What is Knowledge?
AI Agents deal with knowledge (data).
- Facts
- Procedures
- Meaning
Logic Representation
Non-logical Representation
Logical representations have restrictions which can be hard to work with.
Classes Ignored in Logic
Objects in the world tend to be related to each other:
- Classes, super- and sub- classes.
- Part or whole hierarchies
- Properties which are inherited.
The state of the world changing over time.
- Explicit representation of time.
- Frame problem.
- Non-monotonic reasoning.
Closed world assumption.
Uncertainty or fuzzy knowledge.
Classes and Object-Orientated Representations
Classes define Objects, Objects are instances of Classes.
Object ∈ Class
Class ⊂ Superclass
Facts and rules can be encoded.
Semantic Networks
Essentially a generalisation of inheritance hierarchies.
Each node is an object, class, concept or event.
Each link is a relationship which makes sense in context.
Inheritance is as expected.
Example:
Note: pre-dated OOP.
Frames
Incorporates certain valuable human thinking characteristics:
- Expectations
- Assumptions
- Stereotypes
- Exceptions
- Fuzzy boundaries
Represent what is typical unless an exception is known.
Frames allow more convenient “packaging” of facts about an object.
Frames often allow things which are typical of a class and which are definitional and can’t be overridden.
Frames also support multiple inheritance.
Frames are represented as semantic networks where nodes have structure. A frame has a number of slots (age, height, etc.), each of these slots stores specific information.
When new information is gained slots can be filled in, this can cause the triggering of actions, which may trigger the retrieval of other frames.
Scripts
Can fill in missing detail that is assumed.
Non-monotonic Logic
Once true
doesn’t mean always true
.
As information arrives, truth values can change.
A number of implementations for this,
- Circumscription:
Brid(x) and not abnormal(x) -> flies(x)
- We can assume
not abnormal(x)
unless we knowabnormal(x)
.
- Default logic:
- “
x
is true givenx
does not conflict with anything we already know.”
- “
Truth Maintenance Systems
These systems allow truth values to be changed during reasoning (belief revision).
When retracting to a fact, must also retract to any other fact derived from it.
Penelope is a bird (can fly) Penelope is a penguin (cannot fly) Penelope is magical (can fly) Retract magical (cannot fly) Retract penguin (can fly)
Justification-based TMS
For each fact, track its justification
When a fact is retracted, retract all facts that have justifications leading back to that fact, unless they have independent justifications.
Assumption-based TMS
Represent all possible states simultaneously
Neural Networks and subsymbolic learning
We can find solutions using search, but how can we remember solutions, learn from them and adapt them to new situations? This will cover perceptrons, single-layer and multi-layer networks (5 hours).
Symbolic Learning
When we use some sort of rule-based system, we generally have to understand the rules. This means we understand the conclusions it draws, because it can tell us.
When a system learns from such rules, it processes in a way which can be understood.
Subsymbolic Learning
Don’t really understand, or have control over, the way in which solutions are found.
ANNs, GAs, GP and sometimes statistical methods.
Might be related to the randomness factor.
Artificial Neural Networks
(A: time to not understand backprop again)
Inputs (variables) -> network -> Outputs (results).
X and Y matrices of the statistical models are analogous to the training inputs and outputs of ANNs.
Backpropagation
Most common learning rule for ANNs.
Connections between nodes given random initial weights.
We therefore get a value at the output node(s) which happens when these random weights are applied to the data at the input.
Epoch
An iteration, that is, finding the error then adjusting weighting, is called an epoch.
There may be many thousands of epochs in one training.
Overfitting
Need to be able to generalise the model to unseen data.
Too much training will lead to a lack of generalisation.
Kohonen Neural Networks
Self-organising neural networks.
Random initialisation of a grid, see which nodes best match the input and change nodes around it.
Propositional and First-Order Logic
The backbone of knowledge representation (4 hours).
Types of Knowledge
- Declarative - Facts
- Procedural
- Meta - knowledge about knowledge
- Heuristic - rules of thumb
Representing Knowledge: Object-attribute-value
Encoding a fact in three pieces of information; the object, it’s attributes and the values of these attributes.
The facts a human knows are not obvious to a computer; needs to be encoded.
e.g.: Tree(species, oak)
The encoding doesn’t matter, so long as its uniform to the system.
Can include an uncertainty factors is a number which can be taken into account by the system when making decisions.
The final conclusion of any program where uncertainty was used in the input is likely to also have an uncertainty factor (if you’re not sure of the facts, can can the result be certain?)
Encoding uncertainty might be encoded something like this: Tree(species, oak, 0.8)
(the certainty that the tree is an oak tree is 80%).
Again encoding doesn’t matter, so long as its uniform to the system.
Rules
A knowledge base may have rules associated.
IF
premises THEN
conclusion. There may be more than one premises and may contain logical function
AND
,OR
andNOT
for example
If a premise evaluates to TRUE
the rule fires.
e.g.
IF tree(species, oak) THEN tree(type, deciduous)
Rules may contradict another rule. Different strategies can be applied to choose the most specific or most relevant rules.
Meta Rules
IF tree is conifer THEN load conifer data ELSE load deciduous data
Logic
We can represent knowledge using logic. There are two types: propositional and predicate (or first-order logic or predicate calculus)
Propositional Logic
In propositional logic formulas are constructed using variables, TRUE
and FALSE
constants and connectors:
AND
(∧
)OR
(∨
)NOT
(¬
)IMPLIES
(→
)
Predicate Logic
Prolog is based on this.
A predicate is like a function that returns TRUE
or FALSE
Tree(a)
is true if a
is oak, false if a
is daffodil.
Implication
Oak(a) → Tree(a)
If the first clause is satisfied, the second clause is also satisfied.
Assertions
The symbol ∀
can be read as “for all”.
∀a(Oak(a) → Tree(a)
Existence
The symbol ∃
can be read as “exists”
∃a(Beech(a) ∧ ¬Green(a))
The Atomic Formula
beech(a)
is known as an atomic formula.
Can have multiple parameters.
Human Reasoning
We use two standard rules:
- Deductive
- Modus Ponens - if we know
P→Q
then ifP
is true,Q
must also be true - Modus Tollens - if we know
P→Q
then ifQ
is false,P
must also be false
- Modus Ponens - if we know
- Inductive
- Difficult for machines
- Observations:
- Oak trees have green leaves
- Pine trees have green leaves
- Induce
- All trees have green leaves
- Unfortunately, that’s not true, but it is useful.
Non-monotonic
Classic monotonic reasoning cannot contain contradictions
Put formally:
X ⊆ Y → Deriv(X) ⊆ Deriv(Y)
where Deriv(X)
is a set of facts derived from X
Temporal Reasoning
Reasoning changes over time. Can introduce this into machines by introducing a concept of time.
Machine Inference
Machine inference is used to deduce new facts from a knowledge base which is held in working memory.
Knowledge Base -> Inference Engine -> Working memory
Can be very complex
Deducing New Facts
Two principal methods
- Forwards chaining - based on modus ponens.
- Backwards chaining - based on modus tollens.
Forward Chaining
Modens Ponens:
student(S) ∧ studies(S, ai) → studies(S, prolog)
student(T) ∧ studies(S, expsys) → studies(T, ai)
student(joe)
studies(joe, expsys)
Therefore we can deduce:
studies(joe, prolog)
Proof:
student(joe) ∧ studies(joe, expsys) → studies(joe, ai)
student(joe) ∧ studies(joe, ai) → studies(joe, prolog)
Q.E.D.
Forwards chaining can fire any rules which match the knowledge held in its working memory. This can potentially come up with a huge amount of new knowledge, most of which is probably completely irrelevant.
Backwards chaining
Backwards chaining sets out to prove a piece of information.
The information will either be true or false, but doesn’t generate unwanted results.
Use the resolution proof method for now.
Say we have:
A1 ∨ A2 ∨ ... ∨ An ∨ B
and ¬B ∨ C1 ∨ C2 ∨ ... ∨ Cm
Resolvent of clauses is:
A1 ∨ .. ∨ An ∨ C1 ∨ ... ∨ Cm
Resolution
Now take the two clauses:
A1 ∨ A2 ∨ ... ∨ An ∨ B
and D ∨ C1 ∨ C2 ∨ ... ∨ Cm
If there is some subset where B
and D
are negations of each other Theta
If we have two clauses Clause1
and Clause2
, and these both have a resolvant R
, then if Clause1
and Clause2
are both satisfiable, so must R
be.
The idea: take a clause, containing a goal we want to prove, and negate that goal. If we then resole this with other clauses, over and over and we get to the empty clause (which is never satisfiable), we have proved our goal.
Clause form
We can express any predicate calculus statement in clause form.
This enables us to work with OR and NOT rather than any other clause.
p → q ≡ ¬p ∨ q
A ∧ B ≡ ¬(¬A ∨ ¬B)
Example of Resolution
Use a previous example in clause form:
¬student(S) ∨ ¬studies(S, ai) ∨ studies(S, prolog)
¬student(T) ∨ ¬studies(T, expsys) ∨ studies(T, ai)
student(joe)
studies(joe, expsys)
Solution to studies(S, prolog)
means we must negate it:
¬studies(S, prolog)
Resolve the clause 1:
¬student(S) ∨ ¬studies(S, ai)
Resolve with clause (2) (S=T
)
¬student(S) ∨ ¬studies(S, expsys)
Resolve with clause (4) (S = joe
):
¬student(joe)
Resolve with clause (3):
Ø
Therefore studies(joe, prolog)
is true.
Q.E.D.
Horn Clauses
The same thing, but expressed differently. This is how Prolog does it.
A horn clause is a series of disjuncts (ORs)
We can take:
A ∨ ¬B ∨ ¬C
and write it:
A ← B ∧ C
If B
and C
then A
All the same thing:
studies(S, prolog) ← student(S) and studies(S, ai)
studies(T, ai) ← student(T and studies(T, expsys)
student(joe) ←
studies(joe, expsys) ←
Resolving Horn Clauses
All the negatives are one side of the ←
sign.
← studies(S, prolog)
studies(S, prolog) ← student(S) ∧ studies(S, ai)
← student(S) ∧ studies(S, ai)
studies(T, ai) ← student(T) ∧ studies(T, expsys)
S = T
← student(S) ∧ studies(S, expsys)
student(joe) ←
S = joe
← studies(joe, expsys)
studies(joe, expsys) ←
←
Q.E.D.
Prolog Example
Prolog
Programming with logic.
Uses backwards chaining through horn clauses.
Programming for Intelligent Systems
Practical introduction to programming for Intelligent Systems, used to illustrate search, KR and first-order logic (3 hours).
Rule-based Systems
How can human expertise be automated? How to build an expert system - system concepts and architectures. Rule-based systems: design, operation, reasoning, backward and forward chaining (3 hours).
RBS
Knowledge base contains the rules.
Database contains the facts.
Inference engine uses both of these to match facts to rules to derive new facts, etc.
Justifiable and transparent.
Knowledge Acquisition
Knowledge Acquisition and its importance in KR and RBS (2 hours).
Statistical Methods
Multivariate analysis and statistical methods for solving problems.
Multivariate Analysis
Analysis of high-dimensional data. Often not possible to identify or quantify an object from one dimension.
Two methods:
- Statistical
- AI methods
Statistical Models
Always produces the same result from a given set of data.
Will always find the best result within the constraints of their abilities.
X Values
All the factors which might affect the outcome so that the analysis can find some form of correlation.
Y Values
The set of objects and associated variables.
Realigning Axes.
Usually statistics are in high-dimensional space. No way of visualising this data.
In a 3D space, can fit a plane to any 3 points to make the problem a 2D one.
Can approximate the plane to more than 3 points and fit the best plane using root mean square error. This is done using regression, but does mean there will be error in the model.
Can do this with a 3D plane, but the idea is to keep the model as simple as possible.
Principal Component Analysis
The line through multi-dimensional space which describes the most variation in the data (the one with the widest space) is the First Principal Component.
The following Principal Components are always perpendicular and are ordered by the amount of variation from the data and have less importance.
Can analyse the principal components to see which of the original X values contributed most.
Principal components help understand the data better.
Principal Components Regression
The main principal components help to explain what factors had most bearing on the results.
They form a model that could be used for predicting results.
Using a PCA model to predict is known as Principal Components Regression (PRC).
Closely related is PCL.
Principal of Parsimony
Fewer variables, if they produce a model just as good in testing, should be preferred.
Fuzzy Logic
Not clear cut for a definite value. (i.e. not boolean logic).
Seminars
Seminars from Aberystwyth University
Learning from Big Random Weight Networks and Their Training Algorithms
by Xizhao Wang Fellow; IEEE Editor-in-Chief, IJMLC
Introduction to Big Data
The internet, financial institutions, media, medical treatment, scientific research, etc. provide a huge amount of data.
This data is only increasing in size. Big Data is the explosion of information.
Storing Big Data is, as expected, very difficult.
Variety
Structured data is becoming unstructured.
Velocity
Moving from batch jobs to the streaming of data.
Volume
Now around Zettabytes of data around.
Value
Uncertainty of Big Data
There is lots of uncertainty in Big Data, there can be a lot of ambiguity in symbolic data, etc.
Some cybersecurity issues of Smart Buildings, Smart Metering, not-so-Smart Cars and the Smart Grid
Martyn Thomas CBE FREng
Smart Buildings
Offer significant benefits and new vulnerabilities.
Benefits
- Access Passes, Attendance monitoring, follow me printing
- Calendar, room booking, etc.
- CCTV, WiFi, Remote and central monitoring and control
- Dynamic displays, emergency messages, fire alarms
Vulnerabilities
- Denial of access, unauthorised access, data compromise
- Disruption, wasted energy, damage
- Intrusion, loss of control,
- Safety.
Smart Cars
Michael Barr’s expert report in the Bookout v Toyota lawsuit. See here.
Safety World
Safety world works on probabilities, but in security there is a direct attacker - can’t argue independence.
Smart Meter Security
End-to-end security: “No trusted components”.
Whether this works or not is unknown.
Vulnerabilities
- Meters contain clocks and billing data, if these are changed, bills can be manipulated.
- Meters contain an off switch, misuse could cause distress, harm or substantial disruption.
- Meter firmware will need to be updated.
- The Gas meter can only handle low-grade encryption.
Who controls the off switch?
Anyone who has the authority to send the relevant messaged through the DCC.
Anyone who can mount a successful cyber attack on the network.
How will security be assured?
- Human review of the spec.s
- Testing the system
- Focuesed pentesting
Testing only shows that faults do exist, not that there are no faults.
For high confidence you need formal methods.
Cyber Security
Is a through-life discipline.
- Planning
- Control
- Monitoring
- Response
- From design through to decommissioning
- Lifetimes of 20+ years
- Don’t run anti-virus
- Monitoring is difficult (and uncommon)
- How to respond to incidents?
- Insider threats
- Lifetime security needs discipline and excellent configuration control through years of maintenance and upgrades.
M2M and the IoT
Cyber Security
Is a tier one threat.