Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

SEM1020 - Current Issues in Software Engineering

This module focuses on current issues in the field of Software Engineering. The module will involve students in an examination of current research literature of interest to software engineers.

Introduction

It is okay to produce a paper based on my own paper :-)

Learning Outcomes

  1. Identify and use the main research resources that are available to software engineers
  2. Constructively participate in advanced technical debate in the field.
  3. Have a general overview of the field of Software Engineering and be aware of focused areas of research interest within it.
  4. Present current research at an appropriate level of detail to a technical audience.
  5. Produce a survey paper on some current area of Software Engineering research.

Assessment

  • Participation and Contribution 15%
  • Survey Paper (9000 words) 60%
  • Poster 25%

How to Read a Scientific Paper

Papers provides a good view of what is happening now. Journal papers give a more out of date view than conference papers. Blogs typically give the most up-to-date view but are least verifiable.

Papers should give enough information that there is enough information to replicate the experiment.

It will show some actual data.

Steps

  1. Read the abstract, check the information will be relevant
  2. Read the conclusion and discussion, check the results agree with the abstract and if they are still relevant.
  3. Read the introduction, check you understand the background information and see if you need to look up more items to understand this paper.
  4. Read the results. Usually tables and figures are provided to show the data.

Now the methods become important, it gives details on how the experiment was set up and carried out. Try to pick out the basic methods and work upwards.

References provide a way of tracking back through the sources.

Paper Presentations

Set down for presenting Abstract State Machines (Feb 28) and Swarm Robotics (Mar 28) and asking questions on Biological Data Sharing (Mar 14) and Swarm Robotics (Apr 4).

Literature Review/Survey Paper

9000 words of pure gold.

May address any aspect of software engineering.

MSc should probably review the literature of their project.

SEM2220 - Mobile Solutions

Mobile devices (phones, tablets and other handheld devices) are the fastest growing area of computing. Typical applications involve a wide range of software and system issues. This module will investigate those issues, giving students experience and understanding of the best ways of building different types of mobile applications.

Introduction

Scope of mobile systems, , mobile web, native apps, design choices, main platforms, overview of implications (2 lectures)

Staff

Learning Outcomes

  1. Demonstrate an advanced understanding of the domain of mobile systems.
    • Quite a broad overview of these systems.
    • High level view.
    • Give a flavour of specific systems (iOS, Android).
  2. Analyse and evaluate mobile solutions in a range of application areas and be able to critically evaluate their effectiveness.
    • Have to be critical at M level.
    • Research. Pros and cons.
  3. Evaluate the social, legal, ethical and professional issues involved in implementing mobile applications.
    • Neil will be talking about this in the context of mobile.
  4. Implement representative mobile systems.
    • Playing with the technologies.
    • Building an android, iOS and mobile web app.

Assessment

  • Mobile Web-based Programming project 20% 22/10/2013 to 04/11/2013
  • iOS-based Programming project 20% 12/11/2013 to 25/11/2013
  • Android-based Programming project 20% 02/12/2013 to 16/12/2013
  • Case study analysis 40% 10/12/2013 to 22/01/2014

Motivation

In 2012 the whole mobiles ecosystem’s revenue was $1,551 billion; equivalent to 2.2% of global GDP.

5% CAGR (Compound Annual Growth Rate).

These are only projected figures, the picture is much bigger than this as this only includes money which takes into account the money which stays within the ecosystem (doesn’t include ecommerce, etc.). The affect is much higher.

Mobile Landscape

Constantly changing given that its a new market. The iPhone was the real kickstarter for this in 2007.

Timeline of the history mobile telephones.

Difference between “smart” and “feature” phones.

Mobile subscribers are growing 4 times faster than the global population. Reducing now (especially in the West) due to market saturation.

More scope for growth in developing countries.

Asia Pacific accounts for approximately half of global subscribers.

Total sim-enabled connections (including machine to machine (M2M) connections). Estimated 7.6% CAGR 2012-2017. Percentage of M2M connections is increasing rapidly.

Multiple devices (tables/phones/etc.) allow growth in sim-enabled connections.

Mobile data traffic is increasing rapidly. Video speculated to be one of the most rapidly increasing traffic.

Mobile versus Desktop

Over time the sales of desktops (PCs & Laptops) is decreasing while the sale of mobile devices is increasing.

Some predictions by GSMA

  • Asia predicted to add approx. 50% of all connections/subscribers between 2013-17
  • Same period Latin America and Africa next 20%
  • Subscriber growth in developed nations slowing (e.g. 1% in UK)
  • Total data traffic load in 2012 was 0.9 Exabyte/month).
  • Predicted to be 11.2 EB/mth.

Chief Stakeholders

  • Network operators
  • Content providers
  • OS vendors
  • Device vendors

Fragmentation and Differentiation

Issue of fragmentation and differentiation, related to:

  • the number of OS variants,
  • the number of device vendors wishing to customise the OS,
  • different browsers,
  • Network operators and device vendors do not like commoditisation.
    • Don’t want to be treated like commodities which are values on price.

Top 25 ranked apps in 2012 take 15% of all revenue. Only 2% of App Store newcomers get into the top 250 apps.

Free apps are becoming more of a norm, especially with the saturation of android.

Mobile Network Operators

Revenues are decreasing due to over-the-top services (i.e. using online services over SMS or phone calls).

Operators will push, via their shop sales-assistant commission, those devices they gain the biggest discounts on.

High profile phones promoted, often Android.

Apple disrupts this model due to a locked mindshare.

Psychological hook of “cheap” contract phones. Free now, pay later ideology.

Billing relationship with the customer. Paying for things using your mobile account rather than via a bank account (directly). Small transaction fee for this service to generate revenue.

  • 48% of the world’s population do not have a bank account.

Something for the future, not quite so prevalent at the moment.

Rich Communication Service

Threat of over the top (OTT) services such as Skype. Quality of service is currently an issue here, but voice call is growing very slowly (13% in 2008 to 4% 2012).

GSMA responded by developing a Rich Communication Service specification. Only aimed at long term evolution (LTE) networks (4G). Aims to provide this as a universal service. 17 different implementations by MNOs including AT&T, Telefonica, Verizon and Verdafon.

Device Vendors (OEMs)

Nokia sales falling. Haven’t kept up with the trends.

Samsung now the major player. Mainly due to their adoption of Android.

Apple improving with mobiles.

RIM/Blackberry are focusing on development.

Building an Android smartphone involves:

  • Choosing an OS.
  • OEMs design phone (HW, UX, required changes to the Android OS UX) plus apps to include.
  • Marketing.
  • Negotiations with MNOs to arrange bulk discounts.
  • Production test units sent to MNOs, app developers and Google.
  • Phone is shipped to MNOs and independent stores
  • MNOs customise with their own apps.
  • Sales to public plus marketing.

Can also go down the stock phone route.

Inception to market is approximately around 6 months when based on an existing OS.

Apple is a little easier to to the locked-in syndrome.

Types of Phone

  • Basic Phones
    • Call and SMS support only.
  • Feature Phones
    • No touch screen
    • Browser becoming standardised.
  • Smart Phones
    • Definition changes year-on-year.
    • Expectations change a lot.
    • More sensors and technology.
  • Phablets
    • Large smartphones.
  • Tablets
    • Same functionality as smartphones, often without SIM.
  • Slates
    • Large tablets.
  • Non-phone devices
    • iPod touch
    • eBook readers
    • Google Glass
    • Watches like the Apple Pebble.
    • Rely on bluetooth or similar.

Operating System Vendors

Android leading market player.

iOS popular.

Symbian (dead now) and Blackberry not doing well.

Windows Phone a late starter.

Android Fragmentation

There are 24 listed Android device manufacturers but at least 259 different device models currently on sale.

Lots of different screen sizes, resolutions, hardware capability, etc.

There is only one manufacturer for Apple products.

Version 4.1+ Jelly Bean most prevalent, but versions 4.0, 3.0 and 2.3 are still a large market share.

Security problems, most malware attacks on Android.

iOS is mostly all the same version. However, companies often prescribe a 2-3 purchase cycle. Corporate world might need to take into account older versions.

Google allow fragmentation to make Android more interesting to OEMs, However this can get in the way of updates.

Webkit configuration varies; browser fragmentation.

Tried to stop fragmentation in version 3, but this lead to new forks. So this has been re-introduced in version 4.

Fragmentation and Differentiation

Differentiation gives a unique selling point (USP) for vendors. Sticking with the same model.

Fragmentation leads to security issues and makes the OS more complex to handle screen sizes, etc. More testing needed. Therefore increased cost.

Windows Phone 8

Microsoft and Nokia. A closed environment (or “walled garden”).

No fragmentation/differentiation. Very few OEMs interested.

How to join the OS game?

Buy into it.

Web as OS platform. Firefox OS is trying hard with cheap alternatives. HTML5, JS, CSS3 and APIs to access hardware.

Very fragmented, depending on rendering engine.

Mobile Webkits are not always the same

Can be configured differently. Must be careful with mobile websites.

Depends on location too (UC Browser more popular than Android in Asia).

Proxy versus full browsers.

Opera mini, for example.

  • HTTP request (encrypted)
  • Proxy Server
    • Target HTTP Server resources
    • Renders page
    • Compresses into an image map
  • Image map returned to Opera mini (encrypted).

Cheaper and faster over a slow network. Very popular in certain developing countries with low bandwidth.

Cheap in terms in processing too.

JS is handled by the proxy server. Not always perfect and may not be enabled.

Content Providers

App Stores

  • Apple App Store is a walled garden. 70% developer revenue.
  • Google Play. 70% developer revenue.
  • Amazon Android Appstore.
  • Others (apk) security issues with this.

Mobile Web

Basic tools, responsive web, matching platform expectations, testing, delivery. (5 lectures plus practicals)

Objectives

  • To be able to build some simple mobile web sites/applications and view them via desktop browsers, mobile browsers and as mobile applications (hybrid apps).
  • To gain an insight into design issues especially multiple screen sizes and browser support.
  • To use a common mobile web framework to help build a mobile web application.
  • To user PhoneGap™ to turn a webb application into a “native” application which can be deployed.
  • To use some common testing technologies for mobile web.

Overview

  • Is mobile web being exploited?
  • Three main approaches:
    1. Web site
    2. Hybrid App
    3. Native App
  • Challenges for mobile web developers
  • Standards are a moving target.
  • How can we be informed what support there is?
  • Given a complex situation, how can we make testing tractable?

Core philosophy

Progressive enhancement and not graceful degradation.

Web site/app Approach

Pros

  • One codebase
  • Cross platform
  • Familiar technology
  • Easy to distribute to both mobile and non-mobile

Cons

  • Lack app store distributions
  • Weak access to device services
  • Performance (JS interpretation).
  • Not as slick as a native app

Hybrid Applications

Native wrapper around HTML and CSS and JS.

Pros

  • One codebase
  • Cross platform
  • Better accesses to device services
  • Can distribute via app stores
  • Familiar technologies

Cons

  • No full access to device services
  • Performance
  • Not as slick as native apps

Native Applications

Pros

  • Highly polished
  • Good performance (potentially)
  • Good developer support platform
  • App store distribution.

Cons

  • Platform specific
  • Multiple codebases

Progressive Enhancement not Graceful Degradation

Graceful degradation (of a website):

  • Focus is on building a great experience for the latest browsers, testing one browser version back.
  • Old browsers provide poor, but passable, experience, only show-stopped bugs fixed.

Progressive enhancement (of website):

  • Focus on content, not browsers.

Responsive Web Design (RWD)

  • Fluid grids
  • Media queries - use em over px.

Mobile virtual viewports

Mobiles have a virtual viewport which bypass the media queries.

Use the viewport meta-tag to ask the viewport width to be the same as the actual width.

Breakpoints

Research suggest lines of text should be between 45-75 characters for comfortable reading. Columns help with this.

Mobile-first progressive enhancement.

Design site for the mobile device. If media queries are supported and on a bigger screen, add to the existing CSS for mobile to add for the styling rules and changing font sizes.

Simplification of complex designs is harder than making a simple design more complex.

Server-side Detection of Device Capabilities

Client side RWD alone may not be sufficient.

There are some server-side approaches:

  • User-agent sniffing.
  • Client-side reconnaissance missions.

Classes of “device”.

RESS (Responsibe Web Design and Server-side Components).

The problem

Some services will only run on a mobile device (Phone calls, accelerometer).

Some services are more significant on a mobile device (location awareness, etc.), be careful not to run into the Hide and Cry anti-pattern.

Some classes of device are very different (feature versus smart phone).

What we spit out is so different that RWD alone won’y help. Need variants of the website.

Image sizing issues that RWD won’t solve.

User-agent sniffing

Can use HTTP User-Agent request header with a device description repository to work out:

  • Device hardware characteristics
  • Operating System
  • Browser characteristics

Example: Detecting viewport width

var width = (winder.innerWidth > 0) ? Window.innderWidth : screen.width;
document.cookie = "sitewidth="width;
document.location.reload();
if(isset($_COOKIE["sitewidth"]) {
  $_SESSION["screenwidth"] = $_COOKIE["sitewidth"];
}
// ...
if($_SESSION["screenwidth"] <= "380") {
  include("includes/mobile-nav.inc.php");
} else {
  include("includes/desktop-nav.inc.php");
}

Herding Devices into Device Classes

Device Class: An abstract collection of common characteristics of similar devices and their browsers.

Why? Because we don’t want our server-side code to have to deal with every device (this isn’t scalable).

Typical Classes

  • Higher Mobile (WebKit-based, >= 320px width)
  • Simpler Mobile (minimal JS support, >176px width)
  • Tablets
  • Desktop
  • Unsupported (if no SLL, Cookies, JS, or < 176px width)

Does unsupported break progressive enhancement?

What to send?

  • Stylesheets and Code based on class
  • Sometimes single codebase is too small (watch out for duplication of common code).
  • RESS can help with codebase maintenance.

Responsive Web Design and Server Side Components

Combine client-side RWD with templating in the codebase for different classes of device.

Sometimes HTML5, CSS3 and JS isn’t enough

Many HTML5 features aren’t widely supported.

Can’t publish to application stores.

Perhaps go for a hybridge: part native, part HTML.

PhoneGap/Cordova is the best known hybrid technology.

IOS Native Apps

Objective-C, relevant design patterns, platform design considerations, handling data, using sensors and location (5 lectures plus practicals)

Objective-C

Based on C and Object-Orientation (based on Smalltalk), has basic rules of C but most of the syntax is smalltalk-based.

Some repeated information is involved, but this is improving.

Uses late binding (names are looked up at runtime, not compile time).

Good, but unusual, object syntax with some useful extensions and libraries.

Verbose (A: yay boilerplate), designed to be descriptive (A: yay dot-enter programming).

Likely to change with each release (good and bad).

Basic Syntax and Objects

Logging and Strings
NSLog(@"Message");

NSString *var = @"Some message";
NSLog(@"Message: %@.", var);
Booleans
BOOL true = YES;
BOOL false = NO;
Numbers
NSNumber *num = @10;

// Conversion to unsigned int
NSUInteger uInt = [num unsignedIntegerValue];

Arrays

NSArray *arr = @[...];

// Accessing elements
arr[index];

// Something about NSArrays being immutable objects.

Dictionaries

NSDictionary *dict = @{@"key": @"value"};

// Accessing
NSString *val = dict[@"key"];

Objects and Messages

Messages are Objective-C’s version of methods, where the Java syntax would be:

ObjectName.methodName()

The Objective-C syntax is:

[objectName messageName]

Example:

NSArray *temp = @[@"Test", @"2"];

NSLog(@"%@", [temp description]);
// Logs information about temp
Storing the output of Messages

Just need to worry about typing

NSArray *temp = @[@"Test", @"2"];
NSString *result = [temp description]
NSLog(result);

Parameters

Messages can have parameters, however they may act differently than expected.

[object methodName:parameter namedParameter:value]
Initialisation
[[Class alloc] init];

Conditionals and Loops

if(condition) {
    // ...
} else if(condition) {
    // ...
} else {
    // ...
}
switch(thing) {
    case value:
        // ...
        break;
}
for(Class *item in iterable) {
    // ...
}

Blocks

Blocks are like basic methods.

void (^block)(void) = ^{
    // ...
};

Inkoving a block:

block();

Classes

In a header file (.h):

@interface Name : NSObject
@property Type *name;
@property NSString *myName;
@end

Implementing the interface (.m):

#import Name.h

@implementation Name
@end

Accessing properties

Name *name = [[Name alloc] init];
name.myName = @"Value";
Method Signatures

In many languages, you might see:

void m(int x, int y, String text)

A similar method in Obj-C is:

- (void) mAtX: (NSInteger) x
         atY: (NSInteger) y
         withText: (NSString *) text

Calling this:

[obj mAtX: 2 atY: 32 withText @"a message"]

The method name is: mAtX:atY:withText

Properties

Defined with the @property syntax.

The attributes in the () control how the property manages the underlying data value.

The IBOutlet is only used on properties that are linked to the user interface.

atomic is used for multi-threading purposes (read locks, etc.).

Under the bonnet, this is an instance variable (ivar).

Good practise is to cover your ears and pretend they’re just properties.

Instances
Example *inst;
inst = [[Example alloc] init];
// or
inst = [Example new];

alloc and init is the typical Obj-C way as classes often have different initialisation techniques. alloc is a class method, init is an instance method.

Protocols

Similar to Java interfaces. There are some features where you can have required and optional methods.

Optional methods act as empty methods if they are not implemented.

Set in the header file (.h)

Primitive Types

Primitive types from C can be used in Obj-C, e.g. int, long, float, double.

Additional types:

  1. BOOL (YES or NO)
  2. NSInteger (typedef to either long or int depending on architecture)
  3. CGFloat (typedef to either float or double depending on architecture)

Memory Management

Memory management is achieved by checking the number of references to an object.

In older code, you needed to write lines of code to state when you wanted to keep an object in memory.

Modern applications use ARC (Automatic Reference Counting).

The compile tracks when objects should be kept in memory or removed from the application.

Application Structure

Typical project implementations have a simple structure which corresponds to MVC

AppDelegate accesses the model (if there is one).

The storyboard is the view, describe many screens in a single file.

ControllerView is the controller.

Developing an Application

  1. Make a new view-based progect,
  2. Add extra views in the Storyboard,
  3. Design the application by placing buttons, etc. allows easy prototyping
  4. Make new ViewController files and link them to the views
  5. Via drag and drop, define Actions and Outlets in the ViewController and link outlets and actions to interface (associate views to controllers)
  6. Implement model separately (if complex enough).

Rule of Thumb

If your newly created app crashes when trying to load a new screen, then you usually have:

  1. Messed up linking the view to the view controller.
  2. Done something wrong when linking an outlet or action to a screen.

The error message should give you a clue as to what is wrong.

Types of Interfaces

  • Tabbed screens
  • Tables
  • Linked Views
  • More examples from iOSDevUK

You can put each of the views into the storyboard and define how you segue from screen to screen then write code to pass data between them.

Tabbed Screens

Tables

UITableView is used to display repeating section of data. UITableViewCell is a cell in the table.

Unlike classic tables on other platforms, there is only a single column.

Tables contains many sections, which in turn contains many rows.

The entire table can have a header and footer. Sections can also have a header and footer.

Each table must have at least one section to be complete.

Table Styles

Table style is set at build time only.

Plain

Cells flush to each other

Grouped

Not in iOS7.

Cells

Each row contains a cell.

When a row’s data is to be displayed, the UITableView will request the cell.

By default, the UITableViewCell is used to degine a cell.

The cell is initialised in the method: tableView:cellForRowAtndexPath:

A table can contain different types of cells for different rows.

The cell is reused.

Cell Structure
  • Content
    • Layout depends on Style
      • UITableViewCellStyleDefault
      • UITableViewCellStyleSubtitle
      • UITableViewCellStyle1 (right detail)
      • UITableViewCellStyle2 (left detail)
  • Accessory
    • Disclosure Indicator
    • Detail Disclosure Button
    • Check Mark
Custom Cells

Content of a cell can be defined in code, but is tedious.

It is more likely that IB will be used to define the layout for the cell and use a View Controller to control the behaviour.

Table View Contoller

UITableViewContoller is a subclass of UIViewControler that is designed to manage UITableViews.

If a new view based controller is created, the XIB will contain the table linked to this controller.

In order to provide custom behaviour, the following methods needed to be implemented:

  • UITableViewDelegate
    • Manage selections,
    • Configure section headings and footers,
    • Help to delete and reorder cells,
    • Perform other actions
    • (All optional)
  • UITableViewDataSource
    • The data source provides the table-view object with the information it needs to construct and modify a table view. Two are required.

Linked Views

Data Storage on iOS

Techniques for storing and retrieving data on the iPhone.

XML and SQLite provide the best portability.

NSUserDefaults

User preferences.

Persistent map.

Property List Serialisation

Mechanism to store a selected number of data types in a property list (plist).

NSCoder

Technique to save more complex objects.

XML

XML data files.

SQLite

Small memory footprint SQL database.

Core Data

More complex system, but offering a powerful way to manage data and relationships. Best integration with iCloud.

Application Sandbox

iOS applications are run in a protected area on the device. Within this area there is a defined structure of Directories that you can access.

The developer is responsible for managing files that are created within the application area.

SQLite 3

Database and library available in iOS.

C library, which needs to be included in the project.

Create a SQLite database on the command line and include in the project.

Need to marshal data into and out of the database, with some conversions (e.g. Strings).

Where to store the database

By default, your database is in the default bundle, inside your application.

If read-only data is needed, it can remain there.

Write access needs to be moved to the Documents directory.

Android Native Apps

Relevant design patterns, platform design considerations, handling data, using sensors and location, handling device variation (5 lectures plus practicals)

Introducing Android

Android Inc. started developing an open source mobile OS, acquired by Google in 2005.

Uses the Dalvik Virtual Machine:

  • Google
  • Virtual Machine and Dex Language
  • Linux kernel
  • Apache Harmony class libraries.

The effect of OS segmentation

The Android platform is currently changing very quickly.

Each manufacturer has a coperate theme to it.

Why Android?

  • Popularity and market saturation.

Android IDEs

  • Eclipse and ADT.
  • Android Studio.

Application Space

There are other apps which are isolated from the application, but are able to communicate with intents.

Linux Kernel and Android OS

Operating system process

Owned by the application.

Dalvik Virtual Machine

A special Java VM for running on Mobile Devices.

May be moved to the ART (Android Runtime).

Application

No single main method. The way that Android apps are developed, and the model following, doesn’t have a concept of a start point or end point, rather it has the idea of the behaviours.

Activity

A single screen layout and the code behind.

A functional piece of UI.

Service

A background process.

Does not have a UI, may interact with them though.

Content Provider

A way of sharing a local database.

Queries and cursors used to interact.

Broadcast Reciever

Application Lifecycle

Memory of mobile devices is much more limited than on PCs or servers.

Short battery life is an issue.

Lots of apps vying for limited resources.

Android Manages Memory Carefully

Android has the opportunity to remove resorces from memory when screens are switched.

If necessary a whole application and its process will be removed.

Need to be able to reincarnate apps.

List Views

Used to display scrollable lists of data.

List Activity

Extended by classes to handle clicks, etc.

Acts as a controller in an MVC sense.

List View

Used by the List Activity

Acts as a view in an MVC sense. Does the hard work of sizing rows, etc.

List Adapter

Created by the controller and acts as the data source for the list view.

Acts as a model in the MVC pattern.

Array Adapter <T>

Simple array of type T.

Generally a hard-coded list of static data.

Simple Adapter

A list of maps, where the map keys relate to UI element IDs.

Simple Cursor Adapter

A dynamic adapter which is based on a cursor of an iteratable object.

Filesystem

Given a private filestore under: /data/data/package-path

Public

On SD Card

Private

Stored under private filestore.

/data/data/package-path/files

Access

Apps access their private files folder with: Context.getFilesDir();

Read/Write

Context.openFileOutput(String name, int mode);

Context.openFileInput(String name)

Media

On SD Card under the standard location for media files.

Temporary

/data/data/package-path/cache

Shared Preferences

/data/data/package-path/shared-prefs

XML only

Databases

/data/data/package-path/databases

Bringing it all together

Legal, social, ethical and professional issues. Selecting platforms and delivery methods, getting the interface right. (3 lectures)

Motivations

The BCS, under its Royal Chater, is required to establish and maintain standards of competence, conduct and ethical practice for information systems professionals.

The ability to recognised the legal, social, ethical and professional issues involved in the exploitation of computer technology and be guided by the adoption of appropriate professional, ethical and legal practises.

Which laws relate to the use of computers?

Data Protection Act

The Equality Act

Copyright, Design and Patents Act

Contract Law and Licenses

Computer Misuse Act

Social Issues

What is the impact of computers on society?

Ethical Issues

How should we act?

Professional Issues

What are our professional responsibilities?

SEM5640 - Developing Advanced Internet-Based Applications

This module builds on SE31520 examining further the development processes, techniques and technologies for constructing Internet-based, distributed software systems. Particular emphasis is placed on studying and applying enterprise design patterns. Students, working in small groups, will build, using a modern software engineering methodology, a distributed application that employs the technologies and concepts taught in this and previous modules.

Nigel Hardy awesome tally: 2

Introduction

Provides a road-map to the module, with a brief overview of main concepts and how they interrelate.

Learning Outcomes

  1. Analyse a complex software engineering problem.
  2. Within a group design and implement an original solution to the problem.
  3. Test and critically evaluate their solution.
  4. Critically assess the relative merits of various software development methodologies within the context of the given problem and nature of the project team.
  5. Working in a group, apply the chosen software development methodology to solve the given problem.
  6. Critically explain the relative merits of alternative server-side technologies.
  7. Demonstrate a critical appreciation of design issues encountered when developing multi-tier, distributed applications.

Assessment

  • Group Programming project and report 75%
  • Exam 25%

Internet Applications

  • Uses network connection
  • Browser-based
  • Client-server model
  • Reliance of collection of internet standards (e.g. HTTP)

Advanced Internet Applications

  • AJAX/Plugins support to provide a client rich experience
  • Multiple Application/Servers
  • Distributed or more complex processing
  • Transaction (DB) support
  • REST/SOAP or other remote procedure calls

No precise definition.

Overview

There are several themes this module will look at:

  • Multi-Tier Internet Applications
  • Developing using .NET and Java EE
  • Design Patterns
    • MVC
    • ORM
  • Web Services
    • REST
    • SOAP
    • RMI/RPC
  • Building Applications
    • Cloud Computing
    • Structure and deployment

Multi-Tier Internet Applications

Applications that involve multiple servers which run different parts of the application so that the business logic is separated from the presentation logic.

Different parts tend to include:

  • Client
  • Web
  • Business Logic
  • Data/EIS

Lots of small applications which build up a larger, more complex, application.

Communication between the servers provides the linkage between the applications.

Supports scalability if done correctly. Also more resistant to failure. Potentially more secure (more important servers firewalled).

Should all be transparent to the user.

Java Enterprise Edition (J2EE)

  • Building web applications
  • JavaServer Faces
  • Enterprise Java Beans
  • Portability

.NET

  • Building web applications
  • Web Forms, MVC
  • C#
  • Tiers and layers

Design Patterns

Two types of issues:

Patterns of Enterprise Architecture
  • Building and structuring the application
  • Relate to .NET and Java
Patterns of Enterprise Integration
  • Making applications communicate
  • Examples:
    • Apache Camel
    • Apache ActiveMQ

Building Applications

Using Cloud computing and how relevant it really is.

Examples from eBay of structure and deployment.

Web Services

Application that provide services which have destination endpoints with operations.

SOAP XML, WSDL and REST are the big players.

Alternatives to REST?

Construction Technologies

Examination of technical solutions for building enterprise applications as provided by enterprise Java and .NET for building distributed applications.

These include:

  • support for server-side MVC, e.g. JavaServer Faces and ASP.NET MVC;
  • business objects, e.g. Enterprise JavaBeans;
  • object to relational mapping, e.g. MS LINQ, Java Persistence API;
  • messaging systems, e.g. Java Messaging Service;
  • interoperability, e.g. SOAP-based web services.

Java Enterprise Edition (Java EE)

Java is a language (JDK) and a platform (JRE).

The platform is the Java Virtual Machine and a set of APIs.

Four platforms:

  1. Java Standard Edition (SE)
  2. Java Enterprise Edition (EE)
  3. Java Mobile Edition (ME)
  4. Java FX

Java EE

A superset of Java Set, providing a development model, APIs and runtime environment (the server).

“For developing and running large-scale, multi-tiered, scalable, reliable and secure network systems” - The Oracle BS

Java EE Web Tier

Access the web tier with HTTP(S).

Provides static resources and dynamic page generation. Handles client input and navigation flow with maintance of state.

JavaBeans provide temporary storage on the Web Tier.

Can do simple logic, could build the whole application in this tier, but would be a bad design.

Java EE Business Tier

Provides Enterprise Java Beans (EJB) components.

Access to Java Logic, etc. which provide the business logic.

Java EE Enterprise Information System (EIS) Tier

Logically and typically on other machines, typically “foreign” technology.

Other Clients

Can build Java Applications or Web Applications which access the EJB components directly. Uses remote procedure calls.

Java EE Application Server

Provides:

  • The JVM and core APIs
  • The EE APIs
  • Component containers
    • component lifecycle management
      • creation
      • association
    • services to components
      • security
      • transaction management
      • directory lookup
      • remote connectivity
Containers

A set of Java classes packaged into a jar file.

Can be deployed using either:

  • XML descriptors
  • Java annotations

Annotations are widely used, but can be overriden by descriptors.

Components plus deployment descriptors make up a module, each for one container type.

Four types of EE modules:

  • Web modules .war
  • EJB modules .jar
  • Application client modules .jar
  • Resource Adapter Modules .rar

A set of modules makes up a Enterprise archive: .ear

Servlets

Servlets are the olderest EE web component.

Servlets act as a dynamic page, which have a URL, and must receive a HTTP request, from this it generates a HTTP response.

Is a Java object with a managed lifecycle.

Extension of javax.servlet.http.HttpServlet class.

Lifecycle

Managed by the web container.

The first call is:

  • Load class
  • Create an instance
  • Call init()

Any call:

  • Handle the HTTP request

Perhaps calls destroy()

Only one instance is needed by a running server.

Handling Requests

The HttpServlet class has stubs for all the HTTP methods doGet(...) etc.

service() method used to handle the finding of appropriate methods for the HTTP headers.

Common to use the same method to handle GET and POST (for some strange reason, this sounds bad).

Request

Provides access to:

  • Call details
  • Parameters
  • Body
  • A reader
  • Context
Response

Provides access to:

  • Response code
  • Headers
  • A writer (for the body)
Forwarding

Can forward requests to other servlets, passing on the request and response objects. Used for branching on context, for example.

RequestDispatcher d = req.getRequestDispatcher('name');
d.forward(req, resp)
Instance Variables and Servlets

A servlet could have instance variables, however these variables are not specific to a connection, rather the whole servlet.

There may also be multiple servlets so there is no guarantee that these will be the same.

Need features for session data.

Reminder that HTTP is stateless.

Sessions

The request must carry some identification of the session, server usually generates this information.

  • Parameters
  • Cookies

Remember EU Cookie Law

Error Redirects

Specified in the web.xml descriptor.

Useful Features

  • Templating
  • Validation & Schemas
  • i18n
  • Security
  • Database access
  • Support for good design patterns
  • Tight Integration
  • Testing Frameworks
  • Web Service support

Generally: higher level libraries and code reuse.

i18n and L10n

Internationalisation and localisation are both important. Don’t hardcode for the locale, instead store translations in DB or files and load them dynamically.

Authentication

Authorization

Testing

Use JUnit where possible (with mocks, stubs, etc.)

Can test Java Beans using an embedded EJB Container.

Service testing is slightly more difficult

Service Testing

Testing the resources in an automated way to avoid manual input.

Provides rapid and reliable regression testing.

Handles:

  • HTTP(S)
  • FTP
  • LDAP
  • JMS
  • Mail
  • SOAP

Specify calls, with parameters or sequences.

Selenium or JMeter are options (JMeter also does load testing).

JMeter also has a NetBeans plugin.

.NET

Microsoft framework for developing and deploying applications with:

  • Cloud (Azure)
  • Web Server
  • Desktop
  • Restricted Resource Devices and Mobile Phones

Similarities with Java (JRE).

Focused on Windows, but can run on other platforms through mono.

Overview

Common Language Runtime (CLR)

The .NET JRE effectively.

Loads and executes .NET “managed code”.

CLR loads code into application domains to provide a level of isolation and the ability to stop and remove a domain.

Provides Just In Time (JIT) compilation, memory management and GC.

Security based one roles and defined by policies.

Building and Deploying Code
  1. Compile source code files.
  2. Combine IL (MSIL), metadata, config files and other resources into assembly files.
    • DLL or EXE extensions
      • DLL is a library module
      • EXE is only type that will activate the CLR
  3. Deploy Applications to CLR

Internet Information Services (IIS)

The application server which runs ASP.NET applications.

Server-Side MVC

Methods for implementing the MVC pattern in an Web Application.

JavaServer Faces (JSF)

Provides:

  • Templating
    • Facelets
    • JavaServer Pages (JSP)
  • Validators
  • MVC Pull
  • Navigation
    • Fixed
    • Dynamic
  • AJAX
  • i18n/L10n
  • Session Management
  • DB Access
    • JDBC
    • ORM frameworks
  • Testing Frameworks
  • Code reuse
  • Web Services and resources
Basic Technology

Basic Api for represention components and managing their state; handling events, server-side validation and data conversion; defining page navigation; supporting i18n and accessibility; and providing extensibility.

A tag library for adding components to web pages and connecting components to server-side objects.

Expression Language

Like most scripting languages $ and # are used.

${expr} are rvalues (read-only) and are processed immediately.

#{expr} are *lvalues (read-write) and have deferred evaluation. Also used for method expressions.

${object.variable} calls object.getVariable() under the covers.

#{object.method} calls object.method().

Faces

A replacement for JavaServer Pages (which are now deprecated).

Write (X)HTML with special tags which have an XML appearance which will be rendered as HTML with functionality behind it.

<h:body> is the top level tag.

All pages will tend to act like a form, so <h:form> is very commonly used, even in cases where there would not normally be a form.

Common Attributes
  • id
  • style to associate CSS
  • rendered is a condition to check is the element should be rendered on the page. The condition can be an expression from the Expression Language.
  • value the value of the component, again this can be from the Expression Language, which links the view to the model.
Composite Components

Built from standard components.

These can be parametrised and stored in resources; they have a namespace (default: http://xmlns.jcp.org/jsf/composite/ezcomp)

<!DOCTYUPE html PUBLIC ...>
<html xmlns="..."
      xmlns:cc = "..."
      xmlns:h = "...">

<!-- INTERFACE -->
<cc:interface>
  <cc:attribute name="x" required="true" />
</cc:interface>

<!-- IMPLEMENTATION -->
<cc:implementation>
  <p>
    <h:outputLabel for="i" value="Repeats" />
    <h:inputText id="i" value="{cc.attrs.x}" />
    <h:message for="repeat" />
  </p>
</cc:implementation>
Templating

Can create template pages which has values to fill in. The inheriting page has to fill in these values.

Write navigation rules in XML, each rule is a page (view). For each page there is a set of outcomes (in string form). For each outcome a next page is specified.

Typically stored in faces-config.xml

Outcomes are generated from the action attributes from components.

There are implicit navigation rules which falls back to a matching page if it can be found.

Converters

Implement javax.faces.convert.Converter

Have associated error string displayed if conversion fails (<h:messages>).

Number of default converters for most Java Objects and primitives.

Converters are used in four ways:

  1. Bind the component to a managed bean property of the right type (default).
  2. Put the converter class in the component’s converter attribute.
  3. For numbers and dates, nest the f:convertDateTime or f:convertNumber tag inside the component.
  4. Put the f:converter tag inside the component and refer to the converter (general purpose tag).
Events & Listeners

Part of the component model, application events are generated by components (ultimately from the rendered page).

The JSF application can map HTTP requests to the specific handling code.

Two types of event:

  1. Application event; when the user activates a component that implements ActionSource (buttons, links, etc.).
  2. Value-change event; when the users changes the value of a component represented by UIInput.

Listeners cause the application to respond to events:

  1. Implement an event listener class to handle the event and register it to a component.
  2. Implement a method of a managed bean and refer to it in the EL attribute in the component tag.
Validators

Similar to converters, but only provide validation.

Implement javax.faces.validator.Validator

Again, default validators are provided.

Registered by:

  1. nesting the validator’s tag inside the component
  2. Nesting the f:validator tag inside the component.

BeanValidator uses validation methods in the bean instead of having to write a separate class. Also allows the validation to be model-state specific.

Bean Validation
public void validateVar(FacesContext context, UIComponent toValidate, Object value) {
    int input = (Integer) value; // cast the given value
    if(!condition) {
      ((UIInput) toValidate).setValid(false);
      FacesMessage message = newFacesMessage(
        "error message");
      context.addMessage(
        toValidate.getClientId(context), 
        message);
    }
}
Lifecycle

Starts when the client request a URL and ends with the server response.

ASP.NET

.NET Applications for the web, which uses:

  • Web Forms - Event based
  • Web Services - Application to Application
  • MVC
  • ASP.NET AJAX
View State

Stores the entire state in a hidden item in the form. Simplistic, but does affect performance and security.

Sent in base64.

Can disable for the entire page, or for specific elements.

Also have the ability to encrypt, EnableViewStateMac or store server-side.

Session State

There are different techniques which can be used to store session state:

  • Cookies (bad)
  • REST Requests (not stateful)
  • Server Side session tokens (persistence/scalability questions)

ASP.NET uses server side session tokens:

String myVallue = "A. Name";
Session["name"] = myValue;
myVale = (String) Session["name"];

Can configure where the data is held. Can also turn off the requirement for cookies.

In Process (InProc)

Same as ASP.

SQL Server (SQLServer)

Store state in Microsoft’s SQL Server, which can be on another machine.

StateServer

Store in a separate process, which can be on another machine.

Custom

Write your own mechanism.

Off
System.Web.UI.Page

Default base class for code-bheind filels.

Implements System.Web.IHttpHandler:

  • IsReusable() - true if the handler can be pooled
  • ProcessRequest( - Processes the actual HTTP request.
HTTP Pipeline

HTTP Pipeline

GridView and SQL DataSource

Ways of display DB data.

Business Objects

Objects which relate to the business layer of a multi-tier application.

Enterprise Java Beans (EJB)

EJBs act as an ORM framework, where the Java Objects are annotated with the DB schema details.

Three types of EJB Session:

  1. Stateful
  2. Stateless
  3. Singleton

Can also have message-driven beans.

Access can be done through:

  • No-interface view (all the public methods are available)
  • Through a business interface
  • Injection
  • JNDI
Transactions

A transaction is a unit of work which supports “ACID”.

Atomic

Every task within a unit of work must complete successfully otherwise the transaction is aborted

Consistent

Atomicity, isolation, durability lead to consistent data. Developer must also define database consistency checks.

Isolated

Preventing interference from other transactions.

Durable

Data written to disc before a transaction can fully complete.

EJB Transactions

In EJB Transactions have a scope. A transaction manager manages this process.

EJBs support two types of transactions:

  • Container managed
  • Bean Managed
Container Managed Transactions

Transactional attributed specified using annotations or in the deployment descriptor.

Outside of the EJB code.

Reccommended

Bean Managed Tranasctions

Annotate EJB class with:

@TransactionManagement(TransactionManagerType.BEAN)

Offers fine grained control.

EJB Transactions Attributes

Six types of transaction attributes, specified by annotations or descriptor.

The default is Required.

Can annotate the class to set the default for the class.

NotSupported

Transactions are not supported by the method, any current transaction will be suspended during this method call and resumed once it terminates.

Supports

Follows the caller; if it was in a transaction, it will remain so, if it was not, this will not create a new one.

Required

The method must be part of a transaction.

If the caller was not in a transaction, this will start one.

RequiresNew

This method will always crate a new transaction. An existing one will be suspended and will resume after this is complete.

The outer transaction will not roll-back if this one does.

Useful to nest transactions

Mandatory

Must already be in a transaction, otherwise a TransactionRequiredException will be thrown.

Never

Must never be in a transaction, otherwise a RemoteException will be thrown.

EJB Isolation

Problems:

  • Dirty Reads
  • Non-repeatable Reads
  • Phantom Reads

Provide four (fairly standard) levels of transactions:

  • TRANSACTION_READ_UNCOMMITTED
  • TRANSACTION_READ_COMMITTED
  • TRANSACTION_REPEATABLE_READ
  • TRANSACTION_SERIALIZABLE

In bean-managed transaction, can specify isolation using the JDBC API:

connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE) ;

Using serializable guarantees data consistency, but may affect performance.

Read Uncommitted

Uncommitted changes are visible to other transactions, other transactions can change rows that another transaction has read.

May result in inconsistencies as a rollback by T1 may mean that T2s data is out of date.

Read Committed

Rows updated in a transaction cannot be read by another transaction, but rows read by one transaction can be changed by another.

Rows updated by T1 cannot be read by T2.

Rows read by T1 can be updated by T1 or T2.

Repeatable Read

Rows read by one transaction cannot be updated by any transactions.

Rows read by T1 cannot be updated by T1 or T2.

Serializable

Transaction (appears) to have a lock on the data.

EJB Transactions with Exceptions

Checked exceptions (including application exceptions) do not cause an automatic rollback, but can be set using @ApplicationException(rollback=true)

Unchecked exceptions and RemoteExceptions automatically cause a rollback and the EJB instance is discarded.

Injections

Getting a reference to object instances without instantiation or passing in as parameters, providing good decoupling.

There are two mechanisms for this:

  1. Resource Injection
  2. Dependency Injection

Annotations to fields or methods specify the injection points.

Provides a runtime binding which provides inversion of control.

Compatible abstractions are required to avoid typing errors; the code using an object is not responsible for choosing an implementation or creating the instance.

Injection is often the implementation for this, but can use:

  • Factories
  • Service Locators
  • Contextualised lookup
Resource Injection

Inject references to objects in the JNDI namespace into any container-managed object.

Gets a reference to a resource using:

  • Annotations
  • Deployment Descriptor
  • InitialContext lookup

Inject as an interface type, code independent of specific implementation.

Not type safe!

Variable Resource Injection
public class MyServlet extends HttpServlet
  @Resource(name="java:comp/DfeaultDataSource")
  private javax.sql.DataSource dsc;
}
Method Resource Injection
public class MyServlet extends HttpServlet {
  private javax.sql.DataSource dsc;

  @Resource(name="java:comp/DefaultDataSource")
  public void setDsc(javax.sql.DataSource dsc) {
    this.dsc = dsc
  }
}

Method name must being with set, with a void return type and only one parameter.

Object Relational Mapping (ORM)

Libraries which provide a mapping from a relational database to objects in an object-orientated paradigm.

Language-Integrated Query (LINQ)

A query language for .NET languages which is akin to SQL.

Java Persistence API

Access to database at SQL level or through an ORM Framework

JDBC

Provides direct access to a database.

Only provides relational records, not Objects.

Java Naming and Directory Interface (JNDI)

Access to naming and directory services, part of the Java SE API. Looks up objects by name, returning values or references.

API for the clients.

Service provider interface (SPI) wadds new directory and naming services.

Java Persistence API (JPA)

Provides an ORM mapping of Java Object using Java annotations. Provides:

  • Create
  • Retrieve
  • Modify
  • Write

(So CURD then? -A)

Entities are JavaBeans, annotated with @Entity. Automatically mapped, but can label instance variables with extra details; @Column can map Java types to SQL types.

@Id is required.

(JPA can be a real pain though, he glossed over this just a little bit -A).

Contexts and Dependency Injection (CDI)

Any Java object can be made managed by its container. The CDI defines scpoes and a managed object has a declared (or default) scope.

This can then be injected into any other managed object.

Application Scoped

@ApplicationScoped

Session Scoped

@SessionScoped

Request Scoped

@RequestScoped

Conversation Scoped

@ConversationScoped

Covers the AJAX exchanged (can be extended using an ID)

Dependent

@Dependent

Inherits the scope from the object it is inserted into.

This is the default scope.

Injection
import javax.inject.Inject

public class Printer {
  @Inject
  Greeting greeting;
}
Qualified Injection
public class Printer {
  @Inject
  @Informal
  Greeting greeting;
}

Can create a subtype of Greeting which is annotated using @Informal to allow variants of a type.

Messaging Systems

Acts as a middleware which decouples:

  • time
  • platform
  • language

Which is usually machine-to-machine.

There are a lot of standards.

Models for Messaging Systems.

  • Point-to-Point (queue)
  • Publish/Subscribe
  • Fanout
  • Request/Response

Java Messaging Service

A Java API which “allows applications to create, send, receive and read messages using reliable, asynchronous, loosely coupled communication.”

Requires an implementation.

Access Connection Factories or Queues using injection.

Message Brokers

Most patterns require middleware (all can use it).

Possible roles include:

  • Routing
  • Receiving
  • Storing
    • Persistent
  • Transmit
  • Broadcast
  • Translate
  • Monitor
Brokers

JavaEE applcation servers must have one. Glassfish has Open MQ.

.NET has one part of the service bus.

Apache Active MQ is a good offering.

Security

Authentication & Authorisation, encryption of messages, integrity of messages, transactional, etc.

Some protocols and some implementation already on offer.

Interoperability

Methods of performing B2B operation.

Socket Connections

Powerful, but not generally available as an API. Need a Domain Specific Language.

Remote Method Invocation (RMI)

Marshalling and unmarshalling are handled by generated code.

Some distribution services are provided.

Access to CORBA IIOP.

Builds on services provided by sockets.

Has the problem of specificity to Java. Even with IIOP it is still better in an intranet.

Focuses on synchronous transmission.

RESTful Web Services

Use HTTP methods (GET, POST, PUT, DELETE, PATCH and HEAD) with a URI for the API. Return any form of data; usually XML, JSON or YAML.

Very constrained on header information.

Simple Object Access Protocol (SOAP) based Web Services

Expressed as XML supporting interoperability (both Business to Business (B2B) and EAI).

Typically sent over HTTP but can use other network protocols.

Usually generated automatically using facilities in a program, but it is possible to create messages manually.

More popular than CORBA, extensible through the use of XML.

HTTP avoids firewalls.

Industry backing:

  • Miscrosoft
  • I’ve Been Moved (IBM)
  • Oracle
  • BEA
  • HP
  • SAP
  • etc.
Structure
POST /webservice/OrderProcessing HTTP/1.1
...
<?xml version="1.0" encoding="UTF8" ?>
<soapenv:Envelope xmlns:soapenv=“http://schemas.xmlsoap.org/soap/envelope/”
  xmlns:xsd=“http://www.w3.org/2001/XMLSchema”
  xmlns:ns1=“http://orders.borthdistributors.co.uk”>
  <soapenv:Header>
    <!-- Optional. Meta-data in form of header blocks. e.g.: security credentials, id, etc. -->
  </soapenv:Header>
  <soapenv:Body>
    <!-- Mandatory. Arbitrary application-specific XML data or elements that map to method/procedure calls or code that processes the XML document. Could be fault data -->
  </soapenv:Body>
</soapenv:Enveloper>
Web Services Description Language (WSDL)

SOAP manages the process of communicating information but doesn’t define operations, data types or faults of the service.

WSDL is an XML description of a service, generally autogenerated by tools.

WSDL Elements

In the specification, there are the following eleents that define a service.

  • types - the data types used that can be used in the messages
  • message - the messages that can be transmitted, which also specifies the types used
  • portType - the list of possible operations, linking the input and output messages
  • binding - the link between a portType and the specific protocol settings
  • port - the definition of a specific location that a particular binding is available.
  • service - a collection of ports.
In .NET

Generates proxy code for you. A set of asynchronous methods are provided:

BeginMethodName(...)
EndMethodName(...)
Interoperability in SOAP

How do you expose types as parameters and return values in a way which in interoperable?

Value types map onto XML Schema types, classes or structs are converted into complex types in the schema.

  • Classes must be public
  • Public fields and properties are serialised as elements
  • Methods are not exposed.

Arrays and collections are supported by this, but not all types can be mapped easily (e.g. custome types where methods are used to provide access to calculated data). (A: as you’d expect really)

Windows Communication Foundation (WCF)

Splits Application domain from communication domain.

Brings together lots of different interoperability technologies.

Adress
Binding
Contracts

Software Development Process

There is no such thing as a silver bullet for software development processes.

Testing

Methods for testing a system.

White/Black Box Testing

A basis for which testing is built on

Unit Testing

Testing the smallest possible unit.

Functional Testing

Test that a part provides an expected function.

Non-functional Testing

Tests that a part will not fail to function under certain circumstances.

Integration/Subsystem Testing

Testing larger units of work combine together.

Verification Testing

Have we built the right thing?

Does it conform to standards.

System Testing

Our testing of the entire system.

Acceptance Testing

Testing the system is acceptable for the client.

Formal Methods

Write a formal specification and then program to that specification so that something can be proved to have worked.

VDM

Z Notation

Traditional Methods

Focus on up front design before development

Waterfall Model

  1. Requirements
  2. Planning
  3. Design
  4. Development
  5. Testing
  6. Deployment
  7. Maintainace

Spiral Model

Iterative Development

Agile Methods

Focus on providing features without large amounts of up-front development and client involvement.

Extreme Programming

Promotes no-one authorship of code with refractoring.

Planning game to put features.

Can use a metaphor to describe the system.

Test Driven Development

Write tests first then produce code to pass those tests (but no more).

Feature Driven Development

Break the system down into user stories, make these stories into features which can then be developed independently.

Need some overall architecture.

Rapid Application Development

Hack it until it works.

Scrum

Self organising teams with a focus on co-location of all team members and communication.

Model-Driven Engineering

  1. Build a model
  2. Test the model
  3. Translate the model into executables

Platform Independent Model -> Platform Dependent Model.

Can use tools like executable UML.

Open Source

Methods for developing software in a more unmanaged and dynamic way.

Cathedral

A big in-house style of development where every developer understands the big picture.

Bazaar

A very dynamic way of developing, which not every developer has the big picture, but understands their small parts, usually needs some figureheads.

Crowd Sourcing

Getting a large amount of people to come up with potential solutions and selecting the best ones.

Hacking

Playing with a system and getting it to do tasks it was not necessarily designed to do.

Design Issues

A number of enterprise application design patterns will be reviewed within the context of technologies discussed in 2. Building multi-tier applications requires developers to know more than just technologies, they must also know and use design techniques suited to their development. Students will learn about reusable enterprise design patterns, and in particular those used for the development of distributed, multi-tier applications.

Domain Logic Patterns

Transaction Script

Organizes business logic by procedures where each procedure handles a single request from the presentation.

Domain Model

An object model of the domain that incorporates both behavior and data.

Table Model

A single instance that handles the business logic for all rows in a database table or view.

Service Layer

Defines an application’s boundary with a layer of services that establishes a set of available operations and coordinates the application’s response in each operation.

Data Source Architectural Patterns

Table Data Gateway

An object that acts as a Gateway to a database table. One instance handles all the rows in the table.

Row Data Gateway

An object that acts as a Gateway to a single record in a data source. There is one instance per row.

Active Record

An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.

Data Mapper

A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself.

Object-Relational Structural Patterns

Identity Field

Saves a database ID field in an object to maintain identity between an in-memory object and a database row.

Foreign Key Mapping

Maps an association between objects to a foreign key reference between tables.

Association Table Mapping

Saves an association as a table with foreign keys to the tables that are linked by the association.

Dependent Mapping

Has one class perform the database mapping for a child class.

Embedded Value

Maps an object into several fields of another object’s table.

Serialized LOB

Saves a graph of objects by serializing them into a single large object (LOB), which it stores in a database field.

Single Table Inheritance

Represents an inheritance hierarchy of classes as a single table that has columns for all the fields of the various classes.

Class Table Inheritance

Represents an inheritance hierarchy of classes with one table for each class.

Concrete Table Inheritance

Represents an inheritance hierarchy of classes with one table per concrete class in the hierarchy.

Inheritance Mappers

A structure to organize database mappers that handle inheritance hierarchies.

Object-Relational Metadata Mapping Patterns

Metadata Mapping

Holds details of object-relational mapping in metadata.

Query Object

An object that represents a database query.

Repository

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

Web Presentation Patterns

Model View Controller

Splits user interface interaction into three distinct roles.

Page Controller

An object that handles a request for a specific page or action on a Web site.

Front Controller

A controller that handles all requests for a Web site.

Template View

Renders information into HTML by embedding markers in an HTML page.

Transform View

A view that processes domain data element by element and transforms it into HTML.

Two Step View

Turns domain data into HTML in two steps: first by forming some kind of logical page, then rendering the logical page into HTML.

Application Controller

A centralized point for handling screen navigation and the flow of an application.

Distribution Patterns

Remote Facade

Provides a coarse-grained facade on fine-grained objects to improve efficiency over a network.

Data Transfer Object

An object that carries data between processes in order to reduce the number of method calls.

Offline Concurrency Patterns

Optimistic Offline Lock

Prevents conflicts between concurrent business transactions by detecting a conflict and rolling back the transaction.

Pessimistic Offline Lock

Prevents conflicts between concurrent business transactions by allowing only one business transaction at a time to access data.

Coarse-Grained Lock

Locks a set of related objects with a single lock.

Implicit Lock

Allows framework or layer supertype code to acquire offline locks.

Session State Patterns

Client Session State

Stores session state on the client.

Server Session State

Keeps the session state on a server system in a serialized form.

Database Session State

Stores session data as committed data in the database.

Base Patterns

Gateway

An object that encapsulates access to an external system or resource.

Mapper

An object that sets up a communication between two independent objects.

Layer Supertype

A type that acts as the supertype for all types in its layer.

Separated Interface

Defines an interface in a separate package from its implementation.

Registry

A well-known object that other objects can use to find common objects and services.

Value Object

A small simple object, like money or a date range, whose equality isn’t based on identity.

Money

Represents a monetary value.

Special Case

A subclass that provides special behavior for particular cases.

Plugin

Links classes during configuration rather than compilation.

Service Stub

Removes dependence upon problematic services during testing.

Record Set

An in-memory representation of tabular data.

eBay Example

Partition Everything

Split everything into manageable chunks: data, load and usage.

If you can’t split it, you can’t scale it

Motivations:

  • Scalability - can scale horizontally and independently
  • Availability - can isolate failures
  • Manageability - can decouple segments and functional areas
  • Cost - can use less expensive hardware
Functional Segmentation

Group data using standard data modelling techniques.

Logical hosting: abstract the logical representation from hosts physical location.

Support combining and splitting without code change.

Horizontal Split

Split databases horizontally along primary access path.

Multiple split patterns:

  • Hashing based on a key
  • Lookup- or range-based

Aggregation/routing in Data Access Layer:

  • Abstracts developers from split logic.
  • Routes CRUD operations to appropriate split
  • Supports rebalancing through configuration changes
No (cross-database) Transactions

Absolutely no client-side transactions, two-phase commits.

Auto-commit for vast majority of DB writes.

Anonymous PL/SQL blocks for transactions within single database.

Consistency without transactions using:

  • Careful ordering
  • Recovery through:
    • Asynchronous recovery
    • Reconciliation batch
    • Fallover to guaranteed asnyc flow.

Has the added benefits:

  • No deadlocks
  • Avoids coupling availability
  • Maximises update correctly.
No session state

User session flow moves through multiple application pools

Absolutely no session state in application tier.

Transient state maintained/references by:

  • URL rewriting
  • Cookies (sadface)
  • Scratch database

Async Everywhere

  • Scalability - can scale components independently
  • Availability - Decouples state (up/down), characteristics (always available, best effort), retry operations
  • Latency - Can improve user experience at cost of data/execution latency. Can allocate more time than the user would be willing to do.
  • Cost - can spread peak load over time.
Message Dispatch

Consumers subscribe to event.

  • Multiple logical consumers can process each event
  • Each logical consumer processes events independently
  • Within each individual consumer, single consumer instance processes event
  • Guaranteed at least once delivery, no guaranteed order.

Managing timing conditions

  • Idempotent - processing event N times should give the same results as processing once.
  • Readback - consumer typically reads back to primary database for latest data.
Search Feeder

Read and transform item updates from primary database.

Reliable multicast

  • Publish updates to search service
  • Persist messages to intermediate data store for recovery
  • Resent recovery messages when messages are missed

Search nodes listen to updates

  • Listen to assigned subset of messages
  • Update in-memory index
  • Request recovery.
Periodic Batch

Schedule offline batch processes

Most appropriate for:

  • Infrequent or scheduled processing
  • Non-incremental computation

Often drives further downstream processing through message dispatching.

Automate Everything

Prefer adaptive/automated systems to manual systems.

  • Scalability - can scale with machines, not humans
  • Availability/Latency - can adapt to changing environments
  • Cost - machines cheaper than humans
  • Functionality - consider more feature space
Adaptive Configuration

Define service-level agreement (SLA) for a given logical event.

Consumer dynamically adjusts to meet defined SLA with minimal resources.

Automatically adapt to changes in:

  • Load
  • Event processing time
  • Number of consumer instances
Machine Learning

Dynamically adapt experience

  • choose pages, modules and which provide best experience for that user and context
  • Order results by combination of demand, supply and other factors

Feedback loop enables system to learn and improve over time:

  • Collect user behaviour
  • Aggregate and analyse offline
  • Deploy updated metadata
  • Decide and serve appropriate experiences

Best practises:

  • Perturbation (anneling) for continual improvement
  • Dampening of positive feedback

Remember Everything Fails

Build all systems to be tolerant of failure.

  • Assume every operation will fail
  • Assume every resource will be unavailable
  • Detect failure as rapidly as possible
  • Recover from failure as rapidly as possible
  • Do as much as possible during failure.

Motivation:

  • Availability - duh
Failure Detection

Application servers log all requests

  • Detailed logging of all application activity

Messages broadcast on multicast message bus

Listeners automate failure detection and notification.

Rollback

Absolutely no changes to the site cannot be done.

Code deployment: Rollout/Rollback

  • Entire site rolled every 2 weeks
  • Many deployments have dependencies between pools
  • Rollout plan contains explicit set (transitive closure) of all rollout dependencies.
  • Automatic tool executes staged rollout, with built-in checkpoints and immediate rollback if necessary.

Feature deployment: Wire on/Wire off

  • Every feature has an on/off state driven by central configuration
  • Allows feature to be immediately turned off for operational or business reasons
  • Decouples code deployment from feature deployment
  • Applications can check for feature “availability” in the same way as they check for resource availability.
Markdown

Detecting that a resource is down or slow.

Application “marks down” the resource.

  • Stops it from making calls to it
  • Sends alert

Non-critical functionality is removed or ignored

  • “Limp mode” operation

Critical functionality is retired or deferred

  • Fallover to alternative resource
  • Defer process to guarantee aynch message

Explicit “markup”

  • Allows resource to be restored and brought online in a controlled way.

Wider Issues

Examination of current issues that are related to 2 and 3, e.g. scalability; approaches to testing and deployment; the use of cloud computing.

Load Balancing

Balance the load of the request across a number of different machines.

There are hardware and software solutions for performing this.

Benefits:

  • Scalability
  • Manageability
  • Server and Content balancing

Has a single point of failure.

Round Robin

Allocate in a circular pattern

Least Connections

Allocate to the machine with the least connections

Response Times

Allocate to the machine with the quickest response time.

Server Probes

Gain even more information about the servers and allocate based on this.

Weighting

Apply a weighting to each machine and allocate accordingly.

Scalability

  • Partition by function
  • Split horizontally
  • Avoid distributed transactions
  • Decouple functions asynchronously
  • Move processing to asynchronous flow
  • Virtualise at all levels
  • Cache appropriately

Scaling

  • Scale horizontally (scale out) is preferential. Adding more machines
  • Scale vertically (scale up). Upgrade machines

Cost basis - try to keep on commodity hardware.

Partition Everywhere

Functional segmentation into logical function (example is the database; split the database into the segments, continue splitting until no more splits can be made).

Horizontal split using an internal ORM implementation, requires additional routing.

Deployment

Integration

  • Integration and the enterprise
  • Distribution vs Integration
  • Coupling
    8 Types of integration
  • Main pattern topics in the EAI book

Integration and the enterprise

Integrating separate applications so that they work together.

Enterprises tend to have numerous systems which weren’t implicitly designed, for example:

  • Email
  • Databases
  • Calendars
  • Accounting
  • HR
  • Website
  • CRM
  • Warehouse Management
  • Shipping
  • Stock
  • Disaster Recovery

These may

  • Support different areas of the enterprise
  • Mergers, take overs, …
  • Different technologies
  • Legacy systems

Coupling

Thins “fused” (read dependent on another) together.

Multi-tiered applications tend to be tightly coupled.

Messaging systems offer the chance to be loosely coupled by decoupling the ends of the communication, even offering asynchronous communication.

Types of integration

  • File integration
  • Remote procedure invocation
  • Shared database
  • Messaging
File Integration

File Integration

Export data from one system and import into another using a file system.

Works with different technologies.

Issues of file format, shared file systems, timing.

Question of how data is exported and imported?

  • Timed task
  • User managed
  • Other communication

Automation problems. File locking might be required.

Shared Database

Shared Database
Collection of applications that need to share common data.

Quick access to information for each application.

DBMS removes a lot of issues from file integration.

Tightly coupling the applications to a common area.

Remote Procedure Invocation

RPC

Invoking actions in addition to sharing data, using some form of RPC methodology:

  • Sockets
  • SOAP
  • REST
  • etc.

Again needs a shared understanding of the data. No shared file format.

Timing question of when do we start the transfer.

Communication might fail, does that affect the coupling of the two applications?

Messaging

Messaging

Share data by sending Messages over a Message Channel. Application(s) pick up messages and process the information.

Fire and forget for the sender.

Message channel stores and forward to provide reliable message delivery which mimics the real world.

Message Formats
  • Custom data formats
  • Industry standards for exchange
  • XML

Messaging Systems

Example system using message systems:

Message Channel

Logical pipe between senders and receivers, the implementation of this vary:

  • Direct Connections
  • Central message hub

Each channel has an identity, typically a name.

Different channels exist for different methods of solving problems.

Point to Point

Publish and Subscribe

Messages

A unit of information containing a header and a body.

Different types of message:

  1. Messages to send commands
  2. Messages to send documents
  3. Messages to send events

There are ways to specify how long messages are relevant for

Routing

Sending messages might involve more than just forwarding.

Message Filtering

Appling filters to messages

Content-Based Router

Routing messages to other channels and/or filters based on the content of a message

Splitter and Aggregator
Recipient List
Others

See here

Endpoints

The interfaces to the senders and receivers

  • How do they link to the message channel
  • Polling consumer
  • Event driven consumer
  • Messaging gateway
  • Idempotent receiver
  • Other
Transformation

Isolate different stages and then translate later.

System Management

The challenge for these systems is how do we know what’s going on, given that there may be many different queues in a system.

Object to object communication is easy(ish) to use a debugger or log.

Web systems are a little harder, but possible to debug and monitor.

But how do you monitor message based systems?

Message History

Message History

Wire tap

Wire tap

Message Store

Message Store

Example - Widgets and Gadgets

WGRUS Ecosystem

Infrastructure

WGRUS IT Infrastructure

WGRUS has four different channels to interact with customers:

  1. Web site
  2. Call centre
  3. Fax
  4. Notification by email
Taking orders

Taking Orders From Three Difrerent Channels

Processing Orders

Activity Diagram for Order Processing

Implementing this we gets something like this:

Order Processing Implementation using Asynchronous Messaging

Looking in at the inventory request:

Routing the Inventory Request

Because an order may have many items, split the order up:

Processing Order Items Individually

To know which item is for which type, we need to enrich the data at the order stage:

Taking Orders With Enricher

The high level view of the order process now looks like this:

Revised Order Process Implementation

Checking Status

Need to know the status of the order as it can take some time to process.

Using a message store we can log the system:

Adding a Message Store To Track Order Status

In situations where point-to-point MQs are used a wire tap is needed:

Tracking Messages with a Wire Tap

A process manage can be added to manage the flow of messages in the queues, providing two main bits of functionality:

  1. Storing data between messages
  2. Keeping track of progress and deciding the next step

Processing Orders With a Process Manager

Change of Address

What happens if an address changes?

Including Address Data in the New Order Message

What happen if the address change is not part of an order?

Propagating Address Changes via a Separate Publish-Subscribe Channel

New Catalog

Updating Catalog Data via File Transfer

Announcements

Sending Announcements With a Dynamic Recipient List

Cloud

How can we use the cloud for advanced internet applications?

Overview

What is the cloud?

  • The cloud
  • Cloud computing

Principles of cloud computing

X as a service

Cloud - Computing as a Utility

Largely a new business model.

Just computers, networking and applications, but arranged different; pay for usage rather than buying the infrastructure, etc.

The benefit to business

What problems is the cloud trying to solve?

How can the cloud be used for:

  • Small to medium enterprise (SME) manufacturing business
  • Startup company looking to launch an eCommerce site
  • Company gathering data from devices

5 Aspects of Cloud Computing

  1. Pooled computing resources
  2. Virtualised computing resources
  3. Elastic scaling up or down according to need
  4. Automated creation of new virtual machines or deletion of existing ones
  5. Resources only billed as used

The Cloud at Your Service, Resenberg and Mateos, 2011, Manning Publications

Data Centres

Cloud providers have large data centres at locations around the world.

SaaS: Software as a Service

The web browser is your computer.

(Pay?) for resources used.

Is this different from managed servers?

Is this different from web applications?

Platform as a Service (PaaS)

Infrastructure as a Service (IaaS)

A lower lever set of infrastructure to choose from:

  • Services
  • Data storage
  • Load balancing
  • Content delivery networks
  • Message queues

Examples:

  • Amazon Web Services
  • Windows Azure

Revision

Preparing for the examination in June

2012-13 Paper

1. This question is about integration of enterprise systems and is based on an extension of the fish marketing system developed during the course.

a) Discuss what the terms interoperability and integration mean within the context of enterprise systems

Example extract from 2011-2012 paper

This question is about development technologies and enterprise application patterns applied to the HoBo Room Brokering application, developed during the module.

a. Describe and compare the facilities provided by .NET and by the Java Platform, Enterprise Edition 6 (Java EE 6) for supporting the implementation of each of the following features within the application.

i Registration of human users and authentication and authorisation for use of the various aspects of the application by a range of user types
  • Java
    • Registration
      • Realms (File, JDBC, LDAP, …)
      • Group of users
      • Facilities to manage/persist user data
    • Authorisation
      • Roles
    • Authentication
      • Configuration
      • Web server
  • .NET
    • Wizards/template (not many marks for such a galling answer)
    • ASP.NET Identity System
    • Authentication
      • Web.config (protected)
      • SQLServer (.mdf)
      • ActiveDirectory
      • OAuth
    • Authorisation
      • Roles
ii Concurrent use of the application by many users, each with their own server-side interaction context.
  • Java
    • Session beans
      • Distinguishing between request, session and application
    • Session Management?
  • .NET
    • Session Management
      • Session["myKey"] = ...
      • InProc
      • SQL Server (DB)
      • Session State Server
    • View State
iii Provision to client applications of a SOAP based search service that returns a set of descriptions of establishments with current availability, which meet given criteria.
  • Java
    • Annotations
    • WSDL generation of exposed class
    • Operations through @WebMethod on methods
    • @WebService on class
  • .NET
    • Attributes
    • Operations through [WebMethod...] on methods
    • [WebService...] on class
    • .asmx still exists, but older
    • .WCF Windows Communication Foundation
      • Interface
        • methods
      • Implementing class
      • Address Bound Contract (ABC)
    • WS* specifications
    • Generate proxy from WSDL (client)

Problems of mapping data types between languages.

Public properties/fields exposed for sending over a web service (attributes in .NET)

Raw data only.

Underlying standards and tool support.

SEM5720 - The Internet And How It Really Works

The Internet is a complex, multi-organisation network reaching nearly all parts of the world. The functioning of this network and the applications running upon it depend on a complex set of protocols. This module addresses the fundamental aspects of the most important issues that permit the network and its applications to operate successfully. The module also addresses the current threats to the Internet and topics still emerging from R&D studies around the world.

Postal service analogies: 4

Introduction

This module discusses the detailed underlying operation of the Internet and its constituent components and is an essential topic in its own right as well as providing a solid foundation for much of the other material covered in the MEng.

Learning Outcomes

  1. Participate in planning networks that are cost effective and realistic in terms of products and services currently available.
  2. Critically assess proposed networking solutions.
  3. Assess the effect of likely technological developments on existing network applications.
  4. Make decisions and provide guidance to others in the choice of appropriate communications technologies to deploy, to solve real world requirements.
  5. Demonstrate extensive knowledge of the internal operation of the Internet and its protocols.
  6. Demonstrate an appreciation of the problems that appear in the management of routing and naming in large networks.
  7. Exercise judgment in the choice of appropriate protocols and services to address the real needs of Internet operators and users.
  8. investigate, and propose solutions to problems of quality of service.
  9. Demonstrate an appreciation of the security issues that surround the Internet and its applications and how these can be mitigated.
  10. Explain the need for a new generation of the Internet and describe current progress towards it.

Assessment

  • Assignment 40% ??/??/2013 to ??/??/????
  • Exam 60%

Commitment

20 credits = 200 hours of work.

44 hours of lectures, around 20 hours of practicals.

This leaves about 140 hours of personal study, including extra practical work.

There is an assignment worth 40% of the marks (2000 word report).

Text book study and revision.

2 hour exam.

  • TCP/IP Protocol Suite - Foruzan B., Fegan S.
  • RFC or internet drafts available online from IEFT
  • Other textbooks and journals
  • The Internet Protocol Journal by Cisco Systems

Practicals

Practical work sessions focusing mainly on the electronics and hardware of network issues.

Practical 1

Using the computer connected digital oscilloscopes or picoscopes.

More DNS

nslookup minted.dcs.aber.ac.uk whilst capturing packets on port 53 captured 8 packets.

The type of the message was of type A and of class IN and was a recursive lookup. There was no response because the message was a query.

The server this was sent to was 193.60.11.253. In /etc/resolv.conf/ the nameserver: 127.0.1.1 c57net.aber.ac.uk.

1 question and 3 answers were held in the first answering packet. 4 authoritative records were received specifying the 2 authoritative nameservers from dcs.aber.ac.uk and 2 from yale.ac.uk.

The additional records provided all possible IP addresses of all nameservers contacted during the lookup to cache the information.

nslookup set type=MX aber.ac.uk was then captured. The answer record contained: Mail exchange: aber-ac-uk.mail.protection.outlook.com.

The authorative record contained nameservers for: aber.ac.uk (3) and yale.ac.uk (1).

nslookup set type=SOA dcs.aber.ac.uk was then captured.

The email of the person responsible for this domain is: cs-root.aber.ac.uk.

There are 4 authoritative nameservers for this domain: dcs.aber.ac.uk (2), aber.ac.uk (1) and yale.ac.uk (1).

nslookup aber-ac-uk.mail.protection.outlook.com. The response from this was from microsoft. 2 IP addresses which can switch (load balanced?).

nslookup 193.60.11.36 had a type of PTR for the name 36.11.60.193.in-addr.arpa. Still used the standard nameservers.

nslookup sillyname gave an authoritative and non-authoritative response (one authoritative for sillyname.c57net.dcs.aber.ac.uk and non-authoritative for sillyname). Uses two different sets of nameservers, the Aberystwyth local ones and the root nameserver.

dig @dns0.aber.ac.uk 98.34.124.144.in-addr.arpa PTR shows lots of packets, initially gathering DNS information, then on 144.124.34.98.

dig www.yahoo.com A shows that yahoo uses CNAME instead, with an IP address of 87.248.122.122.

Basic Issues in Data Communication

A revision of the basic issues in data communication.

Protocol Models and Frameworks

In the 1970s there was no master plan, overall structure nor agreements on application protocols.

Proprietary protocols and architectures lead to a large amount of anarchy.

ISO Committees

In 1977 ISO establishes committees and subcommittees and so on and so forth.

Not just ISO doing this, telecommunications (CCITT) also got involved.

OSI Reference Model - IS 7498

Provides a basic framework using a “divide and conquer” principle.

Uses layering to reduce complexity, where each layer handles one (group of) problem(s).

  1. Keep things simple.
  2. Choose boundaries at places that minimise interaction between adjacent layers.
  3. Functions of a different nature or purpose in different layers
  4. Similar function in same layer.
  5. Use all part knowledge and experience.
  6. Hide implementation within layers
  7. Special hardware/processors
  8. Data abstraction levels.
  9. Internal changes do not affect other layers
  10. Only create interfaces to directly surrounding layers (controversial).

OSI Reference Model Layers

  1. Physical - wires, radio frequency
  2. Data Link - direct link from one to another
  3. Network - global issues like addressing
  4. Transport - methods for ensuring quality of service.
  5. Session - availability of resources, “checkpoints”, etc.
  6. Presentation - Language/character set encoding
  7. Application - Not supposed to contain or control whole applications, just the useful parts to applications, e.g. directory service

Exercise

Discuss the statement: The existence of a communications framework like the OSI model promotes competition between companies.

For

  • Companies spend more time actually building products than on firmware to deal with communications.
  • Standardised method of communication allow products to be linked: more scope for different products.

Against

  • No competition to improve the network process.
  • No differentiation.

Local Area Networks

A detailed study of variants of the technologies collectively known as Ethernet.

Ethernet

A passive, contention-based broadcast technology that uses baseband signalling.

Passive

No one device controlling the network.

Contention-based

Each device must compete with every other device for access to the network.

Broadcast

Ever device on a shared network fears broadcast transmissions.

Baseband Signalling

Entire bandwidth of a cable for a single transmission.

802.3 CSMA/CD Bus (Ethernet)

Derived from work by:

  • Xerox
  • Intel
  • Digital (DEC)

Publised in Ethernet (DIS Blue Book) V1.0 September 30th 1980.

802.3 Revision D dated December 1982 publish by IEEE.

Revision F published July 1984 and also issued as draft proposal 8802/3 by ISO.

Operation: Carrier Sense Multiple Access with Collision Detection (CSMA/CD)

Distributed Control

  • Contend for use
  • Acquire access and send a data packet
  • No priority

Simple Algorithm:

1.
if(network active)
  goto 1
transmit(data)
check for collision
if(collision)
  transmit(JAM)
  delay(a random time)
  goto 1
transmission complete

Checking for collision is the difficult part. Can check:

  1. I’m transmitting a 1, is there a 1 on the wire.
  2. Do the signals on the wire look like the output of one station and one station only.

Delay must be random to avoid future collisions and not have priority.

Coaxial cable

Very rigorous specifications. Maximum length of 500m and limits on bend.

These are to stop the electric current being interfered with.

Hard limit on propagation: c (speed of light).

Transceiver (MAU) to link devices to co-ax conveniently.

Terminators absorb the signal at the ends of the cable.

Repeaters repeat the signal onto other cables, repeat a perfect version of the signal (if possible).

Broadcast Networks

If data is inserted into the network, it should reach every member on the network.

Need to know how long it will take for data to get across the network.

Rules tend to be the distance between two extremes in the network is 5 of the co-ax cables and 4 repeaters. 2 of the co-ax cables must be point to point without any devices attached.

Bridges, switches, hubs and routers help build larger networks.

Co-ac Medium Options 802.3

  • Original (10BASE5)
    • 10Mbps
    • Baseband coaxial cable
    • segment length 500m
    • Max 10 taps per segment
    • Max 4 repeaters
  • 10BASE2
    • 10Mbps
    • Thinner coaxial cable
    • 185m segments
    • Max 30 taps per segment.

Slot Time

Worst case:

B stars transmitting when As packet has nearly reached B.

B must corrupt, at least, the last bit of As packet.

Therefore, packet length must be such that time to transmit is greater than twice the transmission delay for longest route in the network.

For a 2.5km network, 10MBit/sec, the slot time is 51.2 micro seconds.

Smallest allowed packet needs to be the number of bits it takes to reach from one end of the network to the other, times 2, plus a little extra for luck.

512 bit packets capture the network for 10BASE5.

“Your bits are now all over the network” - Dave Price, 10/10/2013

Calculatng Slot Time

Slot time s.

Path time p.

s > 2p

  • 3 coax (with stations) = 21.65 bits
  • 2 links (no stations) = 25.65 bits
  • 4 repeater delays = 7.5 bits
  • Transceiver delays = 6.0 bits
  • Transceiver cable delays = 2.57 bits
  • etc.

Total about 499 bits worst case. Specified as 512 bits to give a safety margin.

Network JAMs

  • Repeaters required to propagate JAMs.
  • Must make sure all repeaters see JAMs.
  • Time delay of 500m coax about 2.165 micoseconds
  • 32 bits JAM

MAC Frame Format for 802.3

Number of OctetsField Usage
7Preamble
1Start of frame delimiter
6Destination Address
6Source Address
2Length in 802.3
1500LLC data and padding
4Frame Check Sequence
  • Preamble synchronises.

Ethernet Addresses

48 bits long. Designed to be globally unique.

Globally unique should be completely unique, not always though.

Not expected to be able to be changed, can now be programmed (simply, apparently; nmap --spoof-mac 0)

Frame details

Minimum frame size is 512 bits, not including the preamble.

Maximum frame size is 1518 octets (1500 octets of data).

Above assumes 48 bit addresses which IEEE 302.3 says “shall be used” for 10 Mbps networks.

9.6µs inter-frame gap to allow desynchronisation.

Time for frame is 51.2µs plus 6.4µs for preamble.

Collision Backoff and Retransmission

Transmitter tries to send for a maximum of 16 attempts to send a frame.

The transmitter waits for an integer multiple slot times determined by the following algorithm prior to each retry.

Rescheduling of each transmission uses “truncated binary exponential backoff”

This ensures a random but increasing delay if many collisions occur:

DelaySlots = rand[0 <= r < 2^k]

Where k = min(n, 10) for the nth retansmission

Probabilistic Characteristics of CSMA/CD

All access to the 802.3 LAN only completes with some probability. It is thus impossible to guarantee transfer rates. When the net is very busy collisions might go on forever.

Debates often take place on the usability of CSMA/CD LANs for real time use. The answer depends on the true use intended rather than the LAN.

Bridges

Partition LAN to segregate load, add reliability, add security.

Combine remote LAN segments into a single logical network.

Combine separately developed and controlled LANs.

IEEE 802 LANs often include bridges.

Repeaters clean and forward all data.

Bridges selectively forward data. It will store and forward complete packets.

Forwarding based on header information,

Sometimes known as MAC level relay.

Source Routing Bridges

Hosts discover the route to each other host and are very aware of the presence of multiple LANs couple by bridges.

Needs identifiers for bridges and LANs.

Data is transmitted with routing attached.

Bridges obey the routing.

Transparent Bridges

Bridge learns (or is told) the LAN on which each address exists. Hosts need not to know anything about the location of other hosts and indeed are not even aware of the presence of the bridges.

Frames which arrive are handled in one of 3 different ways:

  1. Same LAN
  2. Different LAN
  3. Unknown destination.
  4. Broadcast address.
Same LAN

If destination address on same LAN as source address then discard the packet

Differnet LAN

If destination address on different LAN, forward packet.

Unknown Destination

If location of destination address is not known then ‘flood’.

Broadcast Address

‘flood’.

Flood

Forward the packet to every other network attached to the bridge.

Address Learning

Bridges listen to the traffic and can learn where machines where by looking at the source addresses in the packets.

Address Tables

Need to be quite large depending on the network.

Generally there is a time-out associated with the addresses known.

Bridge Cycles

Address Learning works well if there are no alternative routes in the internetwork connections (i.e. a tree structure).

However there often are alternative routes, so bridges might cause loops.

Solution to Bridge Cycles

Need a protocol to avoid cycles.

Result for graph theory states:

For any connected graph, there is a spanning tree of edges which maintains the connectivity but contains no closed loop.

Each LAN represents a graph node and each bridge corresponds to an edge.

Spanning Tree
  1. Bridges have numbers
  2. Broadcast number every few seconds
  3. One bridge becomes the root bridge
  4. Bridges discover route to root via root port.
  5. Routes might have costs.
  6. A designated bridge is determined for each LAN (minimum cost of path to root).
  7. Only the designated bridge can forward to and from its LAN.
  8. Bridges communicate with a Bridge Protocol Data Unit (BPDU) consisting of:
    • The originating bridge number
    • The number of the bridge thought to be the root
    • The path cost to root.

Initially every bridge thinks it is the root and it broadcasts a BPDU to assert this fact.

If a bridge gets a BPDU indicated a ‘superior’ bridge exists it assigns its root port and the path cost to root.

If a bridge gets a BPDU from a bridge with a shorter root path it releases any claim to be the designated bridge for the segment.

The lowest numbered bridge becomes the root.

Bridge ports which are not root or designated ports are blocks.

Local Bridge

Connect two (or more) adjacent LANs. Throughput is likely to be high. Hosts not likely to notice much performance degradation unless waiting for each packet to be acknowledged.

Remote Bridge

Connect two (or more) LANs which are widely separated. Bridge consists of two ‘half bridges’ connected by a WAN type link. Link typically 64Kbps to 2Mbps.

Managed Bridges

Bridges often available in a managed form. These are managed from a station and can load the bridges with lots of things.

Lots of monitoring, etc.

Another protocol needed and this needs to be standardised.

Other Network Technologies

A brief look at fast and wireless network technologies.

Alternative LAN Technologies

  • 802.4 - Token Bus
  • 802.5 - Token Ring

Standards differ at the physical layer

Differ at MAC sublayer of Data Link Layer.

Compatible at the LLC sublayer of the Data Link Layer.

Token Bus

Uses a bus physical topology and ring logical topology. The physical order of nodes does not matter.

Coaxial cable served as a common communication bus.

Token was created bus the token bus protocol to manage access to the bus

Any station that holds the token packet has permission to transmit data.

The station releases the token when it is done communication or when a higher priority device needs to transmit.

Two major problems:

  1. Bus failure leads to a network failure.
  2. Adding a new node is not simple.

Token Ring

Similar to token bus, but in a ring physical ring topology.

Messages are transferred in one direction along the ring at all times.

Token Ring networks sequentially pass a token to each connected device.

When the token arrives at a particular node, the recipient is allowed to transmit data onto the network.

Since only one device may be transmitting at any given time, no data collisions occur.

Access to the network is guaranteed

Time-sensitive applications can be supported.

Still used for some real-time applications.

Based initially for the mainframe infrastructure.

Problems:

  1. Costly
  2. Complex.

Quick notes on Ethernet

A quick summary of 802.3

Why Ethernet is used

  • Inexpensive
  • Easy to install, maintain, troubleshoot and expand
  • Widely accepted industry standard.
  • Structured to allow compatibility with Network Operating Systems.
  • Reliable

Ethernet Versions

  1. Ethernet 10 BASE-T/F 802.3 (10 Mbps)
  2. Fast Ethernet 100 BASE-T/F 802.3u (100 Mbps)
  3. Gigabit Ethernet 1000 BASE-T/F 802.3z (1000 Mbps)
  4. 10 Gigabit Ethernet 10 GBASE-F 802.3ae (10 Gbps)

Coding Convention

  • Network speed
  • Baseband Signalling
  • 2 or 5 stands for coaxial cable and length
  • T stands for Twisted Pair
  • F/FL/S/L Stands for Fiber-Optic
  • X stands for full duplex.

Ethernet Components

  • CSMA/CD (No protocol for 10 GigE) as 10 GigE is completely full duplex so it isn’t needed.
  • Baseband Signalling.
  • Ethernet cables.
  • Ethernet card or adapter.
  • Ethernet devices.

Ethernet Cables: Twisted Pair (CAT5/CAT5e)

  • CAT5: Ethernet and Fast Ethernet.
  • CAT5e: Gigabit Ethernet.

Contains four pairs of copper wire.

Cable runs are limited to a maximum run length of 100m.

Operates at 100MHz.

CAT5 comes in two main varieties, solid and stranded.

Fast Ethernet communications only utilise 2 out of 4 pairs.

CAT5 enhanced (CAT5e) uses all 4 pairs to support GigE over short distances.

CAT5e is backwards compatible with CAT5.

CAT6 (250MHz) and CAT7 (600MHz).

Signalling

Baseband is a single transmission involved (digital signal)

Broadband can handle multiple transmissions with a modulated analogue signal.

Baseband Signalling

Network uses all available signal frequencies or the entire bandwidth.

One signal can be transmitted at a time.

Light or electrical pulses based transmission.

Related to digital technologies, but can be used for analogue technologies.

Bi-directional communication (Half duplex) is possible, but difficult.

Multiplexing (using Time Division Multiplexing) allows transfer of multiple signals on a single medium.

Broadband Signalling

Network uses only one frequency or a part of the entire bandwidth.

Multiple signals can be transmitted at a time.

Electromagnetic or optical waves transmission.

Related to analogue technologies but can be used for digital technologies.

Bi-directional communication is possible and fairly easy.

Multiplexing (using Frequency Division Multiplexing) allows transfers of multiple signals on a single medium.

Multiplexing and Demultiplexing

Putting more than one signal on a wire at once and getting it back again at the other end.

Time Division Multiplexing

Gives the illusion of multiplexing.

Each signal is allowed a slot of time to transmit for.

Frequency Division Multiplexing

Actual multiplexing.

Each signal uses a single frequency channel to transmit.

Fibre Optic

Supports all Ethernet versions above 10BASE5.

Carries high-bandwidth data through beams of light carrying electromagnetic signals. Not so affected by noise. Can carry signals up to approximately 70 miles without amplification.

Light can be pulsed in a single wavelength to represent 45 Gigabits of information.

Single-Mode Fibre Optic

Transmits one signal per fibre.

Diameter: 8.3 to 10 microns that has one mode of transmission.

Speed and distance: up to 40Gbps over up to 100km

Used in telephones and cable TV applications.

Multi-Mode Fibre Optic

Transmits many signals per fibre.

Multi-modes result from the fact that light will only propergate in the fibre core at discrete angles within the cone of acceptance.

Diameter: 50 to 100 microns.

Speed and distance: 10Mbps to 1 Gbps over 275m to 2km.

Used for Computer Networks and LAN Applications.

Network Adapters

Adapters for connecting to ethernets.

Ethernet cards, etc.

10/100 Ethernet

Connect computer using a PCI, PCIe or ISA moethboard interface slot, CAT5/6

Gigabit Ethernet

PCIe slots, CAT5/5e/6/7 and Fibre Optic cables.

Fibre Optics

For tier 1 and 2 internet backbones, External device, Fibre Optic cables.

Slightly different, can use not only for network but for the backbone (a sophisticated network).

Wireless Network Interface Cards

Internal or external device

Wireless Dongles

External device

Ethernet Devices

Note that most devices are completely tied to a specific function, some repeaters might have some Data Link Layer functions depending on the manufacturer.

Repeater

Repeat an electrical signal including noise.

Takes an incoming signal and then generates a new, clean copy of that exact signal.

Overcomes signal attenuation problem.

Allows LANs to extend beyond normal distance limitations.

Physical layer device.

Doesn’t read data frames, just repeat the signal unintelligently.

Hub

Multiport repeater.

Broadcasts frame to all ports and devices.

Allows users to share Ethernet for transmission of data onto a single network (collision domain).

Physical Layer device.

Doesn’t read data frames.

Repeater is usually used for the extension of the length of a network, while the hub is a simple connectivity gadget that is used to broaden a network.

Bridge

Connect two LAN segments of similar or dissimilar types such as Ethernet and Token Ring.

Split a networking into separate collision domains.

Map the Ethernet addresses of the nodes residing on each network segment and allow only the necessary traffic to pass through the bridge.

Do not forward bad or misaligned packets.

Works on the Physical and lower Data Link (MAC) Layers

Switch

Expansion of the Bridge.

Used in heavily loaded networks to isolate data flow and improve performance (parallelism).

Up to on-half of the computers connected to a switch can send data at the same time.

Data Link and Network Layer device.

Cut-through Switches

Are faster because they examined the packet destination address only before forwarding it on to its destination segment.

Store-and-forward Switches

Work like bridges in that they accept and analyse the packet before forwarding it to its destination.

Takes more time to examine the whole packet, although it does allow the switch to catch certain packet errors and keep them from propagating through the network.

Ethernet Encoding

Methods for encoding bits in different Ethernet technologies.

Manchester Coding

Used in 10-Base*

Bipolar scheme, synchronisation via a clock.

Fast Ethernet 802.3u

CSMA/CD and Full Duplex

Backwards compatible.

Physical layer structure (PHY) has been redesigned.

Complex signal encoding mechanisms than the simple Manchester code.

Uses three sublayers.

Convergence Sublayer

Introduces the concept of auto-negotiation; negotiates between two different medium to make the properties similar to for communication

Media Independent Interface

Interfaces the physical medium used so that it can be handled generically.

Media Dependent Sublayer

Deals with encoding, transmission in a way specific to the medium.

Gigabit Ethernet 802.3z

Two distinctive approaches for medium access:

  1. Half Duplex
  2. Full Duplex

Most follow the full-duplex approach.

Star topology.

Support standard Ethernet frame format.

Uses 802.3x flow control.

Backwards compatible.

Physical layer is more complex

Gigabit Media Independent Interface

Physical Layer

Media Dependent Interface

Medium

Encoding

Digital to digital encoding schemes.

Unipolar Encoding

Only use one pole (+ or -). One pole is used to represent 1 or 0, zero is used to represent the other.

Two problems:

  1. DC Component
  2. Synchronisation

DC cannot be transmitted properly through an AC line without noise.

The receiver cannot synchronise its clock to the sender.

Polar Encoding

Uses both poles to represent bits.

Non-Return to Zero (NRZ)
NRZ-Level

1 as positive voltage, 0 as negative voltage.

NRZ-Inverse

Voltage inverts on a 1.

Return to Zero (RZ)
  • Positive to 0 is 1.
  • Negative to 0 is 0

Biphase
Manchester

Differential Manchester

Bipolar Encoding

Alternating Mark Inversion
Bipolar 8 Zero Subsitution
High Density Bipolar 3

Broadband

Digital Subscriber Line

Uses local telephone lines.

Allows simultaneous void and data transmission.

Asymmetric Digital Subscriber Line (ADSL)

Designed to deliver more bandwidth down than up.

Rates range from 1.5 to 9 Mbps down, 16 to 640kbps up.

Distances of 18,000 feet over a single copper twisted pair.

A splitter is needed.

Very high bit-rate Digital Subscriber Line (VDSL)

VDSL is designed to provide higher bandwidths.

Up to 52Mbps down, 16Mbps up.

VDSL 2

Support the wide deployment og triple play services.

100 Mbps down, 20 Mbps up

150 to 500m distance.

500 Mbps transmission rates over copper cabling by using crosstalk cancellation or “vectorized” VDSL2 based modems.

BT Infinity

Now rolling out ultrafast fibre optic broadband with speeds of up to 100Mb.

Next step up from FTCC is FTTh (Fibre to the Home).

Negative factor is the costs.

Infinity-1

Top speeds are up to 40Mbps down and 10Mbps up

Infinity-2

Topspeeds are up to 76Mbps down and 19Mbps up.

Cable

Wireless LAN (802.11)

Station (STA)

Access Point (AP)

Access points act as a central transmitter and receiver of WLAN radio signals.

Basic Service Set (BSS)

A set of STAs that communicate with each other.

When two or more stations communicate together they form a BSS.

Infrastructure Mode

All wireless clients are connected to an AP.

Generally the default mode for the 802.11b cards.

All mobile STAs communicate with each other via the AP.

Network consumes double bandwidth for one communication.

AP provide buffering of traffic for a mobile broadcast.

Ad-hoc Mode

Wireless clients are connected without an AP.

An ad-hoc network is where stations only communicate peer-to-peer (P2P).

There is no base and no one gives permission to talk.

Mostly these networks are set up spontaneously.

Distributed System (DS)

Two or more BSSs are interconnected using a distribution system.

APs communicate via the DS.

The DS is the backbone of the WLAN and may be constructed of either wired or wireless networks.

The DS is also a thin layer in each AP.

That determines if communications received from the BSS are to be relayed back to a destination in the BSS, forwarded on to the DS to another AP or sent into the wired network infrastructure to a destination not in the ESS

Extended Service Set (ESS)

An ESS is a BSS where APs communicate amongst themselves from one BSS to another..

Entire network looks like an interdependent BSS to the Logical Link Control (LLC).

This means stations can communicate or even move between BSSs transparently to the LLC.

Portal

The logical point at which medium access control (MAC) service data units (MSDUs) from a non-IEEE 802.11 local are network (LAN) enter the distribution system (DS) of an extended service set (ESS).

WLAN Frame

MSDU and AMSDU

Multiple MSDUs are aggregated at the MAC layer and are pushed into a single MPDU.

They have a single frame header with multiple frames and they are destined for the same client and the same service class.

Mac Service Data Unit (MSDU)

It is the MAC service data unit. This is the unit o transmission used at the MAC layer which is received from the upper layer.

Aggregated Mac Service Data Unit

Aggregation of the MSDU directly performed at the MAC layer is called AMSDU.

Such AMSDUs are now passed to the lower PHY later where they are dealt with as MPDUs

MPDU and AMPDU

Mac Protocol Data Unit (MPDU)

These are the frames passed from the MAC layers into the PHY later.

Aggregated Mac Protocol Data Unit

These are the aggregated MPDU units which are pushed into a single Physical Protocol Data Unit (PPDU).

DS Services

While the implementation for the DS is not specified, 802.11 does specify the services which the DS must support.

Station Mobility
No Transition

If a station is not moving within its own BSS or it is not moving

BSS Transition

If a station moves between BSSs within the same ESS

ESS Transition

If the station moves between BSSs of differing ESS.

Station Services (SS)
Authentication

With a wireless system, the medium is not exactly bounded as with a wired system.

In order to control access to the network, stations must first establish their identity.

The authentication relationship may be between two stations inside an IBSS or to the AP of the BSS.

Authentication outside of the BSS does not take place.

Two types of authentication:

  1. Open System Authentication
  2. Shared Key Authentication
Deauthentication

When either the station or AP wishes to terminate a stations authentication.

Privacy

An encryption algorithm, which is used so that other 802.11 users cannot eavesdrop on your LAN traffic.

Distribution Station Services (DSS)
Association

A station must affiliate itself with the BSS infrastructure if it wants to use the LAN.

This is done by Associating itself with an AP. Associations are dynamic in nature because stations can move, turn on or turn off.

A station can only be associated with one AP.

This ensures that the DS always knowns where the station is.

Association supports no-transition mobility, but is not enough to support BSS transition.

Reassociation

The service allows the station to switch its association from one AP to another.

Both association and reassociation are initiated by the station.

Disassociation

When the association between the station and the AP is terminated.

Can be initiated by either party.

A disassociated station cannot send or receive data.

ESS transition is not supported.

A station can move to a new ESS but will have to reinitiate connections

Distribution

Getting data from the sender to the receiver.

The message is sent to the local AP (input AP) then distributed via the DS to the AP (output AP) that the recipient is associated with.

If the sender and receiver are on the same BSS the input and output APs are the same.

The distribution service is not logically invoked whether the data is going through the DS or not.

Integration

Where the output AP is a portal. Thus 802.X LANs are integrated into the 802.11 DS.

Physical Layer

Spread Spectrum

The three physical layers originally defined in 802.11 included two spread-spectrum radio techniques and a diffuse infrared specification.

Spread spectrum is a communication technique that spreads a narrowband communication signal over a wide range of frequencies for transmission then de-spreads it into the original data bandwidth at the receive.

Spread spectrum increases the bandwidth of the signal compared to narrow band by spreading the signal.

Techniques

The Code of Federal Regulations (CFR) Part 15 originally only described two spread spectrum techniques to be used in the licensed free Industrial, Scientific, Medical (ISM) band, 2.4 GHz, thus 802.11 and 802.11b.

  • Frequency Hopping Spread Spectrum (FHSS)
  • Direct Sequence spread Spectrum (DSSS)

Orthogonal Frequency Division Multiplexing (OFDM) was not covered by the CFR and would have required licensing.

802.11a, employing OFDM, was created to work in the 5GHz.

Frequency Hopping Spread Spectrum (FHSS)

FHSS spreads the signal by hopping from one frequency to another across a bandwidth of 83 MHz.

The data is spread over 83 MHz in the 2.4 GHz ISM band.

A short burst of data is sent on one frequency (usually less than half a second).

Then the sender changes to another pseudorandom frequency and broadcasts another burst of data before changing to another frequency, and so on.

The carrier then repeats this pattern.

Direct Sequence Spread Spectrum (DSSS)

DSSS spreads the signal by adding redundant bits to the signal prior to transmission.

The signal is divided into many different parts and sent on different frequencies simultaneously.
Spectrum is divided into 13/14 channels.

However, the FCC specifies only 11 channels for non-licensed (ISM band) use in the US.

Narrowband and Wideband

In communications, band is referred to as the range of frequencies (bandwidth) used in the channel.

Depending on the size of the band (in terms of kHz, MHz or GHz) and some other properties of the communication channel, they can be categorized as narrowband and wideband etc.

Narrowband uses a smaller frequency range (bandwidth)- 300–3400 Hz.

Wideband is a relative term, and the size of the band may be in kHz, MHz or GHz depending on the application.

Narrow Band vs. Spread Spectrum
Narrow Band

Uses only enough frequency spectrum to carry the signal

  • High peak power
  • Easily jammed
  • Easy to detect
  • Easy to intercept
Spread Spectrum

The bandwidth is much wider than required to send to the signal

  • Low peak power
  • Difficult to jam
  • Hard to detect
  • Hard to intercept

Wireless Mobile Communication/Cellular Telephony.

Utilises lower frequency radio spectrum.

Mobile RF Spectrum

Typically delivered over a wide range of radio frequency spectrum bands (e.g. 900MHz, 1800MHz, 2.6GHz, etc.).

Most of which can also reach indoors to some degree.

Architecture of the GSM Network

Base Station Subsystem (BSS)

Composed of one or more BSC.

BSS is the point where all radio transmission related functions are performed.

BTS and BSC connected through the Abis interface.

BSS connected to the MSC through the A interface.

Base Station Controller (BSC)

The management of several BTS is done by the BSC.

It also provides all the control functions and physical links amoung the different BTS and between the mobile switching centre (MSC) and the BTSs.

Being a high-capacity switch, it provides functions such as cell configuration data, control of radio frequency power levels in BTS, frequency hopping, and handovers.

One MSC serves a number of BSCs.

Base Transceiver Station (BTS)

It is a station or site where antennas and radio transmitters and receivers are placed to create a radio coverage area in the mobile network.

Contains one or more transceivers (TRC) and antennas.

The cell site has a 360 degree omni-directional (omni-sector) antenna that is turned to create a cellular area of a specific size.

Omni-sector means the same frequencies are used in all directions.

Communication from the mobile terminal to the cell site is referred to as uplink.

Cell site to mobile terminal is downlink

Mobile Station (MS)

Mobile phone with ME and SIM.

It consists of the mobile equipment (uniquely identified by International Mobile Equipment Identity (IMEI)

The SIM card contains the International Mobile Subscriber Identity (IMSI) used to identify the subscriber to the system, a secret key for authentication and other information.

The IMEI and the IMSI are independent, thereby allowing personal mobility.

The SIM card may be protected against unauthorised by a password or PIN.

Network Switching Subsystem

Contains the network elements MSC, LR, VLR, EIR, AuC and GMSC.

Mobile Switching Centre (MSC)

Primary service delivery node for GSM and central components of NSS.

It provides all the functionality needed to handle a mobile subscriber, such as:

  • Registraion
  • Authentication
  • Sets up and releases the end-to-end connection
  • Location Updating
  • Handovers
  • Cell routing and roaming subscribers

HLR and VLR, together with the MSC, provide the call-routing and roaming capabilities of GSM.

It takes care of charging and real time pre-paid account monitoring.

These services are provided in conjunction with several functional entities, which together form the Network Switching Subsystem (NSS).

The MSC provides the connection to the fixed networks (such as the PSTN or ISDN).

Home Location Register (HLR)

The database that contains a subscription record for each subscriber of the GSM network.

All the administrative information related to each subscriber registered in the respective communication network, including the current location of the subscriber, is contained in the HLR.

The HLR is responsible for the sending of subscription data to the VLR (during registration)
or GMSC (during mobile terminating call handling).

The location of the mobile is typically in the form of the signalling address of the VLR associated with the mobile station.

A GSM subscriber is normally associated with one particular HLR.

There is logically one HLR per GSM network, although it may be implemented as a distributed database.

Visitor Location Register (VLR)

The database that contains subscriber data for subscribers registered in a MSC.

It contains all the temporary information about the subscribers.

This information is needed by the MSC to service the visiting subscribers.

Every MSC contains a VLR. Although MSC and VLR are individually addressable.

They are always contained in one integrated node.

Equipment Identity Register (EIR)

A database of all valid mobile equipment on the network.

Where each mobile station is identified by its International Mobile Equipment Identity (IMEI).

An IMEI is marked as invalid if it has been reported stolen or is not type approved.

Authentication Centre (AuC)

A protected database that stores a copy of the secret key stored in each subscriber’s SIM card which is used for authentication and encryption.

Gateway MSC (GMSC)

Switching entity that controls mobile terminating calls.

When a call is estabished towards a GSM subscriber, a GMSC contacts the HLR of that subscriber, to obtain the address of the MSC where that subscriber is currently registered.

That MSC address is used to route the call to that subsciber

GSM Interfaces
A Interface

The connection between MSC and BSC.

Abis Interface

The connection between BSC and BTS.

D interface

The connection between MSC and HLR.

Um interface

The radio connection between MS and BTS.

Cell

Cellular telephony derives its name from the partition of geographical area into small cells.

A cell is roughly circular (exactly hexagonal) area with a central transmitter and receiver base station.

The size and shape of each cell is determined by the features of the surrounding area, such as buildings, trees and hills, which can block signals.

In a city, there are many small cells, while rural areas may have very large cells.

Sector

Site coverage (cell) is partitioned into different directions called sectors.

Each cell is usually split into tree sectors. Which overlap with other sectors of neighbouring cells so network is uninterrupted.

Switched Communications Networks

Long distance transmission between stations (called end devices) is typically done over a network of switching nodes.

Switching nodes do not concern with content of data.

Their purpose is to provide a switching facility that will move the data from node to node until they reach their destination.

A collection of nodes and connections forms a communications network.

In a switched communications network, data entering the network from a station is routed to the destination by being switched from node to node.

Circuit Switching Network

A dedicated radio channel is allocated to a single transmission

As long as data transmissions are long and continuous (file transfers) a circuit is used efficiently.

However, most data transmissions are bursty, and dedicating an entire circuit to them is usually a waste of valuable wireless bandwidth.

During idle periods when no data is being sent, bandwidth is still dedicated to the user and not available for others to use.

Packet Switching Network

An entire network may be designed just for packet data.

Packets do not have a dedicated path, it is decided by the routers.

Most old wireless data systems offered minimal data rates, usually in the 10Kbps range. However new wireless protocols bond multiple channels to increase data rates.

Multiple Access (Multiplexing) Protocol

Mobile development is facing the major problem to find protocols that could be used to maximise bandwidth efficiently.

Multiple access enables …

Frequency Division Multiple Access (FDMA)
Time Division Multiple Access (TDMA)
Code Division Multiple Access (CDMA)

Each user is assigned a different psuedorandom binary sequence that modulates the carrier, spreading the spectrum of the waveform, giving each user a unique code pattern.

Cellular Generations

1G
2G
2.5G
2.75G
3G
3.5G
3.75G/Pre-4G
4G

Standards

The ISO OSI model.

IEEE 802 Standards

IEEE 802

Encapsulation at each level.

IEEE 802 Standards

More hardware-based layers.

Data Link layer could be divided into two sub-layers:

  1. Logical Link Control (LLC)
  2. Media Access Control (MAC)

LLC is more software orientated and interacts with higher layers.

MAC is more hardware orientated and interacts with lower layer.

(R)ARP protocol links hardware addresses to logical addresses at the Network layer.

Physical Layer

Defines:

  • Physical and electrical properties of the media
  • Mechanical properties of the connectors
  • Bit representation by the signals (encoding)
  • Transmission rate
  • Physical topology
  • Transmission mode

Defines:

  • Encoding bits into packets prior to transmission
  • Decoding the packets back into bits at the destination
  • Flow control
  • Error control (Automatic repeat request (ARQ))
  • Access control
  • Hardware addressing
  • Defining physical layer standards
  • Frames the Network layer packet
  • Identifies the Network layer protocol.

Provides multiplexing and demultiplexing to and from the network layer.

Manages flow control and error control.

Media Access Control
  • Addresses the frame
  • Marks the beginning and ending of the frame

Managed access control.

Related to the MAC address.

LAN Technologies

  • Ethernet
  • Fast Ethernet
  • Gig Ethernet
  • 10 Gig Ethernet
  • WiFi
  • FDDI
  • Token Ring
  • ATM LANE

WAN Technologies

  • X.25
  • Frame Relay
  • T-carrier
  • ISDN
  • DSL Technologies
  • SONET/SDH
  • ATM

OSI Layers

Protocol Data Unit (PDU) is a term to define any one of the following:

  • Data (at Application, Presentation and Session layers)
  • Segment (at Transport layer)
  • Packet (at Network layer)
  • Frame (at Data Link Layer)
  • Bits (at Physical Layer)

Each layer, with the exception of the physical layer, adds its own header. This is done because each layer has its own specific function.

Unicast Network Level Protocols

Unicast Network Level Protocols in use in today’s Internet. Including further study of protocols such as IPv4, ICMP, ARP, RARP used in unicast applications and IPv4 and IGMP used in multicast applications.

Internet Protocol version 4 (IPv4)

IP Headers

Organised in octlets as bytes didn’t used to be just 8 bits long.

IPv4 designed for 32 bits.

IPv4 Header from wikipedia

Version indicates the version of the IP protocol

Time to live stops network loops.

Header checksum to check the integrity of the header but not the data. Very basic checksum which is used by the router to ensure the packet has not be damaged.

This does add significant load to the router, not only this, but due to TTL it has to create a new checkum without much gain. There are better checksums to be used and they do not appear in IPv6.

The IP Address

Tied to an interface, not an actual machine.

Common to have multiple interfaces (eth0, wlan0, etc.)

Can have multiple addresses on a single interface.

Classes of Network

5 classes.

  • Class A - 0, 7 bits netid, 24 bits hostid many host.
  • Class B - 10, 14 bits netid, 16 bits hostid
  • Class C - 110, 21 bits netid, 8 bits hostid many network.
  • Class D
  • Class E - intended for future use

Class Es will never be used because of Microsoft and lack of value.

Special Addresses

  • X.X.255.255 = Broadcast Address
  • X.X.0.0 = Network
  • 255.255.255.255 = Limited broadcast address, used in auto-configuration (e.g. DHCP)

Network masks

IF you own the address:

10.0.0.0

But want to share the address so that:

10.1.0.0 and 10.2.0.0 are different networks.

10.0.0.0 is a class A address. therefore is is actually: 10.0.0.0/8.

So if we change the networks to: 10.1.0.0/16 the network id will be expanded to the first 16 bits.

Loopback Interface

Most TCP implementations have a loopback interface with the IP address 127.0.0.1 (127.X) and name localhost (or anything else in /etc/hosts which specifies 127.X as an address).

The localhost behaves as a separate data link interface,

A packet that is sent to the loopback interface moves down the protocol stack and is returned back by the driver software for the localhost “device”.

Used for debugging.

Loopback is done in software, never appears on the network.

Subnets: An example

Take the university address:

144.124.76.0

Say we want a network per department, can’t use a 24 bit address as that doesn’t leave enough addresses.

So they can use a 22 bit network id.

144.124.76.30/22 or 144.124.76.30/255.255.252.0

Apply the mask 255.255.252.0 to the IP address to get the host id. Invert for the host address.

Network address is: 144.124.76.0

But what’s the broadcast address?

144.124.79.255

(Look at the bits for this)

Here, 255.255.252.0 is the subnet mask.

Used to be allowed to have subnet masks like 255.0.255.255 as it doesn’t add anything.

Classless Interdomain Routing (CIDR) - Supernetting

Treat two contiguous class C networks as a single network.

This eases routing (supernets). For example supernet on geographic locations to make routing tables easier at a router level.

Usual class C is /24, we make them a /23 if we have to contiguous networks

This is the answer to the 192 soup.

Non-contiguous versions of this are disallowed. Must choose numbers for which the maths works.

Private Address Space

There are some special IP addresses

  • e.g. loopback

RFC 1918 written when started IP addresses to reserve certain IP addresses:

These can be used for private addressing schemes, but not routable on the internet.

  • 10.0.0.0 - 10.255.255.255 (10/8)
  • 172.16.0.0 - 172.31.255.255 (172.16/12)
  • 192.168.0.0 - 192.168.255.255 (192.168/16)

Network Address Translation

Router has a pool of public IP addresses, when a private IP address attempts to access an external resource. The router maps the private IP address to a public IP address which accesses the resource and returns. The public IP address is then translated back to the private IP address and routed.

In industry the pool may be a class C network. In home networks there is only a pool of one. They also do port address translation.

Address Resolution Protocol (ARP)

IPv4 has the problem that we know our IP address and the ones which we want to talk to, either router or machines on the link.

Very first header is the Ethernet header, need this because every machine would need to look at the IP header, which is a process which needs to be done in software. Other network protocols exist too.

Ethernet headers can be processed in hardware.

Ethernet headers don’t contain much, but the machines can automate the process of deciding to ignore the packet in firmware not by the CPU.

There is a problem of mapping IPv4 address to Ethernet address. There’s no direct link between IP addresses and Ethernet addresses so we need more at the data link layer.

ARP Packets

ARP Packet

ARP is cached for a limited amount of time.

If no reply, retransmit after stand-off.

Hardware Type

Which type of hardware are we using

Protocol Type

ARP can be used with other protocols (other than IP).

Hardware Size & Protocol Size

Size of the addresses (hardware and protocol).

Doesn’t map just MAC addresses, etc.

Sender Hardware/IP Address

Hardware address is repeated as the Ethernet header is harder to access in software.

Operation

Code to differentiate request/reply.

  1. ARP Request
  2. ARP Reply
  3. RARP Request
  4. RARP Reply

RARP - Reverse ARP (looking up IP from MAC). Can be used to discover the machine’s own IP address, for example.

Used before DHCP.

Target Hardware Address

If unknown to the sender, filled with 0s, otherwise the actual address

Target IP Address

The address being looked for

ARP Delay

First packet:

  1. ARP Request.
  2. Processing Time
  3. ARP Reply
  4. ICMP Echo Request
  5. ICMP Echo Reply

Following packets:

  1. ICMP Echo Request
  2. ICMP Echo Reply

Fooling ARP

Locate an interface in promiscuous mode, ARP request not broadcast, a dummy address is used.

Some TCP/IP stacks pass the ARP request up the line.

A reply indicates a machine is in promiscuous mode.

neped

Gratuitous ARP

A host sends a request for its own MAC.

The sender’s IP and MAC address are broadcast, and other hosts will insert this mapping into their ARP tables.

Useful to detect duplicate IP addresses (should be no reply).

Causes other hosts to update their ARP cache (useful if the network adapter has been changed or if this is a hot spare).

Why switched networks are not safe

Man-in-the-Middle attacks.

ARP allows for MITMA as the IP and MAC are learned by the switch. Sending out a false ARP reply last will allow you to to spoof the IP address to your MAC address.

Proxy ARP

Same problems with MITMA

MAC Addresses (IEEE 802)

MAC Address

  • Gotcha: each Octet of MAC frame transmitted low order first (back-to-front).
  • I/G - Individual (0), Group (1) - Unicast or Multicast
  • U/L - Universal (0), Local (1)

Internet Control Message Protocol (ICMP)

The Internet Control Message Protocol (ICMP) is the protocol used for error and control messages in the Internet

ICMP provides an error reporting mechanism of routers to the sources.

All ICMP packets are encapsulated as IP datagrams.

ICMP Message Types

ICMP messages are either query messages or error messages.

  • ICMP query messages:
    • Echo request / Echo reply
    • Router advertisement / Router solicitation
    • Timestamp request / Timestamp reply
    • Address mask request / Address mask reply
  • ICMP error messages:
    • Host unreachable
    • Source quench
    • Time exceeded
    • Parameter problem

Message Codes

Specific to message type.

E.g. for message type 3 (destination unreachable):

  • 0 = network unreachable
  • 1 = host unreachable
  • 3 = port unreachable
  • 4 = fragmentation needed but don’t fragment bit is set
  • Etc

ICMP Error Codes

Each ICMP error message contains the header and at least the first 8 bytes of the IP datagram payload that triggered the error message.

To avoid too many ICMP messages, ICMP error messages are not sent

  • for multiple fragments of the same IP datagrams
  • in response to an error message
  • in response to a broadcast packet

ICMP Codes

Echo Request/Reply Message Format

Identifier is set to process the ID of querying process.

Sequence numbers are created for each new echo request.

Port unreachable error

Reply telling that a port is not available.

Network scanners may make use of this, e.g. NMAP.

Many hosts will not reply.

Traceroute

Sends out ICMP ping messages with increasing TTL starting at 1. For each host that isn’t the target, an ICMP time exceeded message is sent back.

On UNIX uses the port unreachable message instead (using a UDP packet) by default.

Fragmentation

If a packet needs to be split then a flag is set to say “more fragments”. Identification is the same through the fragments. More fragments flag is not set for the last fragment, but is known to be a fragment due to the fragmentation offset.

If a part of a fragment is missing, after an amount of time packet loss is assumed.

Occurs if MTU < datagram size to be sent out on an interface, “Don’t fragment” flag allowed: causes ICMP “Destination unreachable: fragmentation needed but don’t fragment bit set”.

Reassembly takes place at destination.

Fragments may be fragmented.

Experience shows that fragmentation is best avoided.

Avoiding Fragmentation

  • Never send a datagram which is greater than the minimum MTU maximum size it can be sent wholly.
  • Path MTU discovery to make sure you never send a datagram which is grater than the smallest MTU maximum size.

Path MTU Discovery

Intermediate routers may have lower MTUs.

Uses ICMP uncreachable Error: Fragmentation Required

ping -s n -M do:

  • Set packet size and “do not fragment” bi
  • Returns “message too long” is s > MTU.
  • RFC 1191 has likely values

traceroute.pmtu

  • Version of tracerout that automates discovery.

Firewalls can break PMTU discovery.

Sock Traffic Generator

Sock is a test program which can be run as a client or as a server.

Ping

TTL doesn’t always start at 255, might be around 64 but not standardised.

Ping through a router

Routers have many different interfaces. This means, with traceroute, only the nearside interface can be found.

There is a record route option which can find this: ping -R

Unicast Routing in the Internet

Example routing problems. Interior and exterior routing protocols. Protocols covered will include RIP, OSPF and BGP.

IP Routing

Split into two parts:

  1. Packet Forwarding or Packet Switching
  2. Routing

Packet Forwarding

Needs to be fast and efficient, every packet involves a forwarding option.

Based on prior knowledge the packet is moved towards the destination.

Routing

Filling routing tables. This requires knowledge of the connectivity in some sense.

This needs to be stable and should converge relatively quickly.

This task should not be performed frequently

Classifying Routes

Routes can be classified into three different categories:

  1. Default Routes
  2. Static Routes
  3. Dynamic Routes

Could also classify by interior and exterior routing.

Default Routes

A default route which is used as a fallback if all other routing fails.

Static Routes

Manually configured information.

Not very scalable.

Dynamic Routes

Automatically use information from other participating routers.

Routing Algorithms/Protocols

Different methods for finding out routes.

Distance View Routing

Uses the Bellman-Ford algorithm.

Regularly broadcast the entire routing table, containing known networks and the “distance” to each (usually a number of hops).

Has the problem of routing loops and convergence after change.

Routing Internet Protocol (RIP)

Uses Distance View Routing

RIPv1 (RFC 1058)

Classful routing with no ability to pass netmasks.

Typically broadcasts every 30 seconds.

Metrics range from 1 to 16 (infinity).

“Split Horizon” often used.

“Triggered Updates” send an update when something (close) changes to improve convergence.

Sent via UDP to the IP broadcast address (usually) to port 520.

Single Splint Horizon

Don’t repeat information to the originator.

Poison Reverse Horizon

Repeat back information, but metric is set to infinity

RIPv2 (RFC 1723)

Refinement of RIPv1.

Includes netmasks and some support for authentication and multicast.

Each router sends the state of all its links to all adjacent routers.

  • Link State Advertisements (LSA)

Each router independently calculates its routing table based on the link state database it has constructed.

Normally uses Dijkstra Shortest Path (Shortest Path First).

Open Shortest Path First (OSPF)

Uses IP directly.

Early versions could calculate a different set of routes for each value of the IP Type-Of-Service (ToS) field, but this was abandoned in the latest RFC.

Interfaces have a dimension-less costs (these were potentially different for each value of ToS).

Supports equal cost load balancing.

Supports subnet mask and thus CIDR.

Point-to-point links can be used without IP addresses.

Supports authentication

Uses multicasting to reduce load.

On a multi-access network (e.g. ethernet), two routers are elected as Designated Router and Backup Designated Router.

IP Address Class Based Only

Classless Inter-domain Routing

Multicast Routing in the Internet

Example routing problems. Protocols covered will include PIM-DM, PIM-SM and MSDP. We will also cover the role of the Rendezvous Point, Anycast IP, and issues still under debate in the technical community.

Multicast

Refers to the sending of data from one to many or many to many registered recipients.

Contrast to broadcast, which floods the network with data, which all hosts will receive whether they want it or not.

Unicast is the sending of data to a single recipient.

Unlike unicast, multicast is scalable,if ten clients request a connection then unicast must provide those ten connections with their own copy of the data.

Multicast, whether there are 1, 10 or 100 members of a ground, only one copy of the data is transmitted and is only replicated where there are group members on different paths to and from the router.

Media Access Control (MAC)

All devices have a unique 48-bit MAC address.

Devices on the LAN keep a table that maps unicast IP to MAC.

A special range of MAC addresses is used for multicast as a multicast address does not correspond to a single end host.

1:32 IP Multicast to MAC Multicast Mapping

Class D address range: 224.0.0.0 - 239.255.255.255

High order 4 bits of the first octet of a class D address are always 1110 (0xE)

To provide a 1:1 mapping between MAC and multicast IP addresses the remaining 28 bits of the IP address would need a unique representation in MAC addresses.

MAC address range assigned for multicast is only 24 bits. The high order bit is reserved, leaving 23 bits.

Thus, 28 bits of IP addresses need to be mapped to just 23 bits of the MAC address.

This corresponds to a 1:32 mapping MAC multicast addresses to IP multicast addresses.

Translation

Six octets in an Ethernet header. Three high order octets are constant 01-00-5E.

The remaining three octets have a range of 00-00-00 to 7F-FF-FF (high order bit is always 0 for IP multicast.

An IP address of 224.1.1.1 maps to 01-00-5E-01-01-01. Byt the MAC address also maps to 31 other IP addresses.

Host Behaviour Reception

Discarding of unwanted packets has to be handled by the IP stack.

Hosts interesting in 224.1.1.1 will also receive (if on LAN) the other 31 possibilities.

After de-capsulation the Ethernet fram and discovering the IP address of an unwanted packet, the host discards the packet.

The probability of this are quite slim at the moment

Special Multicast Addresses

Local Network Control Block

224.0.0.0 - 244.0.0.255 (224.0.0.0/24)

Should never leave the local network.

Internetwork Control Block

224.0.1.0 - 244.0.1.255 (244.0.0.1/24)

Session Description Protocol (SDP)/Session Announcement Protocol (SAP) Block

224.2.0.0 - 224.2.255.255 (224.2/16)

Source Specific Multicast

232.000.000.000 - 232.255.255.255

Specifically permit or block source addresses.

GLOP Block

233.000.000.000 - 233.255.255.255

Ensure that addresses are specific to an administrative domain. Low number of specific addresses (255).

Administratively Scoped

239.000.000.000 - 239.255.255.255

Private multicast addresses.

IP Group Management

Protocol used is Internet Group Management Protocol (IGMP)

  • IGMPv1 - RFC1112
  • IGMPv2 - RFC2236
  • IGMPv3 - RFC3376

IGMPv1

No explicit leave message.

IGMP Report Suppression

Membership Queries

Sent by routers to ask for existence of hosts interested in receiving multicast groups.

Membership Reports

Sent by hosts in response to Queries,

Also sent by the host if they wish to join a new group.

Message Format

IGMPv2

Major change, introduces an explicit Leave Message

Membership Queries

General Query as before.

Group-Specific Query. Used to query if there are any members of a group left after a leave message.

IGMPv1 and IGMPv2 Interoperability

IGMPv2 Hosts must send IGMPv1 if they spot an IGMPv1 router is the querier.

IGMPv2 Hosts may suppress Leave Messages if they spot an IGMPv1 router is the querier.

IGMPv2 Hosts must suppress reports if they see others using IGMPv1 or IGMPv2 reports for a given group.

IGMPv2 routers must ignore Leave messages if IGMPv1 hosts are present.

If any IGMPv1 routers are present must use IGMPv1

IGMPv3

Major change: supports the specification of sources in messages.

Host now do not suppress Reports.

Thus, supports the concept of Source Specific Multicast.

Protocol messages are now more complex.

Membership Queries

Now contain source address information.

Membership Reports

Now contain information relating to multiple groups and each group report can contain source address information.

Multicast Distribution Trees

The set of links and connections that traffic will follow to get to a destination.

Source Trees

Sometimes called Shortest Path Trees (SPT)

A tree where the root is the source of the data.

A different forwarding tree for every combination of source (S) and group (G). N groups and M sources implies trees.

This makes it difficult to calculate the best tree.

Routers need to maintain separate states for all trees.

(S,G) notation.

Shared Trees

Rendezvous Point Trees or Core Based Trees.

Traffic follows some form of common path, unlike source trees in which two different trees can send data through different paths, at least most of the route will follow a common tree.

Each group in the system uses the same tree. N trees implies N groups, no matter how many sources.

Normally, each group has a nominated router as the Rendezvous Point, which is the root of the tree.

All sources send towards the Rendezvous Point.

Can potentially lead to a single point of failure if no backup is specified.

(*, G) Notation

Multicast Forwarding

Routers must no just forward all multicast traffic. Potentially there could be customers anywhere, unlike with unicast. However, forwarding all multicast would be broadcast instead and would lead to chaos.

Techniques include:

  • Reverse Path Forwarding
  • Multicast Forwarding Caches
  • TTL Thresholds
  • Administratively Scoped Boundaries

Reverse Path Forwarding

Note the source address of the arriving packet and interface of arrival. The IP tables are checked for the correct interface towards the source.

If the packet has arrived into the expected interface, it can be forwarded onwards. If not it can be discarded.

Multicast Forwarding Caches

Used to help avoid some RPF calculations

Sometimes called multicast routing tables.

Used to help make decisions for routing tables.

TTL Thresholds

TTL normally used by IP and decremented as each router passed. When this TTL reaches zero, the packet is dropped.

Multicast routers often set TTL Thresholds on interfaces, if the TTL of the packet is less than this threshold then the packet isn’t forwarding.

If traffic was only designed to reach internal locations, the TTL would be set such that it could reach all internal routers, but external routers would have high enough threshold to stop this traffic, but not those expected to reach external locations.

Administratively Scoped Boundaries

These boundaries don’t pass certain multicast addresses.

Multicast Routing Protocols

Dense Mode Protocol

Send all valid packets out of all interfaces, flooding the network.

When a router receives unwanted packets it sends a prune message upstream.

When a prune message is received, the router removes the interface it was received on from the forwarding table for the specified group.

Prune times out in typically 2 to 3 minutes.

DVMRP
Protocol Independant Multicast - Dense Mode (PIM-DM)

Sparse Mode Protocol

Routers send no traffic onwards unless asked.

Shared tree branch constructed from rendezvous point (root) to any interested receiver.

Join messages are sent from the receiver to the root via other routes, creating a shared tree where it goes.

Prune sent when the traffic is no longer wanted.

Protocol Independant Multicast - Sparse Mode (PIM-SM)

Has a rendezvous point using a shared tree, but uses source trees to get data from sources to the rendezvous point. Also allows the final router feeding receivers to switchover to a source tree from the source if the traffic level exceeds some threshold (in Cisco routers the default value for the threshold is zero).

CBT
MOSPF

Transport Level Protocols

An in-depth study addressing the behaviour of TCP and UDP. Connection establishment and termination, flow control under various load conditions, timeouts and retransmission, newer features and performance.

User Datagram Protocol (UDP)

RFC 768

  • Datagram orientated
  • Unreliable, connectionless
  • Simple
  • Unicast and multicast
  • Useful for only a few application
  • Used a lot for services

No handshake required which allows it to be fast, especially for many short connections.

Applications must handle failures.

UDP Header

UDP Header

Checksum is the same as the IPv4 checksum, but is over the UDP header, the data, and the UDP pseudo header

IPv4 UDP Pseudo Header

IPv4 UDP Pseudo Header

IPv6 UDP Pseudo Header

IPv6 UDP Pseudo Header

Example: Trivial File Transfer Protocol (TFTP)

TFTP used UDP, uses a stop-and-wait flow window control algorithm:

  • Stop for ACK before sending the next data packet
  • A lost packet causes timeout and retransmission.

Designed for diskless systems to download configuration files during bootstrapping.

Does waste a lot of network time.

Transmission Control Protocol (TCP)

  • Stream orientated
  • Reliable, connection-orientated
  • Complex
  • Only unicast
  • Used for most internet applications

Done by creating connections between two points and aim to provide data integrity.

Also provides flow control.

TCP Header

TCP Header

Source Port Number
Destination Port Number
Sequence Number

Ensures ordering of packets (is actually a byte count of the first byte count in the packet of data).

Doesn’t start at 1.

Acknowledgement Number

Helps find missing packets, is the last byte of the packet.

Header Length

Describes the length of the header.

Reserved
Flags
Urgent (URG)

The value in the Urgent field is valid.

Acknowledge (ACK)

The value in the Acknowledge field is valid.

Push (PSH)

Push the data up to the application as quickly as possible.

Reset (RST)

Reset the connection

Synchronise (SYN)

Used to set up connections

Finish (FIN)

Used to tear down connections

Window Size
TCP Checksum

Same as UDP Checksum

Urgent Pointer
Options

Connections

How TCP establishes, keeps and tears down connections.

States

Listening

Waiting for a connection

SYN Sent

Waiting for acknowledgement of the first send of a SYN

SYN Received
Established
FIN Wait 1

Wait for an ACK of a sent FIN

FIN Wait 2

Wait for a FIN from an acknowledged FIN.

Close Wait

Acknowledged a FIN but still need the connection to send data.

Last ACK

Final FIN sent, waiting for the last ACK.

Time Wait

Final FIN received and ACK sent, wait for a number of seconds before closing.

Done in case the ACK is not received by the other party.

Closed

Not listening, connection closed.

Wait States

TIME_WAIT is also called the 2MSL wait state.

  • MSL - Maximum segment life.
  • RFC 793 specifies MSL as 2 minutes.
  • 30 seconds, 1 minute, 2 minutes are common implementations.
Connection Establishment
  1. Server listens on a given port (no network traffic).
  2. Client sends a TCP packet with the SYN flag set to a random number.
  3. Server receives the SYN, the 0 offset is the value of the SYN flag. Server goes into SYN Received state.
  4. Server sends a TCP packet with a different SYN flag and an ACK flag, returning the Client’s SYN flag plus one*.
  5. Client returns an ACK flag.

* The SYN flag is assumed to have consumed 1 byte.

If no ACK is received at SYN Received state then the SYN is resent after a timeout.

Known as the three way handshake.

Note: Initial sequence numbers are important! Both client and server choose different ISNs.

Half-Open

When you have a connection between two machines, but one crashes. There’s no data sent between them so one still believes it is connected.

Simultaneous Open

Possible (but unlikely) that two applications will perform active open to one another. Not Client/Server, each opens to a known port.

Results in a single open connection.

Four-way handshake required.

Connection Termination
  1. Client sends a FIN
  2. Server sends an AWK and a FIN
  3. Client sends an AWK.

Three way closedown.

ACK and FIN might not be sent together (server might still be transmitting data).

Half-Close

When a FIN is sent to the server but there is still more data to be sent should not close fully until a FIN AWK is sent and the AWK returned. Microsoft broke this by sending a RST instead of a proper AWK

Simultaneous Close

Both sides of a connection might perform an active close.

Four segments required, which s not unusual.

TCP Options

TCP Header also supports options.

Originally only MSS, NOOP and EOS were specified.

End of Option List (0)
No Operation (NOOP) (1)
Maximum Segment Size (MSS) (2)

Length is 4

2 byte MSS. Usually based on the MTU size.

Window Scale Factor (3)

Length is 3

1 byte shift count.

Timestamp (4)

Length is 10

4 bytes timestamp value
4 bytes timestamp echo reply

Bulk Data Transfer

Bulk data transfer (e.g. FTP) has different requirements.

Low overhead from headers.

Receiving systems have limited buffers.

Sliding windows used for flow control.

Packet loss is expensive; timeouts are the simplest way, but there are better.

Want to send as many packets as possible without flooding the network.

Flow Control In TCP for Bulk Data Transfer

Sliding window flow control is carried out by the receiver.

CWND flow control is carried out by the receiver.

Sliding Windows Flow Control

The idea is that in every ACK we also advertise a window. The window is a number of bytes, usually a multiple of the MSS.

The initial window size will be the size of the buffer TCP has available.

Sender can send up to the limit of the window.

When a segment is acknowledged, the window slides making another slot in the window available.

The window is said to close when the left-hand edge moves to the right.

The window is said to open when the right-hand edge moves to the right.

The window is said to shrink if the right-hand edge moves to the left.

Window Scaling

Scale factor is actually a shift.

Largest scaling factor is 14 (maximum window size of 1,073,741,823 bytes).

Remember this increases the size of the header.

Used for high capacity networks.

RFC 1323, TCP Extensions for High Performance.

Calculating Optimal Window Size

Can never reach optimal capacity but can approach it.

Bandwidth delay product

Capacity(bits) = bandwidth (bits/s) * RTT (sec)

Increase Round Trip Time (RTT) and data travels more slowly, thus higher capacity.

Likewise increased bandwidth involves higher capacity.

Max window size of 65535 bytes, but window scale option can increase this to 1024MB.

Around 10% of the capacity is a good size for the TCP buffer size.

TCP Slow Start (RFC 2001)

Intermediate routers must queue packets - congestion may occur at routers.

Congestion window (CWND).

  1. Initially set to 1 segment (based on announced MSS).
  2. On first ACK, increase CWND by 1 segments of bytes.
  3. Now 2 segments are sent and ACK’d. CWND is increased to 4 segments.
  4. Exponentiall increase until peak flow reached or router discards packets.
  5. Sender can transmit up to the minimu of the sliding window size and CWND.

Has two phases:

  1. Exponential increase.
  2. Linear increase.

Based on a threshold (the slow start threshold).

Drops drastically when there is packet loss.

Fast Retransmit and Fast Recovery (TCP Reno)

Problem: TCP timeouts lead to idle periods.

Fast retransmit: use 3 duplicate ACKs to trigger retransmission.

Fast recovery: start CWND at SSTHRESH and do incremental increase after fast retransmit.

Problems with Fast Retransmit and Fast Recovery

WiFi and very fast networks.

Can be a problem on high capacity networks.

TCP Westwood +

Improved congestion control algorithm for fast, high latency links and lossy links.

Can take many hours to reach optimal throughput with TCP Reno as packet loss has a large affect on throughput.

Westwood carries out end to end bandwidth estimate (BWE) using received ACKs and RTT monitoring.

TCP Reno overreacts to random loss by cutting cwnd in half.

A small fraction of random packet loss does not impact the BWE.

Thus the ssthresh remains unchanged, allowing Westwood to be much more efficient than Reno.

On ACK Reception

Increase cwnd according to Reno algorithm.

Estimate available bandwidth.

When 3 Duplicate ACKs received

cwnd is then set to ssthresh.

On RTO (coarse timeout)

cwnd is then set to 1.

Flow Control in TCP for Interactive Data Transfer

Interactive data transfer can result in many small segments which, together with their ACKs, can lead to congestion.

ACKs are, therefore, piggybacked onto data segments - delayed ACKs.

Nagle’s algorithm result in some collection of data to produce larger segments.

Nagle Algorithm

Many small datagrams results in very large overhead.

Can cause congestion, particularly on a WAN.

  • Only one outstanding segement not ACK’d
  • Cannot send any more small segments until ACK received.
  • Whilst waiting for ACK, TCP will collect small segments together to send as a single segment.
  • On a fast network, more segments are sent.
  • On a congested network less segments are sent, but data still get through (with less overhead).

Sometimes desirable to disable Nagle algorithm, e.g., X mouse movements need to be sent without delay.

Persist Timer

Keeps window size information flowing even if the other end closes its receive window.

Possible situation where an ACK is lost and both ends waiting.

  • Sender waiting for acknowledgement with window update so that it can send more data.
  • Receiver waiting to receive the data as it indicated a non-zero window in the acknowledgement that go lost.

Sending TCP uses a persist timer to periodically query the receiver to see if the window size has updated.

Timeout and Retransmission

Fundamental to the use of acknowledgements is the need for timeout and re-transmission of packets.

TCP keeps four different timers for a connection and performs exponential backoff when a packet goes unacknowledged.

TCP also has a congestion avoidance algorithms to complement the slow start algorithm to cope with packet loss caused by congestion.

Timers
Retransmission Timer

Used when expecting an acknowledgement from the other end.

Persist Timer

Keeps window size information flowing even if the other end closes its receive window.

Keepalive Timer

Detects when the other end of a connection has crashed or re-booted.

2MSL Timer
Round Trip Time Measurement

Used to calculate retransmission timeout (RTO).

Mean deviation is used to allow arithmetic to be carried out with integers and without square roots.

Where:

  • D is the smoothed mean deviation
  • M is the latest measured RTT value
  • Gain (g) for average is set to 1/8
  • Gain (h) for deviation is set to 1/4
Karn’s Algorithm

If a packet times out and there is a retransmission, when an ACK is received, whose was it.

Karn’s algorithms specifies that RTT estimate cannot be update when a timeout and retransmission occur.

Re-use the RTO after such an exponential back off until an acknowledgement is received.

Transactional Transmission Control Protocol (T/TCP) RFC 1644

TCP for transactions.

Nearly as fast as UDP.

TCP Accelerated Open (TAO)

Reduces minimum number of packets required to three in many circumstances.

Data and FIN are piggybacked on the ACK.

Connection Count (CC) used to avoid duplicate SYNs.

Security issues caused due to this.

Truncation of TIME_WAIT State

2MSL state is abandonded.

8 times RTO used instead.

Allows retransmission of final ACK.

A new incarnation of the same connection, using TAO, implicitly acknowledges the ACK of FIN.

Port Numbers

Demultiplexing

Layered like the OSI model (but pre-dates the OSI model).

Ethernet driver captures the incoming frame, strips the Ethernet header and passes to IP

IP layer strips out the IP header and passes it to the transport layer, etc.

Levels:

  • Application
  • Transport
  • Network
  • Link
  • Physical

The process of moving things up and down layers is demultiplexing and multiplexing.

Data Encapsulation

Don’t sent arbitrary length methods, to allow the multiplexing of networking.

(In TCP) Data is encapsulated into frames, frames have a frame header, trailer and a datagram.

This datagram contains an IP Header and a segment or protocol data unit (PDU)

The PDU has the (TCP) protocol header and the actual data.

The frame header is used to drop the packet onto the local link. The address used in the frame header is embedded in the hardware in the network card (MAC address), this is why the IP address is not used. This is for efficiency.

Frame headers are: source address, destination address, protocol and checksum.

Naming and Directory Services

Including the DNS and LDAP and their use.

Domain Name Service (DNS)

A distributed database mapping hostnames to IP addresses and vice versa.

How DNS is used

  • Applications access the information in the DNS by way of resolver programs
    • gethostbyname() returns an IP address in response on a hostname
    • gethostbyaddr() returns a hostname in response to an IP address.
  • People access information by way of client programs
    • nslookup
    • dig
  • Common implementation of DNS client and server is BIND (Berkeley Internet Domain Server).
    • Server is named
  • The DNS defines a protocol (see RFCs) that is used for communication between client and server.

The DNS namespace

DNS is stored as a tree:

13 canonical name servers which server information for the root “unnamed” root.

.arpa

A special domain which is used for reverse lookup.

Now .ipv6

Fully qualified domain names

www.google.com. is a fully qualified domain name. Missing the trailing . allows the DNS to lookup other entries based on the domain it resides on.

DNS Lookup

Is a recursive process and therefore quite slow. Results are cached to improve performance.

To try and reduce the effects of this when changing hostnames the TTL of the cache is reduced prior to performing this.

Name Ownership

No one company owns all names

Name Server Zones and Boundaries

Can have zones within a name server.

Resource Record

This is where DNS gets its information from.

Examples:

3www4aber2ac2uk0

3www6google3com0

The 0 occurs because the top-level domain has no name. This is how the end of the string is reached.

SOA

Name of primary source of info for zeon

A

IP address of host

CNAME

Canonical name

PTR

Alias for an IP address for reverse lookups

MX

Mail exchange information

NS

The Name of the nameserver

AAAA

IPv6 address of host

A6

Experimental IPv6 address.

HINFO

Host information (CPU and OS).

LOC

Location of host.

Dynamic Host Configuration Protocol (DHCP)

DHCP Timeline

Request are broadcast address 255.255.255.255.

Often used during host configuration.

Never forwarded by a route, by a DHCP/BOOTP relay agent can be used to forward this onto the DHCP server.

Replies are routed normally.

Lightweight Directory Access Protocol (LDAP)

The Directory Information Tree

Similar to the DNS tree, but the whole world doesn’t typically use the same tree.

The Information Model

Service model based on entries.

An entry is a collection of attributes, that has a distinguished name (DN).

Attributes

Type/value pairs, e.g.:

C = GB

o = University of Wales

Attributes described by a schema.

Quality of Service

The need for and the provision of Quality of Service (QoS) within packet based networks such as the Internet which are inherently best efforts at heart.

Challenges

  • QoS needs to be end-to-end, involving numerous management domains.
  • QoS service should address all requirements - not just ‘top quality’
  • QoS policy distribution, maintenance and monitoring will increase in importance.
  • Better to implement QoS earlier - provides scalability, experience. Pro-active not reactive.

Queueing Disciplines

Ways in which resources can be allocated to priority traffic best.

Priority Queue

Other queues only get any resources at all when the priority queue is empty.

Simple and effective.

Traffic must be policed to stay within low limit as other traffic can get starved or the priority queue reverts to Best Effort

Weighted Round Robin

Multiple queues, each queue gets a guaranteed minimum of resource even under congestion conditions.

Can be combined with Priority Queues.

Security Issues

The inherent risks within networks such as the Internet, cracking, viruses, trojans, worms and denial of service attacks. The role of the Firewall and the problems it can bring.

Data Encryption

Four aspects of security to consider:

  1. Privacy
  2. Authentication
  3. Integrity
  4. Nonrepudiation

Aspects of Security

Privacy

Snoopers should not be able to read confidential data

Authentication

Verifies that the apparent sender really sent a message, and not an imposter.

Integrity

Verifies that data has not been corrupted or altered in transmission

Nonrepudiation

Ensures that the sender or receiver cannot deny sending or receiving a piece of information.

Encryption Techniques

Classical encryption techniques:

  • Permutation: The order of the plaintext characters is changed
  • Substitution: a plaintext alphabet is mapped to a different one.

A cipher used to perform the encryption.

  • Stream ciphers encrypt data bit by bit or byte by byte.
  • Block ciphers first pack data bits into a fixed length block, then encrypt the whole block into a ciphertext block.

Symmetric Key Encryption

Use a shared secret key to encrypt and decrypt.

For users, keys required.

Can be done in hardware.

Public Key Encryption

The key is split into two: a public key and a private key.

Anything encrypted by the public key can only be decrypted by the private key.

The private key must be kept private, but the public key can be shared without worry.

Cannot be done in hardware.

Combining Both

Create a shared secret based from public key encryption.

Transport Layer Security (TLS)

Client sends a “hello” message with TLS version number and prefernces.

Server sends a certificate including a public key.

  • Public key encrypted by some CAs.
  • Client has a list of CAs and their keys and uses a corresponding key to decrypt the server key.
    • This authenticates the server, unless it is self-signed.

Client sends a secret key encrypted with the server’s public key.

Server decrypts message and then encrypts a response with secret key which the client decrypts.

Hashing and Message Authentication

Hashing is the operation that maps the message of variable length into a hash value with fixed length.

Hashing is not reversible.

Hashing can be used to generate a digest of the message, called the Message Authentication Code.

The receiver can use the digest to verify if the message is authentic.

Current and Future Issues

The (still) emerging IPv6 protocol and other active issues.

Internet Protocol Version 6 (IPv6)

Problems with IPv4 is that 32 bit addresses is too small.

  • 128 bit address space solves the problem for the long term.
  • Large space allows addresses to be more structured.

An IPv6 Address

fe80::2c0:dfff:fee4:bd87/10

Loopback address: ::1/128

:: is a string of 0s of indeterminate length.

IPv4 addresses are encapsulated in IPv6 are expressed with dotted decimal for last four octlets: `::194.123.1.2

IPv6 addresses are leased (possibly infinitely).

Current Allocation

  • 0000 0000 - Reserved
  • 0000 001 - NSAP Allocation
  • 001 - Aggregatable Global Unicast
  • 1111 1110 10 - Link-Local Unicast
  • 1111 1110 11 - Site-Local Unicast
  • 1111 1111 - Multicast

Format Prefix: 1111 1110 10 or FE80::/64

Postfix is usually the last 24 bits of MAC address. middle byte is fffe due to IEEE standards.

fe80::00ff:fe00:0000 for MAC address: 00:00:00:00:00:00

Site-Local Addresses

Format Prefix 1111 1110 11 or FEC0

Neighbour Discovery

Router Advertisements

DHCPv6

Where autoconfiguration is undesirable for operational reasons.

Similar in concept to DHCP for IPv4

Compatible with autoconfiguration

Nodes may request multiple addresses.

Authentication of nodes.

Makes use of address deprecation and reconfiguration-init message…

Co-existence and Tunnelling

Exam Preparation

Some chances to get a feel for what is expected in the exam.

Steve Kingston

s/(Use a diagram) is necessary/\1/g

Past Paper 2012-113

1. This is a question about the Transport Layer Protocols

a) With reference to the User Datagram Protocol (UDP) and the Transmission Control Protocol (TCP), describe features that are provided by transport layer protocols. In your answer distinguish between features provided just by TCP and those provided by both UDP and TCP.

  • Both use port numbers. Server listens of a specific port and client sends data to that port.
  • UDP Connectionless
    • No acknowledgement that data is recieved
    • Up to the application to manage successful transmission of data
  • TCP sets up a connection between two machines
    • Connection must first be established
    • All sent packets (except ACK’s) must be acknowledged by the other party
    • Support for flow control using sliding window protocol
    • Must also terminate the connection
    • Protocol ensures sucessful delivery of data (or failure).

b) Describe with the aid of diagrams the TCP/IP connection establishment processes, and explain how the initial sequence number (ISN) is exchanged between two nodes during connection establishment. Label the diagrams with the TCP connection states at each stage.

  • Server listens on a specified port
  • Client sends a SYN packet to that with a random ISN and a receiving port number (usually fairly high).
  • Server responds to the client with an ACK with a sequence one higher than the SYN’s ISN (the SYN is seen to have taken up one byte of data).
  • Server also responds with a SYN to the client, the ISN here is generated by the server and is different to the clients.
  • Client responds to the server’s SYN with an ACK, again here the sequence number of this packet will be the ISN of the Servers SYN packet + 1.
  • At this point the connection is established.

The ISNs must be different as they are used to acknowledge how much data was actually received later in the connection.

Sequence numbers returned in the ACK are the sequence number of the acknowledge packets plus the size of that packet (which can then be used for flow control).

c) What is meant by a flow control mechanism? In your answer describe the likely effect of not employing flow control on TCP connections.

  • Flow control used to send multiple packets of data and acknowledging them all in a single packet.
  • Flow control is also used to stop flooding the network with packets that would just be dropped by the receiver because it’s buffers are full.

Not using flow control would either:

  1. Require the sender to wait for the receiver to acknowledge a single packet before sending the next one, meaning the whole transmission would be very slow.
  2. Allow the sender to flood the network with packets which would eventually be dropped by the receiver as its buffers fill up to maximum (or by intermediary devices to the same effect). The sender would receive acknowledgements for some of the packets and would have to constantly resend dropped packets after a timeout period, leading to a lot of traffic on the network, especially when the receiver does not process the information quickly.

d) Describe, with the aid of diagrams, the problem with transferring very large files using the slow start /congestion avoidance flow control mechanism on fast (e.g. gigabit) but high latency networks. Explain in your answer how the TCP Westwood+ algorithm modifies the mechanism to provide better throughput whilst retaining a concept of fair access.

  • Slow start sets the window to 1 to begin with and grows it exponentially after each ACK received. Once it reaches the slow start threshold (ssthres), this growth is linear (by 1 from each ACK) - this is congestion avoidance.
  • Eventually the congestion will be encountered, and a packet will not be ACK’d before a timeout. The size of sstresh is set to one half of the current window size and then the size of the window is reset to 1. The process then repeats endlessly.
  • This tends to lead to “sawtooth” graphs.
  • Problem of slow start is that it has to wait for a timeout to occur before it knows congestion has occurred.
  • To change this, TCP Reno was implemented; if the receiver does not receive the next expected packet (i.e. it receives one with a higher sequence number than expected) it will send a duplicate ACK of the last ACK it sent.
  • If this ACK is received by the sender 3 times it triggers retransmission and will retransmit the next packet from that sequence number.
  • The receiver may then acknowledge further packets in the sequence in the next ACK.
  • Instead of reseting the window size to 1, it is started at sstresh instead and grown incramentally.
  • TCP Reno still leads to sawtooth graphs, but is less affected by congestion.
  • TCP Westwood+ implemented to further improve TCP connections.
  • Westwood+ estimates the bandwidth available using received ACKs and Round Trip Time monitoring.
  • In both cases of 3 duplicate ACKs or coarse timeout, the ssthresh is set to max(2, (BWE * RTTmin) / Seg_size). The congestion window is set to ssthresh in the case of 3 dup ACKs or 1 in the case of coarse timeout.
  • A small fraction of randomly lost packets doesn’t affect the BWE and therefore the ssthresh, unlike with Reno.

SEM6120 - Introduction to Intelligent Systems

This module introduces the key ideas in Artificial Intelligence and ensures all students are at roughly the same level before moving on to the specialist modules.

Introduction

General introduction to Artificial Intelligence (AI), including discussion of what AI is, its history, definitions, and philosophical debates on the issue (the Turing test and the Chinese room). Ethical issues (3 hours).

Learning Outcomes

  1. Describe and use the basic principles of Artificial Intelligence and Machine Learning.
  2. Be able to reflect on project needs.
  3. Practically apply AI and ML principles to meet those needs.
  4. Present the material they have learned in an informed, clear manner.
  5. Demonstrate understanding and insight into the material that they are presenting.

Assessment

  • Presentation 20% 01/10/2013-18/10/2013
  • Report 20% 01/10/2013-25/10/2013
  • Essay 60% (coding + report) 10/10/2013-01/11/2013

Tips: if you disagree with a paper, you should comment on why, this will be a good thing :)

Commitment

  • 26h seminars
  • 6h practicals
  • The rest of the time is spent on background reading and assignments.

Booklist

  • Artificial Intelligence: A Modern Approach Russell, S. and Norvig, P.
  • Artificial Intelligence: Structures and Strategies for Complex Problem Solving. - Luger G.
  • Artificial Intelligence Illuminated - Coppin B.

What is Artificial Intelligence?

  • Understand intelligent entities; learning more about ourselves?
  • Building intelligent entites; creating things which exhibit “intelligence”.

Two ways of looking at this:

  • Scientific goal
  • Engineering goal

Many definitions, all potentially valid.

Problem: what is intelligence and how do we prove a system is intelligent.

Strong versus Weak AI

  • Strong AI can actually think intelligently.
  • Weak AI can possibly act intelligently.

Turing Test

Human interrogator talks to another system (human or AI), if the interrogator cannot tell the difference then the argument is we must acknowledge it is able to think like a human.

Has not yet been passed.

The Chinese Room

Behaving intelligently may not be enough.

Computers are just a symbol manipulation device and therefore cannot have mental states.

Ethics and AI

We have investigated whether we can develop AI, but not whether we should.

Problems of AI:

  • Job losses
  • Too much/little leisure
  • Privacy rights
  • Accountability
  • End of the human race?
    • Harm in the wrong hands

Branches of AI

  • Logical AI
  • Search
  • Pattern recognition
  • Representation
  • Inference
  • Common-sense knowledge and reasoning
  • Learning from experience
  • Planning
  • Epistemology
  • Ontology
  • Heuristics
  • Genetic Programming

Why search is important in AI and how to go about it. This includes both informed and uninformed strategies. Evolutionary search (6 hours).

Defining the Problem

Many AI problems can be framed in terms of a search problem.

Representation is very important.

The search strategy is the way in which searching is performed:

  • Uninformed
  • Informed (Heuristic)

Can’t always evaluate all the search space. Massive search space (e.g. Chess) or even infinite.

Terminology

Search State

Summarises the state of search.

May not lead to a solution.

e.g. Representation of a chess board.

Initial State

The first search state.

Solution

A special example of a search state. It solves the problem.

Goal State

The state trying to be reached.

State Space

All possible search states.

Successor Function

Ways to move around in the state space (action/operators).

Goal Function

To check if the goal has been reached.

Cost Function

Measures the path cost.

Search Trees

Visualise the progression of a particular algorithm.

Shouldn’t evaluate every possibility of the state space.

Initial state is the root, goal is a leaf.

Don’t store the whole search tree. Requires a lot of space. Can discard explored nodes.

Store the frontier of search (i.e. nodes in search tree with some unexplored children).

Evaluation

Time Complexity

In big O notation. Number of nodes generated during a search (worst case).

Space complexity

In big O notation. Maximum number of nodes stored in memory.

Optimality

Is it guaranteed to find the optimal solution?

Completeness

If there is a solution, will it be found?

Branching Factor

b

Maximum number of successors of any node

Depth of shallowest goal

d

Maximum length of any path in the state space

m

Branching factor b and depth of solution d.

  • Time complexity: O(b^d)
  • Space complexity: O(b^d)
  • Optimal (given step costs are identical)
  • Complete (provided b is finite)

Branching factor b, depth of solution d and maximum depth m.

  • Time complexity: O(b^m)
  • Space complexity: O(bm)
  • Not optimal
  • Not complete (complete if no loops)

A Depth First Search with a limited maximum depth.

Depth limit l.

  • Time Complexity: O(b^l)
  • Space Complexity: O(bl)
  • Complete if l >= d
  • Not optimal

A form of Breadth First Search, using a Priority Queue.

Node with the lowest total path cost is expanded.

If all the cost steps are equal, it is exactly the same as BFS.

Complete and Optimal if no negative path costs.

Iterative Deepening

A depth-limited search where the limit is increased iteratively.

Avoids the space complexity of BFS.

  • Time Complexity: O(b^d)
  • Space Complexity: O(bd)
  • Optimal
  • Complete

Iterative Improvement

Paths not retained - low memory

Heuristic is a sort of rule of thumb.

Hill Climbing

Heuristic best child chosen at each point until goal reached or no change in current state.

No backtracking.

Priority queue based on heuristic. Like BFS other than that.

f(n) = h(n)

A*

A best first search which takes into account current path cost.

f(n) = g(n) + h(n)

Dominance/Informedness

If h2(n) >= h1(n) then h2 dominates h1.

Genetic Algorithms

Encode solutions in Chromosomes.

Mutate and crossover chromosomes.

Evaluate the new population and select the best.

Genetic Programming

A branch of genetic algorithms which uses programming statements as genes.

Easy to do in LISP.

Usually represented as trees.

Koza’s Algorithms

  • Tree consists of functions and terminals
  • Choose a set of functions and terminals
    • e.g.: {+,-,*,/,sqrt},{A,B}
  • Generate random population which are syntactically correct.
  • Follow GA-like procedure.

Ant Colony Optimisation

In nature, ants could solve complex problems unsupervised. Capable of finding the shortest route between a food source and the nest.

Can react to changes in the environment,

  • Each ant moves “randomly”
  • Pheromone is deposited on path
  • Ants detect lead ant’s path, inclined to follow.
  • More pheromone on a path increases the probability of that path being followed.
  • Pheromone decays over time.

Mechanics of ACO

  • Graph Representation
  • Heuristic desirability of edges
  • Construction of feasible solutions
  • Pheromone update rule (attached to edges).

Algorithm for one ant

  • Select starting node at random.
  • While not finished:
    • Evaluate all edges from this node.
    • Select the best-looking edge via probabilistic transition rule
    • Deposite artificial pheromon on the chosen edge.
  • Finished path is a potential solution, analysed for optimality.

Particle Swarm Optimisation

Each particle is searching for the optimum and encodes a solution (like a GA).

Each particle is moving (can’t search otherwise), and hence has a velocity. It also maintains the position it was in where it had its best result so far (its personal best).

The particles co-operate, exchanging information about what they’ve discovered in the places they’ve visited.

This co-operation only needs to be very simple;

  • A particle has a neighbourhood associated with it
  • A particle knows the fitness of those in its neighbourhood, and uses the position of the one with the best fitness.
  • Adjusts using this.

Multi-objective Optimisation

Sometimes an answer has to be optimal in several aspects.

Examples:

  • Quickest and cheapest flights
  • Lightest and strongest construction material.

Knowledge Representation

Ways of representing knowledge in a computer-understandable way. Semantic networks, rules. Examples of the importance of KR (4 hours).

What is Knowledge?

AI Agents deal with knowledge (data).

  • Facts
  • Procedures
  • Meaning

Logic Representation

Non-logical Representation

Logical representations have restrictions which can be hard to work with.

Classes Ignored in Logic

Objects in the world tend to be related to each other:

  • Classes, super- and sub- classes.
  • Part or whole hierarchies
  • Properties which are inherited.

The state of the world changing over time.

  • Explicit representation of time.
  • Frame problem.
  • Non-monotonic reasoning.

Closed world assumption.

Uncertainty or fuzzy knowledge.

Classes and Object-Orientated Representations

Classes define Objects, Objects are instances of Classes.

Object ∈ Class

Class ⊂ Superclass

Facts and rules can be encoded.

Semantic Networks

Essentially a generalisation of inheritance hierarchies.

Each node is an object, class, concept or event.

Each link is a relationship which makes sense in context.

Inheritance is as expected.

Example:

Note: pre-dated OOP.

Frames

Incorporates certain valuable human thinking characteristics:

  • Expectations
  • Assumptions
  • Stereotypes
  • Exceptions
  • Fuzzy boundaries

Represent what is typical unless an exception is known.

Frames allow more convenient “packaging” of facts about an object.

Frames often allow things which are typical of a class and which are definitional and can’t be overridden.

Frames also support multiple inheritance.

Frames are represented as semantic networks where nodes have structure. A frame has a number of slots (age, height, etc.), each of these slots stores specific information.

When new information is gained slots can be filled in, this can cause the triggering of actions, which may trigger the retrieval of other frames.

Scripts

Can fill in missing detail that is assumed.

Non-monotonic Logic

Once true doesn’t mean always true.

As information arrives, truth values can change.

A number of implementations for this,

  • Circumscription:
    • Brid(x) and not abnormal(x) -> flies(x)
    • We can assume not abnormal(x) unless we know abnormal(x).
  • Default logic:
    • x is true given x does not conflict with anything we already know.”

Truth Maintenance Systems

These systems allow truth values to be changed during reasoning (belief revision).

When retracting to a fact, must also retract to any other fact derived from it.

Penelope is a bird    (can fly)
Penelope is a penguin (cannot fly)
Penelope is magical   (can fly)
Retract magical       (cannot fly)
Retract penguin       (can fly)
Justification-based TMS

For each fact, track its justification

When a fact is retracted, retract all facts that have justifications leading back to that fact, unless they have independent justifications.

Assumption-based TMS

Represent all possible states simultaneously

Neural Networks and subsymbolic learning

We can find solutions using search, but how can we remember solutions, learn from them and adapt them to new situations? This will cover perceptrons, single-layer and multi-layer networks (5 hours).

Symbolic Learning

When we use some sort of rule-based system, we generally have to understand the rules. This means we understand the conclusions it draws, because it can tell us.

When a system learns from such rules, it processes in a way which can be understood.

Subsymbolic Learning

Don’t really understand, or have control over, the way in which solutions are found.

ANNs, GAs, GP and sometimes statistical methods.

Might be related to the randomness factor.

Artificial Neural Networks

(A: time to not understand backprop again)

Inputs (variables) -> network -> Outputs (results).

X and Y matrices of the statistical models are analogous to the training inputs and outputs of ANNs.

Backpropagation

Most common learning rule for ANNs.

Connections between nodes given random initial weights.

We therefore get a value at the output node(s) which happens when these random weights are applied to the data at the input.

Epoch

An iteration, that is, finding the error then adjusting weighting, is called an epoch.

There may be many thousands of epochs in one training.

Overfitting

Need to be able to generalise the model to unseen data.

Too much training will lead to a lack of generalisation.

Kohonen Neural Networks

Self-organising neural networks.

Random initialisation of a grid, see which nodes best match the input and change nodes around it.

Propositional and First-Order Logic

The backbone of knowledge representation (4 hours).

Types of Knowledge

  • Declarative - Facts
  • Procedural
  • Meta - knowledge about knowledge
  • Heuristic - rules of thumb

Representing Knowledge: Object-attribute-value

Encoding a fact in three pieces of information; the object, it’s attributes and the values of these attributes.

The facts a human knows are not obvious to a computer; needs to be encoded.

e.g.: Tree(species, oak)

The encoding doesn’t matter, so long as its uniform to the system.

Can include an uncertainty factors is a number which can be taken into account by the system when making decisions.

The final conclusion of any program where uncertainty was used in the input is likely to also have an uncertainty factor (if you’re not sure of the facts, can can the result be certain?)

Encoding uncertainty might be encoded something like this: Tree(species, oak, 0.8) (the certainty that the tree is an oak tree is 80%).

Again encoding doesn’t matter, so long as its uniform to the system.

Rules

A knowledge base may have rules associated.

IF premises THEN conclusion. There may be more than one premises and may contain logical function

  • AND, OR and NOT for example

If a premise evaluates to TRUE the rule fires.

e.g.

IF tree(species, oak) THEN tree(type, deciduous)

Rules may contradict another rule. Different strategies can be applied to choose the most specific or most relevant rules.

Meta Rules

IF
    tree is conifer
THEN
    load conifer data
ELSE
    load deciduous data

Logic

We can represent knowledge using logic. There are two types: propositional and predicate (or first-order logic or predicate calculus)

Propositional Logic

In propositional logic formulas are constructed using variables, TRUE and FALSE constants and connectors:

  • AND ()
  • OR ()
  • NOT (¬)
  • IMPLIES ()

Predicate Logic

Prolog is based on this.

A predicate is like a function that returns TRUE or FALSE

Tree(a) is true if a is oak, false if a is daffodil.

Implication

Oak(a) → Tree(a)

If the first clause is satisfied, the second clause is also satisfied.

Assertions

The symbol can be read as “for all”.

∀a(Oak(a) → Tree(a)

Existence

The symbol can be read as “exists”

∃a(Beech(a) ∧ ¬Green(a))

The Atomic Formula

beech(a) is known as an atomic formula.

Can have multiple parameters.

Human Reasoning

We use two standard rules:

  • Deductive
    1. Modus Ponens - if we know P→Q then if P is true, Q must also be true
    2. Modus Tollens - if we know P→Q then if Q is false, P must also be false
  • Inductive
    • Difficult for machines
    • Observations:
      • Oak trees have green leaves
      • Pine trees have green leaves
    • Induce
      • All trees have green leaves
    • Unfortunately, that’s not true, but it is useful.

Non-monotonic

Classic monotonic reasoning cannot contain contradictions

Put formally:

X ⊆ Y → Deriv(X) ⊆ Deriv(Y) where Deriv(X) is a set of facts derived from X

Temporal Reasoning

Reasoning changes over time. Can introduce this into machines by introducing a concept of time.

Machine Inference

Machine inference is used to deduce new facts from a knowledge base which is held in working memory.

Knowledge Base -> Inference Engine -> Working memory

Can be very complex

Deducing New Facts

Two principal methods

  1. Forwards chaining - based on modus ponens.
  2. Backwards chaining - based on modus tollens.

Forward Chaining

Modens Ponens:

student(S) ∧ studies(S, ai) → studies(S, prolog)
student(T) ∧ studies(S, expsys) → studies(T, ai)
student(joe)
studies(joe, expsys)

Therefore we can deduce:

studies(joe, prolog)

Proof:

student(joe) ∧ studies(joe, expsys) → studies(joe, ai)
student(joe) ∧ studies(joe, ai) → studies(joe, prolog)

Q.E.D.

Forwards chaining can fire any rules which match the knowledge held in its working memory. This can potentially come up with a huge amount of new knowledge, most of which is probably completely irrelevant.

Backwards chaining

Backwards chaining sets out to prove a piece of information.

The information will either be true or false, but doesn’t generate unwanted results.

Use the resolution proof method for now.

Say we have:

A1 ∨ A2 ∨ ... ∨ An ∨ B and ¬B ∨ C1 ∨ C2 ∨ ... ∨ Cm

Resolvent of clauses is:

A1 ∨ .. ∨ An ∨ C1 ∨ ... ∨ Cm

Resolution

Now take the two clauses:

A1 ∨ A2 ∨ ... ∨ An ∨ B and D ∨ C1 ∨ C2 ∨ ... ∨ Cm

If there is some subset where B and D are negations of each other Theta

If we have two clauses Clause1 and Clause2, and these both have a resolvant R, then if Clause1 and Clause2 are both satisfiable, so must R be.

The idea: take a clause, containing a goal we want to prove, and negate that goal. If we then resole this with other clauses, over and over and we get to the empty clause (which is never satisfiable), we have proved our goal.

Clause form

We can express any predicate calculus statement in clause form.

This enables us to work with OR and NOT rather than any other clause.

p → q ≡ ¬p ∨ q

A ∧ B ≡ ¬(¬A ∨ ¬B)

Example of Resolution

Use a previous example in clause form:

  1. ¬student(S) ∨ ¬studies(S, ai) ∨ studies(S, prolog)
  2. ¬student(T) ∨ ¬studies(T, expsys) ∨ studies(T, ai)
  3. student(joe)
  4. studies(joe, expsys)

Solution to studies(S, prolog) means we must negate it:

¬studies(S, prolog)

Resolve the clause 1:

¬student(S) ∨ ¬studies(S, ai)

Resolve with clause (2) (S=T)

¬student(S) ∨ ¬studies(S, expsys)

Resolve with clause (4) (S = joe):

¬student(joe)

Resolve with clause (3):

Ø

Therefore studies(joe, prolog) is true.

Q.E.D.

Horn Clauses

The same thing, but expressed differently. This is how Prolog does it.

A horn clause is a series of disjuncts (ORs)

We can take:

A ∨ ¬B ∨ ¬C

and write it:

A ← B ∧ C

If B and C then A

All the same thing:

  1. studies(S, prolog) ← student(S) and studies(S, ai)
  2. studies(T, ai) ← student(T and studies(T, expsys)
  3. student(joe) ←
  4. studies(joe, expsys) ←

Resolving Horn Clauses

All the negatives are one side of the sign.

← studies(S, prolog)
studies(S, prolog) ← student(S) ∧ studies(S, ai)

← student(S) ∧ studies(S, ai)
studies(T, ai) ← student(T) ∧ studies(T, expsys)

S = T

← student(S) ∧ studies(S, expsys)
student(joe) ←

S = joe

← studies(joe, expsys)
studies(joe, expsys) ←

Q.E.D.

Prolog Example

Prolog Tutorial

Prolog

Programming with logic.

Uses backwards chaining through horn clauses.

Programming for Intelligent Systems

Practical introduction to programming for Intelligent Systems, used to illustrate search, KR and first-order logic (3 hours).

Rule-based Systems

How can human expertise be automated? How to build an expert system - system concepts and architectures. Rule-based systems: design, operation, reasoning, backward and forward chaining (3 hours).

RBS

Knowledge base contains the rules.

Database contains the facts.

Inference engine uses both of these to match facts to rules to derive new facts, etc.

Justifiable and transparent.

Knowledge Acquisition

Knowledge Acquisition and its importance in KR and RBS (2 hours).

Statistical Methods

Multivariate analysis and statistical methods for solving problems.

Multivariate Analysis

Analysis of high-dimensional data. Often not possible to identify or quantify an object from one dimension.

Two methods:

  1. Statistical
  2. AI methods

Statistical Models

Always produces the same result from a given set of data.

Will always find the best result within the constraints of their abilities.

X Values

All the factors which might affect the outcome so that the analysis can find some form of correlation.

Y Values

The set of objects and associated variables.

Realigning Axes.

Usually statistics are in high-dimensional space. No way of visualising this data.

In a 3D space, can fit a plane to any 3 points to make the problem a 2D one.

Can approximate the plane to more than 3 points and fit the best plane using root mean square error. This is done using regression, but does mean there will be error in the model.

Can do this with a 3D plane, but the idea is to keep the model as simple as possible.

Principal Component Analysis

The line through multi-dimensional space which describes the most variation in the data (the one with the widest space) is the First Principal Component.

The following Principal Components are always perpendicular and are ordered by the amount of variation from the data and have less importance.

Can analyse the principal components to see which of the original X values contributed most.

Principal components help understand the data better.

Principal Components Regression

The main principal components help to explain what factors had most bearing on the results.

They form a model that could be used for predicting results.

Using a PCA model to predict is known as Principal Components Regression (PRC).

Closely related is PCL.

Principal of Parsimony

Fewer variables, if they produce a model just as good in testing, should be preferred.

Fuzzy Logic

Not clear cut for a definite value. (i.e. not boolean logic).

Seminars

Seminars from Aberystwyth University

Learning from Big Random Weight Networks and Their Training Algorithms

by Xizhao Wang Fellow; IEEE Editor-in-Chief, IJMLC

Introduction to Big Data

The internet, financial institutions, media, medical treatment, scientific research, etc. provide a huge amount of data.

This data is only increasing in size. Big Data is the explosion of information.

Storing Big Data is, as expected, very difficult.

Variety

Structured data is becoming unstructured.

Velocity

Moving from batch jobs to the streaming of data.

Volume

Now around Zettabytes of data around.

Value

Uncertainty of Big Data

There is lots of uncertainty in Big Data, there can be a lot of ambiguity in symbolic data, etc.

Some cybersecurity issues of Smart Buildings, Smart Metering, not-so-Smart Cars and the Smart Grid

Martyn Thomas CBE FREng

Smart Buildings

Offer significant benefits and new vulnerabilities.

Benefits

  • Access Passes, Attendance monitoring, follow me printing
  • Calendar, room booking, etc.
  • CCTV, WiFi, Remote and central monitoring and control
  • Dynamic displays, emergency messages, fire alarms

Vulnerabilities

  • Denial of access, unauthorised access, data compromise
  • Disruption, wasted energy, damage
  • Intrusion, loss of control,
  • Safety.

Smart Cars

Michael Barr’s expert report in the Bookout v Toyota lawsuit. See here.

Safety World

Safety world works on probabilities, but in security there is a direct attacker - can’t argue independence.

Smart Meter Security

End-to-end security: “No trusted components”.

Whether this works or not is unknown.

Vulnerabilities

  • Meters contain clocks and billing data, if these are changed, bills can be manipulated.
  • Meters contain an off switch, misuse could cause distress, harm or substantial disruption.
  • Meter firmware will need to be updated.
  • The Gas meter can only handle low-grade encryption.
Who controls the off switch?

Anyone who has the authority to send the relevant messaged through the DCC.

Anyone who can mount a successful cyber attack on the network.

How will security be assured?
  • Human review of the spec.s
  • Testing the system
  • Focuesed pentesting

Testing only shows that faults do exist, not that there are no faults.

For high confidence you need formal methods.

Cyber Security

Is a through-life discipline.

  • Planning
  • Control
  • Monitoring
  • Response
  • From design through to decommissioning
  • Lifetimes of 20+ years
  • Don’t run anti-virus
  • Monitoring is difficult (and uncommon)
  • How to respond to incidents?
  • Insider threats
  • Lifetime security needs discipline and excellent configuration control through years of maintenance and upgrades.

Cyber Security

Is a tier one threat.