This module focuses on current issues in the field of Software Engineering. The module will involve students in an examination of current research literature of interest to software engineers.
It is okay to produce a paper based on my own paper :-)
Papers provides a good view of what is happening now. Journal papers give a more out of date view than conference papers. Blogs typically give the most up-to-date view but are least verifiable.
Papers should give enough information that there is enough information to replicate the experiment.
It will show some actual data.
Now the methods become important, it gives details on how the experiment was set up and carried out. Try to pick out the basic methods and work upwards.
References provide a way of tracking back through the sources.
Set down for presenting Abstract State Machines (Feb 28) and Swarm Robotics (Mar 28) and asking questions on Biological Data Sharing (Mar 14) and Swarm Robotics (Apr 4).
9000 words of pure gold.
May address any aspect of software engineering.
MSc should probably review the literature of their project.
Mobile devices (phones, tablets and other handheld devices) are the fastest growing area of computing. Typical applications involve a wide range of software and system issues. This module will investigate those issues, giving students experience and understanding of the best ways of building different types of mobile applications.
Scope of mobile systems, , mobile web, native apps, design choices, main platforms, overview of implications (2 lectures)
In 2012 the whole mobiles ecosystem’s revenue was $1,551 billion; equivalent to 2.2% of global GDP.
5% CAGR (Compound Annual Growth Rate).
These are only projected figures, the picture is much bigger than this as this only includes money which takes into account the money which stays within the ecosystem (doesn’t include ecommerce, etc.). The affect is much higher.
Constantly changing given that its a new market. The iPhone was the real kickstarter for this in 2007.
Timeline of the history mobile telephones.
Difference between “smart” and “feature” phones.
Mobile subscribers are growing 4 times faster than the global population. Reducing now (especially in the West) due to market saturation.
More scope for growth in developing countries.
Asia Pacific accounts for approximately half of global subscribers.
Total sim-enabled connections (including machine to machine (M2M) connections). Estimated 7.6% CAGR 2012-2017. Percentage of M2M connections is increasing rapidly.
Multiple devices (tables/phones/etc.) allow growth in sim-enabled connections.
Mobile data traffic is increasing rapidly. Video speculated to be one of the most rapidly increasing traffic.
Over time the sales of desktops (PCs & Laptops) is decreasing while the sale of mobile devices is increasing.
Issue of fragmentation and differentiation, related to:
Top 25 ranked apps in 2012 take 15% of all revenue. Only 2% of App Store newcomers get into the top 250 apps.
Free apps are becoming more of a norm, especially with the saturation of android.
Revenues are decreasing due to over-the-top services (i.e. using online services over SMS or phone calls).
Operators will push, via their shop sales-assistant commission, those devices they gain the biggest discounts on.
High profile phones promoted, often Android.
Apple disrupts this model due to a locked mindshare.
Psychological hook of “cheap” contract phones. Free now, pay later ideology.
Billing relationship with the customer. Paying for things using your mobile account rather than via a bank account (directly). Small transaction fee for this service to generate revenue.
Something for the future, not quite so prevalent at the moment.
Threat of over the top (OTT) services such as Skype. Quality of service is currently an issue here, but voice call is growing very slowly (13% in 2008 to 4% 2012).
GSMA responded by developing a Rich Communication Service specification. Only aimed at long term evolution (LTE) networks (4G). Aims to provide this as a universal service. 17 different implementations by MNOs including AT&T, Telefonica, Verizon and Verdafon.
Nokia sales falling. Haven’t kept up with the trends.
Samsung now the major player. Mainly due to their adoption of Android.
Apple improving with mobiles.
RIM/Blackberry are focusing on development.
Building an Android smartphone involves:
Can also go down the stock phone route.
Inception to market is approximately around 6 months when based on an existing OS.
Apple is a little easier to to the locked-in syndrome.
Android leading market player.
iOS popular.
Symbian (dead now) and Blackberry not doing well.
Windows Phone a late starter.
There are 24 listed Android device manufacturers but at least 259 different device models currently on sale.
Lots of different screen sizes, resolutions, hardware capability, etc.
There is only one manufacturer for Apple products.
Version 4.1+ Jelly Bean most prevalent, but versions 4.0, 3.0 and 2.3 are still a large market share.
Security problems, most malware attacks on Android.
iOS is mostly all the same version. However, companies often prescribe a 2-3 purchase cycle. Corporate world might need to take into account older versions.
Google allow fragmentation to make Android more interesting to OEMs, However this can get in the way of updates.
Webkit configuration varies; browser fragmentation.
Tried to stop fragmentation in version 3, but this lead to new forks. So this has been re-introduced in version 4.
Differentiation gives a unique selling point (USP) for vendors. Sticking with the same model.
Fragmentation leads to security issues and makes the OS more complex to handle screen sizes, etc. More testing needed. Therefore increased cost.
Microsoft and Nokia. A closed environment (or “walled garden”).
No fragmentation/differentiation. Very few OEMs interested.
Buy into it.
Web as OS platform. Firefox OS is trying hard with cheap alternatives. HTML5, JS, CSS3 and APIs to access hardware.
Very fragmented, depending on rendering engine.
Can be configured differently. Must be careful with mobile websites.
Depends on location too (UC Browser more popular than Android in Asia).
Opera mini, for example.
Cheaper and faster over a slow network. Very popular in certain developing countries with low bandwidth.
Cheap in terms in processing too.
JS is handled by the proxy server. Not always perfect and may not be enabled.
App Stores
Basic tools, responsive web, matching platform expectations, testing, delivery. (5 lectures plus practicals)
Progressive enhancement and not graceful degradation.
Native wrapper around HTML and CSS and JS.
Graceful degradation (of a website):
Progressive enhancement (of website):
em
over px
.Mobiles have a virtual viewport which bypass the media queries.
Use the viewport meta-tag to ask the viewport width to be the same as the actual width.
Research suggest lines of text should be between 45-75 characters for comfortable reading. Columns help with this.
Design site for the mobile device. If media queries are supported and on a bigger screen, add to the existing CSS for mobile to add for the styling rules and changing font sizes.
Simplification of complex designs is harder than making a simple design more complex.
Client side RWD alone may not be sufficient.
There are some server-side approaches:
Classes of “device”.
RESS (Responsibe Web Design and Server-side Components).
Some services will only run on a mobile device (Phone calls, accelerometer).
Some services are more significant on a mobile device (location awareness, etc.), be careful not to run into the Hide and Cry anti-pattern.
Some classes of device are very different (feature versus smart phone).
What we spit out is so different that RWD alone won’y help. Need variants of the website.
Image sizing issues that RWD won’t solve.
Can use HTTP User-Agent request header with a device description repository to work out:
var width = (winder.innerWidth > 0) ? Window.innderWidth : screen.width;
document.cookie = "sitewidth="width;
document.location.reload();
if(isset($_COOKIE["sitewidth"]) {
$_SESSION["screenwidth"] = $_COOKIE["sitewidth"];
}
// ...
if($_SESSION["screenwidth"] <= "380") {
include("includes/mobile-nav.inc.php");
} else {
include("includes/desktop-nav.inc.php");
}
Device Class: An abstract collection of common characteristics of similar devices and their browsers.
Why? Because we don’t want our server-side code to have to deal with every device (this isn’t scalable).
Does unsupported break progressive enhancement?
Combine client-side RWD with templating in the codebase for different classes of device.
Many HTML5 features aren’t widely supported.
Can’t publish to application stores.
Perhaps go for a hybridge: part native, part HTML.
PhoneGap/Cordova is the best known hybrid technology.
Objective-C, relevant design patterns, platform design considerations, handling data, using sensors and location (5 lectures plus practicals)
Based on C and Object-Orientation (based on Smalltalk), has basic rules of C but most of the syntax is smalltalk-based.
Some repeated information is involved, but this is improving.
Uses late binding (names are looked up at runtime, not compile time).
Good, but unusual, object syntax with some useful extensions and libraries.
Verbose (A: yay boilerplate), designed to be descriptive (A: yay dot-enter programming).
Likely to change with each release (good and bad).
NSLog(@"Message");
NSString *var = @"Some message";
NSLog(@"Message: %@.", var);
BOOL true = YES;
BOOL false = NO;
NSNumber *num = @10;
// Conversion to unsigned int
NSUInteger uInt = [num unsignedIntegerValue];
NSArray *arr = @[...];
// Accessing elements
arr[index];
// Something about NSArrays being immutable objects.
NSDictionary *dict = @{@"key": @"value"};
// Accessing
NSString *val = dict[@"key"];
Messages are Objective-C’s version of methods, where the Java syntax would be:
ObjectName.methodName()
The Objective-C syntax is:
[objectName messageName]
Example:
NSArray *temp = @[@"Test", @"2"];
NSLog(@"%@", [temp description]);
// Logs information about temp
Just need to worry about typing
NSArray *temp = @[@"Test", @"2"];
NSString *result = [temp description]
NSLog(result);
Messages can have parameters, however they may act differently than expected.
[object methodName:parameter namedParameter:value]
[[Class alloc] init];
if(condition) {
// ...
} else if(condition) {
// ...
} else {
// ...
}
switch(thing) {
case value:
// ...
break;
}
for(Class *item in iterable) {
// ...
}
Blocks are like basic methods.
void (^block)(void) = ^{
// ...
};
Inkoving a block:
block();
In a header file (.h):
@interface Name : NSObject
@property Type *name;
@property NSString *myName;
@end
Implementing the interface (.m):
#import Name.h
@implementation Name
@end
Accessing properties
Name *name = [[Name alloc] init];
name.myName = @"Value";
In many languages, you might see:
void m(int x, int y, String text)
A similar method in Obj-C is:
- (void) mAtX: (NSInteger) x
atY: (NSInteger) y
withText: (NSString *) text
Calling this:
[obj mAtX: 2 atY: 32 withText @"a message"]
The method name is: mAtX:atY:withText
Defined with the @property
syntax.
The attributes in the ()
control how the property manages the underlying data value.
The IBOutlet
is only used on properties that are linked to the user interface.
atomic
is used for multi-threading purposes (read locks, etc.).
Under the bonnet, this is an instance variable (ivar).
Good practise is to cover your ears and pretend they’re just properties.
Example *inst;
inst = [[Example alloc] init];
// or
inst = [Example new];
alloc
and init
is the typical Obj-C way as classes often have different initialisation techniques. alloc
is a class method, init
is an instance method.
Similar to Java interfaces. There are some features where you can have required and optional methods.
Optional methods act as empty methods if they are not implemented.
Set in the header file (.h)
Primitive types from C can be used in Obj-C, e.g. int, long, float, double.
Additional types:
BOOL
(YES
or NO
)NSInteger
(typedef to either long or int depending on architecture)CGFloat
(typedef to either float or double depending on architecture)Memory management is achieved by checking the number of references to an object.
In older code, you needed to write lines of code to state when you wanted to keep an object in memory.
Modern applications use ARC (Automatic Reference Counting).
The compile tracks when objects should be kept in memory or removed from the application.
Typical project implementations have a simple structure which corresponds to MVC
AppDelegate
accesses the model (if there is one).
The storyboard is the view, describe many screens in a single file.
ControllerView
is the controller.
ViewController
files and link them to the viewsViewController
and link outlets and actions to interface (associate views to controllers)If your newly created app crashes when trying to load a new screen, then you usually have:
The error message should give you a clue as to what is wrong.
You can put each of the views into the storyboard and define how you segue from screen to screen then write code to pass data between them.
UITableView
is used to display repeating section of data. UITableViewCell
is a cell in the table.
Unlike classic tables on other platforms, there is only a single column.
Tables contains many sections, which in turn contains many rows.
The entire table can have a header and footer. Sections can also have a header and footer.
Each table must have at least one section to be complete.
Table style is set at build time only.
Cells flush to each other
Not in iOS7.
Each row contains a cell.
When a row’s data is to be displayed, the UITableView
will request the cell.
By default, the UITableViewCell
is used to degine a cell.
The cell is initialised in the method: tableView:cellForRowAtndexPath:
A table can contain different types of cells for different rows.
The cell is reused.
UITableViewCellStyleDefault
UITableViewCellStyleSubtitle
UITableViewCellStyle1
(right detail)UITableViewCellStyle2
(left detail)Content of a cell can be defined in code, but is tedious.
It is more likely that IB will be used to define the layout for the cell and use a View Controller to control the behaviour.
UITableViewContoller
is a subclass of UIViewControler
that is designed to manage UITableView
s.
If a new view based controller is created, the XIB will contain the table linked to this controller.
In order to provide custom behaviour, the following methods needed to be implemented:
UITableViewDelegate
UITableViewDataSource
Techniques for storing and retrieving data on the iPhone.
XML and SQLite provide the best portability.
User preferences.
Persistent map.
Mechanism to store a selected number of data types in a property list (plist).
Technique to save more complex objects.
XML data files.
Small memory footprint SQL database.
More complex system, but offering a powerful way to manage data and relationships. Best integration with iCloud.
iOS applications are run in a protected area on the device. Within this area there is a defined structure of Directories that you can access.
The developer is responsible for managing files that are created within the application area.
Database and library available in iOS.
C library, which needs to be included in the project.
Create a SQLite database on the command line and include in the project.
Need to marshal data into and out of the database, with some conversions (e.g. Strings).
By default, your database is in the default bundle, inside your application.
If read-only data is needed, it can remain there.
Write access needs to be moved to the Documents directory.
Relevant design patterns, platform design considerations, handling data, using sensors and location, handling device variation (5 lectures plus practicals)
Android Inc. started developing an open source mobile OS, acquired by Google in 2005.
Uses the Dalvik Virtual Machine:
The Android platform is currently changing very quickly.
Each manufacturer has a coperate theme to it.
There are other apps which are isolated from the application, but are able to communicate with intents.
Owned by the application.
A special Java VM for running on Mobile Devices.
May be moved to the ART (Android Runtime).
No single main method. The way that Android apps are developed, and the model following, doesn’t have a concept of a start point or end point, rather it has the idea of the behaviours.
A single screen layout and the code behind.
A functional piece of UI.
A background process.
Does not have a UI, may interact with them though.
A way of sharing a local database.
Queries and cursors used to interact.
Memory of mobile devices is much more limited than on PCs or servers.
Short battery life is an issue.
Lots of apps vying for limited resources.
Android has the opportunity to remove resorces from memory when screens are switched.
If necessary a whole application and its process will be removed.
Need to be able to reincarnate apps.
Used to display scrollable lists of data.
Extended by classes to handle clicks, etc.
Acts as a controller in an MVC sense.
Used by the List Activity
Acts as a view in an MVC sense. Does the hard work of sizing rows, etc.
Created by the controller and acts as the data source for the list view.
Acts as a model in the MVC pattern.
Simple array of type T.
Generally a hard-coded list of static data.
A list of maps, where the map keys relate to UI element IDs.
A dynamic adapter which is based on a cursor of an iteratable object.
Given a private filestore under: /data/data/package-path
On SD Card
Stored under private filestore.
/data/data/package-path/files
Apps access their private files folder with: Context.getFilesDir();
Context.openFileOutput(String name, int mode);
Context.openFileInput(String name)
On SD Card under the standard location for media files.
/data/data/package-path/cache
/data/data/package-path/shared-prefs
XML only
/data/data/package-path/databases
Legal, social, ethical and professional issues. Selecting platforms and delivery methods, getting the interface right. (3 lectures)
The BCS, under its Royal Chater, is required to establish and maintain standards of competence, conduct and ethical practice for information systems professionals.
The ability to recognised the legal, social, ethical and professional issues involved in the exploitation of computer technology and be guided by the adoption of appropriate professional, ethical and legal practises.
Which laws relate to the use of computers?
What is the impact of computers on society?
How should we act?
What are our professional responsibilities?
This module builds on SE31520 examining further the development processes, techniques and technologies for constructing Internet-based, distributed software systems. Particular emphasis is placed on studying and applying enterprise design patterns. Students, working in small groups, will build, using a modern software engineering methodology, a distributed application that employs the technologies and concepts taught in this and previous modules.
Nigel Hardy awesome tally: 2
Provides a road-map to the module, with a brief overview of main concepts and how they interrelate.
No precise definition.
There are several themes this module will look at:
Applications that involve multiple servers which run different parts of the application so that the business logic is separated from the presentation logic.
Different parts tend to include:
Lots of small applications which build up a larger, more complex, application.
Communication between the servers provides the linkage between the applications.
Supports scalability if done correctly. Also more resistant to failure. Potentially more secure (more important servers firewalled).
Should all be transparent to the user.
Two types of issues:
Using Cloud computing and how relevant it really is.
Examples from eBay of structure and deployment.
Application that provide services which have destination endpoints with operations.
SOAP XML, WSDL and REST are the big players.
Alternatives to REST?
Examination of technical solutions for building enterprise applications as provided by enterprise Java and .NET for building distributed applications.
These include:
Java is a language (JDK) and a platform (JRE).
The platform is the Java Virtual Machine and a set of APIs.
Four platforms:
A superset of Java Set, providing a development model, APIs and runtime environment (the server).
“For developing and running large-scale, multi-tiered, scalable, reliable and secure network systems” - The Oracle BS
Access the web tier with HTTP(S).
Provides static resources and dynamic page generation. Handles client input and navigation flow with maintance of state.
JavaBeans provide temporary storage on the Web Tier.
Can do simple logic, could build the whole application in this tier, but would be a bad design.
Provides Enterprise Java Beans (EJB) components.
Access to Java Logic, etc. which provide the business logic.
Logically and typically on other machines, typically “foreign” technology.
Can build Java Applications or Web Applications which access the EJB components directly. Uses remote procedure calls.
Provides:
A set of Java classes packaged into a jar file.
Can be deployed using either:
Annotations are widely used, but can be overriden by descriptors.
Components plus deployment descriptors make up a module, each for one container type.
Four types of EE modules:
.war
.jar
.jar
.rar
A set of modules makes up a Enterprise archive: .ear
Servlets are the olderest EE web component.
Servlets act as a dynamic page, which have a URL, and must receive a HTTP request, from this it generates a HTTP response.
Is a Java object with a managed lifecycle.
Extension of javax.servlet.http.HttpServlet
class.
Managed by the web container.
The first call is:
init()
Any call:
Perhaps calls destroy()
Only one instance is needed by a running server.
The HttpServlet
class has stubs for all the HTTP methods doGet(...)
etc.
service()
method used to handle the finding of appropriate methods for the HTTP headers.
Common to use the same method to handle GET
and POST
(for some strange reason, this sounds bad).
Provides access to:
Provides access to:
Can forward requests to other servlets, passing on the request and response objects. Used for branching on context, for example.
RequestDispatcher d = req.getRequestDispatcher('name');
d.forward(req, resp)
A servlet could have instance variables, however these variables are not specific to a connection, rather the whole servlet.
There may also be multiple servlets so there is no guarantee that these will be the same.
Need features for session data.
Reminder that HTTP is stateless.
The request must carry some identification of the session, server usually generates this information.
Remember EU Cookie Law
Specified in the web.xml
descriptor.
Generally: higher level libraries and code reuse.
Internationalisation and localisation are both important. Don’t hardcode for the locale, instead store translations in DB or files and load them dynamically.
Use JUnit where possible (with mocks, stubs, etc.)
Can test Java Beans using an embedded EJB Container.
Service testing is slightly more difficult
Testing the resources in an automated way to avoid manual input.
Provides rapid and reliable regression testing.
Handles:
Specify calls, with parameters or sequences.
Selenium or JMeter are options (JMeter also does load testing).
JMeter also has a NetBeans plugin.
Microsoft framework for developing and deploying applications with:
Similarities with Java (JRE).
Focused on Windows, but can run on other platforms through mono.
The .NET JRE effectively.
Loads and executes .NET “managed code”.
CLR loads code into application domains to provide a level of isolation and the ability to stop and remove a domain.
Provides Just In Time (JIT) compilation, memory management and GC.
Security based one roles and defined by policies.
The application server which runs ASP.NET applications.
Methods for implementing the MVC pattern in an Web Application.
Provides:
Basic Api for represention components and managing their state; handling events, server-side validation and data conversion; defining page navigation; supporting i18n and accessibility; and providing extensibility.
A tag library for adding components to web pages and connecting components to server-side objects.
Like most scripting languages $
and #
are used.
${expr}
are rvalues (read-only) and are processed immediately.
#{expr}
are *lvalues (read-write) and have deferred evaluation. Also used for method expressions.
${object.variable}
calls object.getVariable()
under the covers.
#{object.method}
calls object.method()
.
A replacement for JavaServer Pages (which are now deprecated).
Write (X)HTML with special tags which have an XML appearance which will be rendered as HTML with functionality behind it.
<h:body>
is the top level tag.
All pages will tend to act like a form, so <h:form>
is very commonly used, even in cases where there would not normally be a form.
id
style
to associate CSSrendered
is a condition to check is the element should be rendered on the page. The condition can be an expression from the Expression Language.value
the value of the component, again this can be from the Expression Language, which links the view to the model.Built from standard components.
These can be parametrised and stored in resources
; they have a namespace (default: http://xmlns.jcp.org/jsf/composite/ezcomp
)
<!DOCTYUPE html PUBLIC ...>
<html xmlns="..."
xmlns:cc = "..."
xmlns:h = "...">
<!-- INTERFACE -->
<cc:interface>
<cc:attribute name="x" required="true" />
</cc:interface>
<!-- IMPLEMENTATION -->
<cc:implementation>
<p>
<h:outputLabel for="i" value="Repeats" />
<h:inputText id="i" value="{cc.attrs.x}" />
<h:message for="repeat" />
</p>
</cc:implementation>
Can create template pages which has values to fill in. The inheriting page has to fill in these values.
Write navigation rules in XML, each rule is a page (view). For each page there is a set of outcomes (in string form). For each outcome a next page is specified.
Typically stored in faces-config.xml
Outcomes are generated from the action
attributes from components.
There are implicit navigation rules which falls back to a matching page if it can be found.
Implement javax.faces.convert.Converter
Have associated error string displayed if conversion fails (<h:messages>
).
Number of default converters for most Java Objects and primitives.
Converters are used in four ways:
converter
attribute.f:convertDateTime
or f:convertNumber
tag inside the component.f:converter
tag inside the component and refer to the converter (general purpose tag).Part of the component model, application events are generated by components (ultimately from the rendered page).
The JSF application can map HTTP requests to the specific handling code.
Two types of event:
ActionSource
(buttons, links, etc.).UIInput
.Listeners cause the application to respond to events:
Similar to converters, but only provide validation.
Implement javax.faces.validator.Validator
Again, default validators are provided.
Registered by:
f:validator
tag inside the component.BeanValidator
uses validation methods in the bean instead of having to write a separate class. Also allows the validation to be model-state specific.
public void validateVar(FacesContext context, UIComponent toValidate, Object value) {
int input = (Integer) value; // cast the given value
if(!condition) {
((UIInput) toValidate).setValid(false);
FacesMessage message = newFacesMessage(
"error message");
context.addMessage(
toValidate.getClientId(context),
message);
}
}
Starts when the client request a URL and ends with the server response.
.NET Applications for the web, which uses:
Stores the entire state in a hidden item in the form. Simplistic, but does affect performance and security.
Sent in base64.
Can disable for the entire page, or for specific elements.
Also have the ability to encrypt, EnableViewStateMac
or store server-side.
There are different techniques which can be used to store session state:
ASP.NET uses server side session tokens:
String myVallue = "A. Name";
Session["name"] = myValue;
myVale = (String) Session["name"];
Can configure where the data is held. Can also turn off the requirement for cookies.
Same as ASP.
Store state in Microsoft’s SQL Server, which can be on another machine.
Store in a separate process, which can be on another machine.
Write your own mechanism.
Default base class for code-bheind filels.
Implements System.Web.IHttpHandler
:
IsReusable()
- true if the handler can be pooledProcessRequest(
- Processes the actual HTTP request.Ways of display DB data.
Objects which relate to the business layer of a multi-tier application.
EJBs act as an ORM framework, where the Java Objects are annotated with the DB schema details.
Three types of EJB Session:
Can also have message-driven beans.
Access can be done through:
A transaction is a unit of work which supports “ACID”.
Every task within a unit of work must complete successfully otherwise the transaction is aborted
Atomicity, isolation, durability lead to consistent data. Developer must also define database consistency checks.
Preventing interference from other transactions.
Data written to disc before a transaction can fully complete.
In EJB Transactions have a scope. A transaction manager manages this process.
EJBs support two types of transactions:
Transactional attributed specified using annotations or in the deployment descriptor.
Outside of the EJB code.
Reccommended
Annotate EJB class with:
@TransactionManagement(TransactionManagerType.BEAN)
Offers fine grained control.
Six types of transaction attributes, specified by annotations or descriptor.
The default is Required
.
Can annotate the class to set the default for the class.
Transactions are not supported by the method, any current transaction will be suspended during this method call and resumed once it terminates.
Follows the caller; if it was in a transaction, it will remain so, if it was not, this will not create a new one.
The method must be part of a transaction.
If the caller was not in a transaction, this will start one.
This method will always crate a new transaction. An existing one will be suspended and will resume after this is complete.
The outer transaction will not roll-back if this one does.
Useful to nest transactions
Must already be in a transaction, otherwise a TransactionRequiredException
will be thrown.
Must never be in a transaction, otherwise a RemoteException
will be thrown.
Problems:
Provide four (fairly standard) levels of transactions:
TRANSACTION_READ_UNCOMMITTED
TRANSACTION_READ_COMMITTED
TRANSACTION_REPEATABLE_READ
TRANSACTION_SERIALIZABLE
In bean-managed transaction, can specify isolation using the JDBC API:
connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE) ;
Using serializable guarantees data consistency, but may affect performance.
Uncommitted changes are visible to other transactions, other transactions can change rows that another transaction has read.
May result in inconsistencies as a rollback by T1 may mean that T2s data is out of date.
Rows updated in a transaction cannot be read by another transaction, but rows read by one transaction can be changed by another.
Rows updated by T1 cannot be read by T2.
Rows read by T1 can be updated by T1 or T2.
Rows read by one transaction cannot be updated by any transactions.
Rows read by T1 cannot be updated by T1 or T2.
Transaction (appears) to have a lock on the data.
Checked exceptions (including application exceptions) do not cause an automatic rollback, but can be set using @ApplicationException(rollback=true)
Unchecked exceptions and RemoteException
s automatically cause a rollback and the EJB instance is discarded.
Getting a reference to object instances without instantiation or passing in as parameters, providing good decoupling.
There are two mechanisms for this:
Annotations to fields or methods specify the injection points.
Provides a runtime binding which provides inversion of control.
Compatible abstractions are required to avoid typing errors; the code using an object is not responsible for choosing an implementation or creating the instance.
Injection is often the implementation for this, but can use:
Inject references to objects in the JNDI namespace into any container-managed object.
Gets a reference to a resource using:
InitialContext
lookupInject as an interface type, code independent of specific implementation.
Not type safe!
public class MyServlet extends HttpServlet
@Resource(name="java:comp/DfeaultDataSource")
private javax.sql.DataSource dsc;
}
public class MyServlet extends HttpServlet {
private javax.sql.DataSource dsc;
@Resource(name="java:comp/DefaultDataSource")
public void setDsc(javax.sql.DataSource dsc) {
this.dsc = dsc
}
}
Method name must being with set
, with a void
return type and only one parameter.
Libraries which provide a mapping from a relational database to objects in an object-orientated paradigm.
A query language for .NET languages which is akin to SQL.
Access to database at SQL level or through an ORM Framework
Provides direct access to a database.
Only provides relational records, not Objects.
Access to naming and directory services, part of the Java SE API. Looks up objects by name, returning values or references.
API for the clients.
Service provider interface (SPI) wadds new directory and naming services.
Provides an ORM mapping of Java Object using Java annotations. Provides:
(So CURD then? -A)
Entities are JavaBeans, annotated with @Entity
. Automatically mapped, but can label instance variables with extra details; @Column
can map Java types to SQL types.
@Id
is required.
(JPA can be a real pain though, he glossed over this just a little bit -A).
Any Java object can be made managed by its container. The CDI defines scpoes and a managed object has a declared (or default) scope.
This can then be injected into any other managed object.
@ApplicationScoped
@SessionScoped
@RequestScoped
@ConversationScoped
Covers the AJAX exchanged (can be extended using an ID)
@Dependent
Inherits the scope from the object it is inserted into.
This is the default scope.
import javax.inject.Inject
public class Printer {
@Inject
Greeting greeting;
}
public class Printer {
@Inject
@Informal
Greeting greeting;
}
Can create a subtype of Greeting which is annotated using @Informal
to allow variants of a type.
Acts as a middleware which decouples:
Which is usually machine-to-machine.
There are a lot of standards.
A Java API which “allows applications to create, send, receive and read messages using reliable, asynchronous, loosely coupled communication.”
Requires an implementation.
Access Connection Factories or Queues using injection.
Most patterns require middleware (all can use it).
Possible roles include:
JavaEE applcation servers must have one. Glassfish has Open MQ.
.NET has one part of the service bus.
Apache Active MQ is a good offering.
Authentication & Authorisation, encryption of messages, integrity of messages, transactional, etc.
Some protocols and some implementation already on offer.
Methods of performing B2B operation.
Powerful, but not generally available as an API. Need a Domain Specific Language.
Marshalling and unmarshalling are handled by generated code.
Some distribution services are provided.
Access to CORBA IIOP.
Builds on services provided by sockets.
Has the problem of specificity to Java. Even with IIOP it is still better in an intranet.
Focuses on synchronous transmission.
Use HTTP methods (GET
, POST
, PUT
, DELETE
, PATCH
and HEAD
) with a URI for the API. Return any form of data; usually XML, JSON or YAML.
Very constrained on header information.
Expressed as XML supporting interoperability (both Business to Business (B2B) and EAI).
Typically sent over HTTP but can use other network protocols.
Usually generated automatically using facilities in a program, but it is possible to create messages manually.
More popular than CORBA, extensible through the use of XML.
HTTP avoids firewalls.
Industry backing:
POST /webservice/OrderProcessing HTTP/1.1
...
<?xml version="1.0" encoding="UTF8" ?>
<soapenv:Envelope xmlns:soapenv=“http://schemas.xmlsoap.org/soap/envelope/”
xmlns:xsd=“http://www.w3.org/2001/XMLSchema”
xmlns:ns1=“http://orders.borthdistributors.co.uk”>
<soapenv:Header>
<!-- Optional. Meta-data in form of header blocks. e.g.: security credentials, id, etc. -->
</soapenv:Header>
<soapenv:Body>
<!-- Mandatory. Arbitrary application-specific XML data or elements that map to method/procedure calls or code that processes the XML document. Could be fault data -->
</soapenv:Body>
</soapenv:Enveloper>
SOAP manages the process of communicating information but doesn’t define operations, data types or faults of the service.
WSDL is an XML description of a service, generally autogenerated by tools.
In the specification, there are the following eleents that define a service.
Generates proxy code for you. A set of asynchronous methods are provided:
BeginMethodName(...)
EndMethodName(...)
How do you expose types as parameters and return values in a way which in interoperable?
Value types map onto XML Schema types, classes or structs are converted into complex types in the schema.
Arrays and collections are supported by this, but not all types can be mapped easily (e.g. custome types where methods are used to provide access to calculated data). (A: as you’d expect really)
Splits Application domain from communication domain.
Brings together lots of different interoperability technologies.
There is no such thing as a silver bullet for software development processes.
Methods for testing a system.
A basis for which testing is built on
Testing the smallest possible unit.
Test that a part provides an expected function.
Tests that a part will not fail to function under certain circumstances.
Testing larger units of work combine together.
Have we built the right thing?
Does it conform to standards.
Our testing of the entire system.
Testing the system is acceptable for the client.
Write a formal specification and then program to that specification so that something can be proved to have worked.
Focus on up front design before development
Focus on providing features without large amounts of up-front development and client involvement.
Promotes no-one authorship of code with refractoring.
Planning game to put features.
Can use a metaphor to describe the system.
Write tests first then produce code to pass those tests (but no more).
Break the system down into user stories, make these stories into features which can then be developed independently.
Need some overall architecture.
Hack it until it works.
Self organising teams with a focus on co-location of all team members and communication.
Platform Independent Model -> Platform Dependent Model.
Can use tools like executable UML.
Methods for developing software in a more unmanaged and dynamic way.
A big in-house style of development where every developer understands the big picture.
A very dynamic way of developing, which not every developer has the big picture, but understands their small parts, usually needs some figureheads.
Getting a large amount of people to come up with potential solutions and selecting the best ones.
Playing with a system and getting it to do tasks it was not necessarily designed to do.
A number of enterprise application design patterns will be reviewed within the context of technologies discussed in 2. Building multi-tier applications requires developers to know more than just technologies, they must also know and use design techniques suited to their development. Students will learn about reusable enterprise design patterns, and in particular those used for the development of distributed, multi-tier applications.
Organizes business logic by procedures where each procedure handles a single request from the presentation.
An object model of the domain that incorporates both behavior and data.
A single instance that handles the business logic for all rows in a database table or view.
Defines an application’s boundary with a layer of services that establishes a set of available operations and coordinates the application’s response in each operation.
An object that acts as a Gateway to a database table. One instance handles all the rows in the table.
An object that acts as a Gateway to a single record in a data source. There is one instance per row.
An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.
A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself.
Saves a database ID field in an object to maintain identity between an in-memory object and a database row.
Maps an association between objects to a foreign key reference between tables.
Saves an association as a table with foreign keys to the tables that are linked by the association.
Has one class perform the database mapping for a child class.
Maps an object into several fields of another object’s table.
Saves a graph of objects by serializing them into a single large object (LOB), which it stores in a database field.
Represents an inheritance hierarchy of classes as a single table that has columns for all the fields of the various classes.
Represents an inheritance hierarchy of classes with one table for each class.
Represents an inheritance hierarchy of classes with one table per concrete class in the hierarchy.
A structure to organize database mappers that handle inheritance hierarchies.
Holds details of object-relational mapping in metadata.
An object that represents a database query.
Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.
Splits user interface interaction into three distinct roles.
An object that handles a request for a specific page or action on a Web site.
A controller that handles all requests for a Web site.
Renders information into HTML by embedding markers in an HTML page.
A view that processes domain data element by element and transforms it into HTML.
Turns domain data into HTML in two steps: first by forming some kind of logical page, then rendering the logical page into HTML.
A centralized point for handling screen navigation and the flow of an application.
Provides a coarse-grained facade on fine-grained objects to improve efficiency over a network.
An object that carries data between processes in order to reduce the number of method calls.
Prevents conflicts between concurrent business transactions by detecting a conflict and rolling back the transaction.
Prevents conflicts between concurrent business transactions by allowing only one business transaction at a time to access data.
Locks a set of related objects with a single lock.
Allows framework or layer supertype code to acquire offline locks.
Stores session state on the client.
Keeps the session state on a server system in a serialized form.
Stores session data as committed data in the database.
An object that encapsulates access to an external system or resource.
An object that sets up a communication between two independent objects.
A type that acts as the supertype for all types in its layer.
Defines an interface in a separate package from its implementation.
A well-known object that other objects can use to find common objects and services.
A small simple object, like money or a date range, whose equality isn’t based on identity.
Represents a monetary value.
A subclass that provides special behavior for particular cases.
Links classes during configuration rather than compilation.
Removes dependence upon problematic services during testing.
An in-memory representation of tabular data.
Split everything into manageable chunks: data, load and usage.
If you can’t split it, you can’t scale it
Motivations:
Group data using standard data modelling techniques.
Logical hosting: abstract the logical representation from hosts physical location.
Support combining and splitting without code change.
Split databases horizontally along primary access path.
Multiple split patterns:
Aggregation/routing in Data Access Layer:
Absolutely no client-side transactions, two-phase commits.
Auto-commit for vast majority of DB writes.
Anonymous PL/SQL blocks for transactions within single database.
Consistency without transactions using:
Has the added benefits:
User session flow moves through multiple application pools
Absolutely no session state in application tier.
Transient state maintained/references by:
Consumers subscribe to event.
Managing timing conditions
Read and transform item updates from primary database.
Reliable multicast
Search nodes listen to updates
Schedule offline batch processes
Most appropriate for:
Often drives further downstream processing through message dispatching.
Prefer adaptive/automated systems to manual systems.
Define service-level agreement (SLA) for a given logical event.
Consumer dynamically adjusts to meet defined SLA with minimal resources.
Automatically adapt to changes in:
Dynamically adapt experience
Feedback loop enables system to learn and improve over time:
Best practises:
Build all systems to be tolerant of failure.
Motivation:
Application servers log all requests
Messages broadcast on multicast message bus
Listeners automate failure detection and notification.
Absolutely no changes to the site cannot be done.
Code deployment: Rollout/Rollback
Feature deployment: Wire on/Wire off
Detecting that a resource is down or slow.
Application “marks down” the resource.
Non-critical functionality is removed or ignored
Critical functionality is retired or deferred
Explicit “markup”
Examination of current issues that are related to 2 and 3, e.g. scalability; approaches to testing and deployment; the use of cloud computing.
Balance the load of the request across a number of different machines.
There are hardware and software solutions for performing this.
Benefits:
Has a single point of failure.
Allocate in a circular pattern
Allocate to the machine with the least connections
Allocate to the machine with the quickest response time.
Gain even more information about the servers and allocate based on this.
Apply a weighting to each machine and allocate accordingly.
Cost basis - try to keep on commodity hardware.
Functional segmentation into logical function (example is the database; split the database into the segments, continue splitting until no more splits can be made).
Horizontal split using an internal ORM implementation, requires additional routing.
Integrating separate applications so that they work together.
Enterprises tend to have numerous systems which weren’t implicitly designed, for example:
These may
Thins “fused” (read dependent on another) together.
Multi-tiered applications tend to be tightly coupled.
Messaging systems offer the chance to be loosely coupled by decoupling the ends of the communication, even offering asynchronous communication.
Export data from one system and import into another using a file system.
Works with different technologies.
Issues of file format, shared file systems, timing.
Question of how data is exported and imported?
Automation problems. File locking might be required.
Collection of applications that need to share common data.
Quick access to information for each application.
DBMS removes a lot of issues from file integration.
Tightly coupling the applications to a common area.
Invoking actions in addition to sharing data, using some form of RPC methodology:
Again needs a shared understanding of the data. No shared file format.
Timing question of when do we start the transfer.
Communication might fail, does that affect the coupling of the two applications?
Share data by sending Messages over a Message Channel. Application(s) pick up messages and process the information.
Fire and forget for the sender.
Message channel stores and forward to provide reliable message delivery which mimics the real world.
Example system using message systems:
Logical pipe between senders and receivers, the implementation of this vary:
Each channel has an identity, typically a name.
Different channels exist for different methods of solving problems.
A unit of information containing a header and a body.
Different types of message:
There are ways to specify how long messages are relevant for
Sending messages might involve more than just forwarding.
Appling filters to messages
Routing messages to other channels and/or filters based on the content of a message
The interfaces to the senders and receivers
Isolate different stages and then translate later.
The challenge for these systems is how do we know what’s going on, given that there may be many different queues in a system.
Object to object communication is easy(ish) to use a debugger or log.
Web systems are a little harder, but possible to debug and monitor.
But how do you monitor message based systems?
WGRUS has four different channels to interact with customers:
Implementing this we gets something like this:
Looking in at the inventory request:
Because an order may have many items, split the order up:
To know which item is for which type, we need to enrich the data at the order stage:
The high level view of the order process now looks like this:
Need to know the status of the order as it can take some time to process.
Using a message store we can log the system:
In situations where point-to-point MQs are used a wire tap is needed:
A process manage can be added to manage the flow of messages in the queues, providing two main bits of functionality:
What happens if an address changes?
What happen if the address change is not part of an order?
How can we use the cloud for advanced internet applications?
What is the cloud?
Principles of cloud computing
X as a service
Largely a new business model.
Just computers, networking and applications, but arranged different; pay for usage rather than buying the infrastructure, etc.
What problems is the cloud trying to solve?
How can the cloud be used for:
The Cloud at Your Service, Resenberg and Mateos, 2011, Manning Publications
Cloud providers have large data centres at locations around the world.
The web browser is your computer.
(Pay?) for resources used.
Is this different from managed servers?
Is this different from web applications?
A lower lever set of infrastructure to choose from:
Examples:
Preparing for the examination in June
This question is about development technologies and enterprise application patterns applied to the HoBo Room Brokering application, developed during the module.
.mdf
)Session["myKey"] = ...
@WebMethod
on methods@WebService
on class[WebMethod...]
on methods[WebService...]
on class.asmx
still exists, but older.WCF
Windows Communication FoundationProblems of mapping data types between languages.
Public properties/fields exposed for sending over a web service (attributes in .NET)
Raw data only.
Underlying standards and tool support.
The Internet is a complex, multi-organisation network reaching nearly all parts of the world. The functioning of this network and the applications running upon it depend on a complex set of protocols. This module addresses the fundamental aspects of the most important issues that permit the network and its applications to operate successfully. The module also addresses the current threats to the Internet and topics still emerging from R&D studies around the world.
Postal service analogies: 4
This module discusses the detailed underlying operation of the Internet and its constituent components and is an essential topic in its own right as well as providing a solid foundation for much of the other material covered in the MEng.
20 credits = 200 hours of work.
44 hours of lectures, around 20 hours of practicals.
This leaves about 140 hours of personal study, including extra practical work.
There is an assignment worth 40% of the marks (2000 word report).
Text book study and revision.
2 hour exam.
Practical work sessions focusing mainly on the electronics and hardware of network issues.
Using the computer connected digital oscilloscopes or picoscopes.
nslookup minted.dcs.aber.ac.uk
whilst capturing packets on port 53 captured 8 packets.
The type of the message was of type A and of class IN and was a recursive lookup. There was no response because the message was a query.
The server this was sent to was 193.60.11.253
. In /etc/resolv.conf/
the nameserver: 127.0.1.1 c57net.aber.ac.uk
.
1 question and 3 answers were held in the first answering packet. 4 authoritative records were received specifying the 2 authoritative nameservers from dcs.aber.ac.uk
and 2 from yale.ac.uk
.
The additional records provided all possible IP addresses of all nameservers contacted during the lookup to cache the information.
nslookup set type=MX aber.ac.uk
was then captured. The answer record contained: Mail exchange: aber-ac-uk.mail.protection.outlook.com
.
The authorative record contained nameservers for: aber.ac.uk
(3) and yale.ac.uk
(1).
nslookup set type=SOA dcs.aber.ac.uk
was then captured.
The email of the person responsible for this domain is: cs-root.aber.ac.uk
.
There are 4 authoritative nameservers for this domain: dcs.aber.ac.uk
(2), aber.ac.uk
(1) and yale.ac.uk
(1).
nslookup aber-ac-uk.mail.protection.outlook.com.
The response from this was from microsoft. 2 IP addresses which can switch (load balanced?).
nslookup 193.60.11.36
had a type of PTR
for the name 36.11.60.193.in-addr.arpa
. Still used the standard nameservers.
nslookup sillyname
gave an authoritative and non-authoritative response (one authoritative for sillyname.c57net.dcs.aber.ac.uk
and non-authoritative for sillyname
). Uses two different sets of nameservers, the Aberystwyth local ones and the root nameserver.
dig @dns0.aber.ac.uk 98.34.124.144.in-addr.arpa PTR
shows lots of packets, initially gathering DNS information, then on 144.124.34.98
.
dig www.yahoo.com A
shows that yahoo uses CNAME
instead, with an IP address of 87.248.122.122
.
A revision of the basic issues in data communication.
In the 1970s there was no master plan, overall structure nor agreements on application protocols.
Proprietary protocols and architectures lead to a large amount of anarchy.
In 1977 ISO establishes committees and subcommittees and so on and so forth.
Not just ISO doing this, telecommunications (CCITT) also got involved.
Provides a basic framework using a “divide and conquer” principle.
Uses layering to reduce complexity, where each layer handles one (group of) problem(s).
Discuss the statement: The existence of a communications framework like the OSI model promotes competition between companies.
A detailed study of variants of the technologies collectively known as Ethernet.
A passive, contention-based broadcast technology that uses baseband signalling.
No one device controlling the network.
Each device must compete with every other device for access to the network.
Ever device on a shared network fears broadcast transmissions.
Entire bandwidth of a cable for a single transmission.
Derived from work by:
Publised in Ethernet (DIS Blue Book) V1.0 September 30th 1980.
802.3 Revision D dated December 1982 publish by IEEE.
Revision F published July 1984 and also issued as draft proposal 8802/3 by ISO.
Distributed Control
Simple Algorithm:
1.
if(network active)
goto 1
transmit(data)
check for collision
if(collision)
transmit(JAM)
delay(a random time)
goto 1
transmission complete
Checking for collision is the difficult part. Can check:
1
, is there a 1
on the wire.Delay must be random to avoid future collisions and not have priority.
Very rigorous specifications. Maximum length of 500m and limits on bend.
These are to stop the electric current being interfered with.
Hard limit on propagation: c
(speed of light).
Transceiver (MAU) to link devices to co-ax conveniently.
Terminators absorb the signal at the ends of the cable.
Repeaters repeat the signal onto other cables, repeat a perfect version of the signal (if possible).
If data is inserted into the network, it should reach every member on the network.
Need to know how long it will take for data to get across the network.
Rules tend to be the distance between two extremes in the network is 5 of the co-ax cables and 4 repeaters. 2 of the co-ax cables must be point to point without any devices attached.
Bridges, switches, hubs and routers help build larger networks.
Worst case:
B
stars transmitting when A
s packet has nearly reached B
.
B
must corrupt, at least, the last bit of A
s packet.
Therefore, packet length must be such that time to transmit is greater than twice the transmission delay for longest route in the network.
For a 2.5km network, 10MBit/sec, the slot time is 51.2 micro seconds.
Smallest allowed packet needs to be the number of bits it takes to reach from one end of the network to the other, times 2, plus a little extra for luck.
512 bit packets capture the network for 10BASE5.
“Your bits are now all over the network” - Dave Price, 10/10/2013
Slot time s
.
Path time p
.
s > 2p
Total about 499 bits worst case. Specified as 512 bits to give a safety margin.
Number of Octets | Field Usage |
---|---|
7 | Preamble |
1 | Start of frame delimiter |
6 | Destination Address |
6 | Source Address |
2 | Length in 802.3 |
1500 | LLC data and padding |
4 | Frame Check Sequence |
48 bits long. Designed to be globally unique.
Globally unique should be completely unique, not always though.
Not expected to be able to be changed, can now be programmed (simply, apparently; nmap --spoof-mac 0
)
Minimum frame size is 512 bits, not including the preamble.
Maximum frame size is 1518 octets (1500 octets of data).
Above assumes 48 bit addresses which IEEE 302.3 says “shall be used” for 10 Mbps networks.
9.6µs inter-frame gap to allow desynchronisation.
Time for frame is 51.2µs plus 6.4µs for preamble.
Transmitter tries to send for a maximum of 16 attempts to send a frame.
The transmitter waits for an integer multiple slot times determined by the following algorithm prior to each retry.
Rescheduling of each transmission uses “truncated binary exponential backoff”
This ensures a random but increasing delay if many collisions occur:
DelaySlots = rand[0 <= r < 2^k]
Where k = min(n, 10)
for the n
th retansmission
All access to the 802.3 LAN only completes with some probability. It is thus impossible to guarantee transfer rates. When the net is very busy collisions might go on forever.
Debates often take place on the usability of CSMA/CD LANs for real time use. The answer depends on the true use intended rather than the LAN.
Partition LAN to segregate load, add reliability, add security.
Combine remote LAN segments into a single logical network.
Combine separately developed and controlled LANs.
IEEE 802 LANs often include bridges.
Repeaters clean and forward all data.
Bridges selectively forward data. It will store and forward complete packets.
Forwarding based on header information,
Sometimes known as MAC level relay.
Hosts discover the route to each other host and are very aware of the presence of multiple LANs couple by bridges.
Needs identifiers for bridges and LANs.
Data is transmitted with routing attached.
Bridges obey the routing.
Bridge learns (or is told) the LAN on which each address exists. Hosts need not to know anything about the location of other hosts and indeed are not even aware of the presence of the bridges.
Frames which arrive are handled in one of 3 different ways:
If destination address on same LAN as source address then discard the packet
If destination address on different LAN, forward packet.
If location of destination address is not known then ‘flood’.
‘flood’.
Forward the packet to every other network attached to the bridge.
Bridges listen to the traffic and can learn where machines where by looking at the source addresses in the packets.
Need to be quite large depending on the network.
Generally there is a time-out associated with the addresses known.
Address Learning works well if there are no alternative routes in the internetwork connections (i.e. a tree structure).
However there often are alternative routes, so bridges might cause loops.
Need a protocol to avoid cycles.
Result for graph theory states:
For any connected graph, there is a spanning tree of edges which maintains the connectivity but contains no closed loop.
Each LAN represents a graph node and each bridge corresponds to an edge.
Initially every bridge thinks it is the root and it broadcasts a BPDU to assert this fact.
If a bridge gets a BPDU indicated a ‘superior’ bridge exists it assigns its root port and the path cost to root.
If a bridge gets a BPDU from a bridge with a shorter root path it releases any claim to be the designated bridge for the segment.
The lowest numbered bridge becomes the root.
Bridge ports which are not root or designated ports are blocks.
Connect two (or more) adjacent LANs. Throughput is likely to be high. Hosts not likely to notice much performance degradation unless waiting for each packet to be acknowledged.
Connect two (or more) LANs which are widely separated. Bridge consists of two ‘half bridges’ connected by a WAN type link. Link typically 64Kbps to 2Mbps.
Bridges often available in a managed form. These are managed from a station and can load the bridges with lots of things.
Lots of monitoring, etc.
Another protocol needed and this needs to be standardised.
A brief look at fast and wireless network technologies.
Standards differ at the physical layer
Differ at MAC sublayer of Data Link Layer.
Compatible at the LLC sublayer of the Data Link Layer.
Uses a bus physical topology and ring logical topology. The physical order of nodes does not matter.
Coaxial cable served as a common communication bus.
Token was created bus the token bus protocol to manage access to the bus
Any station that holds the token packet has permission to transmit data.
The station releases the token when it is done communication or when a higher priority device needs to transmit.
Two major problems:
Similar to token bus, but in a ring physical ring topology.
Messages are transferred in one direction along the ring at all times.
Token Ring networks sequentially pass a token to each connected device.
When the token arrives at a particular node, the recipient is allowed to transmit data onto the network.
Since only one device may be transmitting at any given time, no data collisions occur.
Access to the network is guaranteed
Time-sensitive applications can be supported.
Still used for some real-time applications.
Based initially for the mainframe infrastructure.
Problems:
A quick summary of 802.3
Contains four pairs of copper wire.
Cable runs are limited to a maximum run length of 100m.
Operates at 100MHz.
CAT5 comes in two main varieties, solid and stranded.
Fast Ethernet communications only utilise 2 out of 4 pairs.
CAT5 enhanced (CAT5e) uses all 4 pairs to support GigE over short distances.
CAT5e is backwards compatible with CAT5.
CAT6 (250MHz) and CAT7 (600MHz).
Baseband is a single transmission involved (digital signal)
Broadband can handle multiple transmissions with a modulated analogue signal.
Network uses all available signal frequencies or the entire bandwidth.
One signal can be transmitted at a time.
Light or electrical pulses based transmission.
Related to digital technologies, but can be used for analogue technologies.
Bi-directional communication (Half duplex) is possible, but difficult.
Multiplexing (using Time Division Multiplexing) allows transfer of multiple signals on a single medium.
Network uses only one frequency or a part of the entire bandwidth.
Multiple signals can be transmitted at a time.
Electromagnetic or optical waves transmission.
Related to analogue technologies but can be used for digital technologies.
Bi-directional communication is possible and fairly easy.
Multiplexing (using Frequency Division Multiplexing) allows transfers of multiple signals on a single medium.
Putting more than one signal on a wire at once and getting it back again at the other end.
Gives the illusion of multiplexing.
Each signal is allowed a slot of time to transmit for.
Actual multiplexing.
Each signal uses a single frequency channel to transmit.
Supports all Ethernet versions above 10BASE5.
Carries high-bandwidth data through beams of light carrying electromagnetic signals. Not so affected by noise. Can carry signals up to approximately 70 miles without amplification.
Light can be pulsed in a single wavelength to represent 45 Gigabits of information.
Transmits one signal per fibre.
Diameter: 8.3 to 10 microns that has one mode of transmission.
Speed and distance: up to 40Gbps over up to 100km
Used in telephones and cable TV applications.
Transmits many signals per fibre.
Multi-modes result from the fact that light will only propergate in the fibre core at discrete angles within the cone of acceptance.
Diameter: 50 to 100 microns.
Speed and distance: 10Mbps to 1 Gbps over 275m to 2km.
Used for Computer Networks and LAN Applications.
Adapters for connecting to ethernets.
Ethernet cards, etc.
Connect computer using a PCI, PCIe or ISA moethboard interface slot, CAT5/6
PCIe slots, CAT5/5e/6/7 and Fibre Optic cables.
For tier 1 and 2 internet backbones, External device, Fibre Optic cables.
Slightly different, can use not only for network but for the backbone (a sophisticated network).
Internal or external device
External device
Note that most devices are completely tied to a specific function, some repeaters might have some Data Link Layer functions depending on the manufacturer.
Repeat an electrical signal including noise.
Takes an incoming signal and then generates a new, clean copy of that exact signal.
Overcomes signal attenuation problem.
Allows LANs to extend beyond normal distance limitations.
Physical layer device.
Doesn’t read data frames, just repeat the signal unintelligently.
Multiport repeater.
Broadcasts frame to all ports and devices.
Allows users to share Ethernet for transmission of data onto a single network (collision domain).
Physical Layer device.
Doesn’t read data frames.
Repeater is usually used for the extension of the length of a network, while the hub is a simple connectivity gadget that is used to broaden a network.
Connect two LAN segments of similar or dissimilar types such as Ethernet and Token Ring.
Split a networking into separate collision domains.
Map the Ethernet addresses of the nodes residing on each network segment and allow only the necessary traffic to pass through the bridge.
Do not forward bad or misaligned packets.
Works on the Physical and lower Data Link (MAC) Layers
Expansion of the Bridge.
Used in heavily loaded networks to isolate data flow and improve performance (parallelism).
Up to on-half of the computers connected to a switch can send data at the same time.
Data Link and Network Layer device.
Are faster because they examined the packet destination address only before forwarding it on to its destination segment.
Work like bridges in that they accept and analyse the packet before forwarding it to its destination.
Takes more time to examine the whole packet, although it does allow the switch to catch certain packet errors and keep them from propagating through the network.
Methods for encoding bits in different Ethernet technologies.
Used in 10-Base*
Bipolar scheme, synchronisation via a clock.
CSMA/CD and Full Duplex
Backwards compatible.
Physical layer structure (PHY) has been redesigned.
Complex signal encoding mechanisms than the simple Manchester code.
Uses three sublayers.
Introduces the concept of auto-negotiation; negotiates between two different medium to make the properties similar to for communication
Interfaces the physical medium used so that it can be handled generically.
Deals with encoding, transmission in a way specific to the medium.
Two distinctive approaches for medium access:
Most follow the full-duplex approach.
Star topology.
Support standard Ethernet frame format.
Uses 802.3x flow control.
Backwards compatible.
Physical layer is more complex
Digital to digital encoding schemes.
Only use one pole (+
or -
). One pole is used to represent 1
or 0
, zero is used to represent the other.
Two problems:
DC cannot be transmitted properly through an AC line without noise.
The receiver cannot synchronise its clock to the sender.
Uses both poles to represent bits.
1 as positive voltage, 0 as negative voltage.
Voltage inverts on a 1.
Uses local telephone lines.
Allows simultaneous void and data transmission.
Designed to deliver more bandwidth down than up.
Rates range from 1.5 to 9 Mbps down, 16 to 640kbps up.
Distances of 18,000 feet over a single copper twisted pair.
A splitter is needed.
VDSL is designed to provide higher bandwidths.
Up to 52Mbps down, 16Mbps up.
Support the wide deployment og triple play services.
100 Mbps down, 20 Mbps up
150 to 500m distance.
500 Mbps transmission rates over copper cabling by using crosstalk cancellation or “vectorized” VDSL2 based modems.
Now rolling out ultrafast fibre optic broadband with speeds of up to 100Mb.
Next step up from FTCC is FTTh (Fibre to the Home).
Negative factor is the costs.
Top speeds are up to 40Mbps down and 10Mbps up
Topspeeds are up to 76Mbps down and 19Mbps up.
Access points act as a central transmitter and receiver of WLAN radio signals.
A set of STAs that communicate with each other.
When two or more stations communicate together they form a BSS.
All wireless clients are connected to an AP.
Generally the default mode for the 802.11b cards.
All mobile STAs communicate with each other via the AP.
Network consumes double bandwidth for one communication.
AP provide buffering of traffic for a mobile broadcast.
Wireless clients are connected without an AP.
An ad-hoc network is where stations only communicate peer-to-peer (P2P).
There is no base and no one gives permission to talk.
Mostly these networks are set up spontaneously.
Two or more BSSs are interconnected using a distribution system.
APs communicate via the DS.
The DS is the backbone of the WLAN and may be constructed of either wired or wireless networks.
The DS is also a thin layer in each AP.
That determines if communications received from the BSS are to be relayed back to a destination in the BSS, forwarded on to the DS to another AP or sent into the wired network infrastructure to a destination not in the ESS
An ESS is a BSS where APs communicate amongst themselves from one BSS to another..
Entire network looks like an interdependent BSS to the Logical Link Control (LLC).
This means stations can communicate or even move between BSSs transparently to the LLC.
The logical point at which medium access control (MAC) service data units (MSDUs) from a non-IEEE 802.11 local are network (LAN) enter the distribution system (DS) of an extended service set (ESS).
Multiple MSDUs are aggregated at the MAC layer and are pushed into a single MPDU.
They have a single frame header with multiple frames and they are destined for the same client and the same service class.
It is the MAC service data unit. This is the unit o transmission used at the MAC layer which is received from the upper layer.
Aggregation of the MSDU directly performed at the MAC layer is called AMSDU.
Such AMSDUs are now passed to the lower PHY later where they are dealt with as MPDUs
These are the frames passed from the MAC layers into the PHY later.
These are the aggregated MPDU units which are pushed into a single Physical Protocol Data Unit (PPDU).
While the implementation for the DS is not specified, 802.11 does specify the services which the DS must support.
If a station is not moving within its own BSS or it is not moving
If a station moves between BSSs within the same ESS
If the station moves between BSSs of differing ESS.
With a wireless system, the medium is not exactly bounded as with a wired system.
In order to control access to the network, stations must first establish their identity.
The authentication relationship may be between two stations inside an IBSS or to the AP of the BSS.
Authentication outside of the BSS does not take place.
Two types of authentication:
When either the station or AP wishes to terminate a stations authentication.
An encryption algorithm, which is used so that other 802.11 users cannot eavesdrop on your LAN traffic.
A station must affiliate itself with the BSS infrastructure if it wants to use the LAN.
This is done by Associating itself with an AP. Associations are dynamic in nature because stations can move, turn on or turn off.
A station can only be associated with one AP.
This ensures that the DS always knowns where the station is.
Association supports no-transition mobility, but is not enough to support BSS transition.
The service allows the station to switch its association from one AP to another.
Both association and reassociation are initiated by the station.
When the association between the station and the AP is terminated.
Can be initiated by either party.
A disassociated station cannot send or receive data.
ESS transition is not supported.
A station can move to a new ESS but will have to reinitiate connections
Getting data from the sender to the receiver.
The message is sent to the local AP (input AP) then distributed via the DS to the AP (output AP) that the recipient is associated with.
If the sender and receiver are on the same BSS the input and output APs are the same.
The distribution service is not logically invoked whether the data is going through the DS or not.
Where the output AP is a portal. Thus 802.X LANs are integrated into the 802.11 DS.
The three physical layers originally defined in 802.11 included two spread-spectrum radio techniques and a diffuse infrared specification.
Spread spectrum is a communication technique that spreads a narrowband communication signal over a wide range of frequencies for transmission then de-spreads it into the original data bandwidth at the receive.
Spread spectrum increases the bandwidth of the signal compared to narrow band by spreading the signal.
The Code of Federal Regulations (CFR) Part 15 originally only described two spread spectrum techniques to be used in the licensed free Industrial, Scientific, Medical (ISM) band, 2.4 GHz, thus 802.11 and 802.11b.
Orthogonal Frequency Division Multiplexing (OFDM) was not covered by the CFR and would have required licensing.
802.11a, employing OFDM, was created to work in the 5GHz.
FHSS spreads the signal by hopping from one frequency to another across a bandwidth of 83 MHz.
The data is spread over 83 MHz in the 2.4 GHz ISM band.
A short burst of data is sent on one frequency (usually less than half a second).
Then the sender changes to another pseudorandom frequency and broadcasts another burst of data before changing to another frequency, and so on.
The carrier then repeats this pattern.
DSSS spreads the signal by adding redundant bits to the signal prior to transmission.
The signal is divided into many different parts and sent on different frequencies simultaneously.
Spectrum is divided into 13/14 channels.
However, the FCC specifies only 11 channels for non-licensed (ISM band) use in the US.
In communications, band is referred to as the range of frequencies (bandwidth) used in the channel.
Depending on the size of the band (in terms of kHz, MHz or GHz) and some other properties of the communication channel, they can be categorized as narrowband and wideband etc.
Narrowband uses a smaller frequency range (bandwidth)- 300–3400 Hz.
Wideband is a relative term, and the size of the band may be in kHz, MHz or GHz depending on the application.
Uses only enough frequency spectrum to carry the signal
The bandwidth is much wider than required to send to the signal
Utilises lower frequency radio spectrum.
Typically delivered over a wide range of radio frequency spectrum bands (e.g. 900MHz, 1800MHz, 2.6GHz, etc.).
Most of which can also reach indoors to some degree.
Composed of one or more BSC.
BSS is the point where all radio transmission related functions are performed.
BTS and BSC connected through the Abis interface.
BSS connected to the MSC through the A interface.
The management of several BTS is done by the BSC.
It also provides all the control functions and physical links amoung the different BTS and between the mobile switching centre (MSC) and the BTSs.
Being a high-capacity switch, it provides functions such as cell configuration data, control of radio frequency power levels in BTS, frequency hopping, and handovers.
One MSC serves a number of BSCs.
It is a station or site where antennas and radio transmitters and receivers are placed to create a radio coverage area in the mobile network.
Contains one or more transceivers (TRC) and antennas.
The cell site has a 360 degree omni-directional (omni-sector) antenna that is turned to create a cellular area of a specific size.
Omni-sector means the same frequencies are used in all directions.
Communication from the mobile terminal to the cell site is referred to as uplink.
Cell site to mobile terminal is downlink
Mobile phone with ME and SIM.
It consists of the mobile equipment (uniquely identified by International Mobile Equipment Identity (IMEI)
The SIM card contains the International Mobile Subscriber Identity (IMSI) used to identify the subscriber to the system, a secret key for authentication and other information.
The IMEI and the IMSI are independent, thereby allowing personal mobility.
The SIM card may be protected against unauthorised by a password or PIN.
Contains the network elements MSC, LR, VLR, EIR, AuC and GMSC.
Primary service delivery node for GSM and central components of NSS.
It provides all the functionality needed to handle a mobile subscriber, such as:
HLR and VLR, together with the MSC, provide the call-routing and roaming capabilities of GSM.
It takes care of charging and real time pre-paid account monitoring.
These services are provided in conjunction with several functional entities, which together form the Network Switching Subsystem (NSS).
The MSC provides the connection to the fixed networks (such as the PSTN or ISDN).
The database that contains a subscription record for each subscriber of the GSM network.
All the administrative information related to each subscriber registered in the respective communication network, including the current location of the subscriber, is contained in the HLR.
The HLR is responsible for the sending of subscription data to the VLR (during registration)
or GMSC (during mobile terminating call handling).
The location of the mobile is typically in the form of the signalling address of the VLR associated with the mobile station.
A GSM subscriber is normally associated with one particular HLR.
There is logically one HLR per GSM network, although it may be implemented as a distributed database.
The database that contains subscriber data for subscribers registered in a MSC.
It contains all the temporary information about the subscribers.
This information is needed by the MSC to service the visiting subscribers.
Every MSC contains a VLR. Although MSC and VLR are individually addressable.
They are always contained in one integrated node.
A database of all valid mobile equipment on the network.
Where each mobile station is identified by its International Mobile Equipment Identity (IMEI).
An IMEI is marked as invalid if it has been reported stolen or is not type approved.
A protected database that stores a copy of the secret key stored in each subscriber’s SIM card which is used for authentication and encryption.
Switching entity that controls mobile terminating calls.
When a call is estabished towards a GSM subscriber, a GMSC contacts the HLR of that subscriber, to obtain the address of the MSC where that subscriber is currently registered.
That MSC address is used to route the call to that subsciber
The connection between MSC and BSC.
The connection between BSC and BTS.
The connection between MSC and HLR.
The radio connection between MS and BTS.
Cellular telephony derives its name from the partition of geographical area into small cells.
A cell is roughly circular (exactly hexagonal) area with a central transmitter and receiver base station.
The size and shape of each cell is determined by the features of the surrounding area, such as buildings, trees and hills, which can block signals.
In a city, there are many small cells, while rural areas may have very large cells.
Site coverage (cell) is partitioned into different directions called sectors.
Each cell is usually split into tree sectors. Which overlap with other sectors of neighbouring cells so network is uninterrupted.
Long distance transmission between stations (called end devices) is typically done over a network of switching nodes.
Switching nodes do not concern with content of data.
Their purpose is to provide a switching facility that will move the data from node to node until they reach their destination.
A collection of nodes and connections forms a communications network.
In a switched communications network, data entering the network from a station is routed to the destination by being switched from node to node.
A dedicated radio channel is allocated to a single transmission
As long as data transmissions are long and continuous (file transfers) a circuit is used efficiently.
However, most data transmissions are bursty, and dedicating an entire circuit to them is usually a waste of valuable wireless bandwidth.
During idle periods when no data is being sent, bandwidth is still dedicated to the user and not available for others to use.
An entire network may be designed just for packet data.
Packets do not have a dedicated path, it is decided by the routers.
Most old wireless data systems offered minimal data rates, usually in the 10Kbps range. However new wireless protocols bond multiple channels to increase data rates.
Mobile development is facing the major problem to find protocols that could be used to maximise bandwidth efficiently.
Multiple access enables …
Each user is assigned a different psuedorandom binary sequence that modulates the carrier, spreading the spectrum of the waveform, giving each user a unique code pattern.
…
The ISO OSI model.
Encapsulation at each level.
More hardware-based layers.
Data Link layer could be divided into two sub-layers:
LLC is more software orientated and interacts with higher layers.
MAC is more hardware orientated and interacts with lower layer.
(R)ARP protocol links hardware addresses to logical addresses at the Network layer.
Defines:
Defines:
Provides multiplexing and demultiplexing to and from the network layer.
Manages flow control and error control.
Managed access control.
Related to the MAC address.
Protocol Data Unit (PDU) is a term to define any one of the following:
Each layer, with the exception of the physical layer, adds its own header. This is done because each layer has its own specific function.
Unicast Network Level Protocols in use in today’s Internet. Including further study of protocols such as IPv4, ICMP, ARP, RARP used in unicast applications and IPv4 and IGMP used in multicast applications.
Organised in octlets as bytes didn’t used to be just 8 bits long.
IPv4 designed for 32 bits.
Version indicates the version of the IP protocol
Time to live stops network loops.
Header checksum to check the integrity of the header but not the data. Very basic checksum which is used by the router to ensure the packet has not be damaged.
This does add significant load to the router, not only this, but due to TTL it has to create a new checkum without much gain. There are better checksums to be used and they do not appear in IPv6.
Tied to an interface, not an actual machine.
Common to have multiple interfaces (eth0, wlan0, etc.)
Can have multiple addresses on a single interface.
5 classes.
0, 7 bits netid, 24 bits hostid
many host.10, 14 bits netid, 16 bits hostid
110, 21 bits netid, 8 bits hostid
many network.Class Es will never be used because of Microsoft and lack of value.
X.X.255.255
= Broadcast AddressX.X.0.0
= Network255.255.255.255
= Limited broadcast address, used in auto-configuration (e.g. DHCP)IF you own the address:
10.0.0.0
But want to share the address so that:
10.1.0.0
and 10.2.0.0
are different networks.
10.0.0.0
is a class A address. therefore is is actually: 10.0.0.0/8
.
So if we change the networks to: 10.1.0.0/16
the network id will be expanded to the first 16 bits.
Most TCP implementations have a loopback interface with the IP address 127.0.0.1
(127.X
) and name localhost (or anything else in /etc/hosts
which specifies 127.X
as an address).
The localhost behaves as a separate data link interface,
A packet that is sent to the loopback interface moves down the protocol stack and is returned back by the driver software for the localhost “device”.
Used for debugging.
Loopback is done in software, never appears on the network.
Take the university address:
144.124.76.0
Say we want a network per department, can’t use a 24 bit address as that doesn’t leave enough addresses.
So they can use a 22 bit network id.
144.124.76.30/22
or 144.124.76.30/255.255.252.0
Apply the mask 255.255.252.0
to the IP address to get the host id. Invert for the host address.
Network address is: 144.124.76.0
But what’s the broadcast address?
144.124.79.255
(Look at the bits for this)
Here, 255.255.252.0
is the subnet mask.
Used to be allowed to have subnet masks like 255.0.255.255
as it doesn’t add anything.
Treat two contiguous class C networks as a single network.
This eases routing (supernets). For example supernet on geographic locations to make routing tables easier at a router level.
Usual class C is /24, we make them a /23 if we have to contiguous networks
This is the answer to the 192 soup.
Non-contiguous versions of this are disallowed. Must choose numbers for which the maths works.
There are some special IP addresses
RFC 1918 written when started IP addresses to reserve certain IP addresses:
These can be used for private addressing schemes, but not routable on the internet.
Router has a pool of public IP addresses, when a private IP address attempts to access an external resource. The router maps the private IP address to a public IP address which accesses the resource and returns. The public IP address is then translated back to the private IP address and routed.
In industry the pool may be a class C network. In home networks there is only a pool of one. They also do port address translation.
IPv4 has the problem that we know our IP address and the ones which we want to talk to, either router or machines on the link.
Very first header is the Ethernet header, need this because every machine would need to look at the IP header, which is a process which needs to be done in software. Other network protocols exist too.
Ethernet headers can be processed in hardware.
Ethernet headers don’t contain much, but the machines can automate the process of deciding to ignore the packet in firmware not by the CPU.
There is a problem of mapping IPv4 address to Ethernet address. There’s no direct link between IP addresses and Ethernet addresses so we need more at the data link layer.
ARP is cached for a limited amount of time.
If no reply, retransmit after stand-off.
Which type of hardware are we using
ARP can be used with other protocols (other than IP).
Size of the addresses (hardware and protocol).
Doesn’t map just MAC addresses, etc.
Hardware address is repeated as the Ethernet header is harder to access in software.
Code to differentiate request/reply.
RARP - Reverse ARP (looking up IP from MAC). Can be used to discover the machine’s own IP address, for example.
Used before DHCP.
If unknown to the sender, filled with 0
s, otherwise the actual address
The address being looked for
First packet:
Following packets:
Locate an interface in promiscuous mode, ARP request not broadcast, a dummy address is used.
Some TCP/IP stacks pass the ARP request up the line.
A reply indicates a machine is in promiscuous mode.
neped
A host sends a request for its own MAC.
The sender’s IP and MAC address are broadcast, and other hosts will insert this mapping into their ARP tables.
Useful to detect duplicate IP addresses (should be no reply).
Causes other hosts to update their ARP cache (useful if the network adapter has been changed or if this is a hot spare).
Man-in-the-Middle attacks.
ARP allows for MITMA as the IP and MAC are learned by the switch. Sending out a false ARP reply last will allow you to to spoof the IP address to your MAC address.
Same problems with MITMA
0
), Group (1
) - Unicast or Multicast0
), Local (1
) The Internet Control Message Protocol (ICMP) is the protocol used for error and control messages in the Internet
ICMP provides an error reporting mechanism of routers to the sources.
All ICMP packets are encapsulated as IP datagrams.
ICMP messages are either query messages or error messages.
Specific to message type.
E.g. for message type 3 (destination unreachable):
Each ICMP error message contains the header and at least the first 8 bytes of the IP datagram payload that triggered the error message.
To avoid too many ICMP messages, ICMP error messages are not sent
Identifier is set to process the ID of querying process.
Sequence numbers are created for each new echo request.
Reply telling that a port is not available.
Network scanners may make use of this, e.g. NMAP.
Many hosts will not reply.
Sends out ICMP ping messages with increasing TTL starting at 1. For each host that isn’t the target, an ICMP time exceeded message is sent back.
On UNIX uses the port unreachable message instead (using a UDP packet) by default.
If a packet needs to be split then a flag is set to say “more fragments”. Identification is the same through the fragments. More fragments flag is not set for the last fragment, but is known to be a fragment due to the fragmentation offset.
If a part of a fragment is missing, after an amount of time packet loss is assumed.
Occurs if MTU < datagram size to be sent out on an interface, “Don’t fragment” flag allowed: causes ICMP “Destination unreachable: fragmentation needed but don’t fragment bit set”.
Reassembly takes place at destination.
Fragments may be fragmented.
Experience shows that fragmentation is best avoided.
Intermediate routers may have lower MTUs.
Uses ICMP uncreachable Error: Fragmentation Required
ping -s n -M do
:
s > MTU
.traceroute.pmtu
Firewalls can break PMTU discovery.
Sock is a test program which can be run as a client or as a server.
TTL doesn’t always start at 255, might be around 64 but not standardised.
Routers have many different interfaces. This means, with traceroute, only the nearside interface can be found.
There is a record route option which can find this: ping -R
Example routing problems. Interior and exterior routing protocols. Protocols covered will include RIP, OSPF and BGP.
Split into two parts:
Needs to be fast and efficient, every packet involves a forwarding option.
Based on prior knowledge the packet is moved towards the destination.
Filling routing tables. This requires knowledge of the connectivity in some sense.
This needs to be stable and should converge relatively quickly.
This task should not be performed frequently
Routes can be classified into three different categories:
Could also classify by interior and exterior routing.
A default route which is used as a fallback if all other routing fails.
Manually configured information.
Not very scalable.
Automatically use information from other participating routers.
Different methods for finding out routes.
Uses the Bellman-Ford algorithm.
Regularly broadcast the entire routing table, containing known networks and the “distance” to each (usually a number of hops).
Has the problem of routing loops and convergence after change.
Uses Distance View Routing
Classful routing with no ability to pass netmasks.
Typically broadcasts every 30 seconds.
Metrics range from 1 to 16 (infinity).
“Split Horizon” often used.
“Triggered Updates” send an update when something (close) changes to improve convergence.
Sent via UDP to the IP broadcast address (usually) to port 520.
Don’t repeat information to the originator.
Repeat back information, but metric is set to infinity
Refinement of RIPv1.
Includes netmasks and some support for authentication and multicast.
Each router sends the state of all its links to all adjacent routers.
Each router independently calculates its routing table based on the link state database it has constructed.
Normally uses Dijkstra Shortest Path (Shortest Path First).
Uses IP directly.
Early versions could calculate a different set of routes for each value of the IP Type-Of-Service (ToS) field, but this was abandoned in the latest RFC.
Interfaces have a dimension-less costs (these were potentially different for each value of ToS).
Supports equal cost load balancing.
Supports subnet mask and thus CIDR.
Point-to-point links can be used without IP addresses.
Supports authentication
Uses multicasting to reduce load.
On a multi-access network (e.g. ethernet), two routers are elected as Designated Router and Backup Designated Router.
Example routing problems. Protocols covered will include PIM-DM, PIM-SM and MSDP. We will also cover the role of the Rendezvous Point, Anycast IP, and issues still under debate in the technical community.
Refers to the sending of data from one to many or many to many registered recipients.
Contrast to broadcast, which floods the network with data, which all hosts will receive whether they want it or not.
Unicast is the sending of data to a single recipient.
Unlike unicast, multicast is scalable,if ten clients request a connection then unicast must provide those ten connections with their own copy of the data.
Multicast, whether there are 1, 10 or 100 members of a ground, only one copy of the data is transmitted and is only replicated where there are group members on different paths to and from the router.
All devices have a unique 48-bit MAC address.
Devices on the LAN keep a table that maps unicast IP to MAC.
A special range of MAC addresses is used for multicast as a multicast address does not correspond to a single end host.
Class D address range: 224.0.0.0 - 239.255.255.255
High order 4 bits of the first octet of a class D address are always 1110
(0xE
)
To provide a 1:1 mapping between MAC and multicast IP addresses the remaining 28 bits of the IP address would need a unique representation in MAC addresses.
MAC address range assigned for multicast is only 24 bits. The high order bit is reserved, leaving 23 bits.
Thus, 28 bits of IP addresses need to be mapped to just 23 bits of the MAC address.
This corresponds to a 1:32 mapping MAC multicast addresses to IP multicast addresses.
Six octets in an Ethernet header. Three high order octets are constant 01-00-5E
.
The remaining three octets have a range of 00-00-00
to 7F-FF-FF
(high order bit is always 0
for IP multicast.
An IP address of 224.1.1.1
maps to 01-00-5E-01-01-01
. Byt the MAC address also maps to 31 other IP addresses.
Discarding of unwanted packets has to be handled by the IP stack.
Hosts interesting in 224.1.1.1 will also receive (if on LAN) the other 31 possibilities.
After de-capsulation the Ethernet fram and discovering the IP address of an unwanted packet, the host discards the packet.
The probability of this are quite slim at the moment
224.0.0.0 - 244.0.0.255 (224.0.0.0/24)
Should never leave the local network.
224.0.1.0 - 244.0.1.255 (244.0.0.1/24)
224.2.0.0 - 224.2.255.255 (224.2/16)
232.000.000.000 - 232.255.255.255
Specifically permit or block source addresses.
233.000.000.000 - 233.255.255.255
Ensure that addresses are specific to an administrative domain. Low number of specific addresses (255).
239.000.000.000 - 239.255.255.255
Private multicast addresses.
Protocol used is Internet Group Management Protocol (IGMP)
No explicit leave message.
IGMP Report Suppression
Sent by routers to ask for existence of hosts interested in receiving multicast groups.
Sent by hosts in response to Queries,
Also sent by the host if they wish to join a new group.
Major change, introduces an explicit Leave Message
General Query as before.
Group-Specific Query. Used to query if there are any members of a group left after a leave message.
IGMPv2 Hosts must send IGMPv1 if they spot an IGMPv1 router is the querier.
IGMPv2 Hosts may suppress Leave Messages if they spot an IGMPv1 router is the querier.
IGMPv2 Hosts must suppress reports if they see others using IGMPv1 or IGMPv2 reports for a given group.
IGMPv2 routers must ignore Leave messages if IGMPv1 hosts are present.
If any IGMPv1 routers are present must use IGMPv1
Major change: supports the specification of sources in messages.
Host now do not suppress Reports.
Thus, supports the concept of Source Specific Multicast.
Protocol messages are now more complex.
Now contain source address information.
Now contain information relating to multiple groups and each group report can contain source address information.
The set of links and connections that traffic will follow to get to a destination.
Sometimes called Shortest Path Trees (SPT)
A tree where the root is the source of the data.
A different forwarding tree for every combination of source (S) and group (G). N groups and M sources implies trees.
This makes it difficult to calculate the best tree.
Routers need to maintain separate states for all trees.
(S,G) notation.
Rendezvous Point Trees or Core Based Trees.
Traffic follows some form of common path, unlike source trees in which two different trees can send data through different paths, at least most of the route will follow a common tree.
Each group in the system uses the same tree. N trees implies N groups, no matter how many sources.
Normally, each group has a nominated router as the Rendezvous Point, which is the root of the tree.
All sources send towards the Rendezvous Point.
Can potentially lead to a single point of failure if no backup is specified.
(*, G) Notation
Routers must no just forward all multicast traffic. Potentially there could be customers anywhere, unlike with unicast. However, forwarding all multicast would be broadcast instead and would lead to chaos.
Techniques include:
Note the source address of the arriving packet and interface of arrival. The IP tables are checked for the correct interface towards the source.
If the packet has arrived into the expected interface, it can be forwarded onwards. If not it can be discarded.
Used to help avoid some RPF calculations
Sometimes called multicast routing tables.
Used to help make decisions for routing tables.
TTL normally used by IP and decremented as each router passed. When this TTL reaches zero, the packet is dropped.
Multicast routers often set TTL Thresholds on interfaces, if the TTL of the packet is less than this threshold then the packet isn’t forwarding.
If traffic was only designed to reach internal locations, the TTL would be set such that it could reach all internal routers, but external routers would have high enough threshold to stop this traffic, but not those expected to reach external locations.
These boundaries don’t pass certain multicast addresses.
Send all valid packets out of all interfaces, flooding the network.
When a router receives unwanted packets it sends a prune message upstream.
When a prune message is received, the router removes the interface it was received on from the forwarding table for the specified group.
Prune times out in typically 2 to 3 minutes.
Routers send no traffic onwards unless asked.
Shared tree branch constructed from rendezvous point (root) to any interested receiver.
Join messages are sent from the receiver to the root via other routes, creating a shared tree where it goes.
Prune sent when the traffic is no longer wanted.
Has a rendezvous point using a shared tree, but uses source trees to get data from sources to the rendezvous point. Also allows the final router feeding receivers to switchover to a source tree from the source if the traffic level exceeds some threshold (in Cisco routers the default value for the threshold is zero).
An in-depth study addressing the behaviour of TCP and UDP. Connection establishment and termination, flow control under various load conditions, timeouts and retransmission, newer features and performance.
RFC 768
No handshake required which allows it to be fast, especially for many short connections.
Applications must handle failures.
Checksum is the same as the IPv4 checksum, but is over the UDP header, the data, and the UDP pseudo header
TFTP used UDP, uses a stop-and-wait flow window control algorithm:
Designed for diskless systems to download configuration files during bootstrapping.
Does waste a lot of network time.
Done by creating connections between two points and aim to provide data integrity.
Also provides flow control.
Ensures ordering of packets (is actually a byte count of the first byte count in the packet of data).
Doesn’t start at 1.
Helps find missing packets, is the last byte of the packet.
Describes the length of the header.
The value in the Urgent field is valid.
The value in the Acknowledge field is valid.
Push the data up to the application as quickly as possible.
Reset the connection
Used to set up connections
Used to tear down connections
Same as UDP Checksum
How TCP establishes, keeps and tears down connections.
Waiting for a connection
Waiting for acknowledgement of the first send of a SYN
Wait for an ACK
of a sent FIN
Wait for a FIN
from an acknowledged FIN
.
Acknowledged a FIN
but still need the connection to send data.
Final FIN
sent, waiting for the last ACK
.
Final FIN
received and ACK
sent, wait for a number of seconds before closing.
Done in case the ACK
is not received by the other party.
Not listening, connection closed.
TIME_WAIT
is also called the 2MSL wait state.
SYN
flag set to a random number.SYN
, the 0 offset is the value of the SYN
flag. Server goes into SYN Received
state.SYN
flag and an ACK
flag, returning the Client’s SYN
flag plus one*.ACK
flag.* The SYN
flag is assumed to have consumed 1 byte.
If no ACK
is received at SYN Received
state then the SYN
is resent after a timeout.
Known as the three way handshake.
Note: Initial sequence numbers are important! Both client and server choose different ISNs.
When you have a connection between two machines, but one crashes. There’s no data sent between them so one still believes it is connected.
Possible (but unlikely) that two applications will perform active open to one another. Not Client/Server, each opens to a known port.
Results in a single open connection.
Four-way handshake required.
FIN
AWK
and a FIN
AWK
.Three way closedown.
ACK
and FIN
might not be sent together (server might still be transmitting data).
When a FIN
is sent to the server but there is still more data to be sent should not close fully until a FIN AWK
is sent and the AWK
returned. Microsoft broke this by sending a RST
instead of a proper AWK
Both sides of a connection might perform an active close.
Four segments required, which s not unusual.
TCP Header also supports options.
Originally only MSS
, NOOP
and EOS
were specified.
Length is 4
2 byte MSS. Usually based on the MTU size.
Length is 3
1 byte shift count.
Length is 10
4 bytes timestamp value
4 bytes timestamp echo reply
Bulk data transfer (e.g. FTP) has different requirements.
Low overhead from headers.
Receiving systems have limited buffers.
Sliding windows used for flow control.
Packet loss is expensive; timeouts are the simplest way, but there are better.
Want to send as many packets as possible without flooding the network.
Sliding window flow control is carried out by the receiver.
CWND flow control is carried out by the receiver.
The idea is that in every ACK
we also advertise a window. The window is a number of bytes, usually a multiple of the MSS
.
The initial window size will be the size of the buffer TCP has available.
Sender can send up to the limit of the window.
When a segment is acknowledged, the window slides making another slot in the window available.
The window is said to close when the left-hand edge moves to the right.
The window is said to open when the right-hand edge moves to the right.
The window is said to shrink if the right-hand edge moves to the left.
Scale factor is actually a shift.
Largest scaling factor is 14 (maximum window size of 1,073,741,823 bytes).
Remember this increases the size of the header.
Used for high capacity networks.
RFC 1323, TCP Extensions for High Performance.
Can never reach optimal capacity but can approach it.
Capacity(bits) = bandwidth (bits/s) * RTT (sec)
Increase Round Trip Time (RTT) and data travels more slowly, thus higher capacity.
Likewise increased bandwidth involves higher capacity.
Max window size of 65535 bytes, but window scale option can increase this to 1024MB.
Around 10% of the capacity is a good size for the TCP buffer size.
Intermediate routers must queue packets - congestion may occur at routers.
Congestion window (CWND).
Has two phases:
Based on a threshold (the slow start threshold).
Drops drastically when there is packet loss.
Problem: TCP timeouts lead to idle periods.
Fast retransmit: use 3 duplicate ACKs to trigger retransmission.
Fast recovery: start CWND at SSTHRESH and do incremental increase after fast retransmit.
WiFi and very fast networks.
Can be a problem on high capacity networks.
Improved congestion control algorithm for fast, high latency links and lossy links.
Can take many hours to reach optimal throughput with TCP Reno as packet loss has a large affect on throughput.
Westwood carries out end to end bandwidth estimate (BWE) using received ACKs and RTT monitoring.
TCP Reno overreacts to random loss by cutting cwnd in half.
A small fraction of random packet loss does not impact the BWE.
Thus the ssthresh remains unchanged, allowing Westwood to be much more efficient than Reno.
Increase cwnd according to Reno algorithm.
Estimate available bandwidth.
cwnd is then set to ssthresh.
cwnd is then set to 1.
Interactive data transfer can result in many small segments which, together with their ACKs, can lead to congestion.
ACKs are, therefore, piggybacked onto data segments - delayed ACKs.
Nagle’s algorithm result in some collection of data to produce larger segments.
Many small datagrams results in very large overhead.
Can cause congestion, particularly on a WAN.
Sometimes desirable to disable Nagle algorithm, e.g., X mouse movements need to be sent without delay.
Keeps window size information flowing even if the other end closes its receive window.
Possible situation where an ACK
is lost and both ends waiting.
Sending TCP uses a persist timer to periodically query the receiver to see if the window size has updated.
Fundamental to the use of acknowledgements is the need for timeout and re-transmission of packets.
TCP keeps four different timers for a connection and performs exponential backoff when a packet goes unacknowledged.
TCP also has a congestion avoidance algorithms to complement the slow start algorithm to cope with packet loss caused by congestion.
Used when expecting an acknowledgement from the other end.
Keeps window size information flowing even if the other end closes its receive window.
Detects when the other end of a connection has crashed or re-booted.
Used to calculate retransmission timeout (RTO).
Mean deviation is used to allow arithmetic to be carried out with integers and without square roots.
Where:
If a packet times out and there is a retransmission, when an ACK is received, whose was it.
Karn’s algorithms specifies that RTT estimate cannot be update when a timeout and retransmission occur.
Re-use the RTO after such an exponential back off until an acknowledgement is received.
TCP for transactions.
Nearly as fast as UDP.
Reduces minimum number of packets required to three in many circumstances.
Data and FIN are piggybacked on the ACK.
Connection Count (CC) used to avoid duplicate SYNs.
Security issues caused due to this.
2MSL state is abandonded.
8 times RTO used instead.
Allows retransmission of final ACK.
A new incarnation of the same connection, using TAO, implicitly acknowledges the ACK of FIN.
…
Layered like the OSI model (but pre-dates the OSI model).
Ethernet driver captures the incoming frame, strips the Ethernet header and passes to IP
IP layer strips out the IP header and passes it to the transport layer, etc.
Levels:
The process of moving things up and down layers is demultiplexing and multiplexing.
Don’t sent arbitrary length methods, to allow the multiplexing of networking.
(In TCP) Data is encapsulated into frames, frames have a frame header, trailer and a datagram.
This datagram contains an IP Header and a segment or protocol data unit (PDU)
The PDU has the (TCP) protocol header and the actual data.
The frame header is used to drop the packet onto the local link. The address used in the frame header is embedded in the hardware in the network card (MAC address), this is why the IP address is not used. This is for efficiency.
Frame headers are: source address, destination address, protocol and checksum.
Including the DNS and LDAP and their use.
A distributed database mapping hostnames to IP addresses and vice versa.
gethostbyname()
returns an IP address in response on a hostnamegethostbyaddr()
returns a hostname in response to an IP address.nslookup
dig
DNS is stored as a tree:
13 canonical name servers which server information for the root “unnamed” root.
A special domain which is used for reverse lookup.
Now .ipv6
www.google.com.
is a fully qualified domain name. Missing the trailing .
allows the DNS to lookup other entries based on the domain it resides on.
Is a recursive process and therefore quite slow. Results are cached to improve performance.
To try and reduce the effects of this when changing hostnames the TTL of the cache is reduced prior to performing this.
No one company owns all names
Can have zones within a name server.
This is where DNS gets its information from.
Examples:
3www4aber2ac2uk0
3www6google3com0
The 0
occurs because the top-level domain has no name. This is how the end of the string is reached.
Name of primary source of info for zeon
IP address of host
Canonical name
Alias for an IP address for reverse lookups
Mail exchange information
The Name of the nameserver
IPv6 address of host
Experimental IPv6 address.
Host information (CPU and OS).
Location of host.
Request are broadcast address 255.255.255.255
.
Often used during host configuration.
Never forwarded by a route, by a DHCP/BOOTP relay agent can be used to forward this onto the DHCP server.
Replies are routed normally.
Similar to the DNS tree, but the whole world doesn’t typically use the same tree.
Service model based on entries.
An entry is a collection of attributes, that has a distinguished name (DN).
Type/value pairs, e.g.:
C = GB
o = University of Wales
Attributes described by a schema.
The need for and the provision of Quality of Service (QoS) within packet based networks such as the Internet which are inherently best efforts at heart.
Ways in which resources can be allocated to priority traffic best.
Other queues only get any resources at all when the priority queue is empty.
Simple and effective.
Traffic must be policed to stay within low limit as other traffic can get starved or the priority queue reverts to Best Effort
Multiple queues, each queue gets a guaranteed minimum of resource even under congestion conditions.
Can be combined with Priority Queues.
The inherent risks within networks such as the Internet, cracking, viruses, trojans, worms and denial of service attacks. The role of the Firewall and the problems it can bring.
Four aspects of security to consider:
Snoopers should not be able to read confidential data
Verifies that the apparent sender really sent a message, and not an imposter.
Verifies that data has not been corrupted or altered in transmission
Ensures that the sender or receiver cannot deny sending or receiving a piece of information.
Classical encryption techniques:
A cipher used to perform the encryption.
Use a shared secret key to encrypt and decrypt.
For users,
keys required.
Can be done in hardware.
The key is split into two: a public key and a private key.
Anything encrypted by the public key can only be decrypted by the private key.
The private key must be kept private, but the public key can be shared without worry.
Cannot be done in hardware.
Create a shared secret based from public key encryption.
Client sends a “hello” message with TLS version number and prefernces.
Server sends a certificate including a public key.
Client sends a secret key encrypted with the server’s public key.
Server decrypts message and then encrypts a response with secret key which the client decrypts.
Hashing is the operation that maps the message of variable length into a hash value with fixed length.
Hashing is not reversible.
Hashing can be used to generate a digest of the message, called the Message Authentication Code.
The receiver can use the digest to verify if the message is authentic.
The (still) emerging IPv6 protocol and other active issues.
Problems with IPv4 is that 32 bit addresses is too small.
fe80::2c0:dfff:fee4:bd87/10
Loopback address: ::1/128
::
is a string of 0
s of indeterminate length.
IPv4 addresses are encapsulated in IPv6 are expressed with dotted decimal for last four octlets: `::194.123.1.2
IPv6 addresses are leased (possibly infinitely).
0000 0000
- Reserved0000 001
- NSAP Allocation 001
- Aggregatable Global Unicast1111 1110 10
- Link-Local Unicast1111 1110 11
- Site-Local Unicast1111 1111
- MulticastFormat Prefix: 1111 1110 10
or FE80::/64
Postfix is usually the last 24 bits of MAC address. middle byte is fffe
due to IEEE standards.
fe80::00ff:fe00:0000
for MAC address: 00:00:00:00:00:00
Format Prefix 1111 1110 11
or FEC0
Where autoconfiguration is undesirable for operational reasons.
Similar in concept to DHCP for IPv4
Compatible with autoconfiguration
Nodes may request multiple addresses.
Authentication of nodes.
Makes use of address deprecation and reconfiguration-init message…
Some chances to get a feel for what is expected in the exam.
s/(Use a diagram) is necessary/\1/g
b) Describe with the aid of diagrams the TCP/IP connection establishment processes, and explain how the initial sequence number (ISN) is exchanged between two nodes during connection establishment. Label the diagrams with the TCP connection states at each stage.
The ISNs must be different as they are used to acknowledge how much data was actually received later in the connection.
Sequence numbers returned in the ACK are the sequence number of the acknowledge packets plus the size of that packet (which can then be used for flow control).
Not using flow control would either:
max(2, (BWE * RTTmin) / Seg_size)
. The congestion window is set to ssthresh in the case of 3 dup ACKs or 1 in the case of coarse timeout.This module introduces the key ideas in Artificial Intelligence and ensures all students are at roughly the same level before moving on to the specialist modules.
General introduction to Artificial Intelligence (AI), including discussion of what AI is, its history, definitions, and philosophical debates on the issue (the Turing test and the Chinese room). Ethical issues (3 hours).
Tips: if you disagree with a paper, you should comment on why, this will be a good thing :)
Two ways of looking at this:
Many definitions, all potentially valid.
Problem: what is intelligence and how do we prove a system is intelligent.
Human interrogator talks to another system (human or AI), if the interrogator cannot tell the difference then the argument is we must acknowledge it is able to think like a human.
Has not yet been passed.
Behaving intelligently may not be enough.
Computers are just a symbol manipulation device and therefore cannot have mental states.
We have investigated whether we can develop AI, but not whether we should.
Problems of AI:
Why search is important in AI and how to go about it. This includes both informed and uninformed strategies. Evolutionary search (6 hours).
Many AI problems can be framed in terms of a search problem.
Representation is very important.
The search strategy is the way in which searching is performed:
Can’t always evaluate all the search space. Massive search space (e.g. Chess) or even infinite.
Summarises the state of search.
May not lead to a solution.
e.g. Representation of a chess board.
The first search state.
A special example of a search state. It solves the problem.
The state trying to be reached.
All possible search states.
Ways to move around in the state space (action/operators).
To check if the goal has been reached.
Measures the path cost.
Visualise the progression of a particular algorithm.
Shouldn’t evaluate every possibility of the state space.
Initial state is the root, goal is a leaf.
Don’t store the whole search tree. Requires a lot of space. Can discard explored nodes.
Store the frontier of search (i.e. nodes in search tree with some unexplored children).
In big O notation. Number of nodes generated during a search (worst case).
In big O notation. Maximum number of nodes stored in memory.
Is it guaranteed to find the optimal solution?
If there is a solution, will it be found?
b
Maximum number of successors of any node
d
m
Branching factor b
and depth of solution d
.
O(b^d)
O(b^d)
b
is finite)Branching factor b
, depth of solution d
and maximum depth m
.
O(b^m)
O(bm)
A Depth First Search with a limited maximum depth.
Depth limit l
.
O(b^l)
O(bl)
l >= d
A form of Breadth First Search, using a Priority Queue.
Node with the lowest total path cost is expanded.
If all the cost steps are equal, it is exactly the same as BFS.
Complete and Optimal if no negative path costs.
A depth-limited search where the limit is increased iteratively.
Avoids the space complexity of BFS.
O(b^d)
O(bd)
Paths not retained - low memory
Heuristic is a sort of rule of thumb.
Heuristic best child chosen at each point until goal reached or no change in current state.
No backtracking.
Priority queue based on heuristic. Like BFS other than that.
f(n) = h(n)
A best first search which takes into account current path cost.
f(n) = g(n) + h(n)
If h2(n) >= h1(n)
then h2
dominates h1
.
Encode solutions in Chromosomes.
Mutate and crossover chromosomes.
Evaluate the new population and select the best.
A branch of genetic algorithms which uses programming statements as genes.
Easy to do in LISP.
Usually represented as trees.
{+,-,*,/,sqrt},{A,B}
In nature, ants could solve complex problems unsupervised. Capable of finding the shortest route between a food source and the nest.
Can react to changes in the environment,
Each particle is searching for the optimum and encodes a solution (like a GA).
Each particle is moving (can’t search otherwise), and hence has a velocity. It also maintains the position it was in where it had its best result so far (its personal best).
The particles co-operate, exchanging information about what they’ve discovered in the places they’ve visited.
This co-operation only needs to be very simple;
Sometimes an answer has to be optimal in several aspects.
Examples:
Ways of representing knowledge in a computer-understandable way. Semantic networks, rules. Examples of the importance of KR (4 hours).
AI Agents deal with knowledge (data).
Logical representations have restrictions which can be hard to work with.
Objects in the world tend to be related to each other:
The state of the world changing over time.
Closed world assumption.
Uncertainty or fuzzy knowledge.
Classes define Objects, Objects are instances of Classes.
Object ∈ Class
Class ⊂ Superclass
Facts and rules can be encoded.
Essentially a generalisation of inheritance hierarchies.
Each node is an object, class, concept or event.
Each link is a relationship which makes sense in context.
Inheritance is as expected.
Example:
Note: pre-dated OOP.
Incorporates certain valuable human thinking characteristics:
Represent what is typical unless an exception is known.
Frames allow more convenient “packaging” of facts about an object.
Frames often allow things which are typical of a class and which are definitional and can’t be overridden.
Frames also support multiple inheritance.
Frames are represented as semantic networks where nodes have structure. A frame has a number of slots (age, height, etc.), each of these slots stores specific information.
When new information is gained slots can be filled in, this can cause the triggering of actions, which may trigger the retrieval of other frames.
Can fill in missing detail that is assumed.
Once true
doesn’t mean always true
.
As information arrives, truth values can change.
A number of implementations for this,
Brid(x) and not abnormal(x) -> flies(x)
not abnormal(x)
unless we know abnormal(x)
.x
is true given x
does not conflict with anything we already know.”These systems allow truth values to be changed during reasoning (belief revision).
When retracting to a fact, must also retract to any other fact derived from it.
Penelope is a bird (can fly) Penelope is a penguin (cannot fly) Penelope is magical (can fly) Retract magical (cannot fly) Retract penguin (can fly)
For each fact, track its justification
When a fact is retracted, retract all facts that have justifications leading back to that fact, unless they have independent justifications.
Represent all possible states simultaneously
We can find solutions using search, but how can we remember solutions, learn from them and adapt them to new situations? This will cover perceptrons, single-layer and multi-layer networks (5 hours).
When we use some sort of rule-based system, we generally have to understand the rules. This means we understand the conclusions it draws, because it can tell us.
When a system learns from such rules, it processes in a way which can be understood.
Don’t really understand, or have control over, the way in which solutions are found.
ANNs, GAs, GP and sometimes statistical methods.
Might be related to the randomness factor.
(A: time to not understand backprop again)
Inputs (variables) -> network -> Outputs (results).
X and Y matrices of the statistical models are analogous to the training inputs and outputs of ANNs.
Most common learning rule for ANNs.
Connections between nodes given random initial weights.
We therefore get a value at the output node(s) which happens when these random weights are applied to the data at the input.
An iteration, that is, finding the error then adjusting weighting, is called an epoch.
There may be many thousands of epochs in one training.
Need to be able to generalise the model to unseen data.
Too much training will lead to a lack of generalisation.
Self-organising neural networks.
Random initialisation of a grid, see which nodes best match the input and change nodes around it.
The backbone of knowledge representation (4 hours).
Encoding a fact in three pieces of information; the object, it’s attributes and the values of these attributes.
The facts a human knows are not obvious to a computer; needs to be encoded.
e.g.: Tree(species, oak)
The encoding doesn’t matter, so long as its uniform to the system.
Can include an uncertainty factors is a number which can be taken into account by the system when making decisions.
The final conclusion of any program where uncertainty was used in the input is likely to also have an uncertainty factor (if you’re not sure of the facts, can can the result be certain?)
Encoding uncertainty might be encoded something like this: Tree(species, oak, 0.8)
(the certainty that the tree is an oak tree is 80%).
Again encoding doesn’t matter, so long as its uniform to the system.
A knowledge base may have rules associated.
IF
premises THEN
conclusion. There may be more than one premises and may contain logical function
AND
, OR
and NOT
for exampleIf a premise evaluates to TRUE
the rule fires.
e.g.
IF tree(species, oak) THEN tree(type, deciduous)
Rules may contradict another rule. Different strategies can be applied to choose the most specific or most relevant rules.
IF tree is conifer THEN load conifer data ELSE load deciduous data
We can represent knowledge using logic. There are two types: propositional and predicate (or first-order logic or predicate calculus)
In propositional logic formulas are constructed using variables, TRUE
and FALSE
constants and connectors:
AND
(∧
)OR
(∨
)NOT
(¬
)IMPLIES
(→
)Prolog is based on this.
A predicate is like a function that returns TRUE
or FALSE
Tree(a)
is true if a
is oak, false if a
is daffodil.
Oak(a) → Tree(a)
If the first clause is satisfied, the second clause is also satisfied.
The symbol ∀
can be read as “for all”.
∀a(Oak(a) → Tree(a)
The symbol ∃
can be read as “exists”
∃a(Beech(a) ∧ ¬Green(a))
beech(a)
is known as an atomic formula.
Can have multiple parameters.
We use two standard rules:
P→Q
then if P
is true, Q
must also be trueP→Q
then if Q
is false, P
must also be falseClassic monotonic reasoning cannot contain contradictions
Put formally:
X ⊆ Y → Deriv(X) ⊆ Deriv(Y)
where Deriv(X)
is a set of facts derived from X
Reasoning changes over time. Can introduce this into machines by introducing a concept of time.
Machine inference is used to deduce new facts from a knowledge base which is held in working memory.
Knowledge Base -> Inference Engine -> Working memory
Can be very complex
Two principal methods
Modens Ponens:
student(S) ∧ studies(S, ai) → studies(S, prolog)
student(T) ∧ studies(S, expsys) → studies(T, ai)
student(joe)
studies(joe, expsys)
Therefore we can deduce:
studies(joe, prolog)
Proof:
student(joe) ∧ studies(joe, expsys) → studies(joe, ai)
student(joe) ∧ studies(joe, ai) → studies(joe, prolog)
Q.E.D.
Forwards chaining can fire any rules which match the knowledge held in its working memory. This can potentially come up with a huge amount of new knowledge, most of which is probably completely irrelevant.
Backwards chaining sets out to prove a piece of information.
The information will either be true or false, but doesn’t generate unwanted results.
Use the resolution proof method for now.
Say we have:
A1 ∨ A2 ∨ ... ∨ An ∨ B
and ¬B ∨ C1 ∨ C2 ∨ ... ∨ Cm
Resolvent of clauses is:
A1 ∨ .. ∨ An ∨ C1 ∨ ... ∨ Cm
Now take the two clauses:
A1 ∨ A2 ∨ ... ∨ An ∨ B
and D ∨ C1 ∨ C2 ∨ ... ∨ Cm
If there is some subset where B
and D
are negations of each other Theta
If we have two clauses Clause1
and Clause2
, and these both have a resolvant R
, then if Clause1
and Clause2
are both satisfiable, so must R
be.
The idea: take a clause, containing a goal we want to prove, and negate that goal. If we then resole this with other clauses, over and over and we get to the empty clause (which is never satisfiable), we have proved our goal.
We can express any predicate calculus statement in clause form.
This enables us to work with OR and NOT rather than any other clause.
p → q ≡ ¬p ∨ q
A ∧ B ≡ ¬(¬A ∨ ¬B)
Use a previous example in clause form:
¬student(S) ∨ ¬studies(S, ai) ∨ studies(S, prolog)
¬student(T) ∨ ¬studies(T, expsys) ∨ studies(T, ai)
student(joe)
studies(joe, expsys)
Solution to studies(S, prolog)
means we must negate it:
¬studies(S, prolog)
Resolve the clause 1:
¬student(S) ∨ ¬studies(S, ai)
Resolve with clause (2) (S=T
)
¬student(S) ∨ ¬studies(S, expsys)
Resolve with clause (4) (S = joe
):
¬student(joe)
Resolve with clause (3):
Ø
Therefore studies(joe, prolog)
is true.
Q.E.D.
The same thing, but expressed differently. This is how Prolog does it.
A horn clause is a series of disjuncts (ORs)
We can take:
A ∨ ¬B ∨ ¬C
and write it:
A ← B ∧ C
If B
and C
then A
All the same thing:
studies(S, prolog) ← student(S) and studies(S, ai)
studies(T, ai) ← student(T and studies(T, expsys)
student(joe) ←
studies(joe, expsys) ←
All the negatives are one side of the ←
sign.
← studies(S, prolog)
studies(S, prolog) ← student(S) ∧ studies(S, ai)
← student(S) ∧ studies(S, ai)
studies(T, ai) ← student(T) ∧ studies(T, expsys)
S = T
← student(S) ∧ studies(S, expsys)
student(joe) ←
S = joe
← studies(joe, expsys)
studies(joe, expsys) ←
←
Q.E.D.
Programming with logic.
Uses backwards chaining through horn clauses.
Practical introduction to programming for Intelligent Systems, used to illustrate search, KR and first-order logic (3 hours).
How can human expertise be automated? How to build an expert system - system concepts and architectures. Rule-based systems: design, operation, reasoning, backward and forward chaining (3 hours).
Knowledge base contains the rules.
Database contains the facts.
Inference engine uses both of these to match facts to rules to derive new facts, etc.
Justifiable and transparent.
Knowledge Acquisition and its importance in KR and RBS (2 hours).
Multivariate analysis and statistical methods for solving problems.
Analysis of high-dimensional data. Often not possible to identify or quantify an object from one dimension.
Two methods:
Always produces the same result from a given set of data.
Will always find the best result within the constraints of their abilities.
All the factors which might affect the outcome so that the analysis can find some form of correlation.
The set of objects and associated variables.
Usually statistics are in high-dimensional space. No way of visualising this data.
In a 3D space, can fit a plane to any 3 points to make the problem a 2D one.
Can approximate the plane to more than 3 points and fit the best plane using root mean square error. This is done using regression, but does mean there will be error in the model.
Can do this with a 3D plane, but the idea is to keep the model as simple as possible.
The line through multi-dimensional space which describes the most variation in the data (the one with the widest space) is the First Principal Component.
The following Principal Components are always perpendicular and are ordered by the amount of variation from the data and have less importance.
Can analyse the principal components to see which of the original X values contributed most.
Principal components help understand the data better.
The main principal components help to explain what factors had most bearing on the results.
They form a model that could be used for predicting results.
Using a PCA model to predict is known as Principal Components Regression (PRC).
Closely related is PCL.
Fewer variables, if they produce a model just as good in testing, should be preferred.
Not clear cut for a definite value. (i.e. not boolean logic).
Seminars from Aberystwyth University
by Xizhao Wang Fellow; IEEE Editor-in-Chief, IJMLC
The internet, financial institutions, media, medical treatment, scientific research, etc. provide a huge amount of data.
This data is only increasing in size. Big Data is the explosion of information.
Storing Big Data is, as expected, very difficult.
Structured data is becoming unstructured.
Moving from batch jobs to the streaming of data.
Now around Zettabytes of data around.
There is lots of uncertainty in Big Data, there can be a lot of ambiguity in symbolic data, etc.
Martyn Thomas CBE FREng
Offer significant benefits and new vulnerabilities.
Michael Barr’s expert report in the Bookout v Toyota lawsuit. See here.
Safety world works on probabilities, but in security there is a direct attacker - can’t argue independence.
End-to-end security: “No trusted components”.
Whether this works or not is unknown.
Anyone who has the authority to send the relevant messaged through the DCC.
Anyone who can mount a successful cyber attack on the network.
Testing only shows that faults do exist, not that there are no faults.
For high confidence you need formal methods.
Is a through-life discipline.
Is a tier one threat.