# SEM1020 - Current Issues in Software Engineering This module focuses on current issues in the field of Software Engineering. The module will involve students in an examination of current research literature of interest to software engineers. ## Introduction It is okay to produce a paper based on my own paper :-) ### Staff * **[Edel Sherrat](mailto:eds@aber.ac.uk)** * [Elio Tuci](mailto:elt7@aber.ac.uk) * [Nigel Hardy](mailto:nwh@aber.ac.uk) ### Learning Outcomes 1. Identify and use the main research resources that are available to software engineers 2. Constructively participate in advanced technical debate in the field. 3. Have a general overview of the field of Software Engineering and be aware of focused areas of research interest within it. 4. Present current research at an appropriate level of detail to a technical audience. 5. Produce a survey paper on some current area of Software Engineering research. ### Assessment * Participation and Contribution **15%** * Survey Paper (9000 words) **60%** * Poster **25%** ## How to Read a Scientific Paper Papers provides a good view of what is happening now. Journal papers give a more out of date view than conference papers. Blogs typically give the most up-to-date view but are least verifiable. Papers should give enough information that there is enough information to replicate the experiment. It will show some actual data. ### Steps 1. Read the abstract, check the information will be relevant 2. Read the conclusion and discussion, check the results agree with the abstract and if they are still relevant. 3. Read the introduction, check you understand the background information and see if you need to look up more items to understand this paper. 4. Read the results. Usually tables and figures are provided to show the data. Now the methods become important, it gives details on how the experiment was set up and carried out. Try to pick out the basic methods and work upwards. References provide a way of tracking back through the sources. ## Paper Presentations Set down for presenting Abstract State Machines (Feb 28) and Swarm Robotics (Mar 28) and asking questions on Biological Data Sharing (Mar 14) and Swarm Robotics (Apr 4). ## Literature Review/Survey Paper 9000 words of pure gold. May address *any* aspect of software engineering. MSc should probably review the literature of their project. # SEM2220 - Mobile Solutions Mobile devices (phones, tablets and other handheld devices) are the fastest growing area of computing. Typical applications involve a wide range of software and system issues. This module will investigate those issues, giving students experience and understanding of the best ways of building different types of mobile applications. ## Introduction *Scope of mobile systems, , mobile web, native apps, design choices, main platforms, overview of implications (2 lectures)* ### Staff - **[Chris Loftus](mailto:cwl@aber.ac.uk) (E38)** - [Neil Taylor](mailto:nst@aber.ac.uk) (C58) - [Andrew Star](mailto:aos@aber.ac.uk) (C47) ### Learning Outcomes 1. Demonstrate an advanced understanding of the domain of mobile systems. * Quite a broad overview of these systems. * High level view. * Give a flavour of specific systems (iOS, Android). 2. Analyse and evaluate mobile solutions in a range of application areas and be able to critically evaluate their effectiveness. * Have to be critical at M level. * Research. Pros and cons. 3. Evaluate the social, legal, ethical and professional issues involved in implementing mobile applications. * Neil will be talking about this in the context of mobile. 4. Implement representative mobile systems. * Playing with the technologies. * Building an android, iOS and mobile web app. ### Assessment - Mobile Web-based Programming project **20%** *22/10/2013 to 04/11/2013* - iOS-based Programming project **20%** *12/11/2013 to 25/11/2013* - Android-based Programming project **20%** *02/12/2013 to 16/12/2013* - Case study analysis **40%** *10/12/2013 to 22/01/2014* ### Motivation In 2012 the whole mobiles ecosystem's revenue was $1,551 billion; equivalent to 2.2% of global GDP. 5% CAGR (Compound Annual Growth Rate). These are only projected figures, the picture is much bigger than this as this only includes money which takes into account the money which stays within the ecosystem (doesn't include ecommerce, etc.). The affect is much higher. ### Mobile Landscape Constantly changing given that its a new market. The iPhone was the real kickstarter for this in 2007. [Timeline](https://gsmaintelligence.com/timeline/) of the history mobile telephones. Difference between "smart" and "feature" phones. Mobile subscribers are growing 4 times faster than the global population. Reducing now (especially in the West) due to market saturation. More scope for growth in developing countries. Asia Pacific accounts for approximately half of global subscribers. Total sim-enabled connections (including machine to machine (M2M) connections). Estimated 7.6% CAGR 2012-2017. Percentage of M2M connections is increasing rapidly. Multiple devices (tables/phones/etc.) allow growth in sim-enabled connections. Mobile data traffic is increasing rapidly. Video speculated to be one of the most rapidly increasing traffic. ### Mobile versus Desktop Over time the sales of desktops (PCs & Laptops) is decreasing while the sale of mobile devices is increasing. ### Some predictions by GSMA * Asia predicted to add approx. 50% of all connections/subscribers between 2013-17 * Same period Latin America and Africa next 20% * Subscriber growth in developed nations slowing (e.g. 1% in UK) * Total data traffic load in 2012 was 0.9 Exabyte/month). * Predicted to be 11.2 EB/mth. ### Chief Stakeholders * Network operators * Content providers * OS vendors * Device vendors ### Fragmentation and Differentiation Issue of fragmentation and differentiation, related to: * the number of OS variants, * the number of device vendors wishing to customise the OS, * different browsers, * Network operators and device vendors do not like commoditisation. * Don't want to be treated like commodities which are values on price. Top 25 ranked apps in 2012 take 15% of all revenue. Only 2% of App Store newcomers get into the top 250 apps. Free apps are becoming more of a norm, especially with the saturation of android. ### Mobile Network Operators Revenues are decreasing due to over-the-top services (i.e. using online services over SMS or phone calls). Operators will push, via their shop sales-assistant commission, those devices they gain the biggest discounts on. High profile phones promoted, often Android. Apple disrupts this model due to a locked mindshare. Psychological hook of "cheap" contract phones. Free now, pay later ideology. Billing relationship with the customer. Paying for things using your mobile account rather than via a bank account (directly). Small transaction fee for this service to generate revenue. * 48% of the world's population do not have a bank account. Something for the future, not quite so prevalent at the moment. ### Rich Communication Service Threat of over the top (OTT) services such as Skype. Quality of service is currently an issue here, but voice call is growing very slowly (13% in 2008 to 4% 2012). GSMA responded by developing a Rich Communication Service specification. Only aimed at long term evolution (LTE) networks (4G). Aims to provide this as a universal service. 17 different implementations by MNOs including AT&T, Telefonica, Verizon and Verdafon. ### Device Vendors (OEMs) Nokia sales falling. Haven't kept up with the trends. Samsung now the major player. Mainly due to their adoption of Android. Apple improving with mobiles. RIM/Blackberry are focusing on development. Building an Android smartphone involves: * Choosing an OS. * OEMs design phone (HW, UX, required changes to the Android OS UX) plus apps to include. * Marketing. * Negotiations with MNOs to arrange bulk discounts. * Production test units sent to MNOs, app developers and Google. * Phone is shipped to MNOs and independent stores * MNOs customise with their own apps. * Sales to public plus marketing. Can also go down the stock phone route. Inception to market is approximately around 6 months when based on an existing OS. Apple is a little easier to to the locked-in syndrome. ### Types of Phone * Basic Phones * Call and SMS support only. * Feature Phones * No touch screen * Browser becoming standardised. * Smart Phones * Definition changes year-on-year. * Expectations change a lot. * More sensors and technology. * Phablets * Large smartphones. * Tablets * Same functionality as smartphones, often without SIM. * Slates * Large tablets. * Non-phone devices * iPod touch * eBook readers * Google Glass * Watches like the Apple Pebble. * Rely on bluetooth or similar. ### Operating System Vendors Android leading market player. iOS popular. Symbian (dead now) and Blackberry not doing well. Windows Phone a late starter. ### Android Fragmentation There are 24 listed Android device manufacturers but at least 259 different device models currently on sale. Lots of different screen sizes, resolutions, hardware capability, etc. There is only one manufacturer for Apple products. Version 4.1+ *Jelly Bean* most prevalent, but versions 4.0, 3.0 and 2.3 are still a large market share. Security problems, most malware attacks on Android. iOS is mostly all the same version. However, companies often prescribe a 2-3 purchase cycle. Corporate world might need to take into account older versions. Google allow fragmentation to make Android more interesting to OEMs, However this can get in the way of updates. Webkit configuration varies; browser fragmentation. Tried to stop fragmentation in version 3, but this lead to new forks. So this has been re-introduced in version 4. ### Fragmentation and Differentiation Differentiation gives a unique selling point (USP) for vendors. Sticking with the same model. Fragmentation leads to security issues and makes the OS more complex to handle screen sizes, etc. More testing needed. Therefore increased cost. ### Windows Phone 8 Microsoft and Nokia. A closed environment (or "walled garden"). No fragmentation/differentiation. Very few OEMs interested. ### How to join the OS game? Buy into it. Web as OS platform. Firefox OS is trying hard with cheap alternatives. HTML5, JS, CSS3 and APIs to access hardware. Very fragmented, depending on rendering engine. ### Mobile Webkits are not always the same Can be configured differently. Must be careful with mobile websites. Depends on location too (UC Browser more popular than Android in Asia). ### Proxy versus full browsers. Opera mini, for example. * HTTP request (encrypted) * Proxy Server * Target HTTP Server resources * Renders page * Compresses into an image map * Image map returned to Opera mini (encrypted). Cheaper and faster over a slow network. Very popular in certain developing countries with low bandwidth. Cheap in terms in processing too. JS is handled by the proxy server. Not always perfect and may not be enabled. ### Content Providers App Stores * Apple App Store is a walled garden. 70% developer revenue. * Google Play. 70% developer revenue. * Amazon Android Appstore. * Others (apk) security issues with this. ## Mobile Web *Basic tools, responsive web, matching platform expectations, testing, delivery. (5 lectures plus practicals)* ### Objectives * To be able to build some simple mobile web sites/applications and view them via desktop browsers, mobile browsers and as mobile applications (hybrid apps). * To gain an insight into design issues especially multiple screen sizes and browser support. * To use a common mobile web framework to help build a mobile web application. * To user PhoneGap™ to turn a webb application into a "native" application which can be deployed. * To use some common testing technologies for mobile web. ### Overview * Is mobile web being exploited? * Three main approaches: 1. Web site 2. Hybrid App 3. Native App * Challenges for mobile web developers * Standards are a moving target. * How can we be informed what support there is? * Given a complex situation, how can we make testing tractable? #### Core philosophy Progressive enhancement and not graceful degradation. ### Web site/app Approach #### Pros * One codebase * Cross platform * Familiar technology * Easy to distribute to both mobile and non-mobile #### Cons * Lack app store distributions * Weak access to device services * Performance (JS interpretation). * Not as slick as a native app ### Hybrid Applications Native wrapper around HTML and CSS and JS. #### Pros * One codebase * Cross platform * Better accesses to device services * Can distribute via app stores * Familiar technologies #### Cons * No full access to device services * Performance * Not as slick as native apps ### Native Applications #### Pros * Highly polished * Good performance (potentially) * Good developer support platform * App store distribution. #### Cons * Platform specific * Multiple codebases ### Progressive Enhancement not Graceful Degradation Graceful degradation (of a website): * Focus is on building a great experience for the latest browsers, testing one browser version back. * Old browsers provide poor, but passable, experience, only show-stopped bugs fixed. Progressive enhancement (of website): * Focus on content, not browsers. ### Responsive Web Design (RWD) * Fluid grids * Media queries - use `em` over `px`. ### Mobile virtual viewports Mobiles have a virtual viewport which bypass the media queries. Use the viewport meta-tag to ask the viewport width to be the same as the actual width. ### Breakpoints Research suggest lines of text should be between 45-75 characters for comfortable reading. Columns help with this. ### Mobile-first progressive enhancement. Design site for the mobile device. If media queries are supported and on a bigger screen, add to the existing CSS for mobile to add for the styling rules and changing font sizes. Simplification of complex designs is harder than making a simple design more complex. ### Server-side Detection of Device Capabilities Client side RWD alone may not be sufficient. There are some server-side approaches: * User-agent sniffing. * Client-side reconnaissance missions. Classes of "device". **RESS** (Responsibe Web Design and Server-side Components). ### The problem Some services will only run on a mobile device (Phone calls, accelerometer). Some services are more significant on a mobile device (location awareness, etc.), be careful not to run into the *Hide and Cry* anti-pattern. Some classes of device are very different (feature versus smart phone). What we spit out is so different that RWD alone won'y help. Need variants of the website. Image sizing issues that RWD won't solve. ### User-agent sniffing Can use HTTP User-Agent request header with a device description repository to work out: * Device hardware characteristics * Operating System * Browser characteristics ### Example: Detecting viewport width ```javascript var width = (winder.innerWidth > 0) ? Window.innderWidth : screen.width; document.cookie = "sitewidth="width; document.location.reload(); ``` ```php if(isset($_COOKIE["sitewidth"]) { $_SESSION["screenwidth"] = $_COOKIE["sitewidth"]; } // ... if($_SESSION["screenwidth"] <= "380") { include("includes/mobile-nav.inc.php"); } else { include("includes/desktop-nav.inc.php"); } ``` ### Herding Devices into Device Classes **Device Class**: An abstract collection of common characteristics of similar devices and their browsers. Why? Because we don't want our server-side code to have to deal with every device (this isn't scalable). ### Typical Classes * Higher Mobile (WebKit-based, >= 320px width) * Simpler Mobile (minimal JS support, >176px width) * Tablets * Desktop * Unsupported (if no SLL, Cookies, JS, or < 176px width) Does unsupported break progressive enhancement? ### What to send? * Stylesheets and Code based on class * Sometimes single codebase is too small (watch out for duplication of common code). * RESS can help with codebase maintenance. ### Responsive Web Design and Server Side Components Combine client-side RWD with templating in the codebase for different classes of device. ### Sometimes HTML5, CSS3 and JS isn't enough Many HTML5 features aren't widely supported. Can't publish to application stores. Perhaps go for a hybridge: part native, part HTML. PhoneGap/Cordova is the best known hybrid technology. ## IOS Native Apps *Objective-C, relevant design patterns, platform design considerations, handling data, using sensors and location (5 lectures plus practicals)* ### Objective-C Based on C and Object-Orientation (based on Smalltalk), has basic rules of C but most of the syntax is smalltalk-based. Some repeated information is involved, but this is improving. Uses late binding (names are looked up at runtime, not compile time). Good, but unusual, object syntax with some useful extensions and libraries. Verbose (*A: yay boilerplate*), designed to be descriptive (*A: **yay** dot-enter programming*). Likely to change with each release (good and bad). #### Basic Syntax and Objects ##### Logging and Strings ```objc NSLog(@"Message"); NSString *var = @"Some message"; NSLog(@"Message: %@.", var); ``` ##### Booleans ```objc BOOL true = YES; BOOL false = NO; ``` ##### Numbers ```objc NSNumber *num = @10; // Conversion to unsigned int NSUInteger uInt = [num unsignedIntegerValue]; ``` #### Arrays ```objc NSArray *arr = @[...]; // Accessing elements arr[index]; // Something about NSArrays being immutable objects. ``` #### Dictionaries ```objc NSDictionary *dict = @{@"key": @"value"}; // Accessing NSString *val = dict[@"key"]; ``` #### Objects and Messages Messages are Objective-C's version of methods, where the Java syntax would be: ```java ObjectName.methodName() ``` The Objective-C syntax is: ```objc [objectName messageName] ``` Example: ```objc NSArray *temp = @[@"Test", @"2"]; NSLog(@"%@", [temp description]); // Logs information about temp ``` ##### Storing the output of Messages Just need to worry about typing ```objc NSArray *temp = @[@"Test", @"2"]; NSString *result = [temp description] NSLog(result); ``` #### Parameters Messages can have parameters, however they may act differently than expected. ```objc [object methodName:parameter namedParameter:value] ``` ##### Initialisation ```objc [[Class alloc] init]; ``` #### Conditionals and Loops ``` if(condition) { // ... } else if(condition) { // ... } else { // ... } ``` ``` switch(thing) { case value: // ... break; } ``` ``` for(Class *item in iterable) { // ... } ``` #### Blocks Blocks are like basic methods. ```objc void (^block)(void) = ^{ // ... }; ``` Inkoving a block: ```objc block(); ``` #### Classes In a header file (.h): ```objc @interface Name : NSObject @property Type *name; @property NSString *myName; @end ``` Implementing the interface (.m): ```objc #import Name.h @implementation Name @end ``` Accessing properties ```objc Name *name = [[Name alloc] init]; name.myName = @"Value"; ``` ##### Method Signatures In many languages, you might see: ``` java void m(int x, int y, String text) ``` A similar method in Obj-C is: ``` objc - (void) mAtX: (NSInteger) x atY: (NSInteger) y withText: (NSString *) text ``` Calling this: ``` objc [obj mAtX: 2 atY: 32 withText @"a message"] ``` The method name is: `mAtX:atY:withText` ##### Properties Defined with the `@property` syntax. The attributes in the `()` control how the property manages the underlying data value. The `IBOutlet` is only used on properties that are linked to the user interface. `atomic` is used for multi-threading purposes (read locks, etc.). Under the bonnet, this is an instance variable (ivar). Good practise is to cover your ears and pretend they're just properties. ##### Instances ``` objc Example *inst; inst = [[Example alloc] init]; // or inst = [Example new]; ``` `alloc` and `init` is the typical Obj-C way as classes often have different initialisation techniques. `alloc` is a class method, `init` is an instance method. #### Protocols Similar to Java interfaces. There are some features where you can have required and optional methods. Optional methods act as empty methods if they are not implemented. Set in the header file (.h) #### Primitive Types Primitive types from C can be used in Obj-C, e.g. int, long, float, double. Additional types: 1. `BOOL` (`YES` or `NO`) 2. `NSInteger` (typedef to either long or int depending on architecture) 3. `CGFloat` (typedef to either float or double depending on architecture) #### Memory Management Memory management is achieved by checking the number of references to an object. In older code, you needed to write lines of code to state when you wanted to keep an object in memory. Modern applications use ARC (Automatic Reference Counting). The compile tracks when objects should be kept in memory or removed from the application. ### Application Structure Typical project implementations have a simple structure which corresponds to MVC `AppDelegate` accesses the model (if there is one). The storyboard is the view, describe many screens in a single file. `ControllerView` is the controller. ### Developing an Application 1. Make a new view-based progect, 2. Add extra views in the Storyboard, 3. Design the application by placing buttons, etc. *allows easy prototyping* 4. Make new `ViewController` files and link them to the views 5. Via drag and drop, define Actions and Outlets in the `ViewController` and link outlets and actions to interface (associate views to controllers) 6. Implement model separately (if complex enough). ### Rule of Thumb If your newly created app crashes when trying to load a new screen, then you usually have: 1. Messed up linking the view to the view controller. 2. Done something wrong when linking an outlet or action to a screen. The error message should give you a clue as to what is wrong. ### Types of Interfaces * Tabbed screens * Tables * Linked Views * More examples from iOSDevUK You can put each of the views into the storyboard and define how you segue from screen to screen then write code to pass data between them. #### Tabbed Screens #### Tables `UITableView` is used to display repeating section of data. `UITableViewCell` is a cell in the table. Unlike classic tables on other platforms, there is only a single column. Tables contains many sections, which in turn contains many rows. The entire table can have a header and footer. Sections can also have a header and footer. Each table must have at least one section to be complete. ##### Table Styles Table style is set at build time only. ###### Plain Cells flush to each other ###### Grouped Not in iOS7. ##### Cells Each row contains a cell. When a row's data is to be displayed, the `UITableView` will request the cell. By default, the `UITableViewCell` is used to degine a cell. The cell is initialised in the method: `tableView:cellForRowAtndexPath:` A table can contain different types of cells for different rows. The cell is reused. ###### Cell Structure * Content * Layout depends on Style * `UITableViewCellStyleDefault` * `UITableViewCellStyleSubtitle` * `UITableViewCellStyle1` (right detail) * `UITableViewCellStyle2` (left detail) * Accessory * Disclosure Indicator * Detail Disclosure Button * Check Mark ###### Custom Cells Content of a cell can be defined in code, but is tedious. It is more likely that IB will be used to define the layout for the cell and use a View Controller to control the behaviour. #### Table View Contoller `UITableViewContoller` is a subclass of `UIViewControler` that is designed to manage `UITableView`s. If a new view based controller is created, the XIB will contain the table linked to this controller. In order to provide custom behaviour, the following methods needed to be implemented: * `UITableViewDelegate` * Manage selections, * Configure section headings and footers, * Help to delete and reorder cells, * Perform other actions * (All optional) * `UITableViewDataSource` * The data source provides the table-view object with the information it needs to construct and modify a table view. Two are required. #### Linked Views ### Data Storage on iOS Techniques for storing and retrieving data on the iPhone. XML and SQLite provide the best portability. #### NSUserDefaults User preferences. Persistent map. #### Property List Serialisation Mechanism to store a selected number of data types in a property list (plist). #### NSCoder Technique to save more complex objects. #### XML XML data files. #### SQLite Small memory footprint SQL database. #### Core Data More complex system, but offering a powerful way to manage data and relationships. Best integration with iCloud. ### Application Sandbox iOS applications are run in a protected area on the device. Within this area there is a defined structure of Directories that you can access. The developer is responsible for managing files that are created within the application area. ### SQLite 3 Database and library available in iOS. C library, which needs to be included in the project. Create a SQLite database on the command line and include in the project. Need to marshal data into and out of the database, with some conversions (e.g. Strings). #### Where to store the database By default, your database is in the default bundle, inside your application. If read-only data is needed, it can remain there. Write access needs to be moved to the Documents directory. ## Android Native Apps *Relevant design patterns, platform design considerations, handling data, using sensors and location, handling device variation (5 lectures plus practicals)* ### Introducing Android Android Inc. started developing an open source mobile OS, acquired by Google in 2005. Uses the Dalvik Virtual Machine: * Google * Virtual Machine and Dex Language * Linux kernel * Apache Harmony class libraries. #### The effect of OS segmentation The Android platform is currently changing very quickly. Each manufacturer has a coperate theme to it. ### Why Android? * Popularity and market saturation. ### Android IDEs * Eclipse and ADT. * Android Studio. ### Application Space There are other apps which are isolated from the application, but are able to communicate with intents. #### Linux Kernel and Android OS #### Operating system process Owned by the application. #### Dalvik Virtual Machine A special Java VM for running on Mobile Devices. May be moved to the ART (Android Runtime). #### Application No single main method. The way that Android apps are developed, and the model following, doesn't have a concept of a start point or end point, rather it has the idea of the behaviours. ##### Activity A single screen layout and the code behind. A functional piece of UI. ##### Service A background process. Does not have a UI, may interact with them though. ##### Content Provider A way of sharing a local database. Queries and cursors used to interact. ##### Broadcast Reciever ### Application Lifecycle Memory of mobile devices is much more limited than on PCs or servers. Short battery life is an issue. Lots of apps vying for limited resources. #### Android Manages Memory Carefully Android has the opportunity to remove resorces from memory when screens are switched. If necessary a whole application and its process will be removed. Need to be able to reincarnate apps. ### List Views Used to display scrollable lists of data. #### List Activity Extended by classes to handle clicks, etc. Acts as a controller in an MVC sense. #### List View Used by the List Activity Acts as a view in an MVC sense. Does the hard work of sizing rows, etc. #### List Adapter Created by the controller and acts as the data source for the list view. Acts as a model in the MVC pattern. ##### Array Adapter <T> Simple array of type T. Generally a hard-coded list of static data. ##### Simple Adapter A list of maps, where the map keys relate to UI element IDs. ##### Simple Cursor Adapter A dynamic adapter which is based on a cursor of an iteratable object. ### Filesystem Given a private filestore under: `/data/data/package-path` #### Public On SD Card #### Private Stored under private filestore. `/data/data/package-path/files` ##### Access Apps access their private files folder with: `Context.getFilesDir();` ###### Read/Write `Context.openFileOutput(String name, int mode);` `Context.openFileInput(String name)` #### Media On SD Card under the standard location for media files. #### Temporary `/data/data/package-path/cache` #### Shared Preferences `/data/data/package-path/shared-prefs` XML only #### Databases `/data/data/package-path/databases` ## Bringing it all together *Legal, social, ethical and professional issues. Selecting platforms and delivery methods, getting the interface right. (3 lectures)* ### Motivations The BCS, under its Royal Chater, is required to establish and maintain standards of competence, conduct and ethical practice for information systems professionals. The ability to recognised the **legal**, **social**, **ethical** and **professional** issues involved in the exploitation of computer technology and be guided by the adoption of appropriate professional, ethical and legal practises. ### Legal Issues Which laws relate to the use of computers? #### Data Protection Act #### The Equality Act #### Copyright, Design and Patents Act #### Contract Law and Licenses #### Computer Misuse Act ### Social Issues What is the impact of computers on society? ### Ethical Issues How should we act? ### Professional Issues What are our professional responsibilities? # SEM5640 - Developing Advanced Internet-Based Applications This module builds on SE31520 examining further the development processes, techniques and technologies for constructing Internet-based, distributed software systems. Particular emphasis is placed on studying and applying enterprise design patterns. Students, working in small groups, will build, using a modern software engineering methodology, a distributed application that employs the technologies and concepts taught in this and previous modules. *Nigel Hardy awesome tally: 2* ## Introduction *Provides a road-map to the module, with a brief overview of main concepts and how they interrelate.* ### Staff * **[Neil Taylor](mailto:nst@aber.ac.uk)** * [Nigel Hardy](nwh@aber.ac.uk) ### Learning Outcomes 1. Analyse a complex software engineering problem. 2. Within a group design and implement an original solution to the problem. 3. Test and critically evaluate their solution. 4. Critically assess the relative merits of various software development methodologies within the context of the given problem and nature of the project team. 5. Working in a group, apply the chosen software development methodology to solve the given problem. 6. Critically explain the relative merits of alternative server-side technologies. 7. Demonstrate a critical appreciation of design issues encountered when developing multi-tier, distributed applications. ### Assessment * Group Programming project and report **75%** * Exam **25%** ### Internet Applications * Uses network connection * Browser-based * Client-server model * Reliance of collection of internet standards (e.g. HTTP) ### Advanced Internet Applications * AJAX/Plugins support to provide a client rich experience * Multiple Application/Servers * Distributed or more complex processing * Transaction (DB) support * REST/SOAP or other remote procedure calls No precise definition. ### Overview There are several themes this module will look at: * Multi-Tier Internet Applications * Developing using .NET and Java EE * Design Patterns * MVC * ORM * Web Services * REST * SOAP * RMI/RPC * Building Applications * Cloud Computing * Structure and deployment #### Multi-Tier Internet Applications Applications that involve multiple servers which run different parts of the application so that the business logic is separated from the presentation logic. Different parts tend to include: * Client * Web * Business Logic * Data/EIS Lots of small applications which build up a larger, more complex, application. Communication between the servers provides the linkage between the applications. Supports scalability if done correctly. Also more resistant to failure. Potentially more secure (more important servers firewalled). Should all be transparent to the user. ### Java Enterprise Edition (J2EE) * Building web applications * JavaServer Faces * Enterprise Java Beans * Portability #### .NET * Building web applications * Web Forms, MVC * C# * Tiers and layers #### Design Patterns Two types of issues: ##### Patterns of Enterprise Architecture * Building and structuring the application * Relate to .NET and Java ##### Patterns of Enterprise Integration * Making applications communicate * Examples: * Apache Camel * Apache ActiveMQ #### Building Applications Using Cloud computing and how relevant it really is. Examples from eBay of structure and deployment. #### Web Services Application that provide services which have destination endpoints with operations. SOAP XML, WSDL and REST are the big players. Alternatives to REST? ## Construction Technologies *Examination of technical solutions for building enterprise applications as provided by enterprise Java and .NET for building distributed applications.* These include: * support for server-side MVC, e.g. JavaServer Faces and ASP.NET MVC; * business objects, e.g. Enterprise JavaBeans; * object to relational mapping, e.g. MS LINQ, Java Persistence API; * messaging systems, e.g. Java Messaging Service; * interoperability, e.g. SOAP-based web services. ### Java Enterprise Edition (Java EE) Java is a language (JDK) and a platform (JRE). The platform is the Java Virtual Machine and a set of APIs. Four platforms: 1. Java Standard Edition (SE) 2. Java Enterprise Edition (EE) 3. Java Mobile Edition (ME) 4. Java FX #### Java EE A superset of Java Set, providing a development model, APIs and runtime environment (the server). > *"For developing and running large-scale, multi-tiered, scalable, reliable and secure network systems"* - The Oracle BS #### Java EE Web Tier Access the web tier with HTTP(S). Provides static resources and dynamic page generation. Handles client input and navigation flow with maintance of state. JavaBeans provide temporary storage on the Web Tier. Can do simple logic, could build the whole application in this tier, but would be a bad design. #### Java EE Business Tier Provides Enterprise Java Beans (EJB) components. Access to Java Logic, etc. which provide the business logic. #### Java EE Enterprise Information System (EIS) Tier Logically and typically on other machines, typically "foreign" technology. #### Other Clients Can build Java Applications or Web Applications which access the EJB components directly. Uses remote procedure calls. #### Java EE Application Server Provides: * The JVM and core APIs * The EE APIs * Component containers * component lifecycle management * creation * association * services to components * security * transaction management * directory lookup * remote connectivity ##### Containers A set of Java classes packaged into a jar file. Can be deployed using either: * XML descriptors * Java annotations Annotations are widely used, but can be overriden by descriptors. Components plus deployment descriptors make up a module, each for one container type. Four types of EE modules: * Web modules `.war` * EJB modules `.jar` * Application client modules `.jar` * Resource Adapter Modules `.rar` A set of modules makes up a Enterprise archive: `.ear` #### Servlets Servlets are the olderest EE web component. Servlets act as a dynamic page, which have a URL, and must receive a HTTP request, from this it generates a HTTP response. Is a Java object with a managed lifecycle. Extension of `javax.servlet.http.HttpServlet` class. ##### Lifecycle Managed by the web container. The first call is: * Load class * Create an instance * Call `init()` Any call: * Handle the HTTP request Perhaps calls `destroy()` Only one instance is needed by a running server. ##### Handling Requests The `HttpServlet` class has stubs for all the HTTP methods `doGet(...)` etc. `service()` method used to handle the finding of appropriate methods for the HTTP headers. Common to use the same method to handle `GET` and `POST` (for some strange reason, this sounds bad). ###### Request Provides access to: * Call details * Parameters * Body * A reader * Context ###### Response Provides access to: * Response code * Headers * A writer (for the body) ##### Forwarding Can forward requests to other servlets, passing on the request and response objects. Used for branching on context, for example. ```java RequestDispatcher d = req.getRequestDispatcher('name'); d.forward(req, resp) ``` ##### Instance Variables and Servlets A servlet could have instance variables, however these variables are not specific to a connection, rather the whole servlet. There may also be multiple servlets so there is no guarantee that these will be the same. Need features for session data. Reminder that HTTP is stateless. ##### Sessions The request must carry some identification of the session, server usually generates this information. * Parameters * Cookies Remember EU Cookie Law ##### Error Redirects Specified in the `web.xml` descriptor. #### Useful Features * Templating * Validation & Schemas * i18n * Security * Database access * Support for good design patterns * Tight Integration * Testing Frameworks * Web Service support Generally: higher level libraries and code reuse. #### i18n and L10n Internationalisation and localisation are both important. Don't hardcode for the locale, instead store translations in DB or files and load them dynamically. #### Authentication #### Authorization #### Testing Use JUnit where possible (with mocks, stubs, etc.) Can test Java Beans using an embedded EJB Container. Service testing is slightly more difficult ##### Service Testing Testing the resources in an automated way to avoid manual input. Provides rapid and reliable regression testing. Handles: * HTTP(S) * FTP * LDAP * JMS * Mail * SOAP Specify calls, with parameters or sequences. Selenium or JMeter are options (JMeter also does load testing). JMeter also has a NetBeans plugin. ### .NET Microsoft framework for developing and deploying applications with: * Cloud (Azure) * Web Server * Desktop * Restricted Resource Devices and Mobile Phones Similarities with Java (JRE). Focused on Windows, but can run on other platforms through mono. #### Overview ![](http://misc.alexanderdbrown.com/dotNET.png) ##### Common Language Runtime (CLR) The .NET JRE effectively. Loads and executes .NET "managed code". CLR loads code into application domains to provide a level of isolation and the ability to stop and remove a domain. Provides Just In Time (JIT) compilation, memory management and GC. Security based one roles and defined by policies. ##### Building and Deploying Code 1. Compile source code files. 2. Combine IL (MSIL), metadata, config files and other resources into assembly files. * DLL or EXE extensions * DLL is a library module * EXE is only type that will activate the CLR 3. Deploy Applications to CLR #### Internet Information Services (IIS) The application server which runs ASP.NET applications. ### Server-Side MVC Methods for implementing the MVC pattern in an Web Application. #### JavaServer Faces (JSF) Provides: * Templating * Facelets * JavaServer Pages (JSP) * Validators * MVC Pull * Navigation * Fixed * Dynamic * AJAX * i18n/L10n * Session Management * DB Access * JDBC * ORM frameworks * Testing Frameworks * Code reuse * Web Services and resources ##### Basic Technology Basic Api for represention components and managing their state; handling events, server-side validation and data conversion; defining page navigation; supporting i18n and accessibility; and providing extensibility. A tag library for adding components to web pages and connecting components to server-side objects. ##### Expression Language Like most scripting languages `$` and `#` are used. `${expr}` are *rvalues* (read-only) and are processed immediately. `#{expr}` are *lvalues (read-write) and have deferred evaluation. Also used for method expressions. `${object.variable}` calls `object.getVariable()` under the covers. `#{object.method}` calls `object.method()`. ##### Faces A replacement for JavaServer Pages (which are now deprecated). Write (X)HTML with special tags which have an XML appearance which will be rendered as HTML with functionality behind it. `` is the top level tag. All pages will tend to act like a form, so `` is very commonly used, even in cases where there would not normally be a form. ##### Common Attributes * `id` * `style` to associate CSS * `rendered` is a condition to check is the element should be rendered on the page. The condition can be an expression from the Expression Language. * `value` the value of the component, again this can be from the Expression Language, which links the view to the model. ##### Composite Components Built from standard components. These can be parametrised and stored in `resources`; they have a namespace (default: `http://xmlns.jcp.org/jsf/composite/ezcomp`) ```xml

``` ##### Templating Can create template pages which has values to fill in. The inheriting page has to fill in these values. ##### Navigation Write navigation rules in XML, each rule is a page (view). For each page there is a set of outcomes (in string form). For each outcome a next page is specified. Typically stored in `faces-config.xml` Outcomes are generated from the `action` attributes from components. There are implicit navigation rules which falls back to a matching page if it can be found. ##### Converters Implement `javax.faces.convert.Converter` Have associated error string displayed if conversion fails (``). Number of default converters for most Java Objects and primitives. Converters are used in four ways: 1. Bind the component to a managed bean property of the right type (default). 2. Put the converter class in the component's `converter` attribute. 3. For numbers and dates, nest the `f:convertDateTime` or `f:convertNumber` tag inside the component. 4. Put the `f:converter` tag inside the component and refer to the converter (general purpose tag). ##### Events & Listeners Part of the component model, application events are generated by components (ultimately from the rendered page). The JSF application can map HTTP requests to the specific handling code. Two types of event: 1. Application event; when the user activates a component that implements `ActionSource` (buttons, links, etc.). 2. Value-change event; when the users changes the value of a component represented by `UIInput`. Listeners cause the application to respond to events: 1. Implement an event listener class to handle the event and register it to a component. 2. Implement a method of a managed bean and refer to it in the EL attribute in the component tag. ##### Validators Similar to converters, but only provide validation. Implement `javax.faces.validator.Validator` Again, default validators are provided. Registered by: 1. nesting the validator's tag inside the component 2. Nesting the `f:validator` tag inside the component. `BeanValidator` uses validation methods in the bean instead of having to write a separate class. Also allows the validation to be model-state specific. ###### Bean Validation ```java public void validateVar(FacesContext context, UIComponent toValidate, Object value) { int input = (Integer) value; // cast the given value if(!condition) { ((UIInput) toValidate).setValid(false); FacesMessage message = newFacesMessage( "error message"); context.addMessage( toValidate.getClientId(context), message); } } ``` ##### Lifecycle Starts when the client request a URL and ends with the server response. #### ASP.NET .NET Applications for the web, which uses: * Web Forms - Event based * Web Services - Application to Application * MVC * ASP.NET AJAX ##### View State Stores the entire state in a hidden item in the form. Simplistic, but does affect performance and security. Sent in base64. Can disable for the entire page, or for specific elements. Also have the ability to encrypt, `EnableViewStateMac` or store server-side. ##### Session State There are different techniques which can be used to store session state: * Cookies (bad) * REST Requests (not stateful) * Server Side session tokens (persistence/scalability questions) ASP.NET uses server side session tokens: ```C# String myVallue = "A. Name"; Session["name"] = myValue; myVale = (String) Session["name"]; ``` Can configure where the data is held. Can also turn off the requirement for cookies. ###### In Process (InProc) Same as ASP. ###### SQL Server (SQLServer) Store state in Microsoft's SQL Server, which can be on another machine. ###### StateServer Store in a separate process, which can be on another machine. ###### Custom Write your own mechanism. ###### Off ##### System.Web.UI.Page Default base class for code-bheind filels. Implements `System.Web.IHttpHandler`: * `IsReusable()` - true if the handler can be pooled * `ProcessRequest(` - Processes the actual HTTP request. ##### HTTP Pipeline ![HTTP Pipeline](http://misc.alexanderdbrown.com/asp.net/http_pipeline.png) #### GridView and SQL DataSource Ways of display DB data. ### Business Objects Objects which relate to the business layer of a multi-tier application. #### Enterprise Java Beans (EJB) EJBs act as an ORM framework, where the Java Objects are annotated with the DB schema details. Three types of EJB Session: 1. Stateful 2. Stateless 3. Singleton Can also have message-driven beans. Access can be done through: * No-interface view (all the public methods are available) * Through a business interface * Injection * JNDI ##### Transactions A transaction is a unit of work which supports "ACID". ###### Atomic Every task within a unit of work must complete successfully otherwise the transaction is aborted ###### Consistent Atomicity, isolation, durability lead to consistent data. Developer must also define database consistency checks. ###### Isolated Preventing interference from other transactions. ###### Durable Data written to disc before a transaction can fully complete. ##### EJB Transactions In EJB Transactions have a scope. A transaction manager manages this process. EJBs support two types of transactions: * Container managed * Bean Managed ###### Container Managed Transactions Transactional attributed specified using annotations or in the deployment descriptor. Outside of the EJB code. Reccommended ###### Bean Managed Tranasctions Annotate EJB class with: ```java @TransactionManagement(TransactionManagerType.BEAN) ``` Offers fine grained control. ##### EJB Transactions Attributes Six types of transaction attributes, specified by annotations or descriptor. The default is `Required`. Can annotate the class to set the default for the class. ###### NotSupported Transactions are not supported by the method, any current transaction will be suspended during this method call and resumed once it terminates. ###### Supports Follows the caller; if it was in a transaction, it will remain so, if it was not, this will **not** create a new one. ###### Required The method **must** be part of a transaction. If the caller was not in a transaction, this will start one. ###### RequiresNew This method will always crate a new transaction. An existing one will be suspended and will resume after this is complete. The outer transaction will not roll-back if this one does. Useful to nest transactions ###### Mandatory Must already be in a transaction, otherwise a `TransactionRequiredException` will be thrown. ###### Never Must never be in a transaction, otherwise a `RemoteException` will be thrown. ##### EJB Isolation Problems: * Dirty Reads * Non-repeatable Reads * Phantom Reads Provide four (fairly standard) levels of transactions: * `TRANSACTION_READ_UNCOMMITTED` * `TRANSACTION_READ_COMMITTED` * `TRANSACTION_REPEATABLE_READ` * `TRANSACTION_SERIALIZABLE` In bean-managed transaction, can specify isolation using the JDBC API: ```java connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE) ; ``` Using serializable guarantees data consistency, but may affect performance. ###### Read Uncommitted Uncommitted changes are visible to other transactions, other transactions can change rows that another transaction has read. May result in inconsistencies as a rollback by T1 may mean that T2s data is out of date. ###### Read Committed Rows updated in a transaction cannot be read by another transaction, but rows read by one transaction can be changed by another. Rows updated by T1 cannot be read by T2. Rows read by T1 can be updated by T1 or T2. ###### Repeatable Read Rows read by one transaction cannot be updated by any transactions. Rows read by T1 cannot be updated by T1 or T2. ###### Serializable Transaction (appears) to have a lock on the data. ##### EJB Transactions with Exceptions Checked exceptions (including application exceptions) do not cause an automatic rollback, but can be set using `@ApplicationException(rollback=true)` Unchecked exceptions and `RemoteException`s automatically cause a rollback and the EJB instance is discarded. ##### Injections Getting a reference to object instances without instantiation or passing in as parameters, providing good decoupling. There are two mechanisms for this: 1. Resource Injection 2. Dependency Injection Annotations to fields or methods specify the injection points. Provides a runtime binding which provides inversion of control. Compatible abstractions are required to avoid typing errors; the code using an object is not responsible for choosing an implementation or creating the instance. Injection is often the implementation for this, but can use: * Factories * Service Locators * Contextualised lookup ##### Resource Injection Inject references to objects in the JNDI namespace into any container-managed object. Gets a reference to a resource using: * Annotations * Deployment Descriptor * `InitialContext` lookup Inject as an interface type, code independent of specific implementation. Not type safe! ##### Variable Resource Injection ```java public class MyServlet extends HttpServlet @Resource(name="java:comp/DfeaultDataSource") private javax.sql.DataSource dsc; } ``` ##### Method Resource Injection ```java public class MyServlet extends HttpServlet { private javax.sql.DataSource dsc; @Resource(name="java:comp/DefaultDataSource") public void setDsc(javax.sql.DataSource dsc) { this.dsc = dsc } } ``` Method name must being with `set`, with a `void` return type and only one parameter. ### Object Relational Mapping (ORM) Libraries which provide a mapping from a relational database to objects in an object-orientated paradigm. #### Language-Integrated Query (LINQ) A query language for .NET languages which is akin to SQL. #### Java Persistence API Access to database at SQL level or through an ORM Framework ##### JDBC Provides direct access to a database. Only provides relational records, not Objects. ##### Java Naming and Directory Interface (JNDI) Access to naming and directory services, part of the Java SE API. Looks up objects by name, returning values or references. API for the clients. Service provider interface (SPI) wadds new directory and naming services. ##### Java Persistence API (JPA) Provides an ORM mapping of Java Object using Java annotations. Provides: * Create * Retrieve * Modify * Write *(So CURD then? -A)* Entities are JavaBeans, annotated with `@Entity`. Automatically mapped, but can label instance variables with extra details; `@Column` can map Java types to SQL types. `@Id` is required. *(JPA can be a real pain though, he glossed over this just a little bit -A).* #### Contexts and Dependency Injection (CDI) Any Java object can be made managed by its container. The CDI defines scpoes and a managed object has a declared (or default) scope. This can then be injected into any other managed object. ##### Application Scoped `@ApplicationScoped` ##### Session Scoped `@SessionScoped` ##### Request Scoped `@RequestScoped` ##### Conversation Scoped `@ConversationScoped` Covers the AJAX exchanged (can be extended using an ID) ##### Dependent `@Dependent` Inherits the scope from the object it is inserted into. This is the default scope. ##### Injection ```java import javax.inject.Inject public class Printer { @Inject Greeting greeting; } ``` ##### Qualified Injection ```java public class Printer { @Inject @Informal Greeting greeting; } ``` Can create a subtype of Greeting which is annotated using `@Informal` to allow variants of a type. ### Messaging Systems Acts as a middleware which decouples: * time * platform * language Which is usually machine-to-machine. There are **a lot** of standards. #### Models for Messaging Systems. * Point-to-Point (queue) * Publish/Subscribe * Fanout * Request/Response #### Java Messaging Service A Java API which "allows applications to create, send, receive and read messages using reliable, asynchronous, loosely coupled communication." Requires an implementation. Access Connection Factories or Queues using injection. #### Message Brokers Most patterns require middleware (all can use it). Possible roles include: * Routing * Receiving * Storing * Persistent * Transmit * Broadcast * Translate * Monitor ##### Brokers JavaEE applcation servers must have one. Glassfish has Open MQ. .NET has one part of the service bus. Apache Active MQ is a good offering. ##### Security Authentication & Authorisation, encryption of messages, integrity of messages, transactional, etc. Some protocols and some implementation already on offer. ### Interoperability Methods of performing B2B operation. #### Socket Connections Powerful, but not generally available as an API. Need a Domain Specific Language. #### Remote Method Invocation (RMI) Marshalling and unmarshalling are handled by generated code. Some distribution services are provided. Access to CORBA IIOP. Builds on services provided by sockets. Has the problem of specificity to Java. Even with IIOP it is still better in an intranet. Focuses on synchronous transmission. #### RESTful Web Services Use HTTP methods (`GET`, `POST`, `PUT`, `DELETE`, `PATCH` and `HEAD`) with a URI for the API. Return any form of data; usually XML, JSON or YAML. Very constrained on header information. #### Simple Object Access Protocol (SOAP) based Web Services Expressed as XML supporting interoperability (both Business to Business (B2B) and EAI). Typically sent over HTTP but can use other network protocols. Usually generated automatically using facilities in a program, but it is possible to create messages manually. More popular than CORBA, extensible through the use of XML. HTTP avoids firewalls. Industry backing: * Miscrosoft * I've Been Moved (IBM) * Oracle * BEA * HP * SAP * etc. ##### Structure ```HTTP POST /webservice/OrderProcessing HTTP/1.1 ... ``` ```XML ``` ##### Web Services Description Language (WSDL) SOAP manages the process of communicating information but doesn't define operations, data types or faults of the service. WSDL is an XML description of a service, generally autogenerated by tools. ###### WSDL Elements In the specification, there are the following eleents that define a service. * **types** - the data types used that can be used in the messages * **message** - the messages that can be transmitted, which also specifies the types used * **portType** - the list of possible operations, linking the input and output messages * **binding** - the link between a portType and the specific protocol settings * **port** - the definition of a specific location that a particular binding is available. * **service** - a collection of ports. ##### In .NET Generates proxy code for you. A set of asynchronous methods are provided: ```C# BeginMethodName(...) EndMethodName(...) ``` ##### Interoperability in SOAP > How do you expose types as parameters and return values in a way which in interoperable? Value types map onto XML Schema types, classes or structs are converted into complex types in the schema. * Classes must be public * Public fields and properties are serialised as elements * Methods are not exposed. Arrays and collections are supported by this, but not all types can be mapped easily (e.g. custome types where methods are used to provide access to calculated data). (*A: as you'd expect really*) #### Windows Communication Foundation (WCF) Splits Application domain from communication domain. Brings together lots of different interoperability technologies. ##### Adress ##### Binding ##### Contracts ## Software Development Process There is no such thing as a silver bullet for software development processes. ### Testing Methods for testing a system. #### White/Black Box Testing A basis for which testing is built on #### Unit Testing Testing the smallest possible unit. #### Functional Testing Test that a part provides an expected function. #### Non-functional Testing Tests that a part will not fail to function under certain circumstances. #### Integration/Subsystem Testing Testing larger units of work combine together. #### Verification Testing Have we built the right thing? Does it conform to standards. #### System Testing Our testing of the entire system. #### Acceptance Testing Testing the system is acceptable for the client. ### Formal Methods Write a formal specification and then program to that specification so that something can be proved to have worked. #### VDM #### Z Notation ### Traditional Methods Focus on up front design before development #### Waterfall Model 1. Requirements 2. Planning 3. Design 4. Development 5. Testing 6. Deployment 7. Maintainace #### Spiral Model #### Iterative Development ### Agile Methods Focus on providing features without large amounts of up-front development and client involvement. #### Extreme Programming Promotes no-one authorship of code with refractoring. Planning game to put features. Can use a metaphor to describe the system. #### Test Driven Development Write tests first then produce code to pass those tests (but no more). #### Feature Driven Development Break the system down into user stories, make these stories into features which can then be developed independently. Need some overall architecture. #### Rapid Application Development Hack it until it works. #### Scrum Self organising teams with a focus on co-location of all team members and communication. #### Model-Driven Engineering 1. Build a model 2. Test the model 3. Translate the model into executables Platform Independent Model -> Platform Dependent Model. Can use tools like executable UML. ### Open Source Methods for developing software in a more unmanaged and dynamic way. #### Cathedral A big in-house style of development where every developer understands the big picture. #### Bazaar A very dynamic way of developing, which not every developer has the big picture, but understands their small parts, usually needs some figureheads. #### Crowd Sourcing Getting a large amount of people to come up with potential solutions and selecting the best ones. #### Hacking Playing with a system and getting it to do tasks it was not necessarily designed to do. ## Design Issues *A number of enterprise application design patterns will be reviewed within the context of technologies discussed in 2. Building multi-tier applications requires developers to know more than just technologies, they must also know and use design techniques suited to their development. Students will learn about reusable enterprise design patterns, and in particular those used for the development of distributed, multi-tier applications.* ### Domain Logic Patterns #### Transaction Script Organizes business logic by procedures where each procedure handles a single request from the presentation. ![](http://martinfowler.com/eaaCatalog/transactionScriptSketch.gif) #### Domain Model An object model of the domain that incorporates both behavior and data. ![](http://martinfowler.com/eaaCatalog/domainModelSketch.gif) #### Table Model A single instance that handles the business logic for all rows in a database table or view. ![](http://martinfowler.com/eaaCatalog/tableModuleSketch.gif) #### Service Layer Defines an application's boundary with a layer of services that establishes a set of available operations and coordinates the application's response in each operation. ![](http://martinfowler.com/eaaCatalog/ServiceLayerSketch.gif) ### Data Source Architectural Patterns #### Table Data Gateway An object that acts as a Gateway to a database table. One instance handles all the rows in the table. ![](http://martinfowler.com/eaaCatalog/dbgateTable.gif) #### Row Data Gateway An object that acts as a Gateway to a single record in a data source. There is one instance per row. ![](http://martinfowler.com/eaaCatalog/dbgateRow.gif) #### Active Record An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data. ![](http://martinfowler.com/eaaCatalog/activeRecordSketch.gif) #### Data Mapper A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself. ![](http://martinfowler.com/eaaCatalog/databaseMapperSketch.gif) ### Object-Relational Structural Patterns #### Identity Field Saves a database ID field in an object to maintain identity between an in-memory object and a database row. ![](http://martinfowler.com/eaaCatalog/idFieldSketch.gif) #### Foreign Key Mapping Maps an association between objects to a foreign key reference between tables. ![](http://martinfowler.com/eaaCatalog/foreignKeyMappingSketch.gif) #### Association Table Mapping Saves an association as a table with foreign keys to the tables that are linked by the association. ![](http://martinfowler.com/eaaCatalog/associationTableSketch.gif) #### Dependent Mapping Has one class perform the database mapping for a child class. ![](http://martinfowler.com/eaaCatalog/dependentObjectSketch.gif) #### Embedded Value Maps an object into several fields of another object's table. ![](http://martinfowler.com/eaaCatalog/aggregateMappingSketch.gif) #### Serialized LOB Saves a graph of objects by serializing them into a single large object (LOB), which it stores in a database field. ![](http://martinfowler.com/eaaCatalog/serializedLobSketch.gif) #### Single Table Inheritance Represents an inheritance hierarchy of classes as a single table that has columns for all the fields of the various classes. ![](http://martinfowler.com/eaaCatalog/singleInheritanceTableSketch.gif) #### Class Table Inheritance Represents an inheritance hierarchy of classes with one table for each class. ![](http://martinfowler.com/eaaCatalog/classInheritanceTableSketch.gif) #### Concrete Table Inheritance Represents an inheritance hierarchy of classes with one table per concrete class in the hierarchy. ![](http://martinfowler.com/eaaCatalog/leafInheritanceTableSketch.gif) #### Inheritance Mappers A structure to organize database mappers that handle inheritance hierarchies. ![](http://martinfowler.com/eaaCatalog/inheritanceMappersSketch.gif) ### Object-Relational Metadata Mapping Patterns #### Metadata Mapping Holds details of object-relational mapping in metadata. ![](http://martinfowler.com/eaaCatalog/metaMapperSketch.gif) #### Query Object An object that represents a database query. ![](http://martinfowler.com/eaaCatalog/queryObjectSketch.gif) #### Repository Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects. ![](http://martinfowler.com/eaaCatalog/repositorySketch.gif) ### Web Presentation Patterns #### Model View Controller Splits user interface interaction into three distinct roles. ![](http://martinfowler.com/eaaCatalog/mvc-sketch.gif) #### Page Controller An object that handles a request for a specific page or action on a Web site. ![](http://martinfowler.com/eaaCatalog/actionController-sketch.gif) #### Front Controller A controller that handles all requests for a Web site. ![](http://martinfowler.com/eaaCatalog/frontController-sketch.gif) #### Template View Renders information into HTML by embedding markers in an HTML page. ![](http://martinfowler.com/eaaCatalog/templateViewSketch.gif) #### Transform View A view that processes domain data element by element and transforms it into HTML. ![](http://martinfowler.com/eaaCatalog/transformViewSketch.gif) #### Two Step View Turns domain data into HTML in two steps: first by forming some kind of logical page, then rendering the logical page into HTML. ![](http://martinfowler.com/eaaCatalog/twoStageViewSketch.gif) #### Application Controller A centralized point for handling screen navigation and the flow of an application. ![](http://martinfowler.com/eaaCatalog/appControllerSketch.gif) ### Distribution Patterns #### Remote Facade Provides a coarse-grained facade on fine-grained objects to improve efficiency over a network. ![](http://martinfowler.com/eaaCatalog/distributedFacadeSketch.gif) #### Data Transfer Object An object that carries data between processes in order to reduce the number of method calls. ![](http://martinfowler.com/eaaCatalog/dtoSketch.gif) ### Offline Concurrency Patterns #### Optimistic Offline Lock Prevents conflicts between concurrent business transactions by detecting a conflict and rolling back the transaction. ![](http://martinfowler.com/eaaCatalog/OptimisticSketch.gif) #### Pessimistic Offline Lock Prevents conflicts between concurrent business transactions by allowing only one business transaction at a time to access data. ![](http://martinfowler.com/eaaCatalog/PessimisticSketch.gif) #### Coarse-Grained Lock Locks a set of related objects with a single lock. ![](http://martinfowler.com/eaaCatalog/coarseGrainedLockSketch.gif) #### Implicit Lock Allows framework or layer supertype code to acquire offline locks. ![](http://martinfowler.com/eaaCatalog/ImplicitLockSketch.gif) ### Session State Patterns #### Client Session State Stores session state on the client. #### Server Session State Keeps the session state on a server system in a serialized form. #### Database Session State Stores session data as committed data in the database. ### Base Patterns #### Gateway An object that encapsulates access to an external system or resource. ![](http://martinfowler.com/eaaCatalog/gatewaySketch.gif) #### Mapper An object that sets up a communication between two independent objects. ![](http://martinfowler.com/eaaCatalog/mapperSketch.gif) #### Layer Supertype A type that acts as the supertype for all types in its layer. #### Separated Interface Defines an interface in a separate package from its implementation. ![](http://martinfowler.com/eaaCatalog/separatedInterfaceSketch.gif) #### Registry A well-known object that other objects can use to find common objects and services. ![](http://martinfowler.com/eaaCatalog/registrySketch.gif) #### Value Object A small simple object, like money or a date range, whose equality isn't based on identity. #### Money Represents a monetary value. ![](http://martinfowler.com/eaaCatalog/moneySketch.gif) #### Special Case A subclass that provides special behavior for particular cases. ![](http://martinfowler.com/eaaCatalog/specialCaseSketch.gif) #### Plugin Links classes during configuration rather than compilation. ![](http://martinfowler.com/eaaCatalog/pluginsSketch.gif) #### Service Stub Removes dependence upon problematic services during testing. ![](http://martinfowler.com/eaaCatalog/serviceStubSketch.gif) #### Record Set An in-memory representation of tabular data. ![](http://martinfowler.com/eaaCatalog/recordSetSketch.gif) ### eBay Example #### Partition Everything Split everything into manageable chunks: data, load and usage. > If you can't split it, you can't scale it Motivations: * Scalability - can scale horizontally and independently * Availability - can isolate failures * Manageability - can decouple segments and functional areas * Cost - can use less expensive hardware ##### Functional Segmentation Group data using standard data modelling techniques. Logical hosting: abstract the logical representation from hosts physical location. Support combining and splitting without code change. ##### Horizontal Split Split databases horizontally along primary access path. Multiple split patterns: * Hashing based on a key * Lookup- or range-based Aggregation/routing in Data Access Layer: * Abstracts developers from split logic. * Routes CRUD operations to appropriate split * Supports rebalancing through configuration changes ##### No (cross-database) Transactions Absolutely no client-side transactions, two-phase commits. Auto-commit for vast majority of DB writes. Anonymous PL/SQL blocks for transactions within single database. Consistency without transactions using: * Careful ordering * Recovery through: * Asynchronous recovery * Reconciliation batch * Fallover to guaranteed asnyc flow. Has the added benefits: * No deadlocks * Avoids coupling availability * Maximises update correctly. ##### No session state User session flow moves through multiple application pools Absolutely no session state in application tier. Transient state maintained/references by: * URL rewriting * Cookies (*sadface*) * Scratch database #### Async Everywhere * Scalability - can scale components independently * Availability - Decouples state (up/down), characteristics (always available, best effort), retry operations * Latency - Can improve user experience at cost of data/execution latency. Can allocate more time than the user would be willing to do. * Cost - can spread peak load over time. ##### Message Dispatch Consumers subscribe to event. * Multiple logical consumers can process each event * Each logical consumer processes events independently * Within each individual consumer, single consumer instance processes event * Guaranteed at least once delivery, no guaranteed order. Managing timing conditions * Idempotent - processing event N times should give the same results as processing once. * Readback - consumer typically reads back to primary database for latest data. ##### Search Feeder Read and transform item updates from primary database. Reliable multicast * Publish updates to search service * Persist messages to intermediate data store for recovery * Resent recovery messages when messages are missed Search nodes listen to updates * Listen to assigned subset of messages * Update in-memory index * Request recovery. ##### Periodic Batch Schedule offline batch processes Most appropriate for: * Infrequent or scheduled processing * Non-incremental computation Often drives further downstream processing through message dispatching. #### Automate Everything Prefer adaptive/automated systems to manual systems. * Scalability - can scale with machines, not humans * Availability/Latency - can adapt to changing environments * Cost - machines cheaper than humans * Functionality - consider more feature space ##### Adaptive Configuration Define service-level agreement (SLA) for a given logical event. Consumer dynamically adjusts to meet defined SLA with minimal resources. Automatically adapt to changes in: * Load * Event processing time * Number of consumer instances ##### Machine Learning Dynamically adapt experience * choose pages, modules and which provide best experience for that user and context * Order results by combination of demand, supply and other factors Feedback loop enables system to learn and improve over time: * Collect user behaviour * Aggregate and analyse offline * Deploy updated metadata * Decide and serve appropriate experiences Best practises: * Perturbation (anneling) for continual improvement * Dampening of positive feedback #### Remember Everything Fails Build all systems to be tolerant of failure. * Assume every operation will fail * Assume every resource will be unavailable * Detect failure as rapidly as possible * Recover from failure as rapidly as possible * Do as much as possible during failure. Motivation: * Availability - duh ##### Failure Detection Application servers log all requests * Detailed logging of all application activity Messages broadcast on multicast message bus Listeners automate failure detection and notification. ##### Rollback Absolutely no changes to the site cannot be done. Code deployment: Rollout/Rollback * Entire site rolled every 2 weeks * Many deployments have dependencies between pools * Rollout plan contains explicit set (transitive closure) of all rollout dependencies. * Automatic tool executes staged rollout, with built-in checkpoints and immediate rollback if necessary. Feature deployment: Wire on/Wire off * Every feature has an on/off state driven by central configuration * Allows feature to be immediately turned off for operational or business reasons * Decouples code deployment from feature deployment * Applications can check for feature "availability" in the same way as they check for resource availability. ##### Markdown Detecting that a resource is down or slow. Application "marks down" the resource. * Stops it from making calls to it * Sends alert Non-critical functionality is removed or ignored * "Limp mode" operation Critical functionality is retired or deferred * Fallover to alternative resource * Defer process to guarantee aynch message Explicit "markup" * Allows resource to be restored and brought online in a controlled way. ## Wider Issues *Examination of current issues that are related to 2 and 3, e.g. scalability; approaches to testing and deployment; the use of cloud computing.* ### Load Balancing Balance the load of the request across a number of different machines. There are hardware and software solutions for performing this. Benefits: * Scalability * Manageability * Server and Content balancing Has a single point of failure. #### Round Robin Allocate in a circular pattern #### Least Connections Allocate to the machine with the least connections #### Response Times Allocate to the machine with the quickest response time. #### Server Probes Gain even more information about the servers and allocate based on this. #### Weighting Apply a weighting to each machine and allocate accordingly. ### Scalability * Partition by function * Split horizontally * Avoid distributed transactions * Decouple functions asynchronously * Move processing to asynchronous flow * Virtualise at all levels * Cache appropriately #### Scaling * Scale horizontally (scale out) is preferential. Adding more machines * Scale vertically (scale up). Upgrade machines Cost basis - try to keep on commodity hardware. #### Partition Everywhere Functional segmentation into logical function (example is the database; split the database into the segments, continue splitting until no more splits can be made). Horizontal split using an internal ORM implementation, requires additional routing. ### Deployment ### Integration * Integration and the enterprise * Distribution vs Integration * Coupling 8 Types of integration * Main pattern topics in the [EAI book](http://eiapatterns.com) #### Integration and the enterprise Integrating separate applications so that they work together. Enterprises tend to have numerous systems which weren't implicitly designed, for example: * Email * Databases * Calendars * Accounting * HR * Website * CRM * Warehouse Management * Shipping * Stock * Disaster Recovery These may * Support different areas of the enterprise * Mergers, take overs, ... * Different technologies * Legacy systems #### Coupling Thins "fused" (read dependent on another) together. Multi-tiered applications tend to be tightly coupled. Messaging systems offer the chance to be loosely coupled by decoupling the ends of the communication, even offering asynchronous communication. #### Types of integration * File integration * Remote procedure invocation * Shared database * Messaging ##### File Integration ![File Integration](http://eaipatterns.com/img/FileTransferIntegration.gif "File Integration") Export data from one system and import into another using a file system. Works with different technologies. Issues of file format, shared file systems, timing. Question of how data is exported and imported? * Timed task * User managed * Other communication Automation problems. File locking might be required. ##### Shared Database ![Shared Database](http://eaipatterns.com/img/SharedDatabaseIntegration.gif "Shared Database") Collection of applications that need to share common data. Quick access to information for each application. DBMS removes a lot of issues from file integration. Tightly coupling the applications to a common area. ##### Remote Procedure Invocation ![RPC](http://eaipatterns.com/img/EncapsulatedSynchronousIntegration.gif "RCP") Invoking actions in addition to sharing data, using some form of RPC methodology: * Sockets * SOAP * REST * etc. Again needs a shared understanding of the data. No shared file format. Timing question of when do we start the transfer. Communication might fail, does that affect the coupling of the two applications? ##### Messaging ![Messaging](http://eaipatterns.com/img/Messaging.gif) Share data by sending Messages over a Message Channel. Application(s) pick up messages and process the information. Fire and forget for the sender. Message channel stores and forward to provide reliable message delivery which mimics the real world. ###### Message Formats * Custom data formats * Industry standards for exchange * XML #### Messaging Systems Example system using message systems: ![](http://eaipatterns.com/img/WgrusOrders1.gif) ##### Message Channel ![](http://eaipatterns.com/img/MessageChannelSolution.gif) *Logical pipe* between *senders* and *receivers*, the implementation of this vary: * Direct Connections * Central message hub Each channel has an identity, typically a name. Different channels exist for different methods of solving problems. ###### Point to Point ![](http://eaipatterns.com/img/PointToPointSolution.gif) ###### Publish and Subscribe ![](http://eaipatterns.com/img/PublishSubscribeSolution.gif) ##### Messages ![](http://eaipatterns.com/img/MessageSolution.gif) A unit of information containing a header and a body. Different types of message: 1. Messages to send commands 2. Messages to send documents 3. Messages to send events There are ways to specify how long messages are relevant for ##### Routing ![](http://eaipatterns.com/img/MessageRouter.gif) Sending messages might involve more than just forwarding. ###### Message Filtering Appling filters to messages ###### Content-Based Router Routing messages to other channels and/or filters based on the content of a message ###### Splitter and Aggregator ###### Recipient List ###### Others [See here](http://www.eaipatterns.com/MessageRoutingIntro.html) ##### Endpoints ![](http://eaipatterns.com/img/MessageEndpointSolution.gif) The interfaces to the senders and receivers * How do they link to the message channel * Polling consumer * Event driven consumer * Messaging gateway * Idempotent receiver * [Other](http://eaipatterns.com/MessagingEndpointsIntro.html) ##### Transformation ![](http://eaipatterns.com/img/MessageTranslator.gif) Isolate different stages and then translate later. ##### System Management The challenge for these systems is how do we know what's going on, given that there may be many different queues in a system. Object to object communication is easy(ish) to use a debugger or log. Web systems are a little harder, but possible to debug and monitor. But how do you monitor message based systems? ###### Message History ![Message History](http://eaipatterns.com/img/MessageHistory.gif "Message History") ###### Wire tap ![Wire tap](http://eaipatterns.com/img/WireTap.gif "Wire tap") ###### Message Store ![Message Store](http://eaipatterns.com/img/MessageStore.gif "Message Store") ###### [Others](http://eaipatterns.com/SystemManagementIntro.html) ##### Example - Widgets and Gadgets ![WGRUS Ecosystem](http://eaipatterns.com/img/WgrusContext.gif) ###### Infrastructure ![WGRUS IT Infrastructure](http://eaipatterns.com/img/WgrusSystems.gif) WGRUS has four different channels to interact with customers: 1. Web site 2. Call centre 3. Fax 4. Notification by email ###### Taking orders ![Taking Orders From Three Difrerent Channels](http://eaipatterns.com/img/WgrusOrders1.gif) ###### Processing Orders ![Activity Diagram for Order Processing](http://eaipatterns.com/img/WgrusActivity.gif) Implementing this we gets something like this: ![Order Processing Implementation using Asynchronous Messaging](http://eaipatterns.com/img/WgrusProcessingOrders.gif) Looking in at the inventory request: ![Routing the Inventory Request](http://eaipatterns.com/img/WgrusInventory1.gif) Because an order may have many items, split the order up: ![Processing Order Items Individually](http://eaipatterns.com/img/WgrusInventory2.gif) To know which item is for which type, we need to enrich the data at the order stage: ![Taking Orders With Enricher](http://eaipatterns.com/img/WgrusOrders2.gif) The high level view of the order process now looks like this: ![Revised Order Process Implementation](http://eaipatterns.com/img/WgrusProcessingOrders2.gif) ###### Checking Status Need to know the status of the order as it can take some time to process. Using a message store we can log the system: ![Adding a Message Store To Track Order Status](http://eaipatterns.com/img/WgrusMessageStore.gif) In situations where point-to-point MQs are used a wire tap is needed: ![Tracking Messages with a Wire Tap](http://eaipatterns.com/img/WgrusWireTap.gif) A process manage can be added to manage the flow of messages in the queues, providing two main bits of functionality: 1. Storing data between messages 2. Keeping track of progress and deciding the next step ![Processing Orders With a Process Manager](http://eaipatterns.com/img/WgrusOrderStatus.gif) ###### Change of Address What happens if an address changes? ![Including Address Data in the New Order Message](http://eaipatterns.com/img/WgrusAddressOrder.gif) What happen if the address change is not part of an order? ![Propagating Address Changes via a Separate Publish-Subscribe Channel](http://eaipatterns.com/img/WgrusAddressPropagate.gif) ###### New Catalog ![Updating Catalog Data via File Transfer](http://eaipatterns.com/img/WgrusCatalog.gif) ###### Announcements ![Sending Announcements With a Dynamic Recipient List](http://eaipatterns.com/img/WgrusAnnouncements.gif) ### Cloud How can we use *the cloud* for advanced internet applications? #### Overview What is the cloud? * The cloud * Cloud computing Principles of cloud computing X as a service #### Cloud - Computing as a Utility Largely a new business model. Just computers, networking and applications, but arranged different; pay for usage rather than buying the infrastructure, etc. #### The benefit to business What problems is the cloud trying to solve? How can the cloud be used for: * Small to medium enterprise (SME) manufacturing business * Startup company looking to launch an eCommerce site * Company gathering data from devices #### 5 Aspects of Cloud Computing 1. Pooled computing resources 2. Virtualised computing resources 3. Elastic scaling up or down according to need 4. Automated creation of new virtual machines or deletion of existing ones 5. Resources only billed as used *The Cloud at Your Service, Resenberg and Mateos, 2011, Manning Publications* #### Data Centres Cloud providers have large data centres at locations around the world. #### SaaS: Software as a Service The web browser is your computer. (Pay?) for resources used. Is this different from managed servers? Is this different from web applications? #### Platform as a Service (PaaS) #### Infrastructure as a Service (IaaS) A lower lever set of infrastructure to choose from: * Services * Data storage * Load balancing * Content delivery networks * Message queues Examples: * Amazon Web Services * Windows Azure ## Revision *Preparing for the examination in June* ### 2012-13 Paper #### 1. This question is about integration of enterprise systems and is based on an extension of the fish marketing system developed during the course. ##### a) Discuss what the terms *interoperability* and *integration* mean within the context of enterprise systems ### Example extract from 2011-2012 paper This question is about development technologies and enterprise application patterns applied to the HoBo Room Brokering application, developed during the module. #### a. Describe and compare the facilities provided by .NET and by the Java Platform, Enterprise Edition 6 (Java EE 6) for supporting the implementation of each of the following features within the application. ##### i Registration of human users and authentication and authorisation for use of the various aspects of the application by a range of user types * Java * Registration * Realms (File, JDBC, LDAP, ...) * Group of users * Facilities to manage/persist user data * Authorisation * Roles * Authentication * Configuration * Web server * .NET * Wizards/template (not many marks for such a galling answer) * ASP.NET Identity System * Authentication * Web.config (protected) * SQLServer (`.mdf`) * ActiveDirectory * OAuth * Authorisation * Roles ##### ii Concurrent use of the application by many users, each with their own server-side interaction context. * Java * Session beans * Distinguishing between request, session and application * Session Management? * .NET * Session Management * `Session["myKey"] = ...` * InProc * SQL Server (DB) * Session State Server * View State ##### iii Provision to client applications of a SOAP based search service that returns a set of descriptions of establishments with current availability, which meet given criteria. * Java * Annotations * WSDL generation of exposed class * Operations through `@WebMethod` on methods * `@WebService` on class * .NET * Attributes * Operations through `[WebMethod...]` on methods * `[WebService...]` on class * `.asmx` still exists, but older * `.WCF` Windows Communication Foundation * **Interface** * methods * Implementing class * Address Bound Contract (ABC) * WS* specifications * Generate proxy from WSDL (client) Problems of mapping data types between languages. Public properties/fields exposed for sending over a web service (attributes in .NET) Raw data only. Underlying standards and tool support. # SEM5720 - The Internet And How It Really Works The Internet is a complex, multi-organisation network reaching nearly all parts of the world. The functioning of this network and the applications running upon it depend on a complex set of protocols. This module addresses the fundamental aspects of the most important issues that permit the network and its applications to operate successfully. The module also addresses the current threats to the Internet and topics still emerging from R&D studies around the world. *Postal service analogies: 4* ## Introduction *This module discusses the detailed underlying operation of the Internet and its constituent components and is an essential topic in its own right as well as providing a solid foundation for much of the other material covered in the MEng.* ### Staff - **[Dave Price](mailto:dap@aber.ac.uk)** - [Nitin Naik](mailto:nkn@aber.ac.uk) - [Stephen Kingston](mailto:spk@aber.ac.uk) ### Learning Outcomes 1. Participate in planning networks that are cost effective and realistic in terms of products and services currently available. 2. Critically assess proposed networking solutions. 3. Assess the effect of likely technological developments on existing network applications. 4. Make decisions and provide guidance to others in the choice of appropriate communications technologies to deploy, to solve real world requirements. 5. Demonstrate extensive knowledge of the internal operation of the Internet and its protocols. 6. Demonstrate an appreciation of the problems that appear in the management of routing and naming in large networks. 7. Exercise judgment in the choice of appropriate protocols and services to address the real needs of Internet operators and users. 8. investigate, and propose solutions to problems of quality of service. 9. Demonstrate an appreciation of the security issues that surround the Internet and its applications and how these can be mitigated. 10. Explain the need for a new generation of the Internet and describe current progress towards it. ### Assessment - Assignment **40%** *??/??/2013 to ??/??/????* - Exam **60%** ### Commitment 20 credits = 200 hours of work. 44 hours of lectures, around 20 hours of practicals. This leaves about 140 hours of personal study, including extra practical work. There is an assignment worth 40% of the marks (2000 word report). Text book study and revision. 2 hour exam. ### Recommended Reading * TCP/IP Protocol Suite - *Foruzan B., Fegan S.* * RFC or internet drafts available online from [IEFT](http://www.ietf.org/) * Other textbooks and journals * [The Internet Protocol Journal](http://cisco.com/ipj/) by Cisco Systems ## Practicals *Practical work sessions focusing mainly on the electronics and hardware of network issues.* ### Practical 1 Using the computer connected digital oscilloscopes or *picoscopes*. ### More DNS `nslookup minted.dcs.aber.ac.uk` whilst capturing packets on port 53 captured **8** packets. The type of the message was of type **A** and of class **IN** and was a **recursive** lookup. There was no response because the message was a query. The server this was sent to was `193.60.11.253`. In `/etc/resolv.conf/` the nameserver: `127.0.1.1 c57net.aber.ac.uk`. **1** question and **3** answers were held in the first answering packet. **4** authoritative records were received specifying the 2 authoritative nameservers from `dcs.aber.ac.uk` and 2 from `yale.ac.uk`. The additional records provided all possible IP addresses of all nameservers contacted during the lookup to cache the information. `nslookup set type=MX aber.ac.uk` was then captured. The answer record contained: `Mail exchange: aber-ac-uk.mail.protection.outlook.com`. The authorative record contained nameservers for: `aber.ac.uk` (3) and `yale.ac.uk` (1). `nslookup set type=SOA dcs.aber.ac.uk` was then captured. The email of the person responsible for this domain is: `cs-root.aber.ac.uk`. There are 4 authoritative nameservers for this domain: `dcs.aber.ac.uk` (2), `aber.ac.uk` (1) and `yale.ac.uk` (1). `nslookup aber-ac-uk.mail.protection.outlook.com.` The response from this was from microsoft. 2 IP addresses which can switch (load balanced?). `nslookup 193.60.11.36` had a type of `PTR` for the name `36.11.60.193.in-addr.arpa`. Still used the standard nameservers. `nslookup sillyname` gave an authoritative and non-authoritative response (one authoritative for `sillyname.c57net.dcs.aber.ac.uk` and non-authoritative for `sillyname`). Uses two different sets of nameservers, the Aberystwyth local ones and the root nameserver. `dig @dns0.aber.ac.uk 98.34.124.144.in-addr.arpa PTR` shows lots of packets, initially gathering DNS information, then on `144.124.34.98`. `dig www.yahoo.com A` shows that yahoo uses `CNAME` instead, with an IP address of `87.248.122.122`. ## Basic Issues in Data Communication *A revision of the basic issues in data communication.* ### Protocol Models and Frameworks In the 1970s there was no master plan, overall structure nor agreements on application protocols. Proprietary protocols and architectures lead to a large amount of anarchy. ### ISO Committees In 1977 ISO establishes committees and subcommittees and so on and so forth. Not just ISO doing this, telecommunications (CCITT) also got involved. ### OSI Reference Model - IS 7498 Provides a basic framework using a "divide and conquer" principle. Uses layering to reduce complexity, where each layer handles one (group of) problem(s). 1. Keep things simple. 2. Choose boundaries at places that minimise interaction between adjacent layers. 3. Functions of a different nature or purpose in different layers 4. Similar function in same layer. 5. Use all part knowledge and experience. 6. Hide implementation within layers 7. Special hardware/processors 8. Data abstraction levels. 9. Internal changes do not affect other layers 10. Only create interfaces to directly surrounding layers (controversial). ### OSI Reference Model Layers 1. Physical - wires, radio frequency 2. Data Link - direct link from one to another 3. Network - global issues like addressing 4. Transport - methods for ensuring quality of service. 5. Session - availability of resources, "checkpoints", etc. 6. Presentation - Language/character set encoding 7. Application - Not supposed to contain or control whole applications, just the useful parts to applications, e.g. directory service ### Exercise Discuss the statement: *The existence of a communications framework like the OSI model promotes competition between companies*. #### For * Companies spend more time actually building products than on firmware to deal with communications. * Standardised method of communication allow products to be linked: more scope for different products. #### Against * No competition to improve the network process. * No differentiation. ## Local Area Networks *A detailed study of variants of the technologies collectively known as Ethernet.* ### Ethernet A passive, contention-based broadcast technology that uses baseband signalling. #### Passive No one device controlling the network. #### Contention-based Each device must compete with every other device for access to the network. #### Broadcast Ever device on a shared network fears broadcast transmissions. #### Baseband Signalling Entire bandwidth of a cable for a single transmission. ### 802.3 CSMA/CD Bus (Ethernet) Derived from work by: * Xerox * Intel * Digital (DEC) Publised in Ethernet (DIS Blue Book) V1.0 September 30th 1980. 802.3 Revision D dated December 1982 publish by IEEE. Revision F published July 1984 and also issued as draft proposal 8802/3 by ISO. ### Operation: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) *Distributed Control* * Contend for use * Acquire access and send a data packet * No priority Simple Algorithm: ``` 1. if(network active) goto 1 transmit(data) check for collision if(collision) transmit(JAM) delay(a random time) goto 1 transmission complete ``` Checking for collision is the difficult part. Can check: 1. I'm transmitting a `1`, is there a `1` on the wire. 2. Do the signals on the wire look like the output of one station and one station only. Delay must be random to avoid future collisions and not have priority. ### Coaxial cable Very rigorous specifications. Maximum length of 500m and limits on bend. These are to stop the electric current being interfered with. Hard limit on propagation: `c` (speed of light). Transceiver (MAU) to link devices to co-ax conveniently. Terminators absorb the signal at the ends of the cable. Repeaters repeat the signal onto other cables, repeat a perfect version of the signal (if possible). ### Broadcast Networks If data is inserted into the network, it should reach every member on the network. Need to know how long it will take for data to get across the network. Rules tend to be the distance between two extremes in the network is 5 of the co-ax cables and 4 repeaters. 2 of the co-ax cables must be point to point without any devices attached. Bridges, switches, hubs and routers help build larger networks. ### Co-ac Medium Options 802.3 * Original (10BASE5) * 10Mbps * Baseband coaxial cable * segment length 500m * Max 10 taps per segment * Max 4 repeaters * 10BASE2 * 10Mbps * Thinner coaxial cable * 185m segments * Max 30 taps per segment. ### Slot Time Worst case: `B` stars transmitting when `A`s packet has nearly reached `B`. `B` must corrupt, at least, the last bit of `A`s packet. Therefore, packet length must be such that time to transmit is greater than twice the **transmission delay** for longest route in the network. For a 2.5km network, 10MBit/sec, the slot time is 51.2 micro seconds. Smallest allowed packet needs to be the number of bits it takes to reach from one end of the network to the other, times 2, plus a little extra for luck. 512 bit packets capture the network for 10BASE5. > *"Your bits are now all over the network"* - Dave Price, 10/10/2013 ### Calculatng Slot Time Slot time `s`. Path time `p`. `s > 2p` * 3 coax (with stations) = 21.65 bits * 2 links (no stations) = 25.65 bits * 4 repeater delays = 7.5 bits * Transceiver delays = 6.0 bits * Transceiver cable delays = 2.57 bits * etc. Total about 499 bits worst case. Specified as 512 bits to give a safety margin. ### Network JAMs * Repeaters required to propagate JAMs. * Must make sure all repeaters see JAMs. * Time delay of 500m coax about 2.165 micoseconds * 32 bits JAM ... ### MAC Frame Format for 802.3
Number of OctetsField Usage
7Preamble
1Start of frame delimiter
6Destination Address
6Source Address
2Length in 802.3
1500LLC data and padding
4Frame Check Sequence
* Preamble synchronises. ### Ethernet Addresses 48 bits long. Designed to be globally unique. ![](http://upload.wikimedia.org/wikipedia/commons/thumb/9/94/MAC-48_Address.svg/475px-MAC-48_Address.svg.png) Globally unique should be completely unique, not always though. Not expected to be able to be changed, can now be programmed (simply, apparently; `nmap --spoof-mac 0`) ### Frame details Minimum frame size is 512 bits, not including the preamble. Maximum frame size is 1518 octets (1500 octets of data). Above assumes 48 bit addresses which IEEE 302.3 says "shall be used" for 10 Mbps networks. 9.6µs inter-frame gap to allow desynchronisation. Time for frame is 51.2µs plus 6.4µs for preamble. ### Collision Backoff and Retransmission Transmitter tries to send for a maximum of 16 attempts to send a frame. The transmitter waits for an integer multiple slot times determined by the following algorithm prior to each retry. Rescheduling of each transmission uses "truncated binary exponential backoff" This ensures a random but increasing delay if many collisions occur: `DelaySlots = rand[0 <= r < 2^k]` Where `k = min(n, 10)` for the `n`th retansmission ### Probabilistic Characteristics of CSMA/CD All access to the 802.3 LAN only completes with some probability. It is thus impossible to guarantee transfer rates. When the net is very busy collisions might go on forever. Debates often take place on the usability of CSMA/CD LANs for real time use. The answer depends on the true use intended rather than the LAN. ### Bridges Partition LAN to segregate load, add reliability, add security. Combine remote LAN segments into a single logical network. Combine separately developed and controlled LANs. IEEE 802 LANs often include bridges. Repeaters clean and forward all data. Bridges selectively forward data. It will store and forward complete packets. Forwarding based on header information, Sometimes known as *MAC level relay*. #### Source Routing Bridges Hosts discover the route to each other host and are very aware of the presence of multiple LANs couple by bridges. Needs identifiers for bridges and LANs. Data is transmitted with routing attached. Bridges obey the routing. #### Transparent Bridges Bridge learns (or is told) the LAN on which each address exists. Hosts need not to know anything about the location of other hosts and indeed are not even aware of the presence of the bridges. Frames which arrive are handled in one of 3 different ways: 1. Same LAN 2. Different LAN 3. Unknown destination. 4. Broadcast address. ##### Same LAN If destination address on same LAN as source address then discard the packet ##### Differnet LAN If destination address on different LAN, forward packet. ##### Unknown Destination If location of destination address is not known then 'flood'. ##### Broadcast Address 'flood'. ##### Flood Forward the packet to every other network attached to the bridge. ##### Address Learning Bridges listen to the traffic and can learn where machines where by looking at the source addresses in the packets. ##### Address Tables Need to be quite large depending on the network. Generally there is a time-out associated with the addresses known. ##### Bridge Cycles Address Learning works well if there are no alternative routes in the internetwork connections (i.e. a tree structure). However there often are alternative routes, so bridges might cause loops. ##### Solution to Bridge Cycles Need a protocol to avoid cycles. Result for graph theory states: > For any connected graph, there is a *spanning tree* of edges which maintains the connectivity but contains no closed loop. Each LAN represents a graph node and each bridge corresponds to an edge. ###### Spanning Tree 1. Bridges have numbers 2. Broadcast number every few seconds 3. One bridge becomes the *root bridge* 4. Bridges discover route to *root* via *root port*. 5. Routes might have costs. 6. A *designated bridge* is determined for each LAN (minimum cost of path to root). 7. Only the designated bridge can forward to and from its LAN. 8. Bridges communicate with a Bridge Protocol Data Unit (BPDU) consisting of: * The originating bridge number * The number of the bridge thought to be the root * The path cost to root. Initially every bridge thinks it is the root and it broadcasts a BPDU to assert this fact. If a bridge gets a BPDU indicated a 'superior' bridge exists it assigns its root port and the path cost to root. If a bridge gets a BPDU from a bridge with a shorter root path it releases any claim to be the designated bridge for the segment. The lowest numbered bridge becomes the root. Bridge ports which are not root or designated ports are blocks. #### Local Bridge Connect two (or more) adjacent LANs. Throughput is likely to be high. Hosts not likely to notice much performance degradation unless waiting for each packet to be acknowledged. #### Remote Bridge Connect two (or more) LANs which are widely separated. Bridge consists of two 'half bridges' connected by a WAN type link. Link typically 64Kbps to 2Mbps. #### Managed Bridges Bridges often available in a managed form. These are managed from a station and can load the bridges with lots of things. Lots of monitoring, etc. Another protocol needed and this needs to be standardised. ## Other Network Technologies *A brief look at fast and wireless network technologies.* ### Alternative LAN Technologies * 802.4 - Token Bus * 802.5 - Token Ring Standards differ at the physical layer Differ at MAC sublayer of Data Link Layer. Compatible at the LLC sublayer of the Data Link Layer. #### Token Bus Uses a bus physical topology and ring logical topology. The physical order of nodes does not matter. Coaxial cable served as a common communication bus. Token was created bus the token bus protocol to manage access to the bus Any station that holds the token packet has permission to transmit data. The station releases the token when it is done communication or when a higher priority device needs to transmit. Two major problems: 1. Bus failure leads to a network failure. 2. Adding a new node is not simple. #### Token Ring Similar to token bus, but in a ring physical ring topology. Messages are transferred in one direction along the ring at all times. Token Ring networks sequentially pass a token to each connected device. When the token arrives at a particular node, the recipient is allowed to transmit data onto the network. Since only one device may be transmitting at any given time, no data collisions occur. Access to the network is guaranteed Time-sensitive applications can be supported. Still used for some real-time applications. Based initially for the mainframe infrastructure. Problems: 1. Costly 2. Complex. ### Quick notes on Ethernet A quick summary of 802.3 ### Why Ethernet is used * Inexpensive * Easy to install, maintain, troubleshoot and expand * Widely accepted industry standard. * Structured to allow compatibility with Network Operating Systems. * Reliable ### Ethernet Versions 1. Ethernet 10 BASE-T/F 802.3 (10 Mbps) 2. Fast Ethernet 100 BASE-T/F 802.3u (100 Mbps) 3. Gigabit Ethernet 1000 BASE-T/F 802.3z (1000 Mbps) 4. 10 Gigabit Ethernet 10 GBASE-F 802.3ae (10 Gbps) #### Coding Convention * Network speed * Baseband Signalling * 2 or 5 stands for coaxial cable and length * T stands for Twisted Pair * F/FL/S/L Stands for Fiber-Optic * X stands for full duplex. ### Ethernet Components * CSMA/CD (No protocol for 10 GigE) as 10 GigE is completely full duplex so it isn't needed. * Baseband Signalling. * Ethernet cables. * Ethernet card or adapter. * Ethernet devices. #### Ethernet Cables: Twisted Pair (CAT5/CAT5e) * CAT5: Ethernet and Fast Ethernet. * CAT5e: Gigabit Ethernet. Contains four pairs of copper wire. Cable runs are limited to a maximum run length of 100m. Operates at 100MHz. CAT5 comes in two main varieties, solid and stranded. Fast Ethernet communications only utilise 2 out of 4 pairs. CAT5 enhanced (CAT5e) uses all 4 pairs to support GigE over short distances. CAT5e is backwards compatible with CAT5. CAT6 (250MHz) and CAT7 (600MHz). ### Signalling Baseband is a single transmission involved (digital signal) Broadband can handle multiple transmissions with a modulated analogue signal. #### Baseband Signalling Network uses all available signal frequencies or the entire bandwidth. One signal can be transmitted at a time. Light or electrical pulses based transmission. Related to digital technologies, but can be used for analogue technologies. Bi-directional communication (Half duplex) is possible, but difficult. Multiplexing (using Time Division Multiplexing) allows transfer of multiple signals on a single medium. #### Broadband Signalling Network uses only one frequency or a part of the entire bandwidth. Multiple signals can be transmitted at a time. Electromagnetic or optical waves transmission. Related to analogue technologies but can be used for digital technologies. Bi-directional communication is possible and fairly easy. Multiplexing (using Frequency Division Multiplexing) allows transfers of multiple signals on a single medium. ### Multiplexing and Demultiplexing Putting more than one signal on a wire at once and getting it back again at the other end. #### Time Division Multiplexing Gives the illusion of multiplexing. Each signal is allowed a slot of time to transmit for. #### Frequency Division Multiplexing Actual multiplexing. Each signal uses a single frequency channel to transmit. ### Fibre Optic Supports all Ethernet versions above 10BASE5. Carries high-bandwidth data through beams of light carrying electromagnetic signals. Not so affected by noise. Can carry signals up to approximately 70 miles without amplification. Light can be pulsed in a single wavelength to represent 45 Gigabits of information. #### Single-Mode Fibre Optic Transmits one signal per fibre. Diameter: 8.3 to 10 microns that has one mode of transmission. Speed and distance: up to 40Gbps over up to 100km Used in telephones and cable TV applications. ####Multi-Mode Fibre Optic Transmits many signals per fibre. Multi-modes result from the fact that light will only propergate in the fibre core at discrete angles within the cone of acceptance. Diameter: 50 to 100 microns. Speed and distance: 10Mbps to 1 Gbps over 275m to 2km. Used for Computer Networks and LAN Applications. ### Network Adapters Adapters for connecting to ethernets. Ethernet cards, etc. #### 10/100 Ethernet Connect computer using a PCI, PCIe or ISA moethboard interface slot, CAT5/6 #### Gigabit Ethernet PCIe slots, CAT5/5e/6/7 and Fibre Optic cables. #### Fibre Optics For tier 1 and 2 internet backbones, External device, Fibre Optic cables. Slightly different, can use not only for network but for the backbone (a sophisticated network). #### Wireless Network Interface Cards Internal or external device #### Wireless Dongles External device ### Ethernet Devices Note that most devices are completely tied to a specific function, some repeaters might have some Data Link Layer functions depending on the manufacturer. #### Repeater Repeat an electrical signal including noise. Takes an incoming signal and then generates a new, clean copy of that exact signal. Overcomes signal attenuation problem. Allows LANs to extend beyond normal distance limitations. Physical layer device. Doesn't read data frames, just repeat the signal unintelligently. #### Hub Multiport repeater. Broadcasts frame to all ports and devices. Allows users to share Ethernet for transmission of data onto a single network (collision domain). Physical Layer device. Doesn't read data frames. Repeater is usually used for the extension of the length of a network, while the hub is a simple connectivity gadget that is used to broaden a network. #### Bridge Connect two LAN segments of similar or dissimilar types such as Ethernet and Token Ring. Split a networking into separate collision domains. Map the Ethernet addresses of the nodes residing on each network segment and allow only the necessary traffic to pass through the bridge. Do not forward bad or misaligned packets. Works on the Physical and lower Data Link (MAC) Layers #### Switch Expansion of the Bridge. Used in heavily loaded networks to isolate data flow and improve performance (parallelism). Up to on-half of the computers connected to a switch can send data at the same time. Data Link and Network Layer device. ##### Cut-through Switches Are faster because they examined the packet destination address only before forwarding it on to its destination segment. ##### Store-and-forward Switches Work like bridges in that they accept and analyse the packet before forwarding it to its destination. Takes more time to examine the whole packet, although it does allow the switch to catch certain packet errors and keep them from propagating through the network. ### Ethernet Encoding Methods for encoding bits in different Ethernet technologies. #### Manchester Coding Used in 10-Base* Bipolar scheme, synchronisation via a clock. ### Fast Ethernet 802.3u CSMA/CD and Full Duplex Backwards compatible. Physical layer structure (PHY) has been redesigned. Complex signal encoding mechanisms than the simple Manchester code. Uses three sublayers. #### Convergence Sublayer Introduces the concept of auto-negotiation; negotiates between two different medium to make the properties similar to for communication #### Media Independent Interface Interfaces the physical medium used so that it can be handled generically. #### Media Dependent Sublayer Deals with encoding, transmission in a way specific to the medium. ### Gigabit Ethernet 802.3z Two distinctive approaches for medium access: 1. Half Duplex 2. Full Duplex Most follow the full-duplex approach. Star topology. Support standard Ethernet frame format. Uses 802.3x flow control. Backwards compatible. Physical layer is more complex #### Gigabit Media Independent Interface #### Physical Layer #### Media Dependent Interface #### Medium ### Encoding Digital to digital encoding schemes. #### Unipolar Encoding Only use one pole (`+` or `-`). One pole is used to represent `1` or `0`, zero is used to represent the other. ![](http://file1.hpage.com/000143/66/bilder/picture3.png) ![](https://lh6.googleusercontent.com/i_wHvlOcxGA9HilYhdmFM5wjbo1dJ3q9q8qc6vKMkl8gbgqkBW2vhGjzoioF9gS8DA=s400) Two problems: 1. DC Component 2. Synchronisation DC cannot be transmitted properly through an AC line without noise. The receiver cannot synchronise its clock to the sender. #### Polar Encoding Uses both poles to represent bits. ##### Non-Return to Zero (NRZ) ###### NRZ-Level 1 as positive voltage, 0 as negative voltage. ![](https://lh6.googleusercontent.com/mP9wm9qDd9b5JoiZKmh5FfRDC6aLo7LUd5Vy3lMwRK_7ZBf5rayz67dOzaonQSEY0g=s400) ###### NRZ-Inverse Voltage inverts on a 1. ##### Return to Zero (RZ) * Positive to 0 is 1. * Negative to 0 is 0 ![](https://lh6.googleusercontent.com/HsrRIApzAFwlGCCoFtthVZmKRsROFZdEpN6cpv8tsQsXOl-PVGbD7wkO6adq2rMqUw=s400) ##### Biphase ###### Manchester ![](https://lh4.googleusercontent.com/MA7WfPVKktOw6LLGbu3pQCVMswe4cqHsZnrQ9oNDPqjR28qPOw7j7IjaakklukHDYA=s400) ###### Differential Manchester ![](https://lh6.googleusercontent.com/GsN5RdpXdhqMLMub0RgampWgREFpBMwcot-BmHSzY2GxGPC18HYRUh06YLzj3Mpuig=s400) #### Bipolar Encoding ![](https://lh4.googleusercontent.com/dsAffc-caZWoTueproHHYXmv7b46ebT0bHws-HJNeDVE6YO3rmWLCe9RgoSy0ljo2g=s400) ##### Alternating Mark Inversion ##### Bipolar 8 Zero Subsitution ##### High Density Bipolar 3 ### Broadband #### Digital Subscriber Line Uses local telephone lines. Allows simultaneous void and data transmission. ##### Asymmetric Digital Subscriber Line (ADSL) Designed to deliver more bandwidth down than up. Rates range from 1.5 to 9 Mbps down, 16 to 640kbps up. Distances of 18,000 feet over a single copper twisted pair. A splitter is needed. ##### Very high bit-rate Digital Subscriber Line (VDSL) VDSL is designed to provide higher bandwidths. Up to 52Mbps down, 16Mbps up. ##### VDSL 2 Support the wide deployment og triple play services. 100 Mbps down, 20 Mbps up 150 to 500m distance. 500 Mbps transmission rates over copper cabling by using crosstalk cancellation or "vectorized" VDSL2 based modems. ##### BT Infinity Now rolling out ultrafast fibre optic broadband with speeds of up to 100Mb. Next step up from FTCC is FTTh (Fibre to the Home). Negative factor is the costs. ###### Infinity-1 Top speeds are up to 40Mbps down and 10Mbps up ###### Infinity-2 Topspeeds are up to 76Mbps down and 19Mbps up. #### Cable ### Wireless LAN (802.11) #### Station (STA) #### Access Point (AP) Access points act as a central transmitter and receiver of WLAN radio signals. #### Basic Service Set (BSS) A set of STAs that communicate with each other. When two or more stations communicate together they form a BSS. ##### Infrastructure Mode All wireless clients are connected to an AP. Generally the default mode for the 802.11b cards. All mobile STAs communicate with each other via the AP. Network consumes double bandwidth for one communication. AP provide buffering of traffic for a mobile broadcast. ##### Ad-hoc Mode Wireless clients are connected without an AP. An ad-hoc network is where stations only communicate peer-to-peer (P2P). There is no base and no one gives permission to talk. Mostly these networks are set up spontaneously. #### Distributed System (DS) Two or more BSSs are interconnected using a distribution system. APs communicate via the DS. The DS is the backbone of the WLAN and may be constructed of either wired or wireless networks. The DS is also a thin layer in each AP. That determines if communications received from the BSS are to be relayed back to a destination in the BSS, forwarded on to the DS to another AP or sent into the wired network infrastructure to a destination not in the ESS #### Extended Service Set (ESS) An ESS is a BSS where APs communicate amongst themselves from one BSS to another.. Entire network looks like an interdependent BSS to the Logical Link Control (LLC). This means stations can communicate or even move between BSSs transparently to the LLC. #### Portal The logical point at which medium access control (MAC) service data units (MSDUs) from a non-IEEE 802.11 local are network (LAN) enter the distribution system (DS) of an extended service set (ESS). #### WLAN Frame #### MSDU and AMSDU Multiple MSDUs are aggregated at the MAC layer and are pushed into a single MPDU. They have a single frame header with multiple frames and they are destined for the same client and the same service class. ##### Mac Service Data Unit (MSDU) It is the MAC service data unit. This is the unit o transmission used at the MAC layer which is received from the upper layer. ##### Aggregated Mac Service Data Unit Aggregation of the MSDU directly performed at the MAC layer is called AMSDU. Such AMSDUs are now passed to the lower PHY later where they are dealt with as MPDUs #### MPDU and AMPDU ##### Mac Protocol Data Unit (MPDU) These are the frames passed from the MAC layers into the PHY later. ##### Aggregated Mac Protocol Data Unit These are the aggregated MPDU units which are pushed into a single Physical Protocol Data Unit (PPDU). #### DS Services While the implementation for the DS is not specified, 802.11 does specify the services which the DS must support. ##### Station Mobility ###### No Transition If a station is not moving within its own BSS or it is not moving ###### BSS Transition If a station moves between BSSs within the same ESS ###### ESS Transition If the station moves between BSSs of differing ESS. ##### Station Services (SS) ###### Authentication With a wireless system, the medium is not exactly bounded as with a wired system. In order to control access to the network, stations must first establish their identity. The authentication relationship may be between two stations inside an IBSS or to the AP of the BSS. Authentication outside of the BSS does not take place. Two types of authentication: 1. Open System Authentication 2. Shared Key Authentication ###### Deauthentication When either the station or AP wishes to terminate a stations authentication. ###### Privacy An encryption algorithm, which is used so that other 802.11 users cannot eavesdrop on your LAN traffic. ##### Distribution Station Services (DSS) ###### Association A station must affiliate itself with the BSS infrastructure if it wants to use the LAN. This is done by Associating itself with an AP. Associations are dynamic in nature because stations can move, turn on or turn off. A station can only be associated with one AP. This ensures that the DS always knowns where the station is. Association supports no-transition mobility, but is not enough to support BSS transition. ###### Reassociation The service allows the station to switch its association from one AP to another. Both association and reassociation are initiated by the station. ###### Disassociation When the association between the station and the AP is terminated. Can be initiated by either party. A disassociated station cannot send or receive data. ESS transition is not supported. A station can move to a new ESS but will have to reinitiate connections ###### Distribution Getting data from the sender to the receiver. The message is sent to the local AP (input AP) then distributed via the DS to the AP (output AP) that the recipient is associated with. If the sender and receiver are on the same BSS the input and output APs are the same. The distribution service is not logically invoked whether the data is going through the DS or not. ###### Integration Where the output AP is a portal. Thus 802.X LANs are integrated into the 802.11 DS. #### Physical Layer ##### Spread Spectrum The three physical layers originally defined in 802.11 included two spread-spectrum radio techniques and a diffuse infrared specification. Spread spectrum is a communication technique that spreads a narrowband communication signal over a wide range of frequencies for transmission then de-spreads it into the original data bandwidth at the receive. Spread spectrum increases the bandwidth of the signal compared to narrow band by spreading the signal. ###### Techniques The Code of Federal Regulations (CFR) Part 15 originally only described two spread spectrum techniques to be used in the licensed free Industrial, Scientific, Medical (ISM) band, 2.4 GHz, thus 802.11 and 802.11b. * Frequency Hopping Spread Spectrum (FHSS) * Direct Sequence spread Spectrum (DSSS) Orthogonal Frequency Division Multiplexing (OFDM) was not covered by the CFR and would have required licensing. 802.11a, employing OFDM, was created to work in the 5GHz. ###### Frequency Hopping Spread Spectrum (FHSS) FHSS spreads the signal by hopping from one frequency to another across a bandwidth of 83 MHz. The data is spread over 83 MHz in the 2.4 GHz ISM band. A short burst of data is sent on one frequency (usually less than half a second). Then the sender changes to another pseudorandom frequency and broadcasts another burst of data before changing to another frequency, and so on. The carrier then repeats this pattern. ###### Direct Sequence Spread Spectrum (DSSS) DSSS spreads the signal by adding redundant bits to the signal prior to transmission. The signal is divided into many different parts and sent on different frequencies simultaneously. Spectrum is divided into 13/14 channels. However, the FCC specifies only 11 channels for non-licensed (ISM band) use in the US. ##### Narrowband and Wideband In communications, band is referred to as the range of frequencies (bandwidth) used in the channel. Depending on the size of the band (in terms of kHz, MHz or GHz) and some other properties of the communication channel, they can be categorized as narrowband and wideband etc. Narrowband uses a smaller frequency range (bandwidth)- 300–3400 Hz. Wideband is a relative term, and the size of the band may be in kHz, MHz or GHz depending on the application. ##### Narrow Band vs. Spread Spectrum ###### Narrow Band Uses only enough frequency spectrum to carry the signal * High peak power * Easily jammed * Easy to detect * Easy to intercept ###### Spread Spectrum The bandwidth is much wider than required to send to the signal * Low peak power * Difficult to jam * Hard to detect * Hard to intercept ### Wireless Mobile Communication/Cellular Telephony. Utilises lower frequency radio spectrum. #### Mobile RF Spectrum Typically delivered over a wide range of radio frequency spectrum bands (e.g. 900MHz, 1800MHz, 2.6GHz, etc.). Most of which can also reach indoors to some degree. #### Architecture of the GSM Network ##### Base Station Subsystem (BSS) Composed of one or more BSC. BSS is the point where all radio transmission related functions are performed. BTS and BSC connected through the **Abis interface**. BSS connected to the MSC through the **A interface**. ###### Base Station Controller (BSC) The management of several BTS is done by the BSC. It also provides all the control functions and physical links amoung the different BTS and between the mobile switching centre (MSC) and the BTSs. Being a high-capacity switch, it provides functions such as cell configuration data, control of radio frequency power levels in BTS, frequency hopping, and handovers. One MSC serves a number of BSCs. ###### Base Transceiver Station (BTS) It is a station or site where antennas and radio transmitters and receivers are placed to create a radio coverage area in the mobile network. Contains one or more transceivers (TRC) and antennas. The cell site has a 360 degree omni-directional (omni-sector) antenna that is turned to create a cellular area of a specific size. Omni-sector means the same frequencies are used in all directions. Communication from the mobile terminal to the cell site is referred to as uplink. Cell site to mobile terminal is downlink ###### Mobile Station (MS) Mobile phone with ME and SIM. It consists of the mobile equipment (uniquely identified by International Mobile Equipment Identity (IMEI) The SIM card contains the International Mobile Subscriber Identity (IMSI) used to identify the subscriber to the system, a secret key for authentication and other information. The IMEI and the IMSI are independent, thereby allowing personal mobility. The SIM card may be protected against unauthorised by a password or PIN. ##### Network Switching Subsystem Contains the network elements MSC, LR, VLR, EIR, AuC and GMSC. ###### Mobile Switching Centre (MSC) Primary service delivery node for GSM and central components of NSS. It provides all the functionality needed to handle a mobile subscriber, such as: * Registraion * Authentication * Sets up and releases the end-to-end connection * Location Updating * Handovers * Cell routing and roaming subscribers HLR and VLR, together with the MSC, provide the call-routing and roaming capabilities of GSM. It takes care of charging and real time pre-paid account monitoring. These services are provided in conjunction with several functional entities, which together form the Network Switching Subsystem (NSS). The MSC provides the connection to the fixed networks (such as the PSTN or ISDN). ###### Home Location Register (HLR) The database that contains a subscription record for each subscriber of the GSM network. All the administrative information related to each subscriber registered in the respective communication network, including the current location of the subscriber, is contained in the HLR. The HLR is responsible for the sending of subscription data to the VLR (during registration) or GMSC (during mobile terminating call handling). The location of the mobile is typically in the form of the signalling address of the VLR associated with the mobile station. A GSM subscriber is normally associated with one particular HLR. There is logically one HLR per GSM network, although it may be implemented as a distributed database. ###### Visitor Location Register (VLR) The database that contains subscriber data for subscribers registered in a MSC. It contains all the temporary information about the subscribers. This information is needed by the MSC to service the visiting subscribers. Every MSC contains a VLR. Although MSC and VLR are individually addressable. They are always contained in one integrated node. ###### Equipment Identity Register (EIR) A database of all valid mobile equipment on the network. Where each mobile station is identified by its International Mobile Equipment Identity (IMEI). An IMEI is marked as invalid if it has been reported stolen or is not type approved. ###### Authentication Centre (AuC) A protected database that stores a copy of the secret key stored in each subscriber's SIM card which is used for authentication and encryption. ######Gateway MSC (GMSC) Switching entity that controls mobile terminating calls. When a call is estabished towards a GSM subscriber, a GMSC contacts the HLR of that subscriber, to obtain the address of the MSC where that subscriber is currently registered. That MSC address is used to route the call to that subsciber ##### GSM Interfaces ###### A Interface The connection between MSC and BSC. ###### Abis Interface The connection between BSC and BTS. ######D interface The connection between MSC and HLR. ###### Um interface The radio connection between MS and BTS. #### Cell Cellular telephony derives its name from the partition of geographical area into small cells. A cell is roughly circular (exactly hexagonal) area with a central transmitter and receiver base station. The size and shape of each cell is determined by the features of the surrounding area, such as buildings, trees and hills, which can block signals. In a city, there are many small cells, while rural areas may have very large cells. #### Sector Site coverage (cell) is partitioned into different directions called sectors. Each cell is usually split into tree sectors. Which overlap with other sectors of neighbouring cells so network is uninterrupted. #### Switched Communications Networks Long distance transmission between stations (called end devices) is typically done over a network of switching nodes. Switching nodes do not concern with content of data. Their purpose is to provide a switching facility that will move the data from node to node until they reach their destination. A collection of nodes and connections forms a communications network. In a switched communications network, data entering the network from a station is routed to the destination by being switched from node to node. ##### Circuit Switching Network A dedicated radio channel is allocated to a single transmission As long as data transmissions are long and continuous (file transfers) a circuit is used efficiently. However, most data transmissions are bursty, and dedicating an entire circuit to them is usually a waste of valuable wireless bandwidth. During idle periods when no data is being sent, bandwidth is still dedicated to the user and not available for others to use. ##### Packet Switching Network An entire network may be designed just for packet data. Packets do not have a dedicated path, it is decided by the routers. Most old wireless data systems offered minimal data rates, usually in the 10Kbps range. However new wireless protocols bond multiple channels to increase data rates. #### Multiple Access (Multiplexing) Protocol Mobile development is facing the major problem to find protocols that could be used to maximise bandwidth efficiently. Multiple access enables ... ##### Frequency Division Multiple Access (FDMA) ##### Time Division Multiple Access (TDMA) ##### Code Division Multiple Access (CDMA) Each user is assigned a different psuedorandom binary sequence that modulates the carrier, spreading the spectrum of the waveform, giving each user a unique code pattern. #### Cellular Generations ... ##### 1G ##### 2G ##### 2.5G ##### 2.75G ##### 3G ##### 3.5G ##### 3.75G/Pre-4G ##### 4G ## Standards *The ISO OSI model.* ### IEEE 802 Standards ![IEEE 802](http://misc.alexanderdbrown.com/ieee802.png "IEEE 802") Encapsulation at each level. ![IEEE 802 Standards](http://misc.alexanderdbrown.com/802-standards.png "IEEE 802 Standards") ### Data Link and Physical Layer More hardware-based layers. Data Link layer could be divided into two sub-layers: 1. Logical Link Control (LLC) 2. Media Access Control (MAC) LLC is more software orientated and interacts with higher layers. MAC is more hardware orientated and interacts with lower layer. (R)ARP protocol links hardware addresses to logical addresses at the Network layer. #### Physical Layer Defines: * Physical and electrical properties of the media * Mechanical properties of the connectors * Bit representation by the signals (encoding) * Transmission rate * Physical topology * Transmission mode #### Data Link Layer Defines: * Encoding bits into packets prior to transmission * Decoding the packets back into bits at the destination * Flow control * Error control (**A**utomatic **r**epeat re**q**uest (ARQ)) * Access control * Hardware addressing * Defining physical layer standards ##### Logical Link Control * Frames the Network layer packet * Identifies the Network layer protocol. Provides multiplexing and demultiplexing to and from the network layer. Manages flow control and error control. ##### Media Access Control * Addresses the frame * Marks the beginning and ending of the frame Managed access control. Related to the MAC address. #### LAN Technologies * Ethernet * Fast Ethernet * Gig Ethernet * 10 Gig Ethernet * WiFi * FDDI * Token Ring * ATM LANE #### WAN Technologies * X.25 * Frame Relay * T-carrier * ISDN * DSL Technologies * SONET/SDH * ATM ### OSI Layers Protocol Data Unit (PDU) is a term to define any one of the following: * Data (at Application, Presentation and Session layers) * Segment (at Transport layer) * Packet (at Network layer) * Frame (at Data Link Layer) * Bits (at Physical Layer) Each layer, with the exception of the physical layer, adds its own header. This is done because each layer has its own specific function. ## Unicast Network Level Protocols *Unicast Network Level Protocols in use in today's Internet. Including further study of protocols such as IPv4, ICMP, ARP, RARP used in unicast applications and IPv4 and IGMP used in multicast applications.* ### Internet Protocol version 4 (IPv4) #### IP Headers Organised in octlets as bytes didn't used to be just 8 bits long. IPv4 designed for 32 bits. ![IPv4 Header from wikipedia](http://misc.alexanderdbrown.com/ipv4.png) Version indicates the version of the IP protocol Time to live stops network loops. Header checksum to check the integrity of the header but not the data. Very basic checksum which is used by the router to ensure the packet has not be damaged. This does add significant load to the router, not only this, but due to TTL it has to create a new checkum without much gain. There are better checksums to be used and they do not appear in IPv6. #### The IP Address Tied to an interface, not an actual machine. Common to have multiple interfaces (eth0, wlan0, etc.) Can have multiple addresses on a single interface. #### Classes of Network 5 classes. * Class A - `0, 7 bits netid, 24 bits hostid` many host. * Class B - `10, 14 bits netid, 16 bits hostid` * Class C - `110, 21 bits netid, 8 bits hostid` many network. * Class D * Class E - intended for future use Class Es will never be used because of Microsoft and lack of value. #### Special Addresses * `X.X.255.255` = Broadcast Address * `X.X.0.0` = Network * `255.255.255.255` = Limited broadcast address, used in auto-configuration (e.g. DHCP) #### Network masks IF you own the address: `10.0.0.0` But want to share the address so that: `10.1.0.0` and `10.2.0.0` are different networks. `10.0.0.0` is a class A address. therefore is is actually: `10.0.0.0/8`. So if we change the networks to: `10.1.0.0/16` the network id will be expanded to the first 16 bits. #### Loopback Interface Most TCP implementations have a loopback interface with the IP address `127.0.0.1` (`127.X`) and name localhost (or anything else in `/etc/hosts` which specifies `127.X` as an address). The localhost behaves as a separate data link interface, A packet that is sent to the loopback interface moves down the protocol stack and is returned back by the driver software for the localhost "device". Used for debugging. Loopback is done in software, never appears on the network. #### Subnets: An example Take the university address: `144.124.76.0` Say we want a network per department, can't use a 24 bit address as that doesn't leave enough addresses. So they can use a 22 bit network id. `144.124.76.30/22` or `144.124.76.30/255.255.252.0` Apply the mask `255.255.252.0` to the IP address to get the host id. Invert for the host address. Network address is: `144.124.76.0` But what's the broadcast address? `144.124.79.255` *(Look at the bits for this)* Here, `255.255.252.0` is the subnet mask. Used to be allowed to have subnet masks like `255.0.255.255` as it doesn't add anything. #### Classless Interdomain Routing (CIDR) - Supernetting Treat two contiguous class C networks as a single network. This eases routing (supernets). For example supernet on geographic locations to make routing tables easier at a router level. Usual class C is /24, we make them a /23 if we have to contiguous networks This is the answer to the 192 soup. Non-contiguous versions of this are disallowed. Must choose numbers for which the maths works. #### Private Address Space There are some special IP addresses * e.g. loopback RFC 1918 written when started IP addresses to reserve certain IP addresses: These can be used for private addressing schemes, but not routable on the internet. * 10.0.0.0 - 10.255.255.255 (10/8) * 172.16.0.0 - 172.31.255.255 (172.16/12) * 192.168.0.0 - 192.168.255.255 (192.168/16) #### Network Address Translation Router has a pool of public IP addresses, when a private IP address attempts to access an external resource. The router maps the private IP address to a public IP address which accesses the resource and returns. The public IP address is then translated back to the private IP address and routed. In industry the pool may be a class C network. In home networks there is only a pool of one. They also do port address translation. ### Address Resolution Protocol (ARP) IPv4 has the problem that we know *our* IP address and the ones which we want to talk to, either router or machines on the link. Very first header is the Ethernet header, need this because every machine would need to look at the IP header, which is a process which needs to be done in software. Other network protocols exist too. Ethernet headers can be processed in hardware. Ethernet headers don't contain much, but the machines can automate the process of deciding to ignore the packet in firmware not by the CPU. There is a problem of mapping IPv4 address to Ethernet address. There's no direct link between IP addresses and Ethernet addresses so we need more at the data link layer. #### ARP Packets ![ARP Packet](http://misc.alexanderdbrown.com/arp.png "ARP Packet") ARP is cached for a limited amount of time. If no reply, retransmit after stand-off. ##### Hardware Type Which type of hardware are we using ##### Protocol Type ARP can be used with other protocols (other than IP). ##### Hardware Size & Protocol Size Size of the addresses (hardware and protocol). Doesn't map just MAC addresses, etc. ##### Sender Hardware/IP Address Hardware address is repeated as the Ethernet header is harder to access in software. ##### Operation Code to differentiate request/reply. 1. ARP Request 2. ARP Reply 3. RARP Request 4. RARP Reply RARP - Reverse ARP (looking up IP from MAC). Can be used to discover the machine's own IP address, for example. Used before DHCP. ##### Target Hardware Address If unknown to the sender, filled with `0`s, otherwise the actual address ##### Target IP Address The address being looked for #### ARP Delay First packet: 1. ARP Request. 2. Processing Time 3. ARP Reply 4. ICMP Echo Request 5. ICMP Echo Reply Following packets: 1. ICMP Echo Request 2. ICMP Echo Reply ### Fooling ARP Locate an interface in promiscuous mode, ARP request not broadcast, a dummy address is used. Some TCP/IP stacks pass the ARP request up the line. A reply indicates a machine is in promiscuous mode. `neped` ### Gratuitous ARP A host sends a request for its own MAC. The sender's IP and MAC address are broadcast, and other hosts will insert this mapping into their ARP tables. Useful to detect duplicate IP addresses (should be no reply). Causes other hosts to update their ARP cache (useful if the network adapter has been changed or if this is a hot spare). #### Why switched networks are not safe Man-in-the-Middle attacks. ![](http://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Man_in_the_middle_attack.svg/200px-Man_in_the_middle_attack.svg.png) ARP allows for MITMA as the IP and MAC are learned by the switch. Sending out a false ARP reply **last** will allow you to to spoof the IP address to your MAC address. #### Proxy ARP Same problems with MITMA ### MAC Addresses (IEEE 802) ![MAC Address](http://misc.alexanderdbrown.com/mac.png "MAC Address") * Gotcha: each Octet of MAC frame transmitted low order first (back-to-front). * I/G - Individual (`0`), Group (`1`) - Unicast or Multicast * U/L - Universal (`0`), Local (`1`) ### Internet Control Message Protocol (ICMP) The Internet Control Message Protocol (ICMP) is the protocol used for error and control messages in the Internet ICMP provides an error reporting mechanism of routers to the sources. All ICMP packets are encapsulated as IP datagrams. #### ICMP Message Types ICMP messages are either query messages or error messages. * ICMP query messages: * Echo request / Echo reply * Router advertisement / Router solicitation * Timestamp request / Timestamp reply * Address mask request / Address mask reply * ICMP error messages: * Host unreachable * Source quench * Time exceeded * Parameter problem #### Message Codes Specific to message type. E.g. for message type 3 (destination unreachable): * 0 = network unreachable * 1 = host unreachable * 3 = port unreachable * 4 = fragmentation needed but don’t fragment bit is set * Etc #### ICMP Error Codes Each ICMP error message contains the header and at least the first 8 bytes of the IP datagram payload that triggered the error message. To avoid too many ICMP messages, ICMP error messages are not sent * for multiple fragments of the same IP datagrams * in response to an error message * in response to a broadcast packet #### ICMP Codes ![](http://misc.alexanderdbrown.com/icmp-codes.png) #### Echo Request/Reply Message Format Identifier is set to process the ID of querying process. Sequence numbers are created for each new echo request. #### Port unreachable error Reply telling that a port is not available. Network scanners may make use of this, e.g. NMAP. Many hosts will not reply. ### Traceroute Sends out ICMP ping messages with increasing TTL starting at 1. For each host that isn't the target, an ICMP time exceeded message is sent back. On UNIX uses the port unreachable message instead (using a UDP packet) by default. ### Fragmentation If a packet needs to be split then a flag is set to say "more fragments". Identification is the same through the fragments. More fragments flag is not set for the last fragment, but is known to be a fragment due to the fragmentation offset. If a part of a fragment is missing, after an amount of time packet loss is assumed. Occurs if MTU < datagram size to be sent out on an interface, "Don't fragment" flag allowed: causes ICMP "Destination unreachable: fragmentation needed but don't fragment bit set". Reassembly takes place at destination. Fragments may be fragmented. Experience shows that fragmentation is best avoided. #### Avoiding Fragmentation * Never send a datagram which is greater than the minimum MTU maximum size it can be sent wholly. * Path MTU discovery to make sure you never send a datagram which is grater than the smallest MTU maximum size. ### Path MTU Discovery Intermediate routers may have lower MTUs. Uses ICMP uncreachable Error: Fragmentation Required `ping -s n -M do`: * Set packet size and "do not fragment" bi * Returns "message too long" is `s > MTU`. * RFC 1191 has likely values `traceroute.pmtu` * Version of tracerout that automates discovery. Firewalls can break PMTU discovery. ### Sock Traffic Generator Sock is a test program which can be run as a client or as a server. ### Ping TTL doesn't always start at 255, might be around 64 but not standardised. #### Ping through a router Routers have many different interfaces. This means, with traceroute, only the nearside interface can be found. There is a record route option which can find this: `ping -R` ## Unicast Routing in the Internet *Example routing problems. Interior and exterior routing protocols. Protocols covered will include RIP, OSPF and BGP.* ### IP Routing Split into two parts: 1. Packet Forwarding or Packet Switching 2. Routing #### Packet Forwarding Needs to be fast and efficient, every packet involves a forwarding option. Based on prior knowledge the packet is moved towards the destination. #### Routing Filling routing tables. This requires knowledge of the connectivity in some sense. This needs to be stable and should converge relatively quickly. This task should not be performed frequently ### Classifying Routes Routes can be classified into three different categories: 1. Default Routes 2. Static Routes 3. Dynamic Routes Could also classify by interior and exterior routing. #### Default Routes A default route which is used as a fallback if all other routing fails. #### Static Routes Manually configured information. Not very scalable. #### Dynamic Routes Automatically use information from other participating routers. ### Routing Algorithms/Protocols Different methods for finding out routes. #### Distance View Routing Uses the Bellman-Ford algorithm. Regularly broadcast the entire routing table, containing known networks and the "distance" to each (usually a number of hops). Has the problem of routing loops and convergence after change. #### Routing Internet Protocol (RIP) Uses Distance View Routing ##### RIPv1 (RFC 1058) Classful routing with no ability to pass netmasks. Typically broadcasts every 30 seconds. Metrics range from 1 to 16 (infinity). "Split Horizon" often used. "Triggered Updates" send an update when something (close) changes to improve convergence. Sent via UDP to the IP broadcast address (usually) to port 520. ###### Single Splint Horizon Don't repeat information to the originator. ###### Poison Reverse Horizon Repeat back information, but metric is set to infinity ##### RIPv2 (RFC 1723) Refinement of RIPv1. Includes netmasks and some support for authentication and multicast. #### Link State Routing Each router sends the state of all its links to all adjacent routers. * Link State Advertisements (LSA) Each router independently calculates its routing table based on the link state database it has constructed. Normally uses Dijkstra Shortest Path (Shortest Path First). #### Open Shortest Path First (OSPF) Uses IP directly. Early versions could calculate a different set of routes for each value of the IP Type-Of-Service (ToS) field, but this was abandoned in the latest RFC. Interfaces have a dimension-less costs (these were potentially different for each value of ToS). Supports equal cost load balancing. Supports subnet mask and thus CIDR. Point-to-point links can be used without IP addresses. Supports authentication Uses multicasting to reduce load. On a multi-access network (e.g. ethernet), two routers are elected as Designated Router and Backup Designated Router. #### IP Address Class Based Only #### Classless Inter-domain Routing ## Multicast Routing in the Internet *Example routing problems. Protocols covered will include PIM-DM, PIM-SM and MSDP. We will also cover the role of the Rendezvous Point, Anycast IP, and issues still under debate in the technical community.* ### Multicast Refers to the sending of data from one to many or many to many registered recipients. Contrast to broadcast, which floods the network with data, which all hosts will receive whether they want it or not. Unicast is the sending of data to a single recipient. Unlike unicast, multicast is scalable,if ten clients request a connection then unicast must provide those ten connections with their own copy of the data. Multicast, whether there are 1, 10 or 100 members of a ground, only one copy of the data is transmitted and is only replicated where there are group members on different paths to and from the router. ### Media Access Control (MAC) All devices have a unique 48-bit MAC address. Devices on the LAN keep a table that maps unicast IP to MAC. A special range of MAC addresses is used for multicast as a multicast address does not correspond to a single end host. ### 1:32 IP Multicast to MAC Multicast Mapping Class D address range: `224.0.0.0 - 239.255.255.255` High order 4 bits of the first octet of a class D address are always `1110` (`0xE`) To provide a 1:1 mapping between MAC and multicast IP addresses the remaining 28 bits of the IP address would need a unique representation in MAC addresses. MAC address range assigned for multicast is only 24 bits. The high order bit is reserved, leaving 23 bits. Thus, 28 bits of IP addresses need to be mapped to just 23 bits of the MAC address. This corresponds to a 1:32 mapping MAC multicast addresses to IP multicast addresses. #### Translation Six octets in an Ethernet header. Three high order octets are constant `01-00-5E`. The remaining three octets have a range of `00-00-00` to `7F-FF-FF` (high order bit is always `0` for IP multicast. An IP address of `224.1.1.1` maps to `01-00-5E-01-01-01`. Byt the MAC address also maps to 31 other IP addresses. #### Host Behaviour Reception Discarding of unwanted packets has to be handled by the IP stack. Hosts interesting in 224.1.1.1 will also receive (if on LAN) the other 31 possibilities. After de-capsulation the Ethernet fram and discovering the IP address of an unwanted packet, the host discards the packet. The probability of this are quite slim at the moment ### Special Multicast Addresses #### Local Network Control Block `224.0.0.0 - 244.0.0.255 (224.0.0.0/24)` Should never leave the local network. #### Internetwork Control Block `224.0.1.0 - 244.0.1.255 (244.0.0.1/24)` #### Session Description Protocol (SDP)/Session Announcement Protocol (SAP) Block `224.2.0.0 - 224.2.255.255 (224.2/16)` #### Source Specific Multicast `232.000.000.000 - 232.255.255.255` Specifically permit or block source addresses. #### GLOP Block `233.000.000.000 - 233.255.255.255` Ensure that addresses are specific to an administrative domain. Low number of specific addresses (255). #### Administratively Scoped `239.000.000.000 - 239.255.255.255` Private multicast addresses. ### IP Group Management Protocol used is Internet Group Management Protocol (IGMP) * IGMPv1 - RFC1112 * IGMPv2 - RFC2236 * IGMPv3 - RFC3376 #### IGMPv1 No explicit leave message. IGMP Report Suppression ##### Membership Queries Sent by routers to ask for existence of hosts interested in receiving multicast groups. ##### Membership Reports Sent by hosts in response to Queries, Also sent by the host if they wish to join a new group. ##### Message Format ![](http://routemyworld.com/wp-content/uploads/2009/01/igmpv1header.png) #### IGMPv2 Major change, introduces an explicit Leave Message ##### Membership Queries General Query as before. Group-Specific Query. Used to query if there are any members of a group left after a leave message. #### IGMPv1 and IGMPv2 Interoperability IGMPv2 Hosts **must** send IGMPv1 if they spot an IGMPv1 router is the querier. IGMPv2 Hosts **may** suppress Leave Messages if they spot an IGMPv1 router is the querier. IGMPv2 Hosts **must** suppress reports if they see others using IGMPv1 or IGMPv2 reports for a given group. IGMPv2 routers **must** ignore Leave messages if IGMPv1 hosts are present. If any IGMPv1 routers are present must use IGMPv1 #### IGMPv3 Major change: supports the specification of sources in messages. Host now do **not** suppress Reports. Thus, supports the concept of Source Specific Multicast. Protocol messages are now more complex. ##### Membership Queries Now contain source address information. ##### Membership Reports Now contain information relating to multiple groups and each group report can contain source address information. ### Multicast Distribution Trees The set of links and connections that traffic will follow to get to a destination. #### Source Trees Sometimes called Shortest Path Trees (SPT) A tree where the root is the source of the data. A different forwarding tree for every combination of source (S) and group (G). N groups and M sources implies ![](http://smarturl.it/math?M\times~N\) trees. This makes it difficult to calculate the best tree. Routers need to maintain separate states for all trees. (S,G) notation. #### Shared Trees Rendezvous Point Trees or Core Based Trees. Traffic follows some form of common path, unlike source trees in which two different trees can send data through different paths, at least most of the route will follow a common tree. Each group in the system uses the same tree. N trees implies N groups, no matter how many sources. Normally, each group has a nominated router as the Rendezvous Point, which is the root of the tree. All sources send towards the Rendezvous Point. Can potentially lead to a single point of failure if no backup is specified. (*, G) Notation ### Multicast Forwarding Routers must no just forward all multicast traffic. Potentially there could be customers anywhere, unlike with unicast. However, forwarding all multicast would be broadcast instead and would lead to chaos. Techniques include: * Reverse Path Forwarding * Multicast Forwarding Caches * TTL Thresholds * Administratively Scoped Boundaries #### Reverse Path Forwarding Note the source address of the arriving packet and interface of arrival. The IP tables are checked for the correct interface towards the source. If the packet has arrived into the expected interface, it can be forwarded onwards. If not it can be discarded. #### Multicast Forwarding Caches Used to help avoid some RPF calculations Sometimes called multicast routing tables. Used to help make decisions for routing tables. #### TTL Thresholds TTL normally used by IP and decremented as each router passed. When this TTL reaches zero, the packet is dropped. Multicast routers often set TTL Thresholds on interfaces, if the TTL of the packet is less than this threshold then the packet isn't forwarding. If traffic was only designed to reach internal locations, the TTL would be set such that it could reach all internal routers, but external routers would have high enough threshold to stop this traffic, but not those expected to reach external locations. #### Administratively Scoped Boundaries These boundaries don't pass certain multicast addresses. ### Multicast Routing Protocols #### Dense Mode Protocol Send all valid packets out of all interfaces, flooding the network. When a router receives unwanted packets it sends a prune message upstream. When a prune message is received, the router removes the interface it was received on from the forwarding table for the specified group. Prune times out in typically 2 to 3 minutes. ##### DVMRP ##### Protocol Independant Multicast - Dense Mode (PIM-DM) #### Sparse Mode Protocol Routers send no traffic onwards unless asked. Shared tree branch constructed from rendezvous point (root) to any interested receiver. Join messages are sent from the receiver to the root via other routes, creating a shared tree where it goes. Prune sent when the traffic is no longer wanted. ##### Protocol Independant Multicast - Sparse Mode (PIM-SM) Has a rendezvous point using a shared tree, but uses source trees to get data from sources to the rendezvous point. Also allows the final router feeding receivers to switchover to a source tree from the source if the traffic level exceeds some threshold (in Cisco routers the default value for the threshold is zero). ##### CBT #### Link State Protocol ##### MOSPF ## Transport Level Protocols *An in-depth study addressing the behaviour of TCP and UDP. Connection establishment and termination, flow control under various load conditions, timeouts and retransmission, newer features and performance.* ### User Datagram Protocol (UDP) RFC 768 * Datagram orientated * Unreliable, connectionless * Simple * Unicast and multicast * Useful for only a few application * Used a lot for services No handshake required which allows it to be fast, especially for many short connections. Applications must handle failures. #### UDP Header ![UDP Header](http://misc.alexanderdbrown.com/udp-header.png "UDP Header") Checksum is the same as the IPv4 checksum, but is over the UDP header, the data, and the UDP pseudo header ##### IPv4 UDP Pseudo Header ![IPv4 UDP Pseudo Header](http://misc.alexanderdbrown.com/udp-pseudo-header.png "IPv4 UDP Pseudo Header") ##### IPv6 UDP Pseudo Header ![IPv6 UDP Pseudo Header](http://misc.alexanderdbrown.com/ipv6-udp-pseudo-header.png "IPv6 UDP Pseudo Header") #### Example: Trivial File Transfer Protocol (TFTP) TFTP used UDP, uses a stop-and-wait flow window control algorithm: * Stop for ACK before sending the next data packet * A lost packet causes timeout and retransmission. Designed for diskless systems to download configuration files during bootstrapping. Does waste a lot of network time. ### Transmission Control Protocol (TCP) * Stream orientated * Reliable, connection-orientated * Complex * Only unicast * Used for most internet applications Done by creating connections between two points and aim to provide data integrity. Also provides flow control. #### TCP Header ![TCP Header](http://misc.alexanderdbrown.com/tcp-header.png "TCP Header") ##### Source Port Number ##### Destination Port Number ##### Sequence Number Ensures ordering of packets (is actually a byte count of the first byte count in the packet of data). Doesn't start at 1. ##### Acknowledgement Number Helps find missing packets, is the last byte of the packet. ##### Header Length Describes the length of the header. ##### Reserved ##### Flags ###### Urgent (URG) The value in the Urgent field is valid. ###### Acknowledge (ACK) The value in the Acknowledge field is valid. ###### Push (PSH) Push the data up to the application as quickly as possible. ###### Reset (RST) Reset the connection ###### Synchronise (SYN) Used to set up connections ###### Finish (FIN) Used to tear down connections ##### Window Size ##### TCP Checksum Same as UDP Checksum ##### Urgent Pointer ##### Options #### Connections How TCP establishes, keeps and tears down connections. ##### States ![](http://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Tcp_state_diagram_fixed.svg/796px-Tcp_state_diagram_fixed.svg.png) ###### Listening Waiting for a connection ###### SYN Sent Waiting for acknowledgement of the first send of a `SYN` ###### SYN Received ###### Established ###### FIN Wait 1 Wait for an `ACK` of a sent `FIN` ###### FIN Wait 2 Wait for a `FIN` from an acknowledged `FIN`. ###### Close Wait Acknowledged a `FIN` but still need the connection to send data. ###### Last ACK Final `FIN` sent, waiting for the last `ACK`. ###### Time Wait Final `FIN` received and `ACK` sent, wait for a number of seconds before closing. Done in case the `ACK` is not received by the other party. ###### Closed Not listening, connection closed. ##### Wait States `TIME_WAIT` is also called the 2MSL wait state. * MSL - Maximum segment life. * RFC 793 specifies MSL as 2 minutes. * 30 seconds, 1 minute, 2 minutes are common implementations. ##### Connection Establishment 1. **Server** listens on a given port (no network traffic). 2. **Client** sends a TCP packet with the `SYN` flag set to a random number. 3. **Server** receives the `SYN`, the 0 offset is the value of the `SYN` flag. Server goes into `SYN Received` state. 4. **Server** sends a TCP packet with a different `SYN` flag and an `ACK` flag, returning the Client's `SYN` flag plus one*. 5. **Client** returns an `ACK` flag. \* The `SYN` flag is assumed to have consumed 1 byte. If no `ACK` is received at `SYN Received` state then the `SYN` is resent after a timeout. Known as the *three way handshake*. **Note:** Initial sequence numbers are important! Both client and server choose different ISNs. ##### Half-Open When you have a connection between two machines, but one crashes. There's no data sent between them so one still believes it is connected. ##### Simultaneous Open Possible (but unlikely) that two applications will perform active open to one another. Not Client/Server, each opens to a known port. Results in a single open connection. Four-way handshake required. ##### Connection Termination 1. **Client** sends a `FIN` 2. **Server** sends an `AWK` and a `FIN` 3. **Client** sends an `AWK`. *Three way closedown*. `ACK` and `FIN` might not be sent together (server might still be transmitting data). ##### Half-Close When a `FIN` is sent to the server but there is still more data to be sent should not close fully until a `FIN AWK` is sent and the `AWK` returned. Microsoft broke this by sending a `RST` instead of a proper `AWK` ##### Simultaneous Close Both sides of a connection might perform an active close. Four segments required, which s not unusual. #### TCP Options TCP Header also supports options. Originally only `MSS`, `NOOP` and `EOS` were specified. ##### End of Option List (0) ##### No Operation (NOOP) (1) ##### Maximum Segment Size (MSS) (2) Length is 4 2 byte MSS. Usually based on the MTU size. ##### Window Scale Factor (3) Length is 3 1 byte shift count. ##### Timestamp (4) Length is 10 4 bytes timestamp value 4 bytes timestamp echo reply #### Bulk Data Transfer Bulk data transfer (e.g. FTP) has different requirements. Low overhead from headers. Receiving systems have limited buffers. Sliding windows used for flow control. Packet loss is expensive; timeouts are the simplest way, but there are better. Want to send as many packets as possible without flooding the network. #### Flow Control In TCP for Bulk Data Transfer Sliding window flow control is carried out by the receiver. CWND flow control is carried out by the receiver. ##### Sliding Windows Flow Control The idea is that in every `ACK` we also advertise a window. The window is a number of bytes, usually a multiple of the `MSS`. The initial window size will be the size of the buffer TCP has available. Sender can send up to the limit of the window. When a segment is acknowledged, the window slides making another slot in the window available. The window is said to close when the left-hand edge moves to the right. The window is said to open when the right-hand edge moves to the right. The window is said to shrink if the right-hand edge moves to the left. ###### Window Scaling Scale factor is actually a shift. Largest scaling factor is 14 (maximum window size of 1,073,741,823 bytes). Remember this increases the size of the header. Used for high capacity networks. RFC 1323, TCP Extensions for High Performance. ###### Calculating Optimal Window Size Can never reach optimal capacity but can approach it. ###### Bandwidth delay product `Capacity(bits) = bandwidth (bits/s) * RTT (sec)` Increase Round Trip Time (RTT) and data travels more slowly, thus higher capacity. Likewise increased bandwidth involves higher capacity. Max window size of 65535 bytes, but window scale option can increase this to 1024MB. Around 10% of the capacity is a good size for the TCP buffer size. ##### TCP Slow Start (RFC 2001) Intermediate routers must queue packets - congestion may occur at routers. Congestion window (CWND). 1. Initially set to 1 segment (based on announced MSS). 2. On first ACK, increase CWND by 1 segments of bytes. 3. Now 2 segments are sent and ACK'd. CWND is increased to 4 segments. 4. Exponentiall increase until peak flow reached or router discards packets. 5. Sender can transmit up to the minimu of the sliding window size and CWND. Has two phases: 1. Exponential increase. 2. Linear increase. Based on a threshold (the slow start threshold). Drops drastically when there is packet loss. ##### Fast Retransmit and Fast Recovery (TCP Reno) Problem: TCP timeouts lead to idle periods. Fast retransmit: use 3 duplicate ACKs to trigger retransmission. Fast recovery: start CWND at SSTHRESH and do incremental increase after fast retransmit. ###### Problems with Fast Retransmit and Fast Recovery WiFi and very fast networks. Can be a problem on high capacity networks. #### TCP Westwood + Improved congestion control algorithm for fast, high latency links and lossy links. Can take many hours to reach optimal throughput with TCP Reno as packet loss has a large affect on throughput. Westwood carries out end to end bandwidth estimate (BWE) using received ACKs and RTT monitoring. TCP Reno overreacts to random loss by cutting cwnd in half. A small fraction of random packet loss does not impact the BWE. Thus the ssthresh remains unchanged, allowing Westwood to be much more efficient than Reno. ###### On ACK Reception Increase cwnd according to Reno algorithm. Estimate available bandwidth. ###### When 3 Duplicate ACKs received ![](http://www.texify.com/img/%5Cnormalsize%5C%21%5Ctext%7BSsthresh%7D%20%3D%20%5Ctext%7Bmax%7D%5Cleft%282%2C%20%5Cfrac%7B%5Ctext%7BBWE%7D%20%5Ctimes%20%5Ctext%7BRTT%7D_%7B%5Ctext%7Bmin%7D%7D%7D%7B%5Ctext%7BSeg%5C_size%7D%5Cright%29.gif) cwnd is then set to ssthresh. ###### On RTO (coarse timeout) ![](http://www.texify.com/img/%5Cnormalsize%5C%21%5Ctext%7BSsthresh%7D%20%3D%20%5Ctext%7Bmax%7D%5Cleft%282%2C%20%5Cfrac%7B%5Ctext%7BBWE%7D%20%5Ctimes%20%5Ctext%7BRTT%7D_%7B%5Ctext%7Bmin%7D%7D%7D%7B%5Ctext%7BSeg%5C_size%7D%5Cright%29.gif) cwnd is then set to 1. #### Flow Control in TCP for Interactive Data Transfer Interactive data transfer can result in many small segments which, together with their ACKs, can lead to congestion. ACKs are, therefore, piggybacked onto data segments - delayed ACKs. Nagle's algorithm result in some collection of data to produce larger segments. ##### Nagle Algorithm Many small datagrams results in very large overhead. Can cause congestion, particularly on a WAN. * Only one outstanding segement not ACK'd * Cannot send any more small segments until ACK received. * Whilst waiting for ACK, TCP will collect small segments together to send as a single segment. * On a fast network, more segments are sent. * On a congested network less segments are sent, but data still get through (with less overhead). Sometimes desirable to disable Nagle algorithm, e.g., X mouse movements need to be sent without delay. #### Persist Timer Keeps window size information flowing even if the other end closes its receive window. Possible situation where an `ACK` is lost and both ends waiting. * Sender waiting for acknowledgement with window update so that it can send more data. * Receiver waiting to receive the data as it indicated a non-zero window in the acknowledgement that go lost. Sending TCP uses a persist timer to periodically query the receiver to see if the window size has updated. #### Timeout and Retransmission Fundamental to the use of acknowledgements is the need for timeout and re-transmission of packets. TCP keeps four different timers for a connection and performs exponential backoff when a packet goes unacknowledged. TCP also has a congestion avoidance algorithms to complement the slow start algorithm to cope with packet loss caused by congestion. ##### Timers ###### Retransmission Timer Used when expecting an acknowledgement from the other end. ###### Persist Timer Keeps window size information flowing even if the other end closes its receive window. ###### Keepalive Timer Detects when the other end of a connection has crashed or re-booted. ###### 2MSL Timer ##### Round Trip Time Measurement Used to calculate retransmission timeout (RTO). Mean deviation is used to allow arithmetic to be carried out with integers and without square roots. ![](http://www.texify.com/img/%5Cnormalsize%5C%21Err%20%3D%20M%20-%20A%5C%5CA%20%5Cleftarrow%20A%20%2B%20gErr%20%5C%5C%20D%20%5Cleftarrow%20D%20%2B%20h%20%28%20%7C%20Err%20%7C%20-%20D%20%29%5C%5C%20RTO%20%3D%20A%20%2B%204D.gif) Where: * D is the smoothed mean deviation * M is the latest measured RTT value * Gain (g) for average is set to 1/8 * Gain (h) for deviation is set to 1/4 ##### Karn's Algorithm If a packet times out and there is a retransmission, when an ACK is received, whose was it. Karn's algorithms specifies that RTT estimate cannot be update when a timeout and retransmission occur. Re-use the RTO after such an exponential back off until an acknowledgement is received. ### Transactional Transmission Control Protocol (T/TCP) RFC 1644 TCP for transactions. Nearly as fast as UDP. #### TCP Accelerated Open (TAO) Reduces minimum number of packets required to three in many circumstances. Data and FIN are piggybacked on the ACK. Connection Count (CC) used to avoid duplicate SYNs. Security issues caused due to this. #### Truncation of TIME_WAIT State 2MSL state is abandonded. 8 times RTO used instead. Allows retransmission of final ACK. A new incarnation of the same connection, using TAO, implicitly acknowledges the ACK of FIN. ### Port Numbers ... ### Demultiplexing Layered like the OSI model (but pre-dates the OSI model). Ethernet driver captures the incoming frame, strips the Ethernet header and passes to IP IP layer strips out the IP header and passes it to the transport layer, etc. Levels: * Application * Transport * Network * Link * Physical The process of moving things up and down layers is demultiplexing and multiplexing. ### Data Encapsulation Don't sent arbitrary length methods, to allow the multiplexing of networking. (In TCP) Data is encapsulated into frames, frames have a frame header, trailer and a datagram. This datagram contains an IP Header and a segment or protocol data unit (PDU) The PDU has the (TCP) protocol header and the actual data. The frame header is used to drop the packet onto the local link. The address used in the frame header is embedded in the hardware in the network card (MAC address), this is why the IP address is not used. This is for efficiency. Frame headers are: source address, destination address, protocol and checksum. ## Naming and Directory Services *Including the DNS and LDAP and their use.* ### Domain Name Service (DNS) A distributed database mapping hostnames to IP addresses and vice versa. #### How DNS is used * Applications access the information in the DNS by way of resolver programs * `gethostbyname()` returns an IP address in response on a hostname * `gethostbyaddr()` returns a hostname in response to an IP address. * People access information by way of client programs * `nslookup` * `dig` * Common implementation of DNS client and server is BIND (Berkeley Internet Domain Server). * Server is **named** * The DNS defines a protocol (see RFCs) that is used for communication between client and server. #### The DNS namespace DNS is stored as a tree: ![](http://bio3d.colorado.edu/tor/sadocs/dns/dns-1.png) 13 canonical name servers which server information for the root "unnamed" root. ##### .arpa A special domain which is used for reverse lookup. Now `.ipv6` ##### Fully qualified domain names `www.google.com.` is a fully qualified domain name. Missing the trailing `.` allows the DNS to lookup other entries based on the domain it resides on. #### DNS Lookup Is a recursive process and therefore quite slow. Results are cached to improve performance. To try and reduce the effects of this when changing hostnames the TTL of the cache is reduced prior to performing this. #### Name Ownership No one company owns all names #### Name Server Zones and Boundaries Can have zones within a name server. #### Resource Record This is where DNS gets its information from. ![](http://www.tcpipguide.com/free/diagrams/dnsrrformat.png) Examples: `3www4aber2ac2uk0` `3www6google3com0` The `0` occurs because the top-level domain has no name. This is how the end of the string is reached. ##### SOA Name of primary source of info for zeon ##### A IP address of host ##### CNAME Canonical name ##### PTR Alias for an IP address for reverse lookups ##### MX Mail exchange information ##### NS The Name of the nameserver ##### AAAA IPv6 address of host ##### A6 Experimental IPv6 address. ##### HINFO Host information (CPU and OS). ##### LOC Location of host. ### Dynamic Host Configuration Protocol (DHCP) ### DHCP Timeline ![](http://www.tarunz.org/~vassilii/TAU/protocols/dhcp/alloc.gif) Request are broadcast address `255.255.255.255`. Often used during host configuration. Never forwarded by a route, by a DHCP/BOOTP relay agent can be used to forward this onto the DHCP server. Replies are routed normally. ### Lightweight Directory Access Protocol (LDAP) ### The Directory Information Tree Similar to the DNS tree, but the whole world doesn't typically use the same tree. ### The Information Model Service model based on **entries**. An **entry** is a collection of **attributes**, that has a **distinguished name (DN)**. #### Attributes Type/value pairs, e.g.: `C = GB` `o = University of Wales` Attributes described by a schema. ## Quality of Service *The need for and the provision of Quality of Service (QoS) within packet based networks such as the Internet which are inherently best efforts at heart.* ### Challenges * QoS needs to be end-to-end, involving numerous management domains. * QoS service should address all requirements - not just 'top quality' * QoS *policy* distribution, maintenance and monitoring will increase in importance. * Better to implement QoS earlier - provides scalability, experience. Pro-active not reactive. ### Queueing Disciplines Ways in which resources can be allocated to priority traffic best. #### Priority Queue Other queues only get any resources at all when the priority queue is empty. Simple and effective. Traffic must be policed to stay within low limit as other traffic can get starved or the priority queue reverts to Best Effort #### Weighted Round Robin Multiple queues, each queue gets a guaranteed minimum of resource even under congestion conditions. Can be combined with Priority Queues. ## Security Issues *The inherent risks within networks such as the Internet, cracking, viruses, trojans, worms and denial of service attacks. The role of the Firewall and the problems it can bring.* ### Data Encryption Four aspects of security to consider: 1. Privacy 2. Authentication 3. Integrity 4. Nonrepudiation #### Aspects of Security #### Privacy Snoopers should not be able to read confidential data #### Authentication Verifies that the apparent sender really sent a message, and not an imposter. #### Integrity Verifies that data has not been corrupted or altered in transmission #### Nonrepudiation Ensures that the sender or receiver cannot deny sending or receiving a piece of information. #### Encryption Techniques Classical encryption techniques: * Permutation: The order of the plaintext characters is changed * Substitution: a plaintext alphabet is mapped to a different one. A cipher used to perform the encryption. * Stream ciphers encrypt data bit by bit or byte by byte. * Block ciphers first pack data bits into a fixed length block, then encrypt the whole block into a ciphertext block. #### Symmetric Key Encryption Use a shared secret key to encrypt and decrypt. For ![](http://smarturl.it/math?n\) users, ![](http://smarturl.it/math?\sum{n}\) keys required. Can be done in hardware. #### Public Key Encryption The key is split into two: a public key and a private key. Anything encrypted by the public key can only be decrypted by the private key. The private key must be kept private, but the public key can be shared without worry. Cannot be done in hardware. #### Combining Both Create a shared secret based from public key encryption. ### Transport Layer Security (TLS) Client sends a "hello" message with TLS version number and prefernces. Server sends a certificate including a public key. * Public key encrypted by some CAs. * Client has a list of CAs and their keys and uses a corresponding key to decrypt the server key. * This authenticates the server, unless it is self-signed. Client sends a secret key encrypted with the server's public key. Server decrypts message and then encrypts a response with secret key which the client decrypts. ### Hashing and Message Authentication Hashing is the operation that maps the message of variable length into a hash value with fixed length. Hashing is not reversible. Hashing can be used to generate a digest of the message, called the Message Authentication Code. The receiver can use the digest to verify if the message is authentic. ## Current and Future Issues *The (still) emerging IPv6 protocol and other active issues.* ### Internet Protocol Version 6 (IPv6) Problems with IPv4 is that 32 bit addresses is too small. * 128 bit address space solves the problem for the long term. * Large space allows addresses to be more structured. ### An IPv6 Address `fe80::2c0:dfff:fee4:bd87/10` Loopback address: `::1/128` `::` is a string of `0`s of indeterminate length. IPv4 addresses are encapsulated in IPv6 are expressed with dotted decimal for last four octlets: ``::194.123.1.2` IPv6 addresses are leased (possibly infinitely). #### Current Allocation * `0000 0000` - Reserved * `0000 001 ` - NSAP Allocation * `001 ` - Aggregatable Global Unicast * `1111 1110 10` - Link-Local Unicast * `1111 1110 11` - Site-Local Unicast * `1111 1111` - Multicast ### Link-Local Addresses Format Prefix: `1111 1110 10` or `FE80::/64` Postfix is usually the last 24 bits of MAC address. middle byte is `fffe` due to IEEE standards. `fe80::00ff:fe00:0000` for MAC address: `00:00:00:00:00:00` ### Site-Local Addresses Format Prefix `1111 1110 11` or `FEC0` ### Neighbour Discovery ### Router Advertisements ### DHCPv6 Where autoconfiguration is undesirable for operational reasons. Similar in concept to DHCP for IPv4 Compatible with autoconfiguration Nodes may request multiple addresses. Authentication of nodes. Makes use of address deprecation and reconfiguration-init message... ### Co-existence and Tunnelling ## Exam Preparation *Some chances to get a feel for what is expected in the exam.* ### Steve Kingston `s/(Use a diagram) is necessary/\1/g` ## Past Paper 2012-113 ### 1. This is a question about the Transport Layer Protocols #### a) With reference to the User Datagram Protocol (UDP) and the Transmission Control Protocol (TCP), describe features that are provided by transport layer protocols. In your answer distinguish between features provided just by TCP and those provided by both UDP and TCP. * Both use port numbers. Server listens of a specific port and client sends data to that port. * UDP Connectionless * No acknowledgement that data is recieved * Up to the application to manage successful transmission of data * TCP sets up a connection between two machines * Connection must first be established * All sent packets (except ACK's) must be acknowledged by the other party * Support for flow control using sliding window protocol * Must also terminate the connection * Protocol ensures sucessful delivery of data (or failure). b) Describe with the aid of diagrams the TCP/IP connection establishment processes, and explain how the initial sequence number (ISN) is exchanged between two nodes during connection establishment. Label the diagrams with the TCP connection states at each stage. * Server listens on a specified port * Client sends a SYN packet to that with a random ISN and a receiving port number (usually fairly high). * Server responds to the client with an ACK with a sequence one higher than the SYN's ISN (the SYN is seen to have taken up one byte of data). * Server also responds with a SYN to the client, the ISN here is generated by the server and is different to the clients. * Client responds to the server's SYN with an ACK, again here the sequence number of this packet will be the ISN of the Servers SYN packet + 1. * At this point the connection is established. The ISNs must be different as they are used to acknowledge how much data was actually received later in the connection. Sequence numbers returned in the ACK are the sequence number of the acknowledge packets plus the size of that packet (which can then be used for flow control). #### c) What is meant by a flow control mechanism? In your answer describe the likely effect of not employing flow control on TCP connections. * Flow control used to send multiple packets of data and acknowledging them all in a single packet. * Flow control is also used to stop flooding the network with packets that would just be dropped by the receiver because it's buffers are full. Not using flow control would either: 1. Require the sender to wait for the receiver to acknowledge a single packet before sending the next one, meaning the whole transmission would be very slow. 2. Allow the sender to flood the network with packets which would eventually be dropped by the receiver as its buffers fill up to maximum (or by intermediary devices to the same effect). The sender would receive acknowledgements for some of the packets and would have to constantly resend dropped packets after a timeout period, leading to a lot of traffic on the network, especially when the receiver does not process the information quickly. #### d) Describe, with the aid of diagrams, the problem with transferring very large files using the slow start /congestion avoidance flow control mechanism on fast (e.g. gigabit) but high latency networks. Explain in your answer how the TCP Westwood+ algorithm modifies the mechanism to provide better throughput whilst retaining a concept of fair access. * Slow start sets the window to 1 to begin with and grows it exponentially after each ACK received. Once it reaches the slow start threshold (ssthres), this growth is linear (by 1 from each ACK) - this is congestion avoidance. * Eventually the congestion will be encountered, and a packet will not be ACK'd before a timeout. The size of sstresh is set to one half of the current window size and then the size of the window is reset to 1. The process then repeats endlessly. * This tends to lead to "sawtooth" graphs. * Problem of slow start is that it has to wait for a timeout to occur before it knows congestion has occurred. * To change this, TCP Reno was implemented; if the receiver does not receive the next expected packet (i.e. it receives one with a higher sequence number than expected) it will send a duplicate ACK of the last ACK it sent. * If this ACK is received by the sender 3 times it triggers retransmission and will retransmit the next packet from that sequence number. * The receiver may then acknowledge further packets in the sequence in the next ACK. * Instead of reseting the window size to 1, it is started at sstresh instead and grown incramentally. * TCP Reno still leads to sawtooth graphs, but is less affected by congestion. * TCP Westwood+ implemented to further improve TCP connections. * Westwood+ estimates the bandwidth available using received ACKs and Round Trip Time monitoring. * In both cases of 3 duplicate ACKs or coarse timeout, the ssthresh is set to `max(2, (BWE * RTTmin) / Seg_size)`. The congestion window is set to ssthresh in the case of 3 dup ACKs or 1 in the case of coarse timeout. * A small fraction of randomly lost packets doesn't affect the BWE and therefore the ssthresh, unlike with Reno. # SEM6120 - Introduction to Intelligent Systems This module introduces the key ideas in Artificial Intelligence and ensures all students are at roughly the same level before moving on to the specialist modules. ## Introduction *General introduction to Artificial Intelligence (AI), including discussion of what AI is, its history, definitions, and philosophical debates on the issue (the Turing test and the Chinese room). Ethical issues (3 hours).* ### Staff - **[Richard Jensen](mailto:rkj@aber.ac.uk)** - [Adrian Shaw](ais@aber.ac.uk) ### Learning Outcomes 1. Describe and use the basic principles of Artificial Intelligence and Machine Learning. 2. Be able to reflect on project needs. 3. Practically apply AI and ML principles to meet those needs. 4. Present the material they have learned in an informed, clear manner. 5. Demonstrate understanding and insight into the material that they are presenting. ### Assessment - Presentation **20%** *01/10/2013-18/10/2013* - Report **20%** *01/10/2013-25/10/2013* - Essay **60%** (coding + report) *10/10/2013-01/11/2013* *Tips: if you disagree with a paper, you should comment on why, this will be a good thing :)* ### Commitment * 26h seminars * 6h practicals * The rest of the time is spent on background reading and assignments. ### Booklist * **Artificial Intelligence: A Modern Approach** Russell, S. and Norvig, P. * **Artificial Intelligence: Structures and Strategies for Complex Problem Solving.** - Luger G. * **Artificial Intelligence Illuminated** - Coppin B. ### What is Artificial Intelligence? * Understand intelligent entities; learning more about ourselves? * Building intelligent entites; creating things which exhibit "intelligence". Two ways of looking at this: * Scientific goal * Engineering goal Many definitions, all potentially valid. Problem: what is intelligence and how do we prove a system is intelligent. ### Strong versus Weak AI * Strong AI can actually think intelligently. * Weak AI can possibly act intelligently. ### Turing Test Human interrogator talks to another system (human or AI), if the interrogator cannot tell the difference then the argument is we must acknowledge it is able to think like a human. Has not yet been passed. ### The Chinese Room Behaving intelligently may not be enough. Computers are just a symbol manipulation device and therefore cannot have mental states. ### Ethics and AI We have investigated whether we *can* develop AI, but not whether we *should*. Problems of AI: * Job losses * Too much/little leisure * Privacy rights * Accountability * End of the human race? * Harm in the wrong hands ### Branches of AI * Logical AI * Search * Pattern recognition * Representation * Inference * Common-sense knowledge and reasoning * Learning from experience * Planning * Epistemology * Ontology * Heuristics * Genetic Programming ## Search *Why search is important in AI and how to go about it. This includes both informed and uninformed strategies. Evolutionary search (6 hours)*. ### Defining the Problem Many AI problems can be framed in terms of a search problem. Representation is very important. The search strategy is the way in which searching is performed: * Uninformed * Informed (Heuristic) Can't always evaluate all the search space. Massive search space (e.g. Chess) or even infinite. ### Terminology #### Search State Summarises the state of search. May not lead to a solution. *e.g. Representation of a chess board.* #### Initial State The first search state. #### Solution A special example of a search state. It solves the problem. #### Goal State The state trying to be reached. #### State Space All possible search states. #### Successor Function Ways to move around in the state space (action/operators). #### Goal Function To check if the goal has been reached. #### Cost Function Measures the path cost. ### Search Trees Visualise the progression of a particular algorithm. Shouldn't evaluate every possibility of the state space. Initial state is the root, goal is a leaf. Don't store the whole search tree. Requires a lot of space. Can discard explored nodes. Store the frontier of search (i.e. nodes in search tree with some unexplored children). ### Evaluation #### Time Complexity In big O notation. Number of nodes generated during a search (worst case). #### Space complexity In big O notation. Maximum number of nodes stored in memory. #### Optimality Is it guaranteed to find the optimal solution? #### Completeness If there is a solution, will it be found? #### Branching Factor `b` Maximum number of successors of any node #### Depth of shallowest goal `d` #### Maximum length of any path in the state space `m` ### Breadth First Search Branching factor `b` and depth of solution `d`. * Time complexity: `O(b^d)` * Space complexity: `O(b^d)` * Optimal (given step costs are identical) * Complete (provided `b` is finite) ### Depth First Search Branching factor `b`, depth of solution `d` and maximum depth `m`. * Time complexity: `O(b^m)` * Space complexity: `O(bm)` * Not optimal * Not complete (complete if no loops) ### Depth Limited Search A Depth First Search with a limited maximum depth. Depth limit `l`. * Time Complexity: `O(b^l)` * Space Complexity: `O(bl)` * Complete if `l >= d` * Not optimal ### Uniform Cost Search A form of Breadth First Search, using a Priority Queue. Node with the lowest total path cost is expanded. If all the cost steps are equal, it is exactly the same as BFS. Complete and Optimal if no negative path costs. ### Iterative Deepening A depth-limited search where the limit is increased iteratively. Avoids the space complexity of BFS. * Time Complexity: `O(b^d)` * Space Complexity: `O(bd)` * Optimal * Complete ### Iterative Improvement Paths not retained - low memory ### Heuristic Search Heuristic is a sort of rule of thumb. ### Hill Climbing Heuristic best child chosen at each point until goal reached or no change in current state. No backtracking. ### Best First Search Priority queue based on heuristic. Like BFS other than that. ### Greedy Best First Search `f(n) = h(n)` ### A* A best first search which takes into account current path cost. `f(n) = g(n) + h(n)` ### Dominance/Informedness If `h2(n) >= h1(n)` then `h2` dominates `h1`. ### Genetic Algorithms Encode solutions in Chromosomes. Mutate and crossover chromosomes. Evaluate the new population and select the best. ### Genetic Programming A branch of genetic algorithms which uses programming statements as genes. Easy to do in LISP. Usually represented as trees. #### Koza's Algorithms * Tree consists of functions and terminals * Choose a set of functions and terminals * e.g.: `{+,-,*,/,sqrt},{A,B}` * Generate random population which are syntactically correct. * Follow GA-like procedure. ### Ant Colony Optimisation In nature, ants could solve complex problems unsupervised. Capable of finding the shortest route between a food source and the nest. Can react to changes in the environment, * Each ant moves "randomly" * Pheromone is deposited on path * Ants detect lead ant's path, inclined to follow. * More pheromone on a path increases the probability of that path being followed. * Pheromone decays over time. #### Mechanics of ACO * Graph Representation * Heuristic desirability of edges * Construction of feasible solutions * Pheromone update rule (attached to edges). #### Algorithm for one ant * Select starting node at random. * While not finished: * Evaluate all edges from this node. * Select the best-looking edge via probabilistic transition rule * Deposite artificial pheromon on the chosen edge. * Finished path is a potential solution, analysed for optimality. ### Particle Swarm Optimisation Each particle is searching for the optimum and encodes a solution (like a GA). Each particle is *moving* (can't search otherwise), and hence has a *velocity*. It also maintains the position it was in where it had its best result so far (its *personal best*). The particles co-operate, exchanging information about what they've discovered in the places they've visited. This co-operation only needs to be very simple; * A particle has a *neighbourhood* associated with it * A particle knows the fitness of those in its neighbourhood, and uses the *position* of the one with the best fitness. * Adjusts using this. ### Multi-objective Optimisation Sometimes an answer has to be optimal in several aspects. Examples: * Quickest and cheapest flights * Lightest and strongest construction material. ## Knowledge Representation *Ways of representing knowledge in a computer-understandable way. Semantic networks, rules. Examples of the importance of KR (4 hours).* ### What is Knowledge? AI Agents deal with knowledge (data). * Facts * Procedures * Meaning ### Logic Representation ### Non-logical Representation Logical representations have restrictions which can be hard to work with. #### Classes Ignored in Logic Objects in the world tend to be related to each other: * Classes, super- and sub- classes. * Part or whole hierarchies * Properties which are inherited. The state of the world changing over time. * Explicit representation of time. * Frame problem. * Non-monotonic reasoning. Closed world assumption. Uncertainty or fuzzy knowledge. #### Classes and Object-Orientated Representations Classes define Objects, Objects are instances of Classes. Object ∈ Class Class ⊂ Superclass Facts and rules can be encoded. #### Semantic Networks Essentially a generalisation of inheritance hierarchies. Each node is an object, class, concept or event. Each link is a relationship which makes sense in context. Inheritance is as expected. Example: ![](http://upload.wikimedia.org/wikipedia/commons/thumb/6/67/Semantic_Net.svg/320px-Semantic_Net.svg.png) **Note:** pre-dated OOP. #### Frames Incorporates certain valuable human thinking characteristics: * Expectations * Assumptions * Stereotypes * Exceptions * Fuzzy boundaries Represent what is *typical* unless an *exception* is known. Frames allow more convenient "packaging" of facts about an object. Frames often allow things which are typical of a class and which are definitional and can't be overridden. Frames also support multiple inheritance. Frames are represented as semantic networks where nodes have structure. A frame has a number of slots (age, height, etc.), each of these slots stores specific information. When new information is gained slots can be filled in, this can cause the triggering of actions, which may trigger the retrieval of other frames. #### Scripts Can fill in missing detail that is assumed. ### Non-monotonic Logic Once `true` doesn't mean always `true`. As information arrives, truth values can change. A number of implementations for this, * Circumscription: * `Brid(x) and not abnormal(x) -> flies(x)` * We can assume `not abnormal(x)` unless we know `abnormal(x)`. * Default logic: * "`x` is true given `x` does not conflict with anything we already know." #### Truth Maintenance Systems These systems allow truth values to be changed during reasoning (belief revision). When retracting to a fact, must also retract to any other fact derived from it.
Penelope is a bird    (can fly)
Penelope is a penguin (cannot fly)
Penelope is magical   (can fly)
Retract magical       (cannot fly)
Retract penguin       (can fly)
##### Justification-based TMS For each fact, track its justification When a fact is retracted, retract all facts that have justifications leading back to that fact, unless they have independent justifications. ##### Assumption-based TMS Represent all possible states simultaneously ## Neural Networks and subsymbolic learning *We can find solutions using search, but how can we remember solutions, learn from them and adapt them to new situations? This will cover perceptrons, single-layer and multi-layer networks (5 hours).* ### Symbolic Learning When we use some sort of rule-based system, we generally have to understand the rules. This means we understand the conclusions it draws, because it can tell us. When a system learns from such rules, it processes in a way which can be understood. ### Subsymbolic Learning Don't really understand, or have control over, the way in which solutions are found. ANNs, GAs, GP and sometimes statistical methods. Might be related to the randomness factor. ### Artificial Neural Networks *(A: time to not understand backprop again)* Inputs (variables) -> network -> Outputs (results). X and Y matrices of the statistical models are analogous to the training inputs and outputs of ANNs. ### Backpropagation Most common learning rule for ANNs. Connections between nodes given random initial weights. We therefore get a value at the output node(s) which happens when these random weights are applied to the data at the input. ### Epoch An iteration, that is, finding the error then adjusting weighting, is called an epoch. There may be many thousands of epochs in one training. ### Overfitting Need to be able to *generalise* the model to unseen data. Too much training will lead to a lack of generalisation. ### Kohonen Neural Networks Self-organising neural networks. Random initialisation of a grid, see which nodes best match the input and change nodes around it. ## Propositional and First-Order Logic *The backbone of knowledge representation (4 hours).* ### Types of Knowledge * Declarative - Facts * Procedural * Meta - knowledge about knowledge * Heuristic - rules of thumb ### Representing Knowledge: Object-attribute-value Encoding a fact in three pieces of information; the object, it's attributes and the values of these attributes. The facts a human knows are not obvious to a computer; needs to be encoded. e.g.: `Tree(species, oak)` The encoding doesn't matter, so long as its uniform to the system. Can include an uncertainty factors is a number which can be taken into account by the system when making decisions. The final conclusion of any program where uncertainty was used in the input is likely to also have an uncertainty factor (if you're not sure of the facts, can can the result be certain?) Encoding uncertainty might be encoded something like this: `Tree(species, oak, 0.8)` (the certainty that the tree is an oak tree is 80%). Again encoding doesn't matter, so long as its uniform to the system. ### Rules A knowledge base may have rules associated. `IF` premises `THEN` conclusion. There may be more than one premises and may contain logical function * `AND`, `OR` and `NOT` for example If a premise evaluates to `TRUE` the rule *fires*. e.g. `IF tree(species, oak) THEN tree(type, deciduous)` Rules may contradict another rule. Different strategies can be applied to choose the most specific or most relevant rules. ### Meta Rules
IF
    tree is conifer
THEN
    load conifer data
ELSE
    load deciduous data
### Logic We can represent knowledge using logic. There are two types: *propositional* and *predicate* (or first-order logic or predicate calculus) ### Propositional Logic In propositional logic formulas are constructed using variables, `TRUE` and `FALSE` constants and connectors: * `AND` (`∧`) * `OR` (`∨`) * `NOT` (`¬`) * `IMPLIES` (`→`) ### Predicate Logic Prolog is based on this. A predicate is like a function that returns `TRUE` or `FALSE` `Tree(a)` is true if `a` is oak, false if `a` is daffodil. ### Implication `Oak(a) → Tree(a)` If the first clause is satisfied, the second clause is also satisfied. ### Assertions The symbol `∀` can be read as "for all". `∀a(Oak(a) → Tree(a)` ### Existence The symbol `∃` can be read as "exists" `∃a(Beech(a) ∧ ¬Green(a))` ### The Atomic Formula `beech(a)` is known as an atomic formula. Can have multiple parameters. ### Human Reasoning We use two standard rules: * Deductive 1. Modus Ponens - if we know `P→Q` then if `P` is true, `Q` must also be true 2. Modus Tollens - if we know `P→Q` then if `Q` is false, `P` must also be false * Inductive * Difficult for machines * Observations: * Oak trees have green leaves * Pine trees have green leaves * Induce * All trees have green leaves * Unfortunately, that's not true, but it is useful. ### Non-monotonic Classic monotonic reasoning cannot contain contradictions Put formally: `X ⊆ Y → Deriv(X) ⊆ Deriv(Y)` where `Deriv(X)` is a set of facts derived from `X` ### Temporal Reasoning Reasoning changes over time. Can introduce this into machines by introducing a concept of time. ### Machine Inference Machine inference is used to deduce new facts from a knowledge base which is held in working memory. `Knowledge Base -> Inference Engine -> Working memory` Can be very complex ### Deducing New Facts Two principal methods 1. Forwards chaining - based on modus ponens. 2. Backwards chaining - based on modus tollens. ### Forward Chaining Modens Ponens: `student(S) ∧ studies(S, ai) → studies(S, prolog)` `student(T) ∧ studies(S, expsys) → studies(T, ai)` `student(joe)` `studies(joe, expsys)` Therefore we can deduce: `studies(joe, prolog)` Proof: `student(joe) ∧ studies(joe, expsys) → studies(joe, ai)` `student(joe) ∧ studies(joe, ai) → studies(joe, prolog)` Q.E.D. Forwards chaining can fire any rules which match the knowledge held in its working memory. This can potentially come up with a huge amount of new knowledge, most of which is probably completely irrelevant. ### Backwards chaining Backwards chaining sets out to prove a piece of information. The information will either be true or false, but doesn't generate unwanted results. Use the *resolution* proof method for now. Say we have: `A1 ∨ A2 ∨ ... ∨ An ∨ B` and ` ¬B ∨ C1 ∨ C2 ∨ ... ∨ Cm` Resolvent of clauses is: `A1 ∨ .. ∨ An ∨ C1 ∨ ... ∨ Cm` ### Resolution Now take the two clauses: `A1 ∨ A2 ∨ ... ∨ An ∨ B` and `D ∨ C1 ∨ C2 ∨ ... ∨ Cm` If there is some subset where `B` and `D` are negations of each other `Theta` If we have two clauses `Clause1` and `Clause2`, and these both have a resolvant `R`, then if `Clause1` and `Clause2` are both satisfiable, so must `R` be. The idea: take a clause, containing a goal we want to prove, and negate that goal. If we then resole this with other clauses, over and over and we get to the empty clause (which is never satisfiable), we have proved our goal. ### Clause form We can express any predicate calculus statement in clause form. This enables us to work with OR and NOT rather than any other clause. `p → q ≡ ¬p ∨ q` `A ∧ B ≡ ¬(¬A ∨ ¬B)` ### Example of Resolution Use a previous example in clause form: 1. `¬student(S) ∨ ¬studies(S, ai) ∨ studies(S, prolog)` 2. `¬student(T) ∨ ¬studies(T, expsys) ∨ studies(T, ai)` 3. `student(joe)` 4. `studies(joe, expsys)` Solution to `studies(S, prolog)` means we must negate it: `¬studies(S, prolog)` Resolve the clause 1: `¬student(S) ∨ ¬studies(S, ai)` Resolve with clause (2) (`S=T`) `¬student(S) ∨ ¬studies(S, expsys)` Resolve with clause (4) (`S = joe`): `¬student(joe)` Resolve with clause (3): `Ø` Therefore `studies(joe, prolog)` is true. Q.E.D. ### Horn Clauses The same thing, but expressed differently. This is how Prolog does it. A horn clause is a series of disjuncts (ORs) We can take: `A ∨ ¬B ∨ ¬C` and write it: `A ← B ∧ C` If `B` and `C` then `A` All the same thing: 1. `studies(S, prolog) ← student(S) and studies(S, ai)` 2. `studies(T, ai) ← student(T and studies(T, expsys)` 3. `student(joe) ←` 4. `studies(joe, expsys) ←` ### Resolving Horn Clauses All the negatives are one side of the `←` sign. `← studies(S, prolog)` `studies(S, prolog) ← student(S) ∧ studies(S, ai)` `← student(S) ∧ studies(S, ai)` `studies(T, ai) ← student(T) ∧ studies(T, expsys)` `S = T` `← student(S) ∧ studies(S, expsys)` `student(joe) ←` `S = joe` `← studies(joe, expsys)` `studies(joe, expsys) ←` `←` Q.E.D. ### Prolog Example [Prolog Tutorial](http://users.aber.ac.uk/ais/Prolog/) ### Prolog **Pro**gramming with **log**ic. Uses backwards chaining through horn clauses. ## Programming for Intelligent Systems *Practical introduction to programming for Intelligent Systems, used to illustrate search, KR and first-order logic (3 hours).* ## Rule-based Systems *How can human expertise be automated? How to build an expert system - system concepts and architectures. Rule-based systems: design, operation, reasoning, backward and forward chaining (3 hours).* ### RBS Knowledge base contains the rules. Database contains the facts. Inference engine uses both of these to match facts to rules to derive new facts, etc. Justifiable and transparent. ## Knowledge Acquisition *Knowledge Acquisition and its importance in KR and RBS (2 hours).* ## Statistical Methods *Multivariate analysis and statistical methods for solving problems.* ### Multivariate Analysis Analysis of high-dimensional data. Often not possible to identify or quantify an object from one dimension. Two methods: 1. Statistical 2. AI methods ### Statistical Models Always produces the same result from a given set of data. Will always find the best result within the constraints of their abilities. ### X Values All the factors which might affect the outcome so that the analysis can find some form of *correlation*. ### Y Values The set of objects and associated variables. ### Realigning Axes. Usually statistics are in high-dimensional space. No way of visualising this data. In a 3D space, can fit a plane to any 3 points to make the problem a 2D one. Can approximate the plane to more than 3 points and fit the best plane using root mean square error. This is done using regression, but does mean there will be **error** in the model. Can do this with a 3D plane, but the idea is to keep the model as simple as possible. ### Principal Component Analysis The line through multi-dimensional space which describes the most variation in the data (the one with the widest space) is the First Principal Component. The following Principal Components are always perpendicular and are ordered by the amount of variation from the data and have less importance. Can analyse the principal components to see which of the original X values contributed most. Principal components help understand the data better. ### Principal Components Regression The main principal components help to explain what factors had most bearing on the results. They form a model that could be used for predicting results. Using a PCA model to predict is known as Principal Components Regression (PRC). Closely related is PCL. ### Principal of Parsimony Fewer variables, if they produce a model just as good in testing, should be preferred. ## Fuzzy Logic Not clear cut for a definite value. (i.e. not boolean logic). ![](http://upload.wikimedia.org/wikipedia/commons/6/61/Fuzzy_logic_temperature_en.svg) # Seminars Seminars from Aberystwyth University ## Learning from Big Random Weight Networks and Their Training Algorithms by Xizhao Wang *Fellow; IEEE Editor-in-Chief, IJMLC* ### Introduction to Big Data The internet, financial institutions, media, medical treatment, scientific research, etc. provide a huge amount of data. This data is only increasing in size. Big Data is the explosion of information. Storing Big Data is, as expected, very difficult. #### Variety Structured data is becoming unstructured. #### Velocity Moving from batch jobs to the streaming of data. #### Volume Now around Zettabytes of data around. #### Value ### Uncertainty of Big Data There is lots of uncertainty in Big Data, there can be a lot of ambiguity in symbolic data, etc. ## Some cybersecurity issues of Smart Buildings, Smart Metering, not-so-Smart Cars and the Smart Grid *Martyn Thomas CBE FREng* ### Smart Buildings Offer significant benefits **and new vulnerabilities**. #### Benefits * Access Passes, Attendance monitoring, follow me printing * Calendar, room booking, etc. * CCTV, WiFi, Remote and central monitoring and control * Dynamic displays, emergency messages, fire alarms #### Vulnerabilities * Denial of access, unauthorised access, data compromise * Disruption, wasted energy, damage * Intrusion, loss of control, * Safety. ### Smart Cars Michael Barr's expert report in the Bookout v Toyota lawsuit. [See here](http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf). ### Safety World Safety world works on probabilities, but in security there is a direct attacker - can't argue independence. ### Smart Meter Security End-to-end security: "No trusted components". Whether this works or not is unknown. #### Vulnerabilities * Meters contain clocks and billing data, if these are changed, bills can be manipulated. * Meters contain an off switch, misuse could cause distress, harm or substantial disruption. * Meter firmware will need to be updated. * The Gas meter can only handle low-grade encryption. ##### Who controls the off switch? Anyone who has the authority to send the relevant messaged through the DCC. Anyone who can mount a successful **cyber attack** on the network. ##### How will security be assured? * Human review of the spec.s * Testing the system * Focuesed pentesting Testing only shows that faults **do** exist, not that there are no faults. For high confidence you need formal methods. ### Cyber Security Is a through-life discipline. * Planning * Control * Monitoring * Response * From design through to decommissioning * Lifetimes of 20+ years * Don't run anti-virus * Monitoring is difficult (and uncommon) * How to respond to incidents? * Insider threats * Lifetime security needs discipline and excellent configuration control through years of maintenance and upgrades. ### [M2M and the IoT](http://www.wired.com/opinion/2014/01/theres-no-good-way-to-patch-the-internet-of-things-and-thats-a-huge-problem/) ### Cyber Security Is a tier one threat.