The Centre for Intelligent Data Analytics' Previous Funded Projects

Student page

Find out about TCIDA's previous projects below

HR Analytics and Insider Threat

Insider Threat is a formidable risk to business because it threatens both customer and employee trust. Accidental or malicious misuse of a firm’s most sensitive and valuable data can result in customer identity theft, financial fraud, intellectual property theft, or damage to infrastructure. Because insiders have privileged access to data in order to do their jobs, it’s usually difficult for security professionals to detect suspicious activity.

This undertaking (commencing March 2018) is a two year joint project (gross budget circa [£840k – £1.26m])2 between deep360, Harrman Cyber, Oxford University and TCIDA (alongside senior consultants, ex-GCHQ) to develop advanced tools using state-of-art techniques from Artificial Intelligence (AI), Machine Learning (ML) and Natural Language Processing (NLP) to accurately and reliably identify Insider Threat.

Entity Matching, Onboarding & Analytics (Tungsten Corporation)

In March of 2015 Tungsten entered into a Joint Venture (JV) with Goldsmiths, University of London, to fund TCIDA – a new research centre based at Goldsmiths with a remit to investigate the use of state of the art artificial intelligence technologies to problems in procurement analytics. The JV defined an initial roadmap for the first 24 months which specified that, subsequent to its setup, TCIDA should deliver:

  1. A new framework to provide improved future spend analytics research and development for Tungsten, by appropriately structuring existing Tungsten data and adding functions for manipulating and reporting on this data;

  2. A new backend for the Tungsten Analytics system with near to real-time functionality;

  3. A replacement, provided using the framework referred to in (1) above alongside modern machine learning algorithms, for the existing Tungsten SmartAlec system to provide a better “company matching system”;

  4. An AI Risk Analytics Framework designed to assess and score the creditworthiness of suppliers and so minimise risk and optimise pricing of Tungsten Early Payment (TEP).

However, due to commercial imperatives at Tungsten, there was a switch of research emphasis away from risk analysis and analytics, towards focussing on developing a state-of-art Supplier Matching engine (SmartAlec4: SA4). In response to this shift, TCIDA has helped Tungsten develop the infrastructure to ensure that for both supplier e-invoicing campaigns and sales loads, the matching process is now significantly more accurate and streamlined, resulting in less touch- points for Tungsten staff and a better onboarding3 experience for the buyer. In summary, the JV delivered to Tungsten substantially reduced onboarding times for new suppliers and the amount of data preparation required from buyers.

‘SpendInsight’ – Procurement Analytics (KTP)

Launched in 2011 as a commercial service by KTP industrial partners @UK PLC, SpendInsight has been used by over 380 organisations, including Basingstoke and North Hampshire NHS Foundation Trust, which cut procurement spend by £300k via savings identified using SpendInsight. An analysis produced by SpendInsight for the National Audit Office identified gross inefficiencies in NHS procurement, yielding potential annual overall savings of at least £500m. The findings of this report were discussed in parliament and changes to NHS purchasing policy were recommended as a result.

The research, which led to the development of the SpendInsight spend analysis system, focus on one central (ontological) problem: how to recognise as the same, entities specified in different ways; e.g. to identify ‘BD Plasticpak 50ml syringes’ as identical to ‘50 millilitre BD Plasticpak syringe’. The loci of the SpendInsight research4 were natural language understanding, data re-structuring, de-duplication and product classification problems. At Goldsmiths this work built on two long-established computing research themes: ‘machine under- standing’ and ‘semantics’.

AI for Fraud Detection (KTP)

The project aim was to develop a fraud prevention system by the analysis of customer behaviour as they navigated through an e-commerce website and highlight behaviour associated with fraud. A number of methods were explored to correlate data collected from buyer’ interactions with the website and the likelihood of a fraudulent transaction taking place as a result of those interactions. It was discovered through analysis of known fraud cases that many fraudsters set up multiple accounts on the e-commerce marketplace, and that these accounts can often be related via a common email address, delivery address or username. It was decided to exploit these patterns of behaviour as a basis for automated fraud prevention.

The solution achieved involves collecting categorical parameters related to user inputs and weighting them with their closeness to known fraud cases and frequency of occurrence within the wider user population. A likelihood of fraud for each parameter is combined to give an overall measure of the likelihood of fraud for each transaction as it occurs on the website. New fraudulent transactions are flagged by the suppliers which update the individual parameter likelihoods forming a dynamical systems classifier. The system dramatically reduced the incidence of fraud (over 90% in the first 3 months) and achieved the highest KTP rating of ‘Outstanding’.

Cognition as Interaction

This project has sought to address a philosophical question with pressing social implications: what might cognition be if not computation? For most of the preceding 60 years since Turing’s seminal paper ‘Computing Machinery and Intelligence’, Cognitive Scientists have been drawn to the computational model of the mind. Turing’s mechanistic conception of mind effectively views thought and cognition as a ballistic process: once the chains are in motion – the initial values in place and the computations in execution – the outcome is fully determined. In embracing this ballistic model, however, humankind’s place in the universe seems concomitantly diminished.

Recent advances in theories of cognition have challenged philosophical doctrines underlying computationalism. Work from TCIDA constituents and colleagues [Bishop, Nasuto, 2006; Nasuto et al, 2009] has suggested a new metaphor for cognitive processes that potentially side-step at least some of these philosophical issues with computationalism. A key hypothesis is that ‘communication’, rather than ‘computation’, is a metaphor better suited to describing the operation of the mind.

In particular, the project demonstrated that the Stochastic Diffusion Process (SDP) model, based on a massive population of simple interacting agents who arrive collectively at a solution by co-operation and competition, is Turing complete (i.e. it can solve any problem traditional computational systems can do). The project specifically investigated swarm, thermodynamic and connectionist implementa- tions of the SDP meta-heuristc, showing SDP performing search optimisation, addition, ordering, and strategic game-play (HeX). This research led to recent work in operationalising a procedure to trans- late any flowchart program (cf. NORMA2) into a network of linked SDP populations (demonstrating Turing completeness).

Hardware Artificial Neural Network (Meta Technology)

This work focussed on the construction of a real-time image inspection system using bespoke neural network hardware. The hardware was a RAM based neural network method (developed by Igor Aleksander at Imperial College in the 1980s and early 1990s). RAM-based neural networks essentially use look-up tables to store the function computed by each neuron and hence are easily implemented in digital hardware, permitting very efficient training algorithms.

Artificial Neural Network (ANN) for Face Recognition (BT)

This project involved the development of a Hybrid Stochastic Network to locate eye regions in images of human faces. The solution developed – a combination of a Stochastic Search network with an n-tuple network – was demonstrated to accurately locate all eye features on which it had been trained, and also approximately seventy per cent success in locating eye features on which it had not been explicitly trained. One of the aims of the BT CONNEX project was the eventual production and demonstration of hardware Neural Network systems: the system developed was a hybrid of two extant neural technologies, both of which lend themselves to simple implementation in electronic hardware.

Distributional Semantics (NUFFIELD Foundation)

The project aimed at resolving misunderstandings in expert communication arising due to domain-specific language differences. It involved collecting a corpus of sample conversations and analysing them in order to design an appropriate data-representation, which was subsequently used in the construction of an electronic communication monitoring and support system.

During the process of designing the data representation, the need for detecting syntactic relationships between words of the sample messages arouse. This required a method for disambiguating sentence boundaries and a part-of-speech tagger to categorise each word by its grammatical class (e.g. noun, verb, adjective). Within the duration of the project, both the sentence boundary detector and the part-of-speech tagger were designed and implemented, and a simple statistical semantic model deployed to resolve ambiguous word meanings.

Hyperplane estimation (Thales/EPSRC)

Robust parameter estimation has found many uses in computer vision. Indeed, one of the most powerful robust estimation algorithms, RANSAC, was developed for registering 3D to 2D point sets. RANSAC has also been used extensively for such tasks as estimation of epipolar geometry and motion model selection and has spawned a variety of robust algorithms all with their basis in random sampling being used to minimize different criteria e.g. LMS, MINPRAN, MLESAC.

Whilst the robust estimation problem has been largely solved in low dimensional cases by random sampling methods, less is known about generic robust estimators that will work for large numbers (>10) of parameters. This project explores one way of extending the random sampling formalism to higher dimensions (NAPSAC).

ANNs for ‘Real-Time’ Control (The Science Museum)

This project set out to design and build a set of robots for the Science Museum of London, each controlled by a small hardware neural network, that could learn to move around their corral in real-time. Forming the centre-piece of a major display at the Science Museum, the project deliverable had to operate reliably and with minimal human intervention, over many months. Hence the neural technology deployed – weightless neural networks – had to be extremely robust.

Deep Neural Networks for Image Colourisation (IDS)

In this work, we develop and implement an approach inspired by Zhang et al. (2016)5. Given a grayscale photograph as input, we generate a plausible colour version of the photograph. This problem is under-constrained and, following Zhang, we embrace the underlying uncertainty by posing the problem as a classification task, using class-rebalancing at training time to increase the diversity of colours in the resulting image. The system is implemented as a feed-forward pass in a CNN at test time and is trained using a large corpus of colour images. We evaluate our algorithm using a “colourisation Turing test” asking human participants to choose between a generated and ground truth colour image.

ANNs for Colour Physics (Courtaulds)

Conventional mechanisms used for computer colourant formulation typically employ the Kubelka- Munk theory to relate reflectance values to colourant concentrations. However there are situations where this approach is not applicable, and so an alternative is desirable. One such method is to utilise Artifi- cial Intelligence techniques to mimic the behaviour of the professional colourist. The aim of this project was to deploy Neural Networks on the problem of colour recipe prediction.

Optimisation for Interplanetary Mission Planning (ESA)

To help spacecraft gain the necessary momentum for a complex mission, it is common to exploit the gravitational pull of other celestial bodies in what is commonly referred to as a gravity assist manoeuvre, a flyby or a planetary kick.

In this project, we explored a particular formalisation of an interplanetary trajectory optimisation problem that we refer to as Multiple Gravity Assist (MGA) problem. We compared standard implementations of some known global optimisation solvers, introducing a space pruning technique that may be conveniently used to improve their performances. A deterministic search space pruning algorithm was developed and its polynomial time and space complexity derived. The algorithm was shown to achieve search space reductions of greater than six orders of magnitude, thus reducing significantly the complexity of the subsequent optimisation.

Swarm Intelligence for an Autonomous Vehicle (EU)

The development of an autonomous wheelchair, which is able to self-localise, was one of the goals of the EEC TIDE (Technology and Innovation for the Disabled and Elderly) project named SENARIO. The overall aim of the autonomous wheelchair is to provide a device that people within a hospital or other such institution may wish to use to assist them to independently travel around within a known environment. This project explores the development and application of a novel neural network used to determine the position and orientation of an autonomous wheelchair. A brief overview of other localisation and self-localisation techniques is given as an introduction to the basic localisation problem. The (x, y, θ) space of possible locations is explored in parallel by a set of cells searching in a competitive co-operative manner for the most likely position of the wheelchair in its environment. Trials of the prototype system in a noisy environment demonstrated that the technique was both practical and robust.

Time-series Forecasting (UK National Grid)

There has been long-standing interest in the use of Artificial Neural Networks for short-term load forecasting. This project explored a novel approach, dividing the problem into two phases: first an historic load profile is used to form an initial prediction; subsequently, a simple multilayer perceptron is used to reshape the historic profile by modelling the effect of changes in environmental variables on the value of load.