My primary interests are in the area of Evolutionary Computation (EC), a paradigm of artificial intelligence techniques which use stochastic population-based methods to give good solutions to very difficult problems. I find the idea of population-based searches particularly intriguing, as they effectively allow for parallel searching to take place, while also adding further improvements and efficiencies in the form of solutions exchanging information (e.g. using crossover techniques in Evolutionary Algorithms, or swarming behaviour in Swarm Intelligence). A brief history of my involvement in EC research is provided below -- all publications produced can be seen here.
My history in EC research dates back to late 2014, when I was fortunate enough to recieve an undergraduate summer research scholarship with Prof. Mengjie Zhang and Harith Al-Sahaf. My research then focused on using Genetic Programming (GP) techniques for automatically detecting the location and quantity of algae in images of local rivers (i.e. image segmentation).
This experience heightened my interest in using GP, particularly for image analysis, which led to my honour's project which investigated using high-level image features for image classification in conjunction with GP. Until recently, GP had only been used with relatively low-level features (single pixel values, pixel means, basic Gabor filters etc) which limits the performance of the classifier used in many cases. My honour's work showed how using high-level SIFT or SURF features directly within a GP tree could improve classification performance compared to existing methods.
After my experience with using image features in GP during my honour's year, I realised that I was now most interested in the use of EC techniques for performing feature reduction (selection/construction/extraction) in data mining tasks. While EC has been used fairly extensively for this purpose in supervised learning (in particular, classification tasks), there had been very little work in unsupervised learning tasks, such as clustering. I also discovered that clustering in itself is an interesting and particularly difficult problem; many criteria can be used for measuring the goodness of a clustering solution (partition), a subset of which are included below:
Developing good representations and fitness functions which can consider some or all of the above when searching for good partitions is a difficult task to begin with -- when feature reduction is also performed, the problem becomes even more multi-objective. For example, consider the relationship between the number of features and the number of clusters: the more features available, the more disriminative power the clustering algorithm will have, and hence the more clusters that will be produced on average. In this regard, prioritising a small number of features being selected will implictly encourage a smaller number of clusters in the same way that seperation or connectedness would.
- Compactness of clusters: how far instances in a cluster are from the cluster centre.
- Seperation of clusters: how far apart clusters are.
- Instance connectedness: to what extent instances which are close to each other lie in the same cluster.
- Cluster density: a more general case of compactness, where non-hyper-spherical clusters are acceptable if they are consistently dense.
- Number of clusters: The above four criteria are inherently related to the number of clusters produced for a given dataset; a higher number of clusters will generally be more compact and denser but less seperated and connected.
My Ph.D research is still very much in progress, but thus far my research has primarily focused on new representations, fitness functions, and initialisation methods for performing simultaneous clustering and feature selection/construction, where the number of clusters is or isn't known, using PSO and GP methods for selection and construction respectively.