Browsing by Author "Deka, Lipika"
Now showing 1 - 20 of 32
Results Per Page
Sort Options
Item Metadata only A preliminary study on crop classification with unsupervised algorithms for time series on images with olive trees and cereal crops(Springer, 2020-08-29) Rivera, Antonio Jesus; Perez-Godoy, Maria Dolores; Elizondo, David; Deka, Lipika; del Jesus, Maria JoseItem Metadata only An Optimised BERT Pretraining Approach for Identification of Targeted Offensive Language: Data Imbalance and Potential Solutions(IEEE, 2023-05-22) Mifsud, Ruth; Deka, Lipika; Lahiri, IndraniTargeted offensive comments and hate speech on online media platforms are on the rise, with evidential mental health consequences including suicide. Several NLP techniques have been proposed and in use. However, data imbalance in the training dataset is stopping them from performing at full potential. Solutions include under-sampling of the majority class, oversampling of the minority class or introducing synthetic samples. These approaches present with their own unique problems - that of critical information loss, overfitting and non-generalised models. The presented research explores these approaches for addressing the data imbalance problem, by varying the under/over/synthetic sampling rate and studying the performance as well as the generalisability of the models.Item Metadata only Analysis of clustering methods for crop type mapping using satellite imagery(Elsevier, 2022-04-06) Rivera, Antonio J.; Perez-Godoy, Maria Dolores; Deka, Lipika; del Jesus, Maria J.; Elizondo, DavidWith the current challenges in population growth and scarceness of food, new technologies are emerging. Remote sensing in general and satellite imagery more specifically are part of these technologies which can help provide accurate monitoring and classification of cultivars. Part of the increase in the use of these technologies has to do with the ongoing increment on the spatial–temporal resolution together with the free availability of some of these services. Typically time series are used as a pre-processing technique and combined with supervised learning techniques in order to build models for crop type identification in remote images. However, these models suffer from the lack of labelled data sets needed to train them. Unsupervised classification can overcome this limitation but has been less frequently used in this research field. This paper proposes to test and analyse the performance of several unsupervised clustering algorithms towards crop type identification on remote images. In this manner combinations of clustering algorithms and distance measures, a key element in the behaviour of these algorithms, are studied using an experimental design with more than twenty datasets built from the combinations of five crops and more than 45000 parcels. Results highlight better clustering methods and distance measures to create accurate and novel crop mapping models for remote sensing images.Item Open Access Applications of Machine Learning (ML) and Mathematical Modeling (MM) in Healthcare with Special Focus on Cancer Prognosis and Anticancer Therapy: Current Status and Challenges(MDPI, 2024-02-09) Hasan, Jasmin; Mohammed Saeed, Safiya; Deka, Lipika; Uddin, Md Jasim; Das, Diganta B.The use of data-driven high-throughput analytical techniques, which has given rise to computational oncology, is undisputed. The widespread use of machine learning (ML) and mathematical modeling (MM)-based techniques is widely acknowledged. These two approaches have fueled the advancement in cancer research and eventually led to the uptake of telemedicine in cancer care. For diagnostic, prognostic, and treatment purposes concerning different types of cancer research, vast databases of varied information with manifold dimensions are required, and indeed, all this information can only be managed by an automated system developed utilizing ML and MM. In addition, MM is being used to probe the relationship between the pharmacokinetics and pharmacodynamics (PK/PD interactions) of anti-cancer substances to improve cancer treatment, and also to refine the quality of existing treatment models by being incorporated at all steps of research and development related to cancer and in routine patient care. This review will serve as a consolidation of the advancement and benefits of ML and MM techniques with a special focus on the area of cancer prognosis and anticancer therapy, leading to the identification of challenges (data quantity, ethical consideration, and data privacy) which are yet to be fully addressed in current studies.Item Open Access Are Public Intrusion Datasets Fit for Purpose: Characterising the State of the Art in Intrusion Event Datasets(Elsevier, 2020-09-08) Kenyon, Anthony; Deka, Lipika; Elizondo, DavidIn recent years cybersecurity attacks have caused major disruption and information loss for online organisations, with high profile incidents in the news. One of the key challenges in advancing the state of the art in intrusion detection is the lack of representative datasets. These datasets typically contain millions of time-ordered events (e.g. network packet traces, flow summaries, log entries); subsequently analysed to identify abnormal behavior and specific attacks [1]. Generating realistic datasets has historically required expensive networked assets, specialised traffic generators, and considerable design preparation. Even with advances in virtualisation it remains challenging to create and maintain a representative environment. Major improvements are needed in the design, quality and availability of datasets, to assist researchers in developing advanced detection techniques. With the emergence of new technology paradigms, such as intelligent transport and autonomous vehicles, it is also likely that new classes of threat will emerge [2]. Given the rate of change in threat behavior [3] datasets become quickly obsolete, and some of the most widely cited datasets date back over two decades. Older datasets have limited value: often heavily filtered and anonymised, with unrealistic event distributions, and opaque design methodology. The relative scarcity of (Intrusion Detection System) IDS datasets is compounded by the lack of a central registry, and inconsistent information on provenance. Researchers may also find it hard to locate datasets or understand their relative merits. In addition, many datasets rely on simulation, originating from academic or government institutions. The publication process itself often creates conflicts, with the need to de-identify sensitive information in order to meet regulations such as General Data Protection Act (GDPR) [4]. Another final issue for researchers is the lack of standardised metrics with which to compare dataset quality. In this paper we attempt to classify the most widely used public intrusion datasets, providing references to archives and associated literature. We illustrate their relative utility and scope, highlighting the threat composition, formats, special features, and associated limitations. We identify best practice in dataset design, and describe potential pitfalls of designing anomaly detection techniques based on data that may be either inappropriate, or compromised due to unrealistic threat coverage. Such contributions as made in this paper is expected to facilitate continuous research and development for effectively combating the constantly evolving cyber threat landscape.Item Open Access Artificial neural network to determine dynamic effect in capillary pressure relationship for two-phase flow in porous media with micro-heterogeneities(Springer, 2014-11-08) Das, Diganta; Thirakulchaya, Thanit; Deka, Lipika; Hanspal, NavraAn artificial neural network (ANN) is presented for computing a parameter of dynamic two-phase flow in porous media with water as wetting phase, namely, dynamic coefficient (τ), by considering micro-heterogeneity in porous media as a key parameter. τ quantifies the dependence of time derivative of water saturation on the capillary pressures and indicates the rates at which a two-phase flow system may reach flow equilibrium. Therefore, τ is of importance in the study of dynamic two-phase flow in porous media. An attempt has been made in this work to reduce computational and experimental effort by developing and applying an ANN which can predict the dynamic coefficient through the “learning” from available data. The data employed for testing and training the ANN have been obtained from computational flow physics-based studies. Six input parameters have been used for the training, performance testing and validation of the ANN which include water saturation, intensity of heterogeneity, average permeability depending on this intensity, fluid density ratio, fluid viscosity ratio and temperature. It is found that a 15 neuron, single hidden layer ANN can characterize the relationship between media heterogeneity and dynamic coefficient and it ensures a reliable prediction of the dynamic coefficient as a function of water saturation.Item Embargo Artificial Neural Networks, Sequence-to-Sequence LSTMs, and Exogenous Variables as Analytical Tools for NO2 (Air Pollution) Forecasting: A Case Study in the Bay of Algeciras (Spain)(MDPI, 2021-03-04) Gonzalez-Enrique, Javier; Ruiz-Aguilar, Juan Jesus; Moscoso-Lopez, Jose Antonio; Urda, Daniel; Deka, Lipika; Turias, Ignacio JThis study aims to produce accurate predictions of the NO2 concentrations at a specific station of a monitoring network located in the Bay of Algeciras (Spain). Artificial neural networks (ANNs) and sequence-to-sequence long short-term memory networks (LSTMs) were used to create the forecasting models. Additionally, a new prediction method was proposed combining LSTMs using a rolling window scheme with a cross-validation procedure for time series (LSTM-CVT). Two different strategies were followed regarding the input variables: using NO2 from the station or employing NO2 and other pollutants data from any station of the network plus meteorological variables. The ANN and LSTM-CVT exogenous models used lagged datasets of different window sizes. Several feature ranking methods were used to select the top lagged variables and include them in the final exogenous datasets. Prediction horizons of t + 1, t + 4 and t + 8 were employed. The exogenous variables inclusion enhanced the model’s performance, especially for t + 4 (ρ ≈ 0.68 to ρ ≈ 0.74) and t + 8 (ρ ≈ 0.59 to ρ ≈ 0.66). The proposed LSTM-CVT method delivered promising results as the best performing models per prediction horizon employed this new methodology. Additionally, per each parameter combination, it obtained lower error values than ANNs in 85% of the cases.Item Metadata only Characterising Payload Entropy in Packet Flows(2024-04-29) Kenyon, Anthony; Deka, Lipika; Elizondo, DavidAccurate and timely detection of cyber threats is critical to keeping our online economy and data safe. A key technique in early detection is the classification of unusual patterns of network behaviour, often hidden as low-frequency events within complex time-series packet flows. One of the ways in which such anomalies can be detected is to analyse the information entropy of the payload within individual packets, since changes in entropy can often indicate suspicious activity - such as whether session encryption has been compromised, or whether a plaintext channel has been co-opted as a covert channel. To decide whether activity is anomalous we need to compare real-time entropy values with baseline values, and while the analysis of entropy in packet data is not particularly new, to the best of our knowledge there are no published baselines for payload entropy across common network services. We offer two contributions: 1) We analyse several large packet datasets to establish baseline payload information entropy values for common network services, 2) We describe an efficient method for engineering entropy metrics when performing flow recovery from live or offline packet data, which can be expressed within feature subsets for subsequent analysis and machine learning applications.Item Metadata only Characterising Payload Entropy in Packet Flows—Baseline Entropy Analysis for Network Anomaly Detection(MDPI, 2024-12-16) Kenyon, Anthony; Deka, Lipika; Elizondo, DavidThe accurate and timely detection of cyber threats is critical to keeping our online economy and data safe. A key technique in early detection is the classification of unusual patterns of network behaviour, often hidden as low-frequency events within complex time-series packet flows. One of the ways in which such anomalies can be detected is to analyse the information entropy of the payload within individual packets, since changes in entropy can often indicate suspicious activity—such as whether session encryption has been compromised, or whether a plaintext channel has been co-opted as a covert channel. To decide whether activity is anomalous, we need to compare real-time entropy values with baseline values, and while the analysis of entropy in packet data is not particularly new, to the best of our knowledge, there are no published baselines for payload entropy across commonly used network services. We offer two contributions: (1) we analyse several large packet datasets to establish baseline payload information entropy values for standard network services, and (2) we present an efficient method for engineering entropy metrics from packet flows from real-time and offline packet data. Such entropy metrics can be included within feature subsets, thus making the feature set richer for subsequent analysis and machine learning applicationsItem Open Access Consistent Online Backup in Transactional File Systems(IEEE, 2014-11) Deka, Lipika; Barua, GautamThe backup taken of a file system must be consistent, preserving data integrity across files in the file system. With file system sizes getting very large, and with demand for continuous access to data, backup has to be taken when the file system is active (is online). Arbitrarily taken online backup may result in an inconsistent backup copy. We propose a scheme referred to as mutual serializability to take a consistent backup of an active file system assuming that the file system supports transactions. The scheme extends the set of conflicting operations to include read-read conflicts, and it is shown that if the backup transaction is mutually serializable with every other transaction individually, a consistent backup copy is obtained. The user transactions continue to serialize within themselves using some standard concurrency control protocol such as Strict 2PL. We put our scheme into a formal framework to prove its correctness, and the formalization as well as the correctness proof are independent of the concurrency control protocol used to serialize user transactions. The scheme has been implemented and experiments show that consistent online backup is possible with reasonable overhead.Item Open Access Container Based Electronic Control Unit Virtualisation: A Paradigm Shift Towards a Centralised Automotive E/E Architecture(MDPI, 2024-10-31) Ayre, Nicholas; Deka, Lipika; Paluszczyszyn, DanielThe past 40 years have seen automotive Electronic Control Units (ECUs) move from being purely mechanical controlled to being primarily digital controlled. While the safety of passengers and efficiency of vehicles has seen significant improvements, rising ECU numbers have resulted in increased vehicle weight, greater demands placed on power, more complex hardware and software, ad hoc methods for updating software, and subsequent increases in costs for both vehicle manufacturers and consumers. To address these issues, the research presented in this paper proposes that virtualisation technologies be applied within automotive electrical/electronic (E/E) architecture. The proposed approach is evaluated by comprehensively studying the CPU and memory resource requirements to support container-based ECU automotive functions. This comprehensive performance evaluation reveals that lightweight container virtualisation has the potential to welcome a paradigm shift in E/E architecture, promoting consolidation and enhancing the architecture by facilitating power, weight, and cost savings. Container-based virtualisation will also enable efficient and robust online dynamic software updates throughout a vehicle’s lifetime.Item Open Access Continuous Automotive Software Updates through Container Image Layers(MDPI, 2021-03-20) Ayres, Nicholas; Deka, Lipika; Paluszczyszyn, DanielThe vehicle-embedded system also known as the electronic control unit (ECU) has transformed the humble motorcar, making it more efficient, environmentally friendly, and safer, but has led to a system which is highly dependent on software. As new technologies and features are included with each new vehicle model, the increased reliance on software will no doubt continue. It is an undeniable fact that all software contains bugs, errors, and potential vulnerabilities, which when discovered must be addressed in a timely manner, primarily through patching and updates, to preserve vehicle and occupant safety and integrity. However, current automotive software updating practices are ad hoc at best and often follow the same inefficient fix mechanisms associated with a physical component failure of return or recall. Increasing vehicle connectivity heralds the potential for over the air (OtA) software updates, but rigid ECU hardware design does not often facilitate or enable OtA updating. To address the associated issues regarding automotive ECU-based software updates, a new approach in how automotive software is deployed to the ECU is required. This paper presents how lightweight virtualisation technologies known as containers can promote efficient automotive ECU software updates. ECU functional software can be deployed to a container built from an associated image. Container images promote efficiency in download size and times through layer sharing, similar to ECU difference or delta flashing. Through containers, connectivity and OtA future software updates can be completed without inconveniences to the consumer or incurring expense to the manufacturer.Item Embargo Critical infrastructure risk in NHS England: predicting the impact of building portfolio age(Taylor and Francis, 2015-06-19) Mills, Grant; Deka, Lipika; Price, Andrew; Rich-Mahadkar, Sameedha; Pantzartzis, Efthimia; Sellars, PeterNHS Trusts in England must adopt appropriate levels of continued investment in routine and backlog maintenance if they are to ensure critical backlog does not accumulate. This paper presents the current state of critical backlog maintenance within the National Health Service (NHS) in England through the statistical analyses of 115 Acute NHS Trusts. It aims to find empirical support for a causal relationship between building portfolio age and year-on-year increases in critical backlog. It makes recommendations for the use of building portfolio age in strategic asset management. The current trend across this sample of NHS Trusts may be typical of the whole NHS built asset portfolio and suggests that most Trusts need to invest between 0.5 and 1.5 per cent of income (depending upon current critical backlog levels and Trust age profile) to simply maintain critical backlog levels. More robust analytics for building age, condition and risk-adjusted backlog maintenance are required.Item Open Access Development and Performance Evaluation of a Connected Vehicle Application Development Platform (CVDeP)(Transportation Research Board, 2020-01-12) Islam, Mhafuzul; Rahman, Mizanur; Khan, Sakib Mahmud; Chowdhury, Mashrur; Deka, LipikaConnected vehicle (CV) application developers need a development platform to build, test and debug real-world CV applications, such as safety, mobility, and environmental applications, in edge-centric cyber-physical systems. Our study objective is to develop and evaluate a scalable and secure CV application development platform (CVDeP) that enables application developers to build, test and debug CV applications in realtime. CVDeP ensures that the functional requirements of the CV applications meet the corresponding requirements imposed by the specific applications. We evaluated the efficacy of CVDeP using two CV applications (one safety and one mobility application) and validated them through a field experiment at the Clemson University Connected Vehicle Testbed (CU-CVT). Analyses prove the efficacy of CVDeP, which satisfies the functional requirements (i.e., latency and throughput) of a CV application while maintaining scalability and security of the platform and applications.Item Embargo Dynamics of Dialogue, Humanity, and Peace in the Metaverse: Towards a Conceptual Debate(IGI Global, 2022-12) Lahiri, Indrani; Deka, Lipika; Chakraborty, Nandini; Lakhanpaul, Monica; Pattni, Kamla; Punwani, AnitaAll our fundamental aspects of living a fulfilled life have become at odds with progress. This chapter will consider what are the key components of humanity, thereby addressing the fundamental question of whether metaverse going to alter our understanding of humanity, peace, and collegiality. This chapter aims to trigger a conceptual debate by applying a digital anthropological framework to examine how human beings have been socially conditioned to think, perceive, and anticipate. In doing so, the authors argue that the social construction of fear of the unknown prevents humans from critically analysing our relationship with humanity and technology. Additionally, the chapter aims to initiate some discussion around (1) metaverse and what it means in a human world; (2) will the virtually modified human beings recreate or alter humanity; (3) what would peace look like in metaverse?Item Open Access Estimation of Travel Times for Minor Roads in Urban Areas Using Sparse Travel Time Data(IEEE, 2020-01-30) Vu, Luong H.; Passow, Benjamin N.; Paluszczyszyn, D.; Deka, Lipika; Goodyer, E. N.Item Metadata only Fuzzy Logic Applied to System Monitors(IEEE, 2021-04-09) Khan, Noel; Elizondo, David; Deka, Lipika; Molina-Cabello. M. A.System monitors are applications used to monitor other systems (often mission critical) and take corrective actions upon a system failure. Rather than reactively take action after a failure, the potential of fuzzy logic to anticipate and proactively take corrective actions is explored here. Failures adversely affect a system’s non-functional qualities (e.g., availability, reliability, and usability) and may result in a variety of losses such as data, productivity, or safety losses. The detection and prevention of failures necessarily improves a critical system’s non-functional qualities and avoids losses. The paper is self-contained and reviews set and logic theory, fuzzy inference systems (FIS), explores parameterization, and tests the neighborhood of rule thresholds to evaluate the potential for anticipating failures. Results demonstrate detectable gradients in FIS state spaces and means fuzzy logic based system monitors can anticipate rule violations or system failures.Item Embargo Healthcare Facility Coverage for Malaria and Sickle Cell Disease Treatment: A Spatial Analysis of Ikorodu Local Government Area of Lagos State(Common Ground, 2019-10-15) Olowofoyeku, Olukemi; Shell, Jethro; Goodyer, Eric A.; Deka, LipikaThe escalating population growth in Nigeria calls for urgent attention to malaria control and the provision of accessible public health care for treatment of the disease (appropriate malaria treatment and intervention can, in turn, bring a reduction in the sickle cell disease (SCD) crisis). Malaria is a major cause of visits to healthcare facilities, which is amplified by the malaria interaction with SCD. Access to treatment is a basic need of the population in a country; however, in Nigeria, access to health care is generally poor. Healthcare facilities are sparsely distributed and services inadequate to take care of the health needs of the whole population. This article discusses malaria and SCD prevalence in Nigeria and analyses the spatial distribution of primary healthcare facilities in the Ikorodu Local Government Area of Lagos State, Nigeria, using Geographic Information System (GIS). Analysis is based on existing facility locations in relation to 15 and 30 minutes’ walking time in a 1-km and 2-km catchment radius, respectively. The results show primary health center (PHC) facilities’ coverage of 48 percent for 2-km catchment radius and 15 percent for 1-km catchment. Based on this analysis, this article argues that there is a need to increase the number of facilities for treatment that are optimally located to take care of travel distance and expand facility coverage. This will reduce mortality and morbidity rates due to the diseases.Item Open Access Heavy Duty Vehicle Fuel Consumption Modelling Using Artificial Neural Networks.(IEEE, 2019-09-05) Wysocki, Oskar; Deka, Lipika; Elizondo, DavidIn this paper an artificial neural network (ANN) approach to modelling fuel consumption of heavy duty vehicles is presented. The proposed method uses easy accessible data collected via CAN bus of the truck. As a benchmark a conventional method, which is based on polynomial regression model, is used. The fuel consumption is measured in two different tests, performed by using a unique test bench to apply the load to the engine. Firstly, a transient state test was performed, in order to evaluate the polynomial regression and 25 ANN models with different parameters. Based on the results, the best ANN model was chosen. Then, validation test was conducted using real duty cycle loads for model comparison. The neural network model outperformed the conventional method and represents fuel consumption of the engine operating in transient states significantly better. The presented method can be applied in order to reduce fuel consumption in utility vehicles delivering accurate fuel economy model of truck engines, in particular in low engine speed and torque range.Item Metadata only Improved Flow Recovery from Packet Data(arXiv, 2023) Kenyon, Anthony; Elizondo, David; Deka, LipikaTypical event datasets such as those used in network intrusion detection comprise hundreds of thousands, sometimes millions, of discrete packet events. These datasets tend to be high dimensional, stateful, and time-series in nature, holding complex local and temporal feature associations. Packet data can be abstracted into lower dimensional summary data, such as packet flow records, where some of the temporal complexities of packet data can be mitigated, and smaller well-engineered feature subsets can be created. This data can be invaluable as training data for machine learning and cyber threat detection techniques. Data can be collected in real-time, or from historical packet trace archives. In this paper we focus on how flow records and summary metadata can be extracted from packet data with high accuracy and robustness. We identify limitations in current methods, how they may impact datasets, and how these flaws may impact learning models. Finally, we propose methods to improve the state of the art and introduce proof of concept tools to support this work.