Browsing by Author "Kuhn, Stefan"
Now showing 1 - 20 of 30
Results Per Page
Sort Options
Item Open Access Applying NMR compound identification using NMRfilter to match predicted to experimental data(Springer, 2020-11-21) Kuhn, Stefan; Colreavy-Donnelly, Simon; de Andrade Silva Quaresma, Lucas Eliseu; de Andrade Silva Quaresma, Ezequiel; Moreira Borges, RicardoIntroduction Metabolomics is the approach of choice to guide the understanding of biological systems and its molecular intricacies, but compound identification is yet a bottleneck to be overcome. Objective To assay the use of NMRfilter for confidence compound identification based on chemical shift predictions for different datasets. Results We found comparable results using the lead tool COLMAR and NMRfilter. Then, we successfully assayed the use of HMBC to add confidence to the identified compounds. Conclusions NMRfilter is currently under development to become a stand-alone interactive software for high-confidence NMR compound identification and this communication gathers part of its application capabilities.Item Embargo The C6H6 NMR repository: an integral solution to control the flow of your data from the magnet to the public(Wiley, 2017-10-05) Patiny, Luc; Zasso, Michael; Kostro, Daniel; Bernal, Andres; Castillo, Andres M.; Bolaños, Alejandro; Asencio, Miguel A.; Pellet, Norman; Todd, Matthew; Schloerer, Nils; Kuhn, Stefan; Holmes, Elaine; Javor, Sacha; Wist, JulianNMR is a mature technique that is well established and adopted in a wide range of research facilities from laboratories to hospitals. This accounts for large amounts of valuable experimental data that may be readily exported into a standard and open format. Yet the publication of these data faces an important issue: Raw data are not made available; instead, the information is slimed down into a string of characters (the list of peaks). Although historical limitations of technology explain this practice, it is not acceptable in the era of Internet. The idea of modernizing the strategy for sharing NMR data is not new, and some repositories exist, but sharing raw data is still not an established practice. Here, we present a powerful toolbox built on recent technologies that runs inside the browser and provides a means to store, share, analyse, and interact with original NMR data. Stored spectra can be streamlined into the publication pipeline, to improve the revision process for instance. The set of tools is still basic but is intended to be extended. The project is open source under the Massachusetts Institute of Technology (MIT) licence.Item Open Access The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching(Springer, 2017-06-06) Willighagen, Egon L.; Mayfield, John W.; Alvarsson, Jonathan; Berg, Arvid; Carlsson, Lars; Jeliazkova, Nina; Kuhn, Stefan; Pluskal, Tomas; Rojas-Cherto, Miquel; Spjuth, Ola; Torrence, Gilleain; Evelo, Chris T.; Guha, Rajarshi; Steinbeck, ChristophBackground: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software.Item Open Access Computational Simulation of 1H NMR Profiles of Complex Biofluid Analyte Mixtures at Differential Operating Frequencies: Applications to Low-Field Benchtop Spectra(Wiley, 2021-11-30) Edgar, Mark; Kuhn, Stefan; Page, Georgina; Grootveld, MartinEstimations of accurate and reliable NMR chemical shift values, coupling patterns and constants within a reasonable timeframe remain significantly challenging, and the unavailability of reliable software strategies for the prediction of low-field (e.g., 60 MHz) spectra from those acquired at higher operating frequencies hampers their direct comparison. Hence, this study explored the applications of accessible software options for predicting these parameters in the 1H NMR profiles of analytes as a function of magnetic field strength; this was performed for individual analytes and also for complex biofluid matrices featured in metabolomics investigations. For this purpose, results from the very first successful experimental acquisition and simulation of the 1H NMR profiles of intact human salivary supernatant samples on a 60 MHz benchtop spectrometer were evaluated. Using salivary metabolite concentrations determined at 400 MHz, it was demonstrated that simulation of the low-field spectra of five biomolecules with the most prominent 1H resonances detectable allowed multiple component fits to be applied to experimental spectra. Hence, these salivary 1H NMR profiles could be successfully predicted throughout the 45–600 MHz operating frequency range. With the exception of propionate resonance multiplets, which revealed more complex coupling patterns at low field and required more astute computational and fitting options, valuable quantitative metabolomics data on salivary acetate, formate, methanol and glycine could be attained from low-field spectrometres. These studies are both timely and pertinent in view of the recent advancement of low-field benchtop NMR facilities for diagnostically significant biomarker tracking in biofluids. Experiments performed with added ammonium chloride to facilitate the release of salivary metabolites from biopolymer binding sites provided evidence that a small but nevertheless significant proportion of propionate, but not lactate, was bound to such sites, an observation of much relevance to biomolecule quantification in salivary metabolomics investigations.Item Open Access Data format standards in analytical chemistry(Walter De Gruyter, 2022-07-11) Rauh, David; Blankenburg, Claudia; Fischer, Tillmann G.; Jung, Nicole; Kuhn, Stefan; Schatzschneider, Ulrich; Schulze, Tobias; Neumann, StefanResearch data is an essential part of research and almost every publication in chemistry. The data itself can be valuable for reuse if sustainably deposited, annotated and archived. Thus, it is important to publish data following the FAIR principles, to make it findable, accessible, interoperable and reusable not only for humans but also in machine-readable form. This also improves transparency and reproducibility of research findings and fosters analytical work with scientific data to generate new insights, being only accessible with manifold and diverse datasets. Research data requires complete and informative metadata and use of open data formats to obtain interoperable data. Generic data formats like AnIML and JCAMP-DX have been used for many applications. Special formats for some analytical methods are already accepted, like mzML for mass spectrometry or nmrML and NMReDATA forNMRspectroscopy data. Other methods still lack common standards for data. Only a joint effort of chemists, instrument and software vendors, publishers and infrastructure maintainers can make sure that the analytical data will be of value in the future. In this review, we describe existing data formats in analytical chemistry and introduce guidelines for the development and use of standardized and open data formats.Item Open Access A data-oriented approach to making new molecules as a student experiment: AI-enabling FAIR publication of NMR data for organic esters(Wiley, 2021-08-09) Rzepa, Henry S.; Kuhn, StefanThe lack of machine-readable data is a major obstacle in the application of NMR in artificial intelligence. As a way to overcome this, a procedure for capturing primary NMR Spectroscopic instrumental data annotated with rich metadata and publication in a FAIR data repository is described as part of an undergraduate student laboratory experiment in a chemistry department. This couples the techniques of chemical synthesis of a never before made organic ester with illustration of modern data management practices and serves to raise student awareness of how FAIR data might improve research quality and replicability. Searches of the registered metadata are shown which enable actionable Finding and Accessing of such data. The potential for Re-use of the data in AI-applications is discussed.Item Embargo Dataset Size and Machine Learning - Open NMR Databases as a Case Study(IEEE, 2022-08-10) Kuhn, Stefan; Borges, Ricardo Moreira; Venturini, Francisco; Sansotera, MaurizioThe amount of data needed for training machine learning methods is an open question. Here, we use a problem from chemistry for examining this question. The problem is a special case of a graph data analysis. It can be tackled inter alia by using graph convolutional networks. We show that newer methods can provide good results, but need large amounts of data, which are not always available. In some cases, older methods may be preferable for low amounts of data. In the longer term, open databases can help with this problem.Item Open Access Evolving Deep Learning Convolutional Neural Networks for early COVID-19 detection in chest X-ray images(MDPI, 2021-04-28) Khishe, Mohammad; Caraffini, Fabio; Kuhn, StefanThis article proposes a framework that automatically designs classifiers for the early detection of COVID-19 from chest X-ray images. To do this, our approach repeatedly makes use of a heuristic for optimisation to efficiently find the best combination of the hyperparameters of a convolutional deep learning model. The framework starts with optimising a basic convolutional neural network, to then adding additional layers requiring further optimisation. After each optimisation round the network gets deeper and deeper, it is trained with relevant COVID-19 chest X-ray images and assessed. This iterative process ends when no improvement, in terms of accuracy, is recorded. Hence, the proposed method evolves the most performing network with the minimum number of convolutional layers. In this light, we simultaneously achieve high accuracy while minimising the presence of redundant layers to guarantee a fast but reliable model. Our results show that the proposed implementation of such a framework achieves accuracy up to 99.11%, thus being particularly suitable for the early detection of COVID-19.Item Embargo Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2 – a free in-house NMR database with integrated LIMS for academic service laboratories(Wiley, 2015-05-21) Kuhn, Stefan; Schloerer, N.E.nmrshiftdb2 supports with its laboratory information management system the integration of an electronic lab administration and management into academic NMR facilities. Also, it offers the setup of a local database, while full access to nmrshiftdb2’s World Wide Web database is granted. This freely available system allows on the one hand the submission of orders for measurement, transfers recorded data automatically or manually, and enables download of spectra via web interface, as well as the integrated access to prediction, search, and assignment tools of the NMR database for lab users. On the other hand, for the staff and lab administration, flow of all orders can be supervised; administrative tools also include user and hardware management, a statistic functionality for accounting purposes, and a QuickCheck function for assignment control, to facilitate quality control of assignments submitted to the (local) database. Laboratory information management system and database are based on a web interface as front end and are therefore independent of the operating system in use.Item Open Access Identifying Parkinson’s Disease Through the Classification of Audio Recording Data(IEEE, 2020-07) Bielby, James; Kuhn, Stefan; Colreavy-Donnelly, S.; Caraffini, Fabio; O'Connor, S.; Anastassi, ZachariasDevelopments in artificial intelligence can be leveraged to support the diagnosis of degenerative disorders, such as epilepsy and Parkinson’s disease. This study aims to provide a software solution, focused initially towards Parkinson’s disease, which can positively impact medical practice surrounding degenerative diagnoses. Through the use of a dataset containing numerical data representing acoustic features extracted from an audio recording of an individual, it is determined if a neural approach can provide an improvement over previous results in the area. This is achieved through the implementation of a feedforward neural network and a layer recurrent neural network. By comparison with the state-of-the-art, a Bayesian approach providing a classification accuracy benchmark of 87.1%, it is found that the implemented neural networks are capable of average accuracy of 96%, highlighting improved accuracy for the classification process. The solution is capable of supporting the diagnosis of Parkinson’s disease in an advisory capacity and is envisioned to inform the process of referral through general practice.Item Open Access An integrated approach for mixture analysis using MS and NMR techniques(Royal Society of Chemistry, 2019-02-08) Kuhn, Stefan; Colreavy-Donnelly, S.; Santana De Souza, J.; Moreira Borges, R.We suggest an improved software pipeline for mixture analysis. The improvements include combining tandem MS and 2D NMR data for a reliable identification of its constituents in an algorithm based on network analysis aiming for a robust and reliable identification routine. An important part of this pipeline is the use of open-data repositories, although it is not totally reliant on them. The NMR identification step emphasizes robustness and is less sensitive towards changes in data acquisition and processing than existing methods. The process starts with a LC-ESI-MSMS based molecular network dereplication using data from the GNPS collaborative collection. We identify closely related structures by propagating structure elucidation through edges in the network. Those identified compounds are added on top of a candidate list for the following NMR filtering method that predicts HSQC and HMBC NMR data. The similarity of the predicted spectra of the set of closely related structures to the measured spectra of the mixture sample is taken as one indication of the most likely candidates for its compounds. The other indication is the match of the spectra to clusters built by a network analysis from the spectra of the mixture. The sensitivity gap between NMR and MS is anticipated and it will be reflected naturally by the eventual identification of fewer compounds, but with a higher confidence level, after the NMR analysis step. The contributions of the paper are an algorithm combining MS and NMR spectroscopy and a robust nJCH network analysis to explore the complementary aspect of both techniques. This delivers good results even if a perfect computational separation of the compounds in the mixture is not possible. All the scripts will be made available online for users to aid studies such as with plants, marine organisms, and microorganism natural product chemistry and metabolomics as those are the driving force for this project.Item Open Access Local reversibility in a Calculus of Covalent Bonding(Elsevier, 2018-01-01) Kuhn, Stefan; Ulidowski, IrekWe introduce a process calculus with a new prefixing operator that allows us to model locally controlled reversibility. Actions can be undone spontaneously, as in other reversible process calculi, or as pairs of concerted actions, where performing a weak action forces undoing of another action. The new operator in its full generality allows us to model out-of-causal order computation, where causes are undone before their effects are undone, which goes beyond what typical reversible calculi can express. However, the core calculus, which uses only the reduced form of the new operator, is well behaved as it satisfied causal consistency. We demonstrate the usefulness of the calculus by modelling the hydration of formaldehyde in water into methanediol, an industrially important reaction, where the creation and breaking of some bonds are examples of locally controlled out-of-causal order computation.Item Open Access Modelling of DNA Mismatch Repair with a reversible process calculus(Elsevier, 2022-06-10) Kuhn, Stefan; Ulidowski, IrekWe have demonstrated in previous work that the Calculus of Covalent Bonding (CCB) can be used to simulate higher-level biochemical processes. This is significant since CCB was originally devised to model lower level organic chemical reactions. In this paper we extend the use of the calculus to model an important gene repair pathway, namely DNA Mismatch Repair (MMR). This complex pathway involves four helper proteins and needs a distinction between the two chains in a DNA strand. In order to achieve this, we extend the calculus by allowing prefixing with collections of bonding sites.Item Open Access Network Intrusion Detection based on Amino Acid Sequence Structure Using Machine Learning(MDPI, 2023-10-17) Ibaisi, Thaer AL; Kuhn, Stefan; Kaiiali, Mustafa; Kazim, MuhammadThe detection of intrusions in computer networks, known as Network-Intrusion-Detection Systems (NIDSs), is a critical field in network security. Researchers have explored various methods to design NIDSs with improved accuracy, prevention measures, and faster anomaly identification. Safeguarding computer systems by quickly identifying external intruders is crucial for seamless business continuity and data protection. Recently, bioinformatics techniques have been adopted in NIDSs’ design, enhancing their capabilities and strengthening network security. Moreover, researchers in computer science have found inspiration in molecular biology’s survival mechanisms. These nature-designed mechanisms offer promising solutions for network security challenges, outperforming traditional techniques and leading to better results. Integrating these nature-inspired approaches not only enriches computer science, but also enhances network security by leveraging the wisdom of nature’s evolution. As a result, we have proposed a novel Amino-acid-encoding mechanism that is bio-inspired, utilizing essential Amino acids to encode network transactions and generate structural properties from Amino acid sequences. This mechanism offers advantages over other methods in the literature by preserving the original data relationships, achieving high accuracy of up to 99%, transforming original features into a fixed number of numerical features using bio-inspired mechanisms, and employing deep machine learning methods to generate a trained model capable of efficiently detecting network attack transactions in real-time.Item Open Access A Neural Network for Interpolating Light-Sources(IEEE, 2020-07-13) Colreavy-Donnelly, S.; Kuhn, Stefan; Caraffini, Fabio; O'Connor, S.; Anastassi, Zacharias; Coupland, SimonThis study combines two novel deterministic methods with a Convolutional Neural Network to develop a machine learning method that is aware of directionality of light in images. The first method detects shadows in terrestrial images by using a sliding-window algorithm that extracts specific hue and value features in an image. The second method interpolates light-sources by utilising a line-algorithm, which detects the direction of light sources in the image. Both of these methods are single-image solutions and employ deterministic methods to calculate the values from the image alone, without the need for illumination-models. They extract real-time geometry from the light source in an image, rather than mapping an illumination-model onto the image, which are the only models used today. Finally, those outputs are used to train a Convolutional Neural Network. This displays greater accuracy than previous methods for shadow detection and can predict light source-direction and thus orientation accurately, which is a considerable innovation for an unsupervised CNN. It is significantly faster than the deterministic methods. We also present a reference dataset for the problem of shadow and light direction detection. © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Item Open Access NMReDATA, a standard to report the NMR assignment and parameters of organic compounds(Wiley, 2018-04-14) Pupier, M; Nuzillard, J-M.; Wist, J.; Schlorer, N.E.; Kuhn, Stefan; Erdelyi, M.; Steinbeck, C.; Williams, A.J.; Butts, C.; Claridge, T.; Mikhova, B.; Robien, W.; Dashti, H.; Eghbalnia, H.R.; Fares, C.; Pavel, K.; Moriaud, F.; Elyashberg, M.; Argyropoulos, D.; Perez, M.; Giraudeaus, P.; Gil, R.R.; Trevorrow, P.; Jeannerat, D.Even though NMR has found countless applications in the field of small molecule characterization, there is no standard file for the NMR data relevant to structure characterization of small molecules. A file format is introduced to associate the NMR parameters extracted from 1D and 2D spectra of organic compounds to the assigned chemical structure. These NMR parameters, which we shall call NMReDATA, include chemical shift values, signal integrals, intensities, multiplicities, scalar coupling constants, lists of 2D correlations, relaxation times and diffusion rates. The file format is an extension of the existing SDF (Structure Data Format), which is compatible with the commonly used MOL format. The association of an NMReDATA file with the raw and spectral data from which it originates constitutes an NMR record. This format is easily readable by humans and computers and provides a simple and efficient way for disseminating results of structural chemistry investigations, automating the verification of published result, and for assisting the constitution of highly needed open-source structural databases.Item Open Access NMReDATA: Tools and applications(Wiley, 2021-03-17) Kuhn, Stefan; Wieske, Lianne H. E.; Trevorrow, Paul; Schober, Daniel; Schloerer, Nils E.; Nuzillard, Jean-Marc; Kessler, Pavel; Junker, Jochen; Herraez, Angel; Fares, Christophe; Eredelyi, Mate; Jeannerat, DamienTheNMReDATAformat has been proposed as away to store, exchange, and to disseminate NMR data and physical and chemical metadata of chemical compounds. In this paper we report on analytical workflows that take advantage of the uniform and standardized NMReDATA format.We also give access to a repository of sample data, which can serve for validating software packages that encode or decode files in NMReDATA format.Item Metadata only Particle Swarm Optimisation in Practice: Multiple Applications in a Digital Microscope System(MDPI, 2022-08-04) Ryan, Louis; Kuhn, Stefan; Colreavy-Donnelly, Simon; Caraffini, FabioWe demonstrate that particle swarm optimisation (PSO) can be used to solve a variety of problems arising during operation of a digital inspection microscope. This is a use case for the feasibility of heuristics in a real-world product. We show solutions to four measurement problems, all based on PSO. This allows for a compact software implementation solving different problems. We have found that PSO can solve a variety of problems with small software footprints and good results in a real-world embedded system. Notably, in the microscope application, this eliminates the need to return the device to the factory for calibration.Item Open Access A Pilot Study For Fragment Identification Using 2D NMR and Deep Learning(Wiley, 2021-09-04) Kuhn, Stefan; Tümer, Eda; Conreavy-Donelly, Simon; Moreira Borges, RicardoThis paper presents a proof of concept of a method to identify substructures in 2D NMR spectra of mixtures using a bespoke image-based Convolutional Neural Network application. This is done using HSQC and HMBC spectra separately and in combination. The application can reliably detect substructures in pure compounds, using a simple network. Results indicate that it can work for mixtures when trained on pure compounds only. HMBC data and the combination of HMBC and HSQC show better results than HSQC alone in this pilot study.Item Open Access Prediction of chemical shift in NMR: a review(Wiley, 2021-11-17) Jonas, Eric; Kuhn, Stefan; Schloerer, Nils E.Calculation of solution-state NMR parameters, including chemical shift values and scalar coupling constants, is often a crucial step for unambiguous structure assignment. Data-driven (sometimes called \textit{empirical}) methods leverage databases of known parameter values to estimate parameters for unknown or novel molecules. This is in contrast to popular \textit{ab initio} techniques which use detailed quantum computational chemistry calculations to arrive at parameter estimates. Data-driven methods have the potential to be considerably faster than ab inito techniques and have been the subject of renewed interest over the past decade with the rise of high-quality databases of NMR parameters and novel machine learning methods. Here we review these methods, their strengths and pitfalls, and the databases they are built on.