Machine learning

Notes on Machine Learning

This collection of links serves as a comprehensive guide to various aspects of machine learning and artificial intelligence, covering everything from foundational concepts to advanced techniques and frameworks.

Machine Learning Overview

The difference between AI and Machine Learning: Clarifying misconceptions and highlighting the distinctions between AI and ML.
- The difference between AI and Machine Learning

Core Concepts

Data Handling and Generation: Techniques for synthetic data creation are crucial for training models where real data is scarce.
- Synthetic Data Generation: reference/site/Synthetic data with SDV and Gaussian copulas, reference/site/Synthetic data with SDV and CTGAN, reference/site/Synthetic data with SDV and CopulaGAN
Model Explainability: Making machine learning models understandable to humans.
- reference/site/Explainability
Handling Drift: Maintaining model performance over time by addressing concept, model, and data drift.
- Introduction to Concept Drift in Machine Learning, Model drift, Data drift

Model Development and Evaluation

Optimisation Techniques: Algorithms to improve model performance, including both gradient-based and gradient-free options.
- Gradient Descent, Stochastic Gradient Descent, Stochastic Gradient descent with momentum, Mini-Batch Gradient Descent, Adagrad, RMSProp, AdaDelta, Adam, reference/site/Gradient-free optimisation
Model Selection and Evaluation: Strategies for selecting the best model and assessing its performance.
- Cross-validation, Kernel functions: Interpretation and applications, Kernel functions
Error and Performance Metrics: Metrics to evaluate model errors and performance.
- Error metrics: Distance metrics
- Performance: Language performance metrics, Model performance metrics

Machine Learning Methods and Techniques

Time-Series Analysis: Analysing and predicting data that changes over time, and detecting anomalies in streaming data.
- reference/site/Time-series analysis, Streaming anomaly detection
Clustering: Grouping data points based on their similarities.
- reference/site/K-means clustering
Fairness: Ensuring models do not perpetuate biases.
- Fairness in Machine Learning, reference/site/Model fairness
Transformations: Preprocessing steps for effective algorithm performance.
- reference/site/Feature scaling
Recurrent Neural Networks (RNN): Recognising patterns in sequences of data with LSTM networks.
- LSTM

Machine Learning Applications and Frameworks

Supervised and Unsupervised Learning: Differentiating these foundational approaches with examples.
- Supervised methods: Random Forest, Regression: reference/site/Gaussian Process Regression
- Unsupervised methods: Self-organising maps

LLMs (Large Language Models)

Overview of LLMs: Understanding the architecture and capabilities of large language models, including their applications in natural language processing and beyond.
- Introduction to LLMs, Applications of LLMs
- Retrieval-Augmented Generation (RAG): Enhancing LLMs with external knowledge retrieval
LLM Evaluation and Benchmarking: Methods for assessing the performance of LLMs, including metrics and benchmarks relevant to their capabilities.
- Tying into existing evaluation metrics: Language performance metrics, Model performance metrics
- LLM Evaluation: LLM evaluation
LLM Security and Safety: Exploring vulnerabilities, attack vectors, and defence mechanisms for large language models, including techniques to prevent misuse and ensure responsible deployment.
- Adversarial Attacks: Understanding how malicious inputs can manipulate model behaviour through techniques like prompt injection and jail-breaking.
  - Universal and Transferable Adversarial Attacks, Prompt injection attacks
- Detection and Prevention: Methods for identifying and mitigating security threats to LLMs in production environments.
  - Task drift detection, LLM security monitoring
- Alignment and Safety: Approaches to ensure LLMs behave according to human values and resist harmful outputs.
  - RLHF for alignment, Constitutional AI, LLM safety benchmarks
Machine Learning Frameworks and Tools: Tools and frameworks for ML development, from model building to deployment.
- KServe, Cookiecutter Data Science, reference/site/Scikit-learn, Model serving

Fundamental Theory and Statistics

Statistics in Machine Learning: Statistical concepts and methods crucial for ML algorithms and evaluation.
- Statistics, reference/site/Streaming statistics, Thompson sampling, Statistical dependence