Description:
In this role, the successful candidate will focus on optimising the performance and efficiency of large-scale AI models, including large language models (LLMs) and other generative AI systems. This is a unique opportunity to shape the future of next-generation AI technologies within a dynamic and collaborative environment.
Key Responsibilities:
- Research and implement advanced optimisation algorithms to enhance the efficiency and performance of large model inference.
- Identify and address bottlenecks in inference processes, developing innovative solutions for improved computational power, reduced latency, and efficient memory utilisation.
- Conduct in-depth research on model quantisation techniques (e.g., INT8, INT4) to achieve optimal inference performance while maintaining model stability and accuracy.
- Design and refine speculative sampling algorithms to improve generation speed and output quality for large AI models.
- Translate cutting-edge research into deployable algorithmic tools and solutions for production use.
- Collaborate with internal engineering teams to seamlessly integrate optimised algorithms into scalable, real-world systems.
- Stay at the forefront of industry trends and advancements in AI and large models, fostering a culture of continuous innovation.
Required Qualifications:
- Master’s degree or higher in Computer Science, Artificial Intelligence, Mathematics, or a related field (PhD strongly preferred).
- A minimum of 2 years of R&D experience in deep learning or related fields, particularly large model optimisation.
- Strong knowledge of mainstream large model architectures (e.g., Transformers, LLMs) and their inference processes.
- Expertise in model quantisation techniques, including Quantisation-Aware Training (QAT) and Post-Training Quantisation (PTQ).
- Deep understanding of speculative sampling methods (e.g., Top-k, Top-p, temperature sampling) and their optimisation strategies.
- Proficiency in deep learning frameworks such as TensorFlow and PyTorch, with experience in inference engines like vLLM and SGlang.
- Advanced Python programming skills; experience with C++ or CUDA is advantageous.
- Solid foundation in algorithms and coding implementation.