
"My name is Mark Kurtz. I was the CTO at a startup called Neural Magic. We were acquired by Red Hat end of last year, and now working under the CTO arm at Red Hat. I'm going to be talking about GenAI at scale. Essentially, what it enables, a quick overview on that, costs, and generally how to reduce the pain. Running through a little bit more of the structure, we'll go through the state of LLMs and real-world deployment trends."
"LLMs, especially when ChatGPT 3.5 came out, really made a big difference in terms of the usability of these really large models, specifically targeted at generating text. Ultimately, they're able to understand and generate natural human-like text with unprecedented accuracy that we haven't seen, really, in models before. They're trained to predict the next token as part of pre-training. Then the really key thing that enabled ChatGPT to blow up is tuning for alignment preferences."
Large language models generate human-like text by predicting next tokens and achieve higher usability after alignment to human preferences. ChatGPT 3.5 highlighted how alignment and ranking of generated answers improve practical outputs. Production deployments demand careful choices around model selection, infrastructure, cost control, and scaling. Optimization techniques include runtime systems such as vLLM, model compression methods to reduce resource use, and fine-tuning tools like InstructLab to tailor behavior. Effective deployment balances performance, expense, and maintainability while applying open-source tooling and targeted tuning to reduce operational pain and improve real-world utility.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]