LLaVA-CoT Shows How to Achieve Structured, Autonomous Reasoning in Vision Language ModelsLLava-CoT enhances visual language models' reasoning abilities by adopting a structured, multistage approach, leading to superior performance over larger counterparts.