#model-evaluation

[ follow ]
#ai-safety
fromZDNET
4 hours ago
Artificial intelligence

OpenAI and Anthropic evaluated each others' models - which ones came out on top

fromTechCrunch
4 months ago
Artificial intelligence

OpenAI partner says it had relatively little time to test the company's newest AI models | TechCrunch

fromZDNET
4 hours ago
Artificial intelligence

OpenAI and Anthropic evaluated each others' models - which ones came out on top

fromTechCrunch
4 months ago
Artificial intelligence

OpenAI partner says it had relatively little time to test the company's newest AI models | TechCrunch

#ai-benchmarks
fromHackernoon
1 year ago

Real-World Code Performance: Multi-Token Finetuning on CodeContests | HackerNoon

Models pretrained with different losses achieve different optimal temperatures for pass@k evaluation.
#pretraining-data
fromHackernoon
1 year ago
Artificial intelligence

AI Models Trained on Synthetic Data Still Follow Concept Frequency Trends | HackerNoon

fromHackernoon
1 year ago
Artificial intelligence

'Let It Wag!' and the Limits of Machine Learning on Rare Concepts | HackerNoon

fromHackernoon
1 year ago
Artificial intelligence

AI Models Trained on Synthetic Data Still Follow Concept Frequency Trends | HackerNoon

fromHackernoon
1 year ago
Artificial intelligence

'Let It Wag!' and the Limits of Machine Learning on Rare Concepts | HackerNoon

fromHackernoon
1 year ago

AI Training Data Has a Long-Tail Problem | HackerNoon

Pretraining datasets exhibit a long-tailed distribution of concept frequencies, impacting performance disparities.
Data science
fromHackernoon
2 years ago

Deep Dive into MS MARCO Web Search: Unpacking Dataset Characteristics | HackerNoon

The MS MARCO dataset reveals considerable multilingual disparity and significant data skew, highlighting challenges in model evaluation and training.
Artificial intelligence
fromHackernoon
1 year ago

Evaluating Multimodal Speech Models Across Diverse Audio Tasks | HackerNoon

The study leverages diverse speech datasets to evaluate model performance across various speech tasks and improve generalization capabilities.
fromHackernoon
2 months ago

AI Learns Common Sense from Touch, Not Just Vision | HackerNoon

Model size significantly impacts physical understanding accuracy in task performance for OCTOPI.
Utilizing physical property descriptions enhances the performance of language models in complex understanding tasks.
Data science
fromHackernoon
2 months ago

The Future of Remote Sensing: Few-Shot Learning and Explainable AI | HackerNoon

Few-shot learning techniques for remote sensing enhance model efficiency with limited data, emphasizing the need for explainable AI.
Artificial intelligence
fromhackernoon.com
2 months ago

Limited Gains: Multi-Token Training on Natural Language Choice Tasks

Multi-token prediction enhances model performance in natural language processing benchmarks.
Larger models lead to improved scalability and faster inference times.
Artificial intelligence
fromHackernoon
1 year ago

Behind the Scenes: The Prompts and Tricks That Made Many-Shot ICL Work | HackerNoon

GPT4(V)-Turbo demonstrates variable performance in many-shot ICL, with notable failures to scale effectively under certain conditions.
fromHackernoon
3 months ago

Comparing Chameleon AI to Leading Image-to-Text Models | HackerNoon

In evaluating Chameleon, we focus on tasks requiring text generation conditioned on images, particularly image captioning and visual question-answering, with results grouped by task specificity.
Artificial intelligence
Bootstrapping
fromHackernoon
8 months ago

How Many Glitch Tokens Hide in Popular LLMs? Revelations from Large-Scale Testing | HackerNoon

The study reveals that simple indicators can effectively detect under-trained tokens in language models, improving token prediction accuracy.
[ Load more ]