#model-evaluation

[ follow ]
#machine-learning
Artificial intelligence
fromWIRED
1 month ago

This Tool Probes Frontier AI Models for Lapses in Intelligence

Scale AI's new tool, Scale Evaluation, automates testing of AI models to identify weaknesses and improve performance effectively.
fromHackernoon
1 month ago
Artificial intelligence

When Smaller is Smarter: How Precision-Tuned AI Cracks Protein Mysteries | HackerNoon

QA task performance is evaluated through metrics like F1 score and MAE, ensuring accuracy in modeling.
Model interpretability is analyzed through attention weights, providing insights into its reasoning process.
fromHackernoon
5 months ago
Bootstrapping

How Many Glitch Tokens Hide in Popular LLMs? Revelations from Large-Scale Testing | HackerNoon

The study reveals that simple indicators can effectively detect under-trained tokens in language models, improving token prediction accuracy.
fromMedium
5 days ago
Artificial intelligence

Beyond Benchmarks: Really Evaluating AI

Benchmarks help standardize test sets for AI models, ensuring fair evaluation of performance.
fromHackernoon
11 months ago
Data science

The Key Differences Between Real and Complex-Valued State Space Models | HackerNoon

Real-valued SSMs can outperform complex-valued ones for discrete data modalities.
Artificial intelligence
fromWIRED
1 month ago

This Tool Probes Frontier AI Models for Lapses in Intelligence

Scale AI's new tool, Scale Evaluation, automates testing of AI models to identify weaknesses and improve performance effectively.
fromHackernoon
1 month ago
Artificial intelligence

When Smaller is Smarter: How Precision-Tuned AI Cracks Protein Mysteries | HackerNoon

QA task performance is evaluated through metrics like F1 score and MAE, ensuring accuracy in modeling.
Model interpretability is analyzed through attention weights, providing insights into its reasoning process.
fromHackernoon
5 months ago
Bootstrapping

How Many Glitch Tokens Hide in Popular LLMs? Revelations from Large-Scale Testing | HackerNoon

The study reveals that simple indicators can effectively detect under-trained tokens in language models, improving token prediction accuracy.
fromMedium
5 days ago
Artificial intelligence

Beyond Benchmarks: Really Evaluating AI

Benchmarks help standardize test sets for AI models, ensuring fair evaluation of performance.
fromHackernoon
11 months ago
Data science

The Key Differences Between Real and Complex-Valued State Space Models | HackerNoon

Real-valued SSMs can outperform complex-valued ones for discrete data modalities.
more#machine-learning
Software development
fromInfoQ
2 months ago

OpenAI Introduces Software Engineering Benchmark

SWE-Lancer benchmark assesses AI language models on real-world freelance software engineering tasks.
AI models face significant challenges in software engineering despite advancements.
fromHackernoon
11 months ago
Miscellaneous

Limitations in AI Model Evaluation: Bias, Efficiency, and Human Judgment | HackerNoon

The article presents 12 key aspects for evaluating text-to-image generation models, highlighting the need for continuous research and improvement in assessment metrics.
fromHackernoon
1 year ago
JavaScript

GPT-4 Prompts for Computing Summarization and Dialogue Win Rates | HackerNoon

Direct Preference Optimization (DPO) is introduced as an effective method for preference learning, demonstrated through rigorous experimental validation.
[ Load more ]