Data science

[ follow ]
fromEntrepreneur
1 day ago

This Common Invisible Barrier Is Sabotaging Your Data-Driven Decisions

AI was everywhere, but I wasn't focused on product launches. I was looking at how companies think about data itself: how it's shared, governed and ultimately turned into decisions. And across conversations with executives and sessions on security and compliance, a pattern emerged: the technical limitations that once justified locking data down have largely been solved. What remains difficult is human. Alignment, trust and confidence inside organizations are now the true barriers.
Data science
fromEntrepreneur
1 day ago

Most Founders Don't Realize They're Giving Away Their Influence - Here's How to Take It Back

Every search, purchase, loyalty swipe, location ping and scroll feeds systems that now shape pricing, product decisions, hiring and marketing strategies. Most founders understand this in theory, but few grasp the practical consequence: whether they intend to or not, they and their customers are already casting votes with their data. And those votes? They're usually cast passively, on someone else's terms.
Data science
Data science
fromInfoWorld
2 days ago

How to choose the best LLM using R and vitals

Swap model by creating a new chat solver, clone or create tasks with alternative LLMs, run evaluations, and bind results for comparison and analysis.
fromInfoQ
4 days ago

Panel: Modern Data Architectures

I wrote a book for O'Reilly on scaling machine learning with Spark specifically. My second book is coming out on how to improve high-performance Spark, the second edition. Started my career in the machine learning space 15 years ago, moved into data infrastructure, batch processing, and a year and a half ago I moved into the data streaming space, which I think it's what's going to help us pave the future in the data.
Data science
fromTreehouse Blog
4 days ago

Portfolio Projects for Entry-Level Data Roles

Most beginner data portfolios look similar. They include: A few cleaned datasets Some charts or dashboards A notebook with code and commentary Again, nothing here is wrong. But hiring teams don't review portfolios to check whether you can follow instructions. They review them to see whether you can think like a data analyst. When projects feel generic, reviewers are left guessing:
Data science
fromTheregister
1 week ago

ServiceNow buys Pyramid Analytics

"Pyramid adds an analytics and semantic layer that can define metrics in a way that both humans and AI agents can rely on,"
Data science
fromMedium
3 weeks ago

From Graphs to Generative AI: Building Context That Pays-Part 1

Every year, poor communication and siloed data bleed companies of productivity and profit. Research shows U.S. businesses lose up to $1.2 trillion annually to ineffective communication, that's about $12,506 per employee per year. This stems from breakdowns that waste an average of 7.47 hours per employee each week on miscommunications. The damage isn't only interpersonal; it's structural. Disconnected and fragmented data systems mean that employees spend around 12 hours per week just searching for information trapped in those silos.
Data science
fromWIRED
1 week ago

A Wave of Unexplained Bot Traffic Is Sweeping the Web

For a brief moment in October, Alejandro Quintero thought he had made it big in China. The Bogotá-based data analyst owns and manages a website that publishes articles about paranormal activities, like ghosts and aliens. The content is written in "Spanglish," he says, and was never intended for an Asian audience. But last fall, Quintero's site suddenly began receiving a large volume of visits from China and Singapore.
Data science
Data science
fromMedium
3 weeks ago

Taking Back the Math: How Everyday Numbers Can Empower Us in an Algorithmic World

Learning basic mathematics empowers individuals to understand, question, and influence algorithms that shape choices, reducing opaque power imbalances in the algorithm-driven economy.
Data science
fromFlowingData
1 week ago

Network map of Bluesky users

A searchable, interactive map visualizes follow-pattern relationships among 3.4 million Bluesky users, revealing topical and regional community clusters.
Data science
fromNextgov.com
1 week ago

FPDS looks old and clunky but that only masks its power

FPDS.gov retains a 1990s-era, clunky interface but remains a powerful, complex federal procurement data repository that requires skill to navigate.
Data science
fromBerlin Startup Jobs
1 week ago

Job Vacancy: Data Platform Specialist (m/f/d) // Stackgini GmbH | Product Management Jobs | Berlin Startup Jobs

Join Stackgini to monitor and improve a rapidly growing B2B SaaS data platform, owning core dataset and driving data quality, integrations, and stakeholder support.
Data science
fromNature
1 week ago

How to stop the survey-taking AI chatbots that threaten to upend social science

Online survey recruitment faces widespread inauthentic and automated responses, increasingly amplified by AI agents, threatening data validity.
Data science
fromFinanceBuzz
1 week ago

9 Remote Jobs That Pay $50 an Hour or More (Yes, They're Legit)

Nine remote-friendly roles pay $50+/hour, leveraging experienced professionals' skills—examples include mathematician/statistician, data scientist, and administrative services manager.
fromFast Company
2 weeks ago

How AWS-powered Next Gen Stats changed the NFL forever

Next Gen Stats began in 2015, when the National Football League deployed RFID chips in player shoulder pads and even in the football itself, enabling the league to capture location data multiple times per second through sensors installed throughout stadiums.
Data science
fromHarvard Gazette
2 weeks ago

Breaking chess's rating stalemate - Harvard Gazette

This is the conundrum of elite chess. The stronger the players, the greater the odds of the match ending in a draw. "What ended up happening," said Mark Glickman, senior lecturer in the Department of Statistics and longtime chess enthusiast, "is that these top players were not having their ratings change very much, just because the games would be drawn all the time."
Data science
Data science
fromWIRED
2 weeks ago

Sports Betting Is Skyrocketing. Will It Take Over the Olympics?

Integrity agencies monitor live betting data to detect suspicious patterns and coordinate investigations into match-fixing, collusion, and other gambling malfeasance.
fromNews Center
2 weeks ago

New Computational Biology Track Added to PhD Graduate Program - News Center

A new PhD track is being added to the Walter S. and Lucienne Driskill Graduate Program in Life Sciences ( DGP) for the 2026 application cycle, to enhance student learning and build community around computational biology and bioinformatics at Feinberg. The computational biology and bioinformatics (CBB) track in the graduate program will prepare students through coursework and lectures to use modern computational approaches, including machine learning and artificial intelligence, to extract biological insight from large-scale datasets to address complex biological problems.
Data science
Data science
fromInfoQ
2 weeks ago

Beyond the Warehouse: Why BigQuery Alone Won't Solve Your Data Problems

Data warehouses like BigQuery perform well initially but become slow, costly, and disorganized at scale, undermining low-latency operational use and innovation.
Data science
fromInfoWorld
2 weeks ago

Snowflake debuts Cortex Code, an AI agent that understands enterprise data context

Cortex Code enables developers to use natural language to build, optimize, and deploy governed, production-ready data pipelines, analytics, ML workloads, and AI agents.
Data science
fromDevOps.com
2 weeks ago

Why Data Contracts Need Apache Kafka and Apache Flink - DevOps.com

Data contracts formalize schemas, types, and quality constraints through early producer-consumer collaboration to prevent pipeline failures and reduce operational downtime.
fromCornell Chronicle
2 weeks ago

Maps offer neighborhood-level insight into American migration | Cornell Chronicle

That local exodus is documented by Cornell-led research that mapped annual moves between U.S. neighborhoods from 2010 to 2019 in detail 4,600 times greater than standard public data. Called MIGRATE, the new, publicly available dataset revealed that most of those displaced remained within the affected county - moves not captured in county-level public migration data aggregated every five years.
Data science
Data science
fromBusiness Insider
2 weeks ago

Economic data is getting harder to come by, and the alternative won't help everyone

Erosion of BLS economic data undermines public data reliability and will widen information gaps as costly alternative data favors wealthy investors.
Data science
fromNature
2 weeks ago

Science finds its song

Scientists are translating research data into music, fostering interdisciplinary collaboration, revealing patterns, and increasing accessibility through data-driven music events.
Data science
fromBusiness Insider
2 weeks ago

The under-the-radar risk that could sink America's economy

Government-produced data that underpins markets and decision-making is eroding, risking poorer decisions across economies and households.
fromInfoWorld
3 weeks ago

Google expands BigQuery with conversational agent and custom agent tools

Instead of treating each prompt as a one-off request, the new agent remembers what was asked earlier, including datasets, filters, time ranges, and assumptions, and uses that context when answering follow-up questions. This lets users refine an analysis progressively rather than starting from scratch each time," Satapathy added. Satapathy pointed out that this eases the pressure on developers to prebuild dashboards or predefined business logic for every possible question that a data analyst or business user could ask.
Data science
Data science
fromFlowingData
4 weeks ago

Pentagon Pizza dashboard to track activities

A real-time dashboard (PizzINT) monitors pizza shop popularity around the Pentagon to track potential correlations between late-night pizza orders and military activity.
fromTechzine Global
3 weeks ago

Alteryx and Google Cloud bring analytics closer to BigQuery

With the introduction of Live Query for BigQuery and Alteryx One: Google Edition, users no longer need to move data to run workflows. Companies that standardize cloud platforms for analytics and AI often see a gap between where data is stored and how it is prepared and used. Alteryx wants to change that by bringing analytics workflows directly to BigQuery. The promise: from data to insight to action, without compromising on security or scalability.
Data science
Data science
fromComputerworld
3 weeks ago

Great R packages for data import, wrangling, and visualization

A set of R packages (dplyr, purrr, readr/vroom, datapasta, Hmisc) streamline data wrangling, importing, and analysis with faster, standardized, and reproducible tools.
fromTheServerSide.com
3 weeks ago
Data science

Why Java devs should switch to Python or R for data science | TheServerSide

Python and R dominate data science front-end work, offering richer ecosystems and easier data analysis than Java for many statistical and machine learning tasks.
Data science
fromCIO
3 weeks ago

5 perspectives on modern data analytics

Data/business analytics is the top IT investment priority, yet analytics projects often fail due to poor data, vague objectives, and one-size-fits-all solutions.
Data science
fromComputerworld
3 weeks ago

Tableau re-engineers dashboards, adds new analytics tools for business analysts

Tableau 2022.3 adds Data Guide and Table Extension, dynamic dashboards, event auditing, and performance/cost optimization to simplify self-service analytics for business users.
Data science
fromCmxhub
3 weeks ago

Ready to Nerd Out About Community Data? Join Richard Millington's Workshop at CMX Summit 2023

Learn data-driven community management techniques in a hands-on Pre-Summit workshop to increase engagement, prioritize actions, and prove community value.
Data science
fromComputerworld
3 weeks ago

R syntax quirks you'll want to know

R primarily uses <- for assignment; = can sometimes assign, is used for default arguments and some functions; R is case-sensitive; c() combines values into vectors.
Data science
fromBusiness Insider
3 weeks ago

How hedge funds are tapping prediction markets and their data for an edge

Hedge funds primarily use prediction market data rather than trading on platforms like Kalshi and Polymarket.
fromFortune
4 weeks ago

How Walmart is using AI to reroute essential supplies ahead of Winter Storm Fern | Fortune

From a meteorological perspective, the winter storm sweeping across the country this weekend is a supply chain disruption in its own right: A high-pressure system from the north is smashing into a low-pressure system from the south, belting large swaths of the US with heavy snow, sleet, and freezing rain. While the snarl in the upper atmosphere could trickle down to the real supply chain on the ground, some retailers are taking steps to anticipate the impact of the storm and position their products accordingly.
Data science
fromComputerWeekly.com
1 month ago

Interview: Barry Panayi, group chief data officer, Howden | Computer Weekly

Our work is not about producing a list of tables with numbers in rows and columns,
Data science
Data science
fromLondon Business News | Londonlovesbusiness.com
1 month ago

Is Maptive the best mapping software to conduct complex spatial analysis - London Business News | Londonlovesbusiness.com

Maptive delivers cloud-based, no-code spatial analysis and mapping that handles large datasets, automated territories, route planning, and enterprise-grade global mapping infrastructure.
Data science
fromTreehouse Blog
1 month ago

Beginning SQL: 10 Essential Query Patterns

Recognizing common SQL query patterns enables beginners to retrieve, filter, summarize, and reason about data effectively across industries.
frommoz.com
1 month ago

Vibe Coding Your Own SEO Tools Whiteboard Friday

You can always make it better. You can improve things. But it does give you a good taste of what can be done in vibe coding. Those are things that I made maybe in 15 minutes, half an hour. It is quite simple to get those first steps and say, "Oh, this works." Maybe you want to do some improvements, and you refine the code and what you're expecting.
Data science
Data science
fromInfoQ
1 month ago

How Agoda Unified Multiple Data Pipelines Into a Single Source of Truth

A centralized Apache Spark-based financial pipeline (FINUDP) creates a single source of truth and a multi-layered quality framework to ensure accurate, consistent financial metrics.
fromGael Varoquaux
1 month ago

Stepping up as probabl's CSO to supercharge scikit-learn and its ecosystem

I'm thrilled to announce that I'm stepping up as Probabl 's CSO (Chief Science Officer) to supercharge scikit-learn and its ecosystem, pursuing my dreams of tools that help go from data to impact. Scikit-learn, a central tool Scikit-learn is central to data-scientists' work: it is the most used machine-learning package. It has grown over more than a decade, supported by volunteers' time, donations, and grant funding, with a central role of Inria.
Data science
Data science
fromMedium
1 month ago

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

A Spark job slowed roughly 10x after data growth; diagnosing and optimizing Spark execution reduced runtime by about 70% without adding cluster resources.
fromNew Relic
1 month ago

The Power and Cost of Data Cardinality

The more attributes you add to your metrics, the more complex and valuable questions you can answer. Every additional attribute provides a new dimension for analysis and troubleshooting. For instance, adding an infrastructure attribute, such as region can help you determine if a performance issue is isolated to a specific geographic area or is widespread. Similarly, adding business context, like a store location attribute for an e-commerce platform, allows you to understand if an issue is specific to a particular set of stores
Data science
Data science
fromMedium
1 month ago

The Complete Guide to Optimizing Apache Spark Jobs: From Basics to Production-Ready Performance

Optimize Spark jobs by using lazy evaluation awareness, early filter and column pruning, partition pruning, and appropriate join strategies to minimize shuffles and I/O.
Data science
fromwww.bbc.com
1 month ago

Excel: The software that's hard to quit

Excel's ubiquity enables quick analysis but spreadsheet-based workflows and macros create maintenance, security, centralization, and AI integration problems.
Data science
fromComputerworld
1 month ago

Accenture to acquire UK AI startup Faculty

Faculty, renamed from ASI Data Science, built NHS Covid predictive systems and aligns with Accenture's AI-focused Reinvention Services.
#aws
fromBusiness Insider
1 month ago

CEO of AI training startup says humans will still be involved in data creation for decades

"When I first started this job, the main push back I always got was that synthetic data will take over and you just will not need human feedback two to three years from now," said Fitzpatrick, who joined the startup last year. "From first principles, that actually doesn't make very much sense." Synthetic data refers to data that is artificially created.
Data science
Data science
fromMedium
1 month ago

Migrating from Historical Batch Processing to Incremental CDC Using Apache Iceberg (Glue 4...

Use Apache Iceberg Copy-on-Write tables in AWS Glue 4 to migrate from full historical batch reprocessing to incremental CDC, reducing redundant computation, I/O, and costs.
Data science
fromwww.housingwire.com
1 month ago

The spreadsheet trap: Why investor reporting still operates like it's 2005

Investor reporting offices in loan servicing rely on legacy, spreadsheet-based processes due to historical adoption, cultural inertia, and perceived transparency despite significant operational risk.
#charts
#data-quality
fromMedium
2 months ago
Data science

Data Quality on Spark, Part 4: Deequ

Deequ enables scalable, automated data quality checks, profiling, analyzers, and suggestions on Apache Spark for open-source Data Quality assessments.
fromMedium
2 months ago
Data science

Data Quality on Spark, Part 4: Deequ

Deequ provides scalable, Spark-native tools for defining, profiling, and analyzing data quality checks with Scala APIs and an optional Python wrapper (PyDeequ).
fromFlowingData
1 month ago

Best Data Visualization Projects of 2025

Another year. It passed extremely fast and yet, painfully slow. Despite developing tech that some think might take over our day-to-day work, data things got made by people this year. These are my favorites. Inside the Confusing World of Women's Clothing Sizes They approached the topic from several angles with 3-D models, data collection, and sizing charts. Adding to the visualization genre of variable clothes sizes, this piece helped me appreciate the process that is women's shopping. [ See the Project / On FlowingData]
Data science
fromWIRED
1 month ago

Billion-Dollar Data Centers Are Taking Over the World

When Sam Altman said one year ago that OpenAI's Roman Empire is the actual Roman Empire, he wasn't kidding. In the same way that the Romans gradually amassed an empire of land spanning three continents and one-ninth of the Earth's circumference, the CEO and his cohort are now dotting the planet with their own latifundia-not agricultural estates, but AI data centers.
Data science
Data science
fromInfoQ
1 month ago

Beyond Win Rates: How Spotify Quantifies Learning in Product Experiments

Experiments should be judged by decision-ready learning—valid and actionable outcomes that tell teams to ship, abort, or iterate—rather than by win rates alone.
Data science
fromTheregister
2 months ago

AI has pumped hyperscale - but how long can it last?

Hyperscale datacenter operators nearly tripled infrastructure spending and increased quarterly operational capacity by roughly 170% driven by surging demand for AI workloads since late 2022.
Data science
fromInfoQ
2 months ago

Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs

Migrating small-to-medium data workloads from Apache Spark to Polars yields major performance and cost improvements by enabling single-node execution and faster in-memory processing.
Data science
fromMedium
2 months ago

Data Quality on Spark, Part 4: Deequ

Deequ is a Spark-based open-source library for expressing, evaluating, and profiling data quality checks at scale, with analyzers, automatic suggestions, and Scala/Python support.
Data science
fromMedium
2 months ago

Ten Open-Source Business Intelligence Tools for Improved ROI and Productivity

Open-source BI tools deliver flexible, transparent, cost-effective analytics that enable nontechnical users to build dashboards and achieve higher ROI.
Data science
fromThe ODI
4 weeks ago

Data Ethics Professional #10: Advertising & Ethics - Audience Selection & Proxy Data

Digital advertising requires ethical, inclusive audience selection practices to prevent harmful exclusion and prioritize human safety alongside brand safety.
Data science
fromYahoo Creators
2 months ago

Boring remote jobs that pay at least $100,000 a year and employers can't fill fast enough

Numerous data-heavy, low-drama remote careers pay six figures and offer steady, repetitive tasks with strong demand and clear career paths.
fromInfoQ
2 months ago

Breaking Silos: Netflix Introduces Upper Metamodel to Bring Consistency Across Content Engineering

Upper is based on W3C standards such as RDF for conceptual graph representation and SHACL for validation, and it enables the principle of &quot;model once, represent everywhere&quot; across the data ecosystem.Upper organizes concepts through keyed entities, their attributes, and their relationships across domain boundaries. The modeling grammar and validation structure are designed to maintain consistency as definitions evolve. Keyed concepts can be extended monotonically, allowing new attributes or relationships without modifying existing definitions allowing domains to expand over time without breaking existing models.
Data science
Data science
fromZDNET
2 months ago

This company's AI success was built on 5 essential steps - see how they work for you

AI initiatives succeed when grounded in strong data foundations, clear user-focused goals, measurable value, governance, and an iterative approach that builds confidence and delivers outcomes.
Data science
fromTreehouse Blog
2 months ago

Beginning Data Analysis: From Questions to Insights

Learning data analysis enables beginners to turn raw information into meaningful insights, spot trends, and support evidence-based decision-making across many fields.
Data science
fromMedium
3 months ago

From Zero to Scala Expertise: My Step-by-Step Homework Path

Learning Scala requires overcoming unfamiliar functional syntax and errors, but mastery enables high-performance, cleaner code and access to big data frameworks like Apache Spark.
Data science
fromComputerWeekly.com
2 months ago

Interview: Paul Neville, director of digital, data and technology, The Pensions Regulator | Computer Weekly

TPR is shifting from compliance-based to risk-based regulation by building strong IT foundations, improving data, automation, and cross-organisational information flows.
Data science
fromBarchart.com
2 months ago

Meta Platforms Has Lost $73 Billion on Reality labs. Are Its Spending Cuts Enough for META Stock?

Save chart setups as templates, switch the Market flag for country-specific data, use the Interactive Chart menu, and navigate symbols with arrow keys.
Data science
fromInfoWorld
2 months ago

OpenAI to acquire AI training tracker Neptune

Neptune's hosted experiment-tracking SaaS will shut down March 4, 2026; users have months to export data while stability and security fixes continue.
fromRealpython
2 months ago

Introduction to pandas - Real Python

The pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels. DataFrames are widely used in data science, machine learning, scientific computing, and many other data-intensive fields. DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Calc.
Data science
fromTheregister
2 months ago

MongoDB talks up its AI chops by talking down PostgreSQL

Speaking to investment analysts, he said that while MongoDB had all the elements needed to be the right foundational platform for AI workloads, it was too early to say what might be the platform of choice. However, he said MongoDB had been winning work from AI-native companies, citing a customer that recently "switched from PostgreSQL to MongoDB because PostgreSQL could not just scale."
Data science
Data science
fromTheregister
2 months ago

HPE pumps AI cloud lineup with extra Nvidia capabilities

HPE upgrades Private Cloud AI with Nvidia Blackwell GPUs, GPU fractionalization, STIG-hardened NIMs, Juniper networking integration, and Alletra storage for inline data preparation.
Data science
fromInfoQ
2 months ago

Reliable Data Flows and Scalable Platforms: Tackling Key Data Challenges

Uncoordinated data schema changes between application and analytics teams cause silent failures and incorrect analytics; software practices must ensure versioning and compatibility.
fromTechzine Global
2 months ago

Snowflake acquires Select Star for broader data context

Snowflake has signed an agreement to acquire Select Star. This company's technology will expand Snowflake Horizon Catalog by integrating with databases, BI tools, and data pipelines. This will increase the context for AI agents such as Snowflake Intelligence. The full context of data assets is often scattered across upstream and downstream systems. This fragmentation makes it difficult to find the right data and understand the full context. In the AI era, this limited context poses a problem for both humans and agents.
Data science
Data science
fromIT Pro
2 months ago

Chief data officers believe they'll be a 'pivotal' force in in the C-suite within five years

CDOs will become equal or highly influential C-suite leaders as data, AI, budgets, and teams expand.
Data science
fromFortune
2 months ago

A World Bank expert thinks countries should leverage 'small AI'-and avoid competing with the biggest tech giants | Fortune

Smaller Southeast Asian countries can pursue targeted 'small AI' but require expanded data centers, reliable power infrastructure, and regulatory collaboration to scale.
[ Load more ]