Google BigQuery Adds SQL-Native Managed Inference for Hugging Face Models
Briefly

Google BigQuery Adds SQL-Native Managed Inference for Hugging Face Models
"Google recently launched third-party generative AI inference for open models in BigQuery, allowing data teams to deploy and run any model from Hugging Face or Vertex AI Model Garden using plain SQL. With this interface in preview, there is no longer a need for separate ML infrastructure, as it automatically spins up compute resources, manages endpoints, and cleans up everything through BigQuery's SQL interface."
"Yet, with BigQuery's SQL interface, the entire workflow boils down to two SQL statements. Users create a model with one CREATE MODEL statement that specifies a Hugging Face model ID (like sentence-transformers/all-MiniLM-L6-v2) or a Vertex AI Model Garden model name. BigQuery automatically provisions compute resources with default configurations, typically completing deployment in 3-10 minutes depending on the model size. Next, users run inference using AI.GENERATE_TEXT for language models or AI.GENERATE_EMBEDDING for embeddings, querying data straight from BigQuery tables."
BigQuery supports third-party generative AI inference for open models, enabling deployment and inference via plain SQL. Users create a model with a single CREATE MODEL statement that specifies a Hugging Face model ID or a Vertex AI Model Garden model name. BigQuery automatically provisions compute resources with default configurations and typically completes deployment in a few minutes. Inference runs via AI.GENERATE_TEXT for language models or AI.GENERATE_EMBEDDING for embeddings and can query data directly from BigQuery tables. The platform manages endpoint lifecycles with endpoint_idle_ttl and supports manual undeploy with ALTER MODEL. Production options include machine types, replica counts, endpoint idle times, and Compute Engine GPU reservations.
Read at InfoQ
Unable to calculate read time
[
|
]