成都打造乡村旅游、山地旅游等6个产业集群项目
百度 移动互联网给每个人带来或多或少的好处或者红利,我们首先利用好当下的一切,再去迈更快的一步,比如说3D打印机,未来的全球脑,还有机器人,这些我们都要努力,首先要利用好当下,过好当下,可能给予未来更好,感恩大家。See recent articles
Showing new listings for Tuesday, 5 August 2025
- [1] arXiv:2508.01136 [pdf, html, other]
-
Title: DBAIOps: A Reasoning LLM-Enhanced Database Operation and Maintenance System using Knowledge GraphsComments: DBAIOps supports 25 database systems and has been deployed in 20 real-world scenarios, covering domains like finance, energy, and healthcare. See website at: this http URL; See code at: this http URLSubjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
The operation and maintenance (O&M) of database systems is critical to ensuring system availability and performance, typically requiring expert experience (e.g., identifying metric-to-anomaly relations) for effective diagnosis and recovery. However, existing automatic database O&M methods, including commercial products, cannot effectively utilize expert experience. On the one hand, rule-based methods only support basic O&M tasks (e.g., metric-based anomaly detection), which are mostly numerical equations and cannot effectively incorporate literal O&M experience (e.g., troubleshooting guidance in manuals). On the other hand, LLM-based methods, which retrieve fragmented information (e.g., standard documents + RAG), often generate inaccurate or generic results. To address these limitations, we present DBAIOps, a novel hybrid database O&M system that combines reasoning LLMs with knowledge graphs to achieve DBA-style diagnosis. First, DBAIOps introduces a heterogeneous graph model for representing the diagnosis experience, and proposes a semi-automatic graph construction algorithm to build that graph from thousands of documents. Second, DBAIOps develops a collection of (800+) reusable anomaly models that identify both directly alerted metrics and implicitly correlated experience and metrics. Third, for each anomaly, DBAIOps proposes a two-stage graph evolution mechanism to explore relevant diagnosis paths and identify missing relations automatically. It then leverages a reasoning LLM (e.g., DeepSeek-R1) to infer root causes and generate clear diagnosis reports for both DBAs and common users. Our evaluation over four mainstream database systems (Oracle, MySQL, PostgreSQL, and DM8) demonstrates that DBAIOps outperforms state-of-the-art baselines, 34.85% and 47.22% higher in root cause and human evaluation accuracy, respectively.
- [2] arXiv:2508.01405 [pdf, html, other]
-
Title: Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid SearchSubjects: Databases (cs.DB)
Hybrid search, the integration of lexical and semantic retrieval, has become a cornerstone of modern information retrieval systems, driven by demanding applications like Retrieval-Augmented Generation (RAG). The architectural design space for these systems is vast and complex, yet a systematic, empirical understanding of the trade-offs among their core components--retrieval paradigms, combination schemes, and re-ranking methods--is critically lacking. To address this, and informed by our experience building the Infinity open-source database, we present the first systematic benchmark of advanced hybrid search architectures. Our framework evaluates four retrieval paradigms--Full-Text Search (FTS), Sparse Vector Search (SVS), Dense Vector Search (DVS), and Tensor Search (TenS)--benchmarking their combinations and re-ranking strategies across 11 real-world datasets. Our results reveal three key findings for practitioners and researchers: (1) A "weakest link" phenomenon, where a single underperforming retrieval path can disproportionately degrade overall accuracy, highlighting the need for path-wise quality assessment before fusion. (2) A data-driven map of the performance trade-offs, demonstrating that optimal configurations depend heavily on resource constraints and data characteristics, moving beyond a one-size-fits-all approach. (3) The identification of Tensor-based Re-ranking Fusion (TRF) as a high-efficacy alternative to mainstream fusion methods, offering the semantic power of tensor search at a fraction of the computational and memory cost. Our findings offer concrete guidelines for designing the next generation of adaptive, scalable hybrid search systems while also identifying key directions for future research.
- [3] arXiv:2508.01931 [pdf, html, other]
-
Title: Marlin: Efficient Coordination for Autoscaling Cloud DBMS (Extended Version)Subjects: Databases (cs.DB)
Modern cloud databases are shifting from converged architectures to storage disaggregation, enabling independent scaling and billing of compute and storage. However, cloud databases still rely on external, converged coordination services (e.g., ZooKeeper) for their control planes. These services are effectively lightweight databases optimized for low-volume metadata. As the control plane scales in the cloud, this approach faces similar limitations as converged databases did before storage disaggregation: scalability bottlenecks, low cost efficiency, and increased operational burden.
We propose to disaggregate the cluster coordination to achieve the same benefits that storage disaggregation brought to modern cloud DBMSs. We present Marlin, a cloud-native coordination mechanism that fully embraces storage disaggregation. Marlin eliminates the need for external coordination services by consolidating coordination functionality into the existing cloud-native database it manages. To achieve failover without an external coordination service, Marlin allows cross-node modifications on coordination states. To ensure data consistency, Marlin employs transactions to manage both coordination and application states and introduces MarlinCommit, an optimized commit protocol that ensures strong transactional guarantees even under cross-node modifications. Our evaluations demonstrate that Marlin improves cost efficiency by up to 4.4x and reduces reconfiguration duration by up to 4.9x compared to converged coordination solutions. - [4] arXiv:2508.02280 [pdf, html, other]
-
Title: OnPair: Short Strings Compression for Fast Random AccessSubjects: Databases (cs.DB)
We present OnPair, a dictionary-based compression algorithm designed to meet the needs of in-memory database systems that require both high compression and fast random access. Existing methods either achieve strong compression ratios at significant computational and memory cost (e.g., BPE) or prioritize speed at the expense of compression quality (e.g., FSST). OnPair bridges this gap by employing a cache-friendly dictionary construction technique that incrementally merges frequent adjacent substrings in a single sequential pass over a data sample. This enables fast, memory-efficient training without tracking global pair positions, as required by traditional BPE. We also introduce OnPair16, a variant that limits dictionary entries to 16 bytes, enabling faster parsing via optimized longest prefix matching. Both variants compress strings independently, supporting fine-grained random access without block-level overhead. Experiments on real-world datasets show that OnPair and OnPair16 achieve compression ratios comparable to BPE while significantly improving compression speed and memory usage.
- [5] arXiv:2508.02458 [pdf, html, other]
-
Title: From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement LearningSubjects: Databases (cs.DB)
Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.
- [6] arXiv:2508.02508 [pdf, html, other]
-
Title: M2: An Analytic System with Specialized Storage Engines for Multi-Model WorkloadsSubjects: Databases (cs.DB)
Modern data analytic workloads increasingly require handling multiple data models simultaneously. Two primary approaches meet this need: polyglot persistence and multi-model database systems. Polyglot persistence employs a coordinator program to manage several independent database systems but suffers from high communication costs due to its physically disaggregated architecture. Meanwhile, existing multi-model database systems rely on a single storage engine optimized for a specific data model, resulting in inefficient processing across diverse data models. To address these limitations, we present M2, a multi-model analytic system with integrated storage engines. M2 treats all data models as first-class entities, composing query plans that incorporate operations across models. To effectively combine data from different models, the system introduces a specialized inter-model join algorithm called multi-stage hash join. Our evaluation demonstrates that M2 outperforms existing approaches by up to 188x speedup on multi-model analytics, confirming the effectiveness of our proposed techniques.
- [7] arXiv:2508.02548 [pdf, other]
-
Title: The KG-ER Conceptual Schema LanguageEnrico Franconi, Beno?t Groz, Jan Hidders, Nina Pardal, S?awek Staworko, Jan Van den Bussche, Piotr WieczorekSubjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
We propose KG-ER, a conceptual schema language for knowledge graphs that describes the structure of knowledge graphs independently of their representation (relational databases, property graphs, RDF) while helping to capture the semantics of the information stored in a knowledge graph.
New submissions (showing 7 of 7 entries)
- [8] arXiv:2508.01108 (cross-list from cs.DS) [pdf, html, other]
-
Title: Efficient Direct-Access Ranked RetrievalSubjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Databases (cs.DB)
We study the problem of Direct-Access Ranked Retrieval (DAR) for interactive data tooling, where evolving data exploration practices, combined with large-scale and high-dimensional datasets, create new challenges. DAR concerns the problem of enabling efficient access to arbitrary rank positions according to a ranking function, without enumerating all preceding tuples. To address this need, we formalize the DAR problem and propose a theoretically efficient algorithm based on geometric arrangements, achieving logarithmic query time. However, this method suffers from exponential space complexity in high dimensions. Therefore, we develop a second class of algorithms based on $\varepsilon$-sampling, which consume a linear space. Since exactly locating the tuple at a specific rank is challenging due to its connection to the range counting problem, we introduce a relaxed variant called Conformal Set Ranked Retrieval (CSR), which returns a small subset guaranteed to contain the target tuple. To solve the CSR problem efficiently, we define an intermediate problem, Stripe Range Retrieval (SRR), and design a hierarchical sampling data structure tailored for narrow-range queries. Our method achieves practical scalability in both data size and dimensionality. We prove near-optimal bounds on the efficiency of our algorithms and validate their performance through extensive experiments on real and synthetic datasets, demonstrating scalability to millions of tuples and hundreds of dimensions.
- [9] arXiv:2508.01244 (cross-list from cs.SI) [pdf, html, other]
-
Title: Effective and Efficient Conductance-based Community Search at Billion ScaleSubjects: Social and Information Networks (cs.SI); Databases (cs.DB)
Community search is a widely studied semi-supervised graph clustering problem, retrieving a high-quality connected subgraph containing the user-specified query vertex. However, existing methods primarily focus on cohesiveness within the community but ignore the sparsity outside the community, obtaining sub-par results. Inspired by this, we adopt the well-known conductance metric to measure the quality of a community and introduce a novel problem of conductance-based community search (CCS). CCS aims at finding a subgraph with the smallest conductance among all connected subgraphs that contain the query vertex. We prove that the CCS problem is NP-hard. To efficiently query CCS, a four-stage subgraph-conductance-based community search algorithm, SCCS, is proposed. Specifically, we first greatly reduce the entire graph using local sampling techniques. Then, a three-stage local optimization strategy is employed to continuously refine the community quality. Namely, we first utilize a seeding strategy to obtain an initial community to enhance its internal cohesiveness. Then, we iteratively add qualified vertices in the expansion stage to guarantee the internal cohesiveness and external sparsity of the community. Finally, we gradually remove unqualified vertices during the verification stage. Extensive experiments on real-world datasets containing one billion-scale graph and synthetic datasets show the effectiveness, efficiency, and scalability of our solutions.
- [10] arXiv:2508.01856 (cross-list from cs.DC) [pdf, other]
-
Title: Efficient Byzantine Consensus MechanismBased on Reputation in IoT BlockchainJournal-ref: Hindawi Wireless Communications and Mobile Computing 2021Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Databases (cs.DB); Software Engineering (cs.SE)
Blockchain technology has advanced rapidly in recent years and is now widely used in a variety of fields. Blockchain appears to be one of the best solutions for managing massive heterogeneous devices while achieving advanced data security and data reputation, particularly in the field of large-scale IoT (Internet of Things) networks. Despite the numerous advantages, there are still challenges while deploying IoT applications on blockchain systems due to the limited storage, power, and computing capability of IoT devices, and some of these problems are caused by the consensus algorithm, which plays a significant role in blockchain systems by ensuring overall system reliability and robustness. Nonetheless, most existing consensus algorithms are prone to poor node reliability, low transaction per second (TPS) rates, and scalability issues. Aiming at some critical problems in the existing consensus algorithms, this paper proposes the Efficient Byzantine Reputation-based Consensus (EBRC) mechanism to resolve the issues raised above. In comparison to traditional algorithms, we reinvented ways to evaluate node reliability and robustness and manage active nodes. Our experiments show that the EBRC algorithm has lower consensus delay, higher throughput, improved security, and lower verification costs. It offers new reference ideas for solving the Internet of Things+blockchain+Internet court construction problem.
- [11] arXiv:2508.01871 (cross-list from cs.AI) [pdf, html, other]
-
Title: Multi-turn Natural Language to Graph Query Language TranslationComments: 21 pagesSubjects: Artificial Intelligence (cs.AI); Databases (cs.DB)
In recent years, research on transforming natural language into graph query language (NL2GQL) has been increasing. Most existing methods focus on single-turn transformation from NL to GQL. In practical applications, user interactions with graph databases are typically multi-turn, dynamic, and context-dependent. While single-turn methods can handle straightforward queries, more complex scenarios often require users to iteratively adjust their queries, investigate the connections between entities, or request additional details across multiple dialogue turns. Research focused on single-turn conversion fails to effectively address multi-turn dialogues and complex context dependencies. Additionally, the scarcity of high-quality multi-turn NL2GQL datasets further hinders the progress of this field. To address this challenge, we propose an automated method for constructing multi-turn NL2GQL datasets based on Large Language Models (LLMs) , and apply this method to develop the MTGQL dataset, which is constructed from a financial market graph database and will be publicly released for future research. Moreover, we propose three types of baseline methods to assess the effectiveness of multi-turn NL2GQL translation, thereby laying a solid foundation for future research.
- [12] arXiv:2508.02084 (cross-list from cs.DL) [pdf, other]
-
Title: SSBD Ontology: A Two-Tier Approach for Interoperable Bioimaging MetadataComments: Accepted to the 24th International Semantic Web Conference Resource Track (ISWC 2025)Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Databases (cs.DB)
Advanced bioimaging technologies have enabled the large-scale acquisition of multidimensional data, yet effective metadata management and interoperability remain significant challenges. To address these issues, we propose a new ontology-driven framework for the Systems Science of Biological Dynamics Database (SSBD) that adopts a two-tier architecture. The core layer provides a class-centric structure referencing existing biomedical ontologies, supporting both SSBD:repository -- which focuses on rapid dataset publication with minimal metadata -- and SSBD:database, which is enhanced with biological and imaging-related annotations. Meanwhile, the instance layer represents actual imaging dataset information as Resource Description Framework individuals that are explicitly linked to the core classes. This layered approach aligns flexible instance data with robust ontological classes, enabling seamless integration and advanced semantic queries. By coupling flexibility with rigor, the SSBD Ontology promotes interoperability, data reuse, and the discovery of novel biological mechanisms. Moreover, our solution aligns with the Recommended Metadata for Biological Images guidelines and fosters compatibility. Ultimately, our approach contributes to establishing a Findable, Accessible, Interoperable, and Reusable data ecosystem within the bioimaging community.
- [13] arXiv:2508.02091 (cross-list from cs.LG) [pdf, html, other]
-
Title: CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor SearchComments: Preprint VersionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB)
Approximate nearest-neighbor search (ANNS) algorithms have become increasingly critical for recent AI applications, particularly in retrieval-augmented generation (RAG) and agent-based LLM applications. In this paper, we present CRINN, a new paradigm for ANNS algorithms. CRINN treats ANNS optimization as a reinforcement learning problem where execution speed serves as the reward signal. This approach enables the automatic generation of progressively faster ANNS implementations while maintaining accuracy constraints. Our experimental evaluation demonstrates CRINN's effectiveness across six widely-used NNS benchmark datasets. When compared against state-of-the-art open-source ANNS algorithms, CRINN achieves best performance on three of them (GIST-960-Euclidean, MNIST-784-Euclidean, and GloVe-25-angular), and tied for first place on two of them (SIFT-128-Euclidean and GloVe-25-angular). The implications of CRINN's success reach well beyond ANNS optimization: It validates that LLMs augmented with reinforcement learning can function as an effective tool for automating sophisticated algorithmic optimizations that demand specialized knowledge and labor-intensive manual this http URL can be found at this http URL
- [14] arXiv:2508.02270 (cross-list from cs.LG) [pdf, html, other]
-
Title: Skeleton-Guided Learning for Shortest Path SearchSubjects: Machine Learning (cs.LG); Databases (cs.DB)
Shortest path search is a core operation in graph-based applications, yet existing methods face important limitations. Classical algorithms such as Dijkstra's and A* become inefficient as graphs grow more complex, while index-based techniques often require substantial preprocessing and storage. Recent learning-based approaches typically focus on spatial graphs and rely on context-specific features like geographic coordinates, limiting their general applicability. We propose a versatile learning-based framework for shortest path search on generic graphs, without requiring domain-specific features. At the core of our approach is the construction of a skeleton graph that captures multi-level distance and hop information in a compact form. A Skeleton Graph Neural Network (SGNN) operates on this structure to learn node embeddings and predict distances and hop lengths between node pairs. These predictions support LSearch, a guided search algorithm that uses model-driven pruning to reduce the search space while preserving accuracy. To handle larger graphs, we introduce a hierarchical training strategy that partitions the graph into subgraphs with individually trained SGNNs. This structure enables HLSearch, an extension of our method for efficient path search across graph partitions. Experiments on five diverse real-world graphs demonstrate that our framework achieves strong performance across graph types, offering a flexible and effective solution for learning-based shortest path search.
Cross submissions (showing 7 of 7 entries)
- [15] arXiv:2501.16607 (replaced) [pdf, other]
-
Title: MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree SearchComments: 15 pages, 6 figuresSubjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Text-to-SQL is a fundamental yet challenging task in the NLP area, aiming at translating natural language questions into SQL queries. While recent advances in large language models have greatly improved performance, most existing approaches depend on models with tens of billions of parameters or costly APIs, limiting their applicability in resource-constrained environments. For real world, especially on edge devices, it is crucial for Text-to-SQL to ensure cost-effectiveness. Therefore, enabling the light-weight models for Text-to-SQL is of great practical significance. However, smaller LLMs often struggle with complicated user instruction, redundant schema linking or syntax correctness. To address these challenges, we propose MCTS-SQL, a novel framework that uses Monte Carlo Tree Search to guide SQL generation through multi-step refinement. Since the light-weight models' weak performance of single-shot prediction, we generate better results through several trials with feedback. However, directly applying MCTS-based methods inevitably leads to significant time and computational overhead. Driven by this issue, we propose a token-level prefix-cache mechanism that stores prior information during iterations, effectively improved the execution speed. Experiments results on the SPIDER and BIRD benchmarks demonstrate the effectiveness of our approach. Using a small open-source Qwen2.5-Coder-1.5B, our method outperforms ChatGPT-3.5. When leveraging a more powerful model Gemini 2.5 to explore the performance upper bound, we achieved results competitive with the SOTA. Our findings demonstrate that even small models can be effectively deployed in practical Text-to-SQL systems with the right strategy.
- [16] arXiv:2506.09226 (replaced) [pdf, html, other]
-
Title: Terabyte-Scale Analytics in the Blink of an EyeSubjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and popularity of AI models lead to the deployment of incredibly powerful GPU clusters in commercial data centers. Compared to CPU-only solutions, these clusters deliver impressive improvements in per-node compute, memory bandwidth, and inter-node interconnect performance. In this paper, we study the problem of scaling analytical SQL queries on distributed clusters of GPUs, with the stated goal of establishing an upper bound on the likely performance gains. To do so, we build a prototype designed to maximize performance by leveraging ML/HPC best practices, such as group communication primitives for cross-device data movements. This allows us to conduct thorough performance experimentation to point our community towards a massive performance opportunity of at least 60$\times$. To make these gains more relatable, before you can blink twice, our system can run all 22 queries of TPC-H at a 1TB scale factor!
- [17] arXiv:2312.15959 (replaced) [pdf, html, other]
-
Title: Range (Rényi) Entropy Queries and PartitioningSubjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
Data partitioning that maximizes/minimizes the Shannon entropy, or more generally the Rényi entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be accelerated if we have a data structure to compute the entropy in different subsets of data when the algorithm needs to decide what block to construct. Such a data structure will also be useful for data analysts exploring different subsets of data to identify areas of interest. While it is generally known how to compute the Shannon or the Rényi entropy of a discrete distribution in the offline or streaming setting efficiently, we focus on the query setting where we aim to efficiently derive the entropy among a subset of data that satisfy some linear predicates. We solve this problem in a typical setting when we deal with real data, where data items are geometric points and each requested area is a query (hyper)rectangle. More specifically, we consider a set $P$ of $n$ weighted and colored points in $\mathbb{R}^d$. For the range S-entropy (resp. R-entropy) query problem, the goal is to construct a low space data structure, such that given a query (hyper)rectangle $R$, it computes the Shannon (resp. Rényi) entropy based on the colors and the weights of the points in $P\cap R$, in sublinear time. We show conditional lower bounds proving that we cannot hope for data structures with near-linear space and near-constant query time for both the range S-entropy and R-entropy query problems. Then, we propose exact data structures for $d=1$ and $d>1$ with $o(n^{2d})$ space and $o(n)$ query time for both problems. Finally, we propose near linear space data structures for returning either an additive or a multiplicative approximation of the Shannon (resp. Rényi) entropy in $P\cap R$.
- [18] arXiv:2502.13422 (replaced) [pdf, other]
-
Title: Towards Question Answering over Large Semi-structured TablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB)
Table Question Answering (TableQA) attracts strong interests due to the prevalence of web information presented in the form of semi-structured tables. Despite many efforts, TableQA over large tables remains an open challenge. This is because large tables may overwhelm models that try to comprehend them in full to locate question answers. Recent studies reduce input table size by decomposing tables into smaller, question-relevant sub-tables via generating programs to parse the tables. However, such solutions are subject to program generation and execution errors and are difficult to ensure decomposition quality. To address this issue, we propose TaDRe, a TableQA model that incorporates both pre- and post-table decomposition refinements to ensure table decomposition quality, hence achieving highly accurate TableQA results. To evaluate TaDRe, we construct two new large-table TableQA benchmarks via LLM-driven table expansion and QA pair generation. Extensive experiments on both the new and public benchmarks show that TaDRe achieves state-of-the-art performance on large-table TableQA tasks.
- [19] arXiv:2505.07828 (replaced) [pdf, html, other]
-
Title: AI-Based Crypto Tokens: The Illusion of Decentralized AI?Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Databases (cs.DB)
The convergence of blockchain and artificial intelligence (AI) has led to the emergence of AI-based tokens, which are cryptographic assets designed to power decentralized AI platforms and services. This paper provides a comprehensive review of leading AI-token projects, examining their technical architectures, token utilities, consensus mechanisms, and underlying business models. We explore how these tokens operate across various blockchain ecosystems and assess the extent to which they offer value beyond traditional centralized AI services. Based on this assessment, our analysis identifies several core limitations. From a technical perspective, many platforms depend extensively on off-chain computation, exhibit limited capabilities for on-chain intelligence, and encounter significant scalability challenges. From a business perspective, many models appear to replicate centralized AI service structures, simply adding token-based payment and governance layers without delivering truly novel value. In light of these challenges, we also examine emerging developments that may shape the next phase of decentralized AI systems. These include approaches for on-chain verification of AI outputs, blockchain-enabled federated learning, and more robust incentive frameworks. Collectively, while emerging innovations offer pathways to strengthen decentralized AI ecosystems, significant gaps remain between the promises and the realities of current AI-token implementations. Our findings contribute to a growing body of research at the intersection of AI and blockchain, highlighting the need for critical evaluation and more grounded approaches as the field continues to evolve.