成都打造乡村旅游、山地旅游等6个产业集群项目

New submissions
Cross-lists
Replacements

百度移动互联网给每个人带来或多或少的好处或者红利，我们首先利用好当下的一切，再去迈更快的一步，比如说3D打印机，未来的全球脑，还有机器人，这些我们都要努力，首先要利用好当下，过好当下，可能给予未来更好，感恩大家。

See recent articles

Showing new listings for Tuesday, 5 August 2025

Total of 19 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2508.01136 [pdf, html, other]: Title: DBAIOps: A Reasoning LLM-Enhanced Database Operation and Maintenance System using Knowledge Graphs

Wei Zhou, Peng Sun, Xuanhe Zhou, Qianglei Zang, Ji Xu, Tieying Zhang, Guoliang Li, Fan Wu

Comments: DBAIOps supports 25 database systems and has been deployed in 20 real-world scenarios, covering domains like finance, energy, and healthcare. See website at: this http URL; See code at: this http URL

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

The operation and maintenance (O&M) of database systems is critical to ensuring system availability and performance, typically requiring expert experience (e.g., identifying metric-to-anomaly relations) for effective diagnosis and recovery. However, existing automatic database O&M methods, including commercial products, cannot effectively utilize expert experience. On the one hand, rule-based methods only support basic O&M tasks (e.g., metric-based anomaly detection), which are mostly numerical equations and cannot effectively incorporate literal O&M experience (e.g., troubleshooting guidance in manuals). On the other hand, LLM-based methods, which retrieve fragmented information (e.g., standard documents + RAG), often generate inaccurate or generic results. To address these limitations, we present DBAIOps, a novel hybrid database O&M system that combines reasoning LLMs with knowledge graphs to achieve DBA-style diagnosis. First, DBAIOps introduces a heterogeneous graph model for representing the diagnosis experience, and proposes a semi-automatic graph construction algorithm to build that graph from thousands of documents. Second, DBAIOps develops a collection of (800+) reusable anomaly models that identify both directly alerted metrics and implicitly correlated experience and metrics. Third, for each anomaly, DBAIOps proposes a two-stage graph evolution mechanism to explore relevant diagnosis paths and identify missing relations automatically. It then leverages a reasoning LLM (e.g., DeepSeek-R1) to infer root causes and generate clear diagnosis reports for both DBAs and common users. Our evaluation over four mainstream database systems (Oracle, MySQL, PostgreSQL, and DM8) demonstrates that DBAIOps outperforms state-of-the-art baselines, 34.85% and 47.22% higher in root cause and human evaluation accuracy, respectively.
[2] arXiv:2508.01405 [pdf, html, other]: Title: Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search

Mengzhao Wang, Boyu Tan, Yunjun Gao, Hai Jin, Yingfeng Zhang, Xiangyu Ke, Xiaoliang Xu, Yifan Zhu

Subjects: Databases (cs.DB)

Hybrid search, the integration of lexical and semantic retrieval, has become a cornerstone of modern information retrieval systems, driven by demanding applications like Retrieval-Augmented Generation (RAG). The architectural design space for these systems is vast and complex, yet a systematic, empirical understanding of the trade-offs among their core components--retrieval paradigms, combination schemes, and re-ranking methods--is critically lacking. To address this, and informed by our experience building the Infinity open-source database, we present the first systematic benchmark of advanced hybrid search architectures. Our framework evaluates four retrieval paradigms--Full-Text Search (FTS), Sparse Vector Search (SVS), Dense Vector Search (DVS), and Tensor Search (TenS)--benchmarking their combinations and re-ranking strategies across 11 real-world datasets. Our results reveal three key findings for practitioners and researchers: (1) A "weakest link" phenomenon, where a single underperforming retrieval path can disproportionately degrade overall accuracy, highlighting the need for path-wise quality assessment before fusion. (2) A data-driven map of the performance trade-offs, demonstrating that optimal configurations depend heavily on resource constraints and data characteristics, moving beyond a one-size-fits-all approach. (3) The identification of Tensor-based Re-ranking Fusion (TRF) as a high-efficacy alternative to mainstream fusion methods, offering the semantic power of tensor search at a fraction of the computational and memory cost. Our findings offer concrete guidelines for designing the next generation of adaptive, scalable hybrid search systems while also identifying key directions for future research.
[3] arXiv:2508.01931 [pdf, html, other]: Title: Marlin: Efficient Coordination for Autoscaling Cloud DBMS (Extended Version)

Wenjie Hu, Guanzhou Hu, Mahesh Balakrishnan, Xiangyao Yu

Subjects: Databases (cs.DB)

Modern cloud databases are shifting from converged architectures to storage disaggregation, enabling independent scaling and billing of compute and storage. However, cloud databases still rely on external, converged coordination services (e.g., ZooKeeper) for their control planes. These services are effectively lightweight databases optimized for low-volume metadata. As the control plane scales in the cloud, this approach faces similar limitations as converged databases did before storage disaggregation: scalability bottlenecks, low cost efficiency, and increased operational burden.
We propose to disaggregate the cluster coordination to achieve the same benefits that storage disaggregation brought to modern cloud DBMSs. We present Marlin, a cloud-native coordination mechanism that fully embraces storage disaggregation. Marlin eliminates the need for external coordination services by consolidating coordination functionality into the existing cloud-native database it manages. To achieve failover without an external coordination service, Marlin allows cross-node modifications on coordination states. To ensure data consistency, Marlin employs transactions to manage both coordination and application states and introduces MarlinCommit, an optimized commit protocol that ensures strong transactional guarantees even under cross-node modifications. Our evaluations demonstrate that Marlin improves cost efficiency by up to 4.4x and reduces reconfiguration duration by up to 4.9x compared to converged coordination solutions.
[4] arXiv:2508.02280 [pdf, html, other]: Title: OnPair: Short Strings Compression for Fast Random Access

Francesco Gargiulo, Rossano Venturini

Subjects: Databases (cs.DB)

We present OnPair, a dictionary-based compression algorithm designed to meet the needs of in-memory database systems that require both high compression and fast random access. Existing methods either achieve strong compression ratios at significant computational and memory cost (e.g., BPE) or prioritize speed at the expense of compression quality (e.g., FSST). OnPair bridges this gap by employing a cache-friendly dictionary construction technique that incrementally merges frequent adjacent substrings in a single sequential pass over a data sample. This enables fast, memory-efficient training without tracking global pair positions, as required by traditional BPE. We also introduce OnPair16, a variant that limits dictionary entries to 16 bytes, enabling faster parsing via optimized longest prefix matching. Both variants compress strings independently, supporting fine-grained random access without block-level overhead. Experiments on real-world datasets show that OnPair and OnPair16 achieve compression ratios comparable to BPE while significantly improving compression speed and memory usage.
[5] arXiv:2508.02458 [pdf, html, other]: Title: From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement Learning

Yichao Feng

Subjects: Databases (cs.DB)

Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.
[6] arXiv:2508.02508 [pdf, html, other]: Title: M2: An Analytic System with Specialized Storage Engines for Multi-Model Workloads

Kyoseung Koo, Bogyeong Kim, Bongki Moon

Subjects: Databases (cs.DB)

Modern data analytic workloads increasingly require handling multiple data models simultaneously. Two primary approaches meet this need: polyglot persistence and multi-model database systems. Polyglot persistence employs a coordinator program to manage several independent database systems but suffers from high communication costs due to its physically disaggregated architecture. Meanwhile, existing multi-model database systems rely on a single storage engine optimized for a specific data model, resulting in inefficient processing across diverse data models. To address these limitations, we present M2, a multi-model analytic system with integrated storage engines. M2 treats all data models as first-class entities, composing query plans that incorporate operations across models. To effectively combine data from different models, the system introduces a specialized inter-model join algorithm called multi-stage hash join. Our evaluation demonstrates that M2 outperforms existing approaches by up to 188x speedup on multi-model analytics, confirming the effectiveness of our proposed techniques.
[7] arXiv:2508.02548 [pdf, other]: Title: The KG-ER Conceptual Schema Language

Enrico Franconi, Beno?t Groz, Jan Hidders, Nina Pardal, S?awek Staworko, Jan Van den Bussche, Piotr Wieczorek

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

We propose KG-ER, a conceptual schema language for knowledge graphs that describes the structure of knowledge graphs independently of their representation (relational databases, property graphs, RDF) while helping to capture the semantics of the information stored in a knowledge graph.

[8] arXiv:2508.01108 (cross-list from cs.DS) [pdf, html, other]: Title: Efficient Direct-Access Ranked Retrieval

Mohsen Dehghankar, Raghav Mittal, Suraj Shetiya, Abolfazl Asudeh, Gautam Das

Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Databases (cs.DB)

We study the problem of Direct-Access Ranked Retrieval (DAR) for interactive data tooling, where evolving data exploration practices, combined with large-scale and high-dimensional datasets, create new challenges. DAR concerns the problem of enabling efficient access to arbitrary rank positions according to a ranking function, without enumerating all preceding tuples. To address this need, we formalize the DAR problem and propose a theoretically efficient algorithm based on geometric arrangements, achieving logarithmic query time. However, this method suffers from exponential space complexity in high dimensions. Therefore, we develop a second class of algorithms based on $\varepsilon$-sampling, which consume a linear space. Since exactly locating the tuple at a specific rank is challenging due to its connection to the range counting problem, we introduce a relaxed variant called Conformal Set Ranked Retrieval (CSR), which returns a small subset guaranteed to contain the target tuple. To solve the CSR problem efficiently, we define an intermediate problem, Stripe Range Retrieval (SRR), and design a hierarchical sampling data structure tailored for narrow-range queries. Our method achieves practical scalability in both data size and dimensionality. We prove near-optimal bounds on the efficiency of our algorithms and validate their performance through extensive experiments on real and synthetic datasets, demonstrating scalability to millions of tuples and hundreds of dimensions.
[9] arXiv:2508.01244 (cross-list from cs.SI) [pdf, html, other]: Title: Effective and Efficient Conductance-based Community Search at Billion Scale

Longlong Lin, Yue He, Wei Chen, Pingpeng Yuan, Rong-Hua Li, Tao Jia

Subjects: Social and Information Networks (cs.SI); Databases (cs.DB)

Community search is a widely studied semi-supervised graph clustering problem, retrieving a high-quality connected subgraph containing the user-specified query vertex. However, existing methods primarily focus on cohesiveness within the community but ignore the sparsity outside the community, obtaining sub-par results. Inspired by this, we adopt the well-known conductance metric to measure the quality of a community and introduce a novel problem of conductance-based community search (CCS). CCS aims at finding a subgraph with the smallest conductance among all connected subgraphs that contain the query vertex. We prove that the CCS problem is NP-hard. To efficiently query CCS, a four-stage subgraph-conductance-based community search algorithm, SCCS, is proposed. Specifically, we first greatly reduce the entire graph using local sampling techniques. Then, a three-stage local optimization strategy is employed to continuously refine the community quality. Namely, we first utilize a seeding strategy to obtain an initial community to enhance its internal cohesiveness. Then, we iteratively add qualified vertices in the expansion stage to guarantee the internal cohesiveness and external sparsity of the community. Finally, we gradually remove unqualified vertices during the verification stage. Extensive experiments on real-world datasets containing one billion-scale graph and synthetic datasets show the effectiveness, efficiency, and scalability of our solutions.
[10] arXiv:2508.01856 (cross-list from cs.DC) [pdf, other]: Title: Efficient Byzantine Consensus MechanismBased on Reputation in IoT Blockchain

Xu Yuan, Fang Luo, Muhammad Zeeshan Haider, Zhikui Chen, Yucheng Li

Journal-ref: Hindawi Wireless Communications and Mobile Computing 2021

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Databases (cs.DB); Software Engineering (cs.SE)

Blockchain technology has advanced rapidly in recent years and is now widely used in a variety of fields. Blockchain appears to be one of the best solutions for managing massive heterogeneous devices while achieving advanced data security and data reputation, particularly in the field of large-scale IoT (Internet of Things) networks. Despite the numerous advantages, there are still challenges while deploying IoT applications on blockchain systems due to the limited storage, power, and computing capability of IoT devices, and some of these problems are caused by the consensus algorithm, which plays a significant role in blockchain systems by ensuring overall system reliability and robustness. Nonetheless, most existing consensus algorithms are prone to poor node reliability, low transaction per second (TPS) rates, and scalability issues. Aiming at some critical problems in the existing consensus algorithms, this paper proposes the Efficient Byzantine Reputation-based Consensus (EBRC) mechanism to resolve the issues raised above. In comparison to traditional algorithms, we reinvented ways to evaluate node reliability and robustness and manage active nodes. Our experiments show that the EBRC algorithm has lower consensus delay, higher throughput, improved security, and lower verification costs. It offers new reference ideas for solving the Internet of Things+blockchain+Internet court construction problem.
[11] arXiv:2508.01871 (cross-list from cs.AI) [pdf, html, other]: Title: Multi-turn Natural Language to Graph Query Language Translation

Yuanyuan Liang, Lei Pan, Tingyu Xie, Yunshi Lan, Weining Qian

Comments: 21 pages

Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

In recent years, research on transforming natural language into graph query language (NL2GQL) has been increasing. Most existing methods focus on single-turn transformation from NL to GQL. In practical applications, user interactions with graph databases are typically multi-turn, dynamic, and context-dependent. While single-turn methods can handle straightforward queries, more complex scenarios often require users to iteratively adjust their queries, investigate the connections between entities, or request additional details across multiple dialogue turns. Research focused on single-turn conversion fails to effectively address multi-turn dialogues and complex context dependencies. Additionally, the scarcity of high-quality multi-turn NL2GQL datasets further hinders the progress of this field. To address this challenge, we propose an automated method for constructing multi-turn NL2GQL datasets based on Large Language Models (LLMs) , and apply this method to develop the MTGQL dataset, which is constructed from a financial market graph database and will be publicly released for future research. Moreover, we propose three types of baseline methods to assess the effectiveness of multi-turn NL2GQL translation, thereby laying a solid foundation for future research.
[12] arXiv:2508.02084 (cross-list from cs.DL) [pdf, other]: Title: SSBD Ontology: A Two-Tier Approach for Interoperable Bioimaging Metadata

Yuki Yamagata, Koji Kyoda, Hiroya Itoga, Emi Fujisawa, Shuichi Onami

Comments: Accepted to the 24th International Semantic Web Conference Resource Track (ISWC 2025)

Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Databases (cs.DB)

Advanced bioimaging technologies have enabled the large-scale acquisition of multidimensional data, yet effective metadata management and interoperability remain significant challenges. To address these issues, we propose a new ontology-driven framework for the Systems Science of Biological Dynamics Database (SSBD) that adopts a two-tier architecture. The core layer provides a class-centric structure referencing existing biomedical ontologies, supporting both SSBD:repository -- which focuses on rapid dataset publication with minimal metadata -- and SSBD:database, which is enhanced with biological and imaging-related annotations. Meanwhile, the instance layer represents actual imaging dataset information as Resource Description Framework individuals that are explicitly linked to the core classes. This layered approach aligns flexible instance data with robust ontological classes, enabling seamless integration and advanced semantic queries. By coupling flexibility with rigor, the SSBD Ontology promotes interoperability, data reuse, and the discovery of novel biological mechanisms. Moreover, our solution aligns with the Recommended Metadata for Biological Images guidelines and fosters compatibility. Ultimately, our approach contributes to establishing a Findable, Accessible, Interoperable, and Reusable data ecosystem within the bioimaging community.
[13] arXiv:2508.02091 (cross-list from cs.LG) [pdf, html, other]: Title: CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Xiaoya Li, Xiaofei Sun, Albert Wang, Chris Shum, Jiwei Li

Comments: Preprint Version

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB)

Approximate nearest-neighbor search (ANNS) algorithms have become increasingly critical for recent AI applications, particularly in retrieval-augmented generation (RAG) and agent-based LLM applications. In this paper, we present CRINN, a new paradigm for ANNS algorithms. CRINN treats ANNS optimization as a reinforcement learning problem where execution speed serves as the reward signal. This approach enables the automatic generation of progressively faster ANNS implementations while maintaining accuracy constraints. Our experimental evaluation demonstrates CRINN's effectiveness across six widely-used NNS benchmark datasets. When compared against state-of-the-art open-source ANNS algorithms, CRINN achieves best performance on three of them (GIST-960-Euclidean, MNIST-784-Euclidean, and GloVe-25-angular), and tied for first place on two of them (SIFT-128-Euclidean and GloVe-25-angular). The implications of CRINN's success reach well beyond ANNS optimization: It validates that LLMs augmented with reinforcement learning can function as an effective tool for automating sophisticated algorithmic optimizations that demand specialized knowledge and labor-intensive manual this http URL can be found at this http URL
[14] arXiv:2508.02270 (cross-list from cs.LG) [pdf, html, other]: Title: Skeleton-Guided Learning for Shortest Path Search

Tiantian Liu, Xiao Li, Huan Li, Hua Lu, Christian S. Jensen, Jianliang Xu

Subjects: Machine Learning (cs.LG); Databases (cs.DB)

Shortest path search is a core operation in graph-based applications, yet existing methods face important limitations. Classical algorithms such as Dijkstra's and A* become inefficient as graphs grow more complex, while index-based techniques often require substantial preprocessing and storage. Recent learning-based approaches typically focus on spatial graphs and rely on context-specific features like geographic coordinates, limiting their general applicability. We propose a versatile learning-based framework for shortest path search on generic graphs, without requiring domain-specific features. At the core of our approach is the construction of a skeleton graph that captures multi-level distance and hop information in a compact form. A Skeleton Graph Neural Network (SGNN) operates on this structure to learn node embeddings and predict distances and hop lengths between node pairs. These predictions support LSearch, a guided search algorithm that uses model-driven pruning to reduce the search space while preserving accuracy. To handle larger graphs, we introduce a hierarchical training strategy that partitions the graph into subgraphs with individually trained SGNNs. This structure enables HLSearch, an extension of our method for efficient path search across graph partitions. Experiments on five diverse real-world graphs demonstrate that our framework achieves strong performance across graph types, offering a flexible and effective solution for learning-based shortest path search.

[15] arXiv:2501.16607 (replaced) [pdf, other]: Title: MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search

Shuozhi Yuan, Limin Chen, Miaomiao Yuan, Jin Zhao

Comments: 15 pages, 6 figures

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)

Text-to-SQL is a fundamental yet challenging task in the NLP area, aiming at translating natural language questions into SQL queries. While recent advances in large language models have greatly improved performance, most existing approaches depend on models with tens of billions of parameters or costly APIs, limiting their applicability in resource-constrained environments. For real world, especially on edge devices, it is crucial for Text-to-SQL to ensure cost-effectiveness. Therefore, enabling the light-weight models for Text-to-SQL is of great practical significance. However, smaller LLMs often struggle with complicated user instruction, redundant schema linking or syntax correctness. To address these challenges, we propose MCTS-SQL, a novel framework that uses Monte Carlo Tree Search to guide SQL generation through multi-step refinement. Since the light-weight models' weak performance of single-shot prediction, we generate better results through several trials with feedback. However, directly applying MCTS-based methods inevitably leads to significant time and computational overhead. Driven by this issue, we propose a token-level prefix-cache mechanism that stores prior information during iterations, effectively improved the execution speed. Experiments results on the SPIDER and BIRD benchmarks demonstrate the effectiveness of our approach. Using a small open-source Qwen2.5-Coder-1.5B, our method outperforms ChatGPT-3.5. When leveraging a more powerful model Gemini 2.5 to explore the performance upper bound, we achieved results competitive with the SOTA. Our findings demonstrate that even small models can be effectively deployed in practical Text-to-SQL systems with the right strategy.
[16] arXiv:2506.09226 (replaced) [pdf, html, other]: Title: Terabyte-Scale Analytics in the Blink of an Eye

Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen

Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and popularity of AI models lead to the deployment of incredibly powerful GPU clusters in commercial data centers. Compared to CPU-only solutions, these clusters deliver impressive improvements in per-node compute, memory bandwidth, and inter-node interconnect performance. In this paper, we study the problem of scaling analytical SQL queries on distributed clusters of GPUs, with the stated goal of establishing an upper bound on the likely performance gains. To do so, we build a prototype designed to maximize performance by leveraging ML/HPC best practices, such as group communication primitives for cross-device data movements. This allows us to conduct thorough performance experimentation to point our community towards a massive performance opportunity of at least 60$\times$. To make these gains more relatable, before you can blink twice, our system can run all 22 queries of TPC-H at a 1TB scale factor!
[17] arXiv:2312.15959 (replaced) [pdf, html, other]: Title: Range (Rényi) Entropy Queries and Partitioning

Aryan Esmailpour, Sanjay Krishnan, Stavros Sintos

Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)

Data partitioning that maximizes/minimizes the Shannon entropy, or more generally the Rényi entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be accelerated if we have a data structure to compute the entropy in different subsets of data when the algorithm needs to decide what block to construct. Such a data structure will also be useful for data analysts exploring different subsets of data to identify areas of interest. While it is generally known how to compute the Shannon or the Rényi entropy of a discrete distribution in the offline or streaming setting efficiently, we focus on the query setting where we aim to efficiently derive the entropy among a subset of data that satisfy some linear predicates. We solve this problem in a typical setting when we deal with real data, where data items are geometric points and each requested area is a query (hyper)rectangle. More specifically, we consider a set $P$ of $n$ weighted and colored points in $\mathbb{R}^d$. For the range S-entropy (resp. R-entropy) query problem, the goal is to construct a low space data structure, such that given a query (hyper)rectangle $R$, it computes the Shannon (resp. Rényi) entropy based on the colors and the weights of the points in $P\cap R$, in sublinear time. We show conditional lower bounds proving that we cannot hope for data structures with near-linear space and near-constant query time for both the range S-entropy and R-entropy query problems. Then, we propose exact data structures for $d=1$ and $d>1$ with $o(n^{2d})$ space and $o(n)$ query time for both problems. Finally, we propose near linear space data structures for returning either an additive or a multiplicative approximation of the Shannon (resp. Rényi) entropy in $P\cap R$.
[18] arXiv:2502.13422 (replaced) [pdf, other]: Title: Towards Question Answering over Large Semi-structured Tables

Yuxiang Wang, Junhao Gan, Jianzhong Qi

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB)

Table Question Answering (TableQA) attracts strong interests due to the prevalence of web information presented in the form of semi-structured tables. Despite many efforts, TableQA over large tables remains an open challenge. This is because large tables may overwhelm models that try to comprehend them in full to locate question answers. Recent studies reduce input table size by decomposing tables into smaller, question-relevant sub-tables via generating programs to parse the tables. However, such solutions are subject to program generation and execution errors and are difficult to ensure decomposition quality. To address this issue, we propose TaDRe, a TableQA model that incorporates both pre- and post-table decomposition refinements to ensure table decomposition quality, hence achieving highly accurate TableQA results. To evaluate TaDRe, we construct two new large-table TableQA benchmarks via LLM-driven table expansion and QA pair generation. Extensive experiments on both the new and public benchmarks show that TaDRe achieves state-of-the-art performance on large-table TableQA tasks.
[19] arXiv:2505.07828 (replaced) [pdf, html, other]: Title: AI-Based Crypto Tokens: The Illusion of Decentralized AI?

Rischan Mafrur

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Databases (cs.DB)

The convergence of blockchain and artificial intelligence (AI) has led to the emergence of AI-based tokens, which are cryptographic assets designed to power decentralized AI platforms and services. This paper provides a comprehensive review of leading AI-token projects, examining their technical architectures, token utilities, consensus mechanisms, and underlying business models. We explore how these tokens operate across various blockchain ecosystems and assess the extent to which they offer value beyond traditional centralized AI services. Based on this assessment, our analysis identifies several core limitations. From a technical perspective, many platforms depend extensively on off-chain computation, exhibit limited capabilities for on-chain intelligence, and encounter significant scalability challenges. From a business perspective, many models appear to replicate centralized AI service structures, simply adding token-based payment and governance layers without delivering truly novel value. In light of these challenges, we also examine emerging developments that may shape the next phase of decentralized AI systems. These include approaches for on-chain verification of AI outputs, blockchain-enabled federated learning, and more robust incentive frameworks. Collectively, while emerging innovations offer pathways to strengthen decentralized AI ecosystems, significant gaps remain between the promises and the realities of current AI-token implementations. Our findings contribute to a growing body of research at the intersection of AI and blockchain, highlighting the need for critical evaluation and more grounded approaches as the field continues to evolve.

Total of 19 entries

Showing up to 2000 entries per page: fewer | more | all

为什么阴道会排气	黄芪有什么作用	慢性阑尾炎吃什么消炎药	热闹的什么	股癣用什么药膏效果最好
4月3号什么星座	三文鱼长什么样	egfr是什么意思	肝穿刺检查是什么意思	嗨体水光针有什么功效
河粉是什么	几又念什么	征求是什么意思	脑白质是什么病	750金是什么金
89岁属什么生肖	六是什么意思	游离三碘甲状腺原氨酸是什么意思	表面抗体阳性什么意思	鲔鱼是什么鱼

uhd是什么意思hcv8jop4ns2r.cn	猫癣传染人什么症状hkuteam.com	红鸾星动是什么意思hcv9jop6ns9r.cn	丛林法则是什么意思hcv9jop5ns6r.cn	什么降血糖hcv8jop1ns4r.cn
人彘是什么jiuxinfghf.com	猫吐是什么原因hcv8jop1ns5r.cn	牛奶可以做什么美食hcv8jop7ns0r.cn	寡欲是什么意思hcv8jop1ns4r.cn	植物是什么hcv8jop3ns2r.cn
农历四月是什么月helloaicloud.com	欠是什么意思hcv8jop0ns8r.cn	离岸人民币什么意思hcv7jop7ns4r.cn	古尔邦节什么意思ff14chat.com	小狗什么时候可以洗澡hcv7jop4ns7r.cn
减肥吃什么药瘦得快hcv9jop5ns3r.cn	h1是什么意思hcv8jop8ns7r.cn	venus是什么星球hcv9jop0ns9r.cn	胆囊炎吃什么中成药hcv7jop9ns2r.cn	宫外孕做什么手术hcv9jop0ns0r.cn

成都打造乡村旅游、山地旅游等6个产业集群项目

Showing new listings for Tuesday, 5 August 2025

New submissions (showing 7 of 7 entries)

Cross submissions (showing 7 of 7 entries)

Replacement submissions (showing 5 of 5 entries)