车讯:2020年前发布疑似新一代Giulietta谍照

New submissions
Cross-lists
Replacements

百度从使用方式来看，汽车由过去的私人拥有和私人使用为主，向私人拥有和共享--同拥有相结合的科学合理使用转变未来出行。

See recent articles

Showing new listings for Tuesday, 5 August 2025

Total of 9 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2508.00904 [pdf, html, other]: Title: Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling

Rajeev Patwari, Ashish Sirasao, Devleena Das

Comments: 10 pages, 9 figures

Subjects: Performance (cs.PF); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Large language models (LLMs) have been increasingly deployed as local agents on personal devices with CPUs, NPUs and integrated GPUs. However, forecasting inference performance on devices with such heterogeneity remains challenging due to the dynamic compute and memory demands. Existing approaches rely on GPU benchmarking or machine learning-based latency predictors, which are often hardware-specific and lack generalizability. To this end, we introduce LIFE, a lightweight and modular analytical framework that is comprised of modular analytical model of operators, configurable to characterize LLM inference workloads in a hardware and dataset-agnostic manner. LIFE characterizes the influence of software and model optimizations, such as quantization, KV cache compression, LoRA adapters, chunked prefill, different attentions, and operator fusion, on performance metrics such as time-to-first-token (TTFT), time-per-output-token (TPOT) and tokens-per-second (TPS). LIFE enables performance forecasting using only hardware specifications, such as TOPS and memory bandwidth, without requiring extensive dataset benchmarking. We validate LIFE's forecasting with inference on AMD Ryzen CPUs, NPUs, iGPUs and NVIDIA V100 GPUs, with Llama2-7B variants, demonstrating the utility of LIFE in forecasting LLM performance through lens of system efficiency to enable efficient LLM deployment across different hardware platforms.

[2] arXiv:1905.13011 (cross-list from cs.DB) [pdf, other]: Title: Don't Persist All : Efficient Persistent Data Structures

Pratyush Mahapatra, Mark D. Hill, Michael M. Swift

Comments: 10 pages, 12 figures

Subjects: Databases (cs.DB); Hardware Architecture (cs.AR); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

Data structures used in software development have inbuilt redundancy to improve software reliability and to speed up performance. Examples include a Doubly Linked List which allows a faster deletion due to the presence of the previous pointer. With the introduction of Persistent Memory, storing the redundant data fields into persistent memory adds a significant write overhead, and reduces performance. In this work, we focus on three data structures - Doubly Linked List, B+Tree and Hashmap, and showcase alternate partly persistent implementations where we only store a limited set of data fields to persistent memory. After a crash/restart, we use the persistent data fields to recreate the data structures along with the redundant data fields. We compare our implementation with the base implementation and show that we achieve speedups around 5-20% for some data structures, and up to 165% for a flush-dominated data structure.
[3] arXiv:2508.01506 (cross-list from cs.LG) [pdf, html, other]: Title: FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models

Zishan Shao, Yixiao Wang, Qinsi Wang, Ting Jiang, Zhixu Du, Hancheng Ye, Danyang Zhuo, Yiran Chen, Hai Li

Comments: Technical Report

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)

Singular Value Decomposition (SVD) has recently seen a surge of interest as a simple yet powerful tool for large language models (LLMs) compression, with a growing number of works demonstrating 20-80% parameter reductions at minimal accuracy loss. Previous SVD-based approaches have focused primarily on reducing the memory footprint of model weights, largely overlooking the additional activation memory overhead incurred during inference when applying truncated factors via standard dense CUDA kernels. Our experiments demonstrate that this activation overhead, scaling with sequence length and hidden dimension, prevents current SVD compression techniques from achieving any reduction in peak inference memory, thereby limiting their viability for real-world, on-device deployments.
We introduce FlashSVD, a novel, end-to-end rank-aware streaming inference framework specifically designed for SVD-compressed large language models. FlashSVD can be seamlessly integrated with any model that employs SVD-based methods for parameter reduction. By fusing low-rank projection kernels directly into both the self-attention and feed-forward network (FFN) pipelines, FlashSVD avoid materializing full-size activation buffers. Instead, small tiles of the truncated factors are loaded into on-chip SRAM, multiplied and reduced on the fly, and immediately evicted, preserving high GPU occupancy and adding no extra latency. On standard encoder benchmarks (e.g., BERT-Base), FlashSVD cuts peak activation memory by up to 70.2% and intermediate transient memory by 75%, all while incur no accuracy loss with upstreaming compression methods, offering a practical path toward memory-constrained deployment of low-rank LLMs.
[4] arXiv:2508.01635 (cross-list from cs.LG) [pdf, html, other]: Title: Learning Unified System Representations for Microservice Tail Latency Prediction

Wenzhuo Qian, Hailiang Zhao, Tianlv Chen, Jiayi Chen, Ziqi Wang, Kingsum Chow, Shuiguang Deng

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Microservice architectures have become the de facto standard for building scalable cloud-native applications, yet their distributed nature introduces significant challenges in performance monitoring and resource management. Traditional approaches often rely on per-request latency metrics, which are highly sensitive to transient noise and fail to reflect the holistic behavior of complex, concurrent workloads. In contrast, window-level P95 tail latency provides a stable and meaningful signal that captures both system-wide trends and user-perceived performance degradation. We identify two key shortcomings in existing methods: (i) inadequate handling of heterogeneous data, where traffic-side features propagate across service dependencies and resource-side signals reflect localized bottlenecks, and (ii) the lack of principled architectural designs that effectively distinguish and integrate these complementary modalities. To address these challenges, we propose USRFNet, a deep learning network that explicitly separates and models traffic-side and resource-side features. USRFNet employs GNNs to capture service interactions and request propagation patterns, while gMLP modules independently model cluster resource dynamics. These representations are then fused into a unified system embedding to predict window-level P95 latency with high accuracy. We evaluate USRFNet on real-world microservice benchmarks under large-scale stress testing conditions, demonstrating substantial improvements in prediction accuracy over state-of-the-art baselines.

[5] arXiv:2507.21895 (replaced) [pdf, html, other]: Title: Beamforming-based Achievable Rate Maximization in ISAC System for Multi-UAV Networking

Shengcai Zhou, Luping Xiang, Kun Yang, Kai Kit Wong, Dapeng Oliver Wu, Chan-Byoung Chae

Subjects: Performance (cs.PF)

Airborne mobile Integrated Sensing and Communication (ISAC) base stations have garnered significant attention recently, with ISAC technology being a crucial application for 6G networks. Since ISAC can sense potential mobile communication users, this paper studies an effective scheme for a multi-UAV network tailored for emergency communication. In this paper, we develop a temporal-assisted frame structure utilizing integrated omnidirectional and directional beampattern to facilitate efficient and frequent searching, with extended Kalman filtering (EKF) as an aid to beam alignment. Further, we address an optimization problem to maximize the total achievable rate per slot by jointly designing UAV beamforming, load management, and UAV direction planning, all while adhering to the constraints of the predicted beam coverage. Given the problem NP-hard, we introduce three robust mechanisms for its resolution: an enhanced distributed Successive Convex Approximation (SCA)-Iterative Rank Minimization (IRM) algorithm, an coalition game approach, and a Fermat point search method. In particular, the proposed SCA-IRM algorithm decomposes the original complex optimization problem into several sub-problems and assigns them equally to each UAV, so as to realize distributed computing and improve computational efficiency. Our proposed simulations demonstrate the improved system performance in terms of communication rate, fairness, and sensing accuracy, providing design guidelines of UAV-assisted emergency communication networking.
[6] arXiv:2506.02666 (replaced) [pdf, html, other]: Title: Spatially Correlated multi-RIS Communication: The Effect of Inter-Operator Interference

Nikolaos I. Miridakis, Panagiotis A. Karkazis

Comments: Submitted to IEEE Journal. arXiv admin note: text overlap with arXiv:2403.00349

Subjects: Information Theory (cs.IT); Performance (cs.PF)

A multi-operator wireless communication system is studied where each operator is equipped with a reconfigurable intelligent surface (RIS) to enhance its communication quality. RISs controlled by different operators affect the system performance of one another due to the inherently rapid phase shift adjustments that occur on an independent basis. The system performance of such a communication scenario is analytically studied for the practical case where spatial correlation occurs at RIS of arbitrary size. The proposed framework is quite general since it is analyzed under Nakagami-$m$ channel fading conditions. Finally, the derived analytical results are verified via numerical and simulation trials as well as some new and useful engineering outcomes are revealed.
[7] arXiv:2506.09226 (replaced) [pdf, html, other]: Title: Terabyte-Scale Analytics in the Blink of an Eye

Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen

Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and popularity of AI models lead to the deployment of incredibly powerful GPU clusters in commercial data centers. Compared to CPU-only solutions, these clusters deliver impressive improvements in per-node compute, memory bandwidth, and inter-node interconnect performance. In this paper, we study the problem of scaling analytical SQL queries on distributed clusters of GPUs, with the stated goal of establishing an upper bound on the likely performance gains. To do so, we build a prototype designed to maximize performance by leveraging ML/HPC best practices, such as group communication primitives for cross-device data movements. This allows us to conduct thorough performance experimentation to point our community towards a massive performance opportunity of at least 60$\times$. To make these gains more relatable, before you can blink twice, our system can run all 22 queries of TPC-H at a 1TB scale factor!
[8] arXiv:2506.10872 (replaced) [pdf, html, other]: Title: The Gittins Index: A Design Principle for Decision-Making Under Uncertainty

Ziv Scully, Alexander Terenin

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Performance (cs.PF); Probability (math.PR); Machine Learning (stat.ML)

The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora's box model. However, despite the above examples and later extensions thereof, the space of problems that the Gittins index can solve perfectly optimally is limited, and its definition is rather subtle compared to those of other multi-armed bandit algorithms. As a result, the Gittins index is often regarded as being primarily a concept of theoretical importance, rather than a practical tool for solving decision-making problems.
The aim of this tutorial is to demonstrate that the Gittins index can be fruitfully applied to practical problems. We start by giving an example-driven introduction to the Gittins index, then walk through several examples of problems it solves - some optimally, some suboptimally but still with excellent performance. Two practical highlights in the latter category are applying the Gittins index to Bayesian optimization, and applying the Gittins index to minimizing tail latency in queues.
[9] arXiv:2507.14959 (replaced) [pdf, html, other]: Title: Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices

Saeid Ghafouri, Mohsen Fayyaz, Xiangchen Li, Deepu John, Bo Ji, Dimitrios Nikolopoulos, Hans Vandierendonck

Subjects: Computer Vision and Pattern Recognition (cs.CV); Performance (cs.PF)

Real-time multi-label video classification on embedded devices is constrained by limited compute and energy budgets. Yet, video streams exhibit structural properties such as label sparsity, temporal continuity, and label co-occurrence that can be leveraged for more efficient inference. We introduce Polymorph, a context-aware framework that activates a minimal set of lightweight Low Rank Adapters (LoRA) per frame. Each adapter specializes in a subset of classes derived from co-occurrence patterns and is implemented as a LoRA weight over a shared backbone. At runtime, Polymorph dynamically selects and composes only the adapters needed to cover the active labels, avoiding full-model switching and weight merging. This modular strategy improves scalability while reducing latency and energy overhead. Polymorph achieves 40% lower energy consumption and improves mAP by 9 points over strong baselines on the TAO dataset. Polymorph is open source at this http URL.

Total of 9 entries

Showing up to 2000 entries per page: fewer | more | all

红细胞减少是什么原因	什么官许愿	小孩眼屎多是什么原因引起的	异常脑电图说明什么	欲盖弥彰是什么意思
一只脚心疼是什么原因	梦见偷菜是什么意思	旺五行属什么	湿疹是什么症状图片	肾炎吃什么药好
左手小指和无名指发麻是什么原因	病毒性结膜炎用什么眼药水	男人左眼跳是什么意思	生吃黄瓜有什么好处	巴宝莉是什么品牌
哥子是什么意思	雾化是治疗什么的	农历六月初六是什么节	为什么男怕招风耳	地中海贫血什么意思

湖北古代叫什么hcv8jop4ns7r.cn	有头皮屑用什么洗发水hcv8jop8ns4r.cn	耐人寻味是什么意思hcv9jop0ns4r.cn	厦门房价为什么那么高hcv9jop1ns3r.cn	吸渣体质是什么意思hcv9jop8ns0r.cn
很容易饿是什么原因hcv7jop6ns0r.cn	eb是什么hcv8jop9ns8r.cn	微光是什么意思hcv8jop9ns1r.cn	化疗中的病人应该吃什么bjhyzcsm.com	什么牌子的益生菌调理肠胃比较好qingzhougame.com
血压低说明什么hcv9jop5ns0r.cn	什么叫偏财hcv9jop4ns2r.cn	思维是什么hcv9jop2ns9r.cn	风湿挂什么科hcv7jop9ns6r.cn	化验血常规能查出什么shenchushe.com
外阴红肿疼痛用什么药hcv7jop7ns3r.cn	无什么于事hcv9jop3ns8r.cn	巨蟹座幸运花是什么xscnpatent.com	甲状腺是什么病严重吗cj623037.com	颧骨长斑是什么原因hcv8jop0ns4r.cn

车讯:2020年前发布疑似新一代Giulietta谍照

Showing new listings for Tuesday, 5 August 2025

New submissions (showing 1 of 1 entries)

Cross submissions (showing 3 of 3 entries)

Replacement submissions (showing 5 of 5 entries)

车讯:2020年前发布 疑似新一代Giulietta谍照

Showing new listings for Tuesday, 5 August 2025

New submissions (showing 1 of 1 entries)

Cross submissions (showing 3 of 3 entries)

Replacement submissions (showing 5 of 5 entries)

车讯:2020年前发布疑似新一代Giulietta谍照