摘要 :
We introduce a new statistical model for time series that iteratively seg- ments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes t...
展开
We introduce a new statistical model for time series that iteratively seg- ments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time-series models- hidden Markov models and linear dynamical systems-and is closely related to models that fare widely used in the control and econometrics literatures.
收起
摘要 :
We study sampling as optimization in the space of measures. We focus on gradient flow-based optimization with the Langevin dynamics as a case study. We investigate the source of the bias of the unadjusted Langevin algorithm (ULA) ...
展开
We study sampling as optimization in the space of measures. We focus on gradient flow-based optimization with the Langevin dynamics as a case study. We investigate the source of the bias of the unadjusted Langevin algorithm (ULA) in discrete time, and consider how to remove or reduce the bias. We point out the difficulty is that the heat flow is exactly solvable, but neither its forward nor backward method is implementable in general, except for Gaussian data. We propose the symmetrized Langevin algorithm (SLA), which should have a smaller bias than ULA, at the price of implementing a proximal gradient step in space. We show SLA is in fact consistent for Gaussian target measure, whereas ULA is not. We also illustrate various algorithms explicitly for Gaussian target measure with Gaussian data, including gradient descent, proximal gradient, and Forward-Backward, and show they are all consistent.
收起
摘要 :
We study sampling as optimization in the space of measures. We focus on gradient flow-based optimization with the Langevin dynamics as a case study. We investigate the source of the bias of the unadjusted Langevin algorithm (ULA) ...
展开
We study sampling as optimization in the space of measures. We focus on gradient flow-based optimization with the Langevin dynamics as a case study. We investigate the source of the bias of the unadjusted Langevin algorithm (ULA) in discrete time, and consider how to remove or reduce the bias. We point out the difficulty is that the heat flow is exactly solvable, but neither its forward nor backward method is implementable in general, except for Gaussian data. We propose the symmetrized Langevin algorithm (SLA), which should have a smaller bias than ULA, at the price of implementing a proximal gradient step in space. We show SLA is in fact consistent for Gaussian target measure, whereas ULA is not. We also illustrate various algorithms explicitly for Gaussian target measure with Gaussian data, including gradient descent, proximal gradient, and Forward-Backward, and show they are all consistent.
收起
摘要 :
Inference problems with conjectured statistical-computational gaps are ubiquitous throughout modern statistics, computer science, statistical physics and discrete probability. While there has been success evidencing these gaps fro...
展开
Inference problems with conjectured statistical-computational gaps are ubiquitous throughout modern statistics, computer science, statistical physics and discrete probability. While there has been success evidencing these gaps from the failure of restricted classes of algorithms, progress towards a more traditional reduction-based approach to computational complexity in statistical inference has been limited. These average-case problems are each tied to a different natural distribution, high-dimensional structure and conjecturally hard parameter regime, leaving reductions among them technically challenging. Despite a flurry of recent success in developing such techniques, existing reductions have largely been limited to inference problems with similar structure – primarily mapping among problems representable as a sparse submatrix signal plus a noise matrix, which is similar to the common starting hardness assumption of planted clique ($\textsc{pc}$). The insight in this work is that a slight generalization of the planted clique conjecture – secret leakage planted clique ($\textsc{pc}_\rho$), wherein a small amount of information about the hidden clique is revealed – gives rise to a variety of new average-case reduction techniques, yielding a web of reductions relating statistical problems with very different structure. Based on generalizations of the planted clique conjecture to specific forms of $\textsc{pc}_\rho$, we deduce tight statistical-computational tradeoffs for a diverse range of problems including robust sparse mean estimation, mixtures of sparse linear regressions, robust sparse linear regression, tensor PCA, variants of dense $k$-block stochastic block models, negatively correlated sparse PCA, semirandom planted dense subgraph, detection in hidden partition models and a universality principle for learning sparse mixtures. This gives the first reduction-based evidence for a number of conjectured statistical-computational gaps. We introduce a number of new average-case reduction techniques that also reveal novel connections to combinatorial designs based on the incidence geometry of $\mathbb{F}_r^t$ and to random matrix theory. In particular, we show a convergence result between Wishart and inverse Wishart matrices that may be of independent interest. The specific hardness conjectures for $\textsc{pc}_\rho$ implying our statistical-computational gaps all are in correspondence with natural graph problems such as $k$-partite, bipartite and hypergraph variants of $\textsc{pc}$. Hardness in a $k$-partite hypergraph variant of $\textsc{pc}$ is the strongest of these conjectures and sufficient to establish all of our computational lower bounds. We also give evidence for our $\textsc{pc}_\rho$ hardness conjectures from the failure of low-degree polynomials and statistical query algorithms. Our work raises a number of open problems and suggests that previous technical obstacles to average-case reductions may have arisen because planted clique is not the right starting point. An expanded set of hardness assumptions, such as $\textsc{pc}_\rho$, may be a key first step towards a more complete theory of reductions among statistical problems.
收起
摘要 :
Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form...
展开
Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, sparse, and low density. Since many real-world applications need semantic interpretation of such short texts, research in Short Text Topic Modeling (STTM) has recently gained a lot of interest to reveal unique and cohesive latent topics. This article examines the current state of the art in STTM algorithms. It presents a comprehensive survey and taxonomy of STTM algorithms for short text topic modelling. The article also includes a qualitative and quantitative study of the STTM algorithms, as well as analyses of the various strengths and drawbacks of STTM techniques. Moreover, a comparative analysis of the topic quality and performance of representative STTM models is presented. The performance evaluation is conducted on two real-world Twitter datasets: the Real-World Pandemic Twitter (RW-Pand-Twitter) dataset and Real-world Cyberbullying Twitter (RW-CB-Twitter) dataset in terms of several metrics such as topic coherence, purity, NMI, and accuracy. Finally, the open challenges and future research directions in this promising field are discussed to highlight the trends of research in STTM. The work presented in this paper is useful for researchers interested in learning state-of-the-art short text topic modelling and researchers focusing on developing new algorithms for short text topic modelling.
收起
摘要 :
In the field of natural language processing and text mining, sentiment analysis (SA) has received huge attention from various researchers' across the globe. By the prevalence of Web 2.0, user's became more vigilant to share, promo...
展开
In the field of natural language processing and text mining, sentiment analysis (SA) has received huge attention from various researchers' across the globe. By the prevalence of Web 2.0, user's became more vigilant to share, promote and express themselves along with any issues or challenges that are being encountered on daily activities through the Internet (social media, micro-blogs, e-commerce, etc.) Expression and opinion are a complex sequence of acts that convey a huge volume of data that pose a challenge for computational researchers to decode. Over the period of time, researchers from various segments of public and private sectors are involved in the exploration of SA with an aim to understand the behavioral perspective of various stakeholders in society. Though the efforts to positively construct SA are successful, challenges still prevail for efficiency. This article presents an organized survey of SA (also known as opinion mining) along with methodologies or algorithms. The survey classifies SA into categories based on levels, tasks, and sub-task along with various techniques used for performing them. The survey explicitly focuses on different directions in which the research was explored in the area of cross-domain opinion classification. The article is concluded with an objective to present an exclusive and exhaustive analysis in the area of opinion mining containing approaches, datasets, languages, and applications used. The observations made are expected to support researches to get a greater understanding on emerging trends and state-of-the-art methods to be applied for future exploration.
收起
摘要 :
While the concept of swarm intelligence was introduced in 1980s, the first swarm optimisation algorithm was introduced a decade later, in 1992. In this paper, nineteen representative original swarm optimisation algorithms are anal...
展开
While the concept of swarm intelligence was introduced in 1980s, the first swarm optimisation algorithm was introduced a decade later, in 1992. In this paper, nineteen representative original swarm optimisation algorithms are analysed to extract their common features and design a taxonomy for swarm optimisation. We use twenty-nine benchmark problems to compare the performance of these nineteen algorithms in the form they were first introduced in the literature against five state-of-the-art swarm algorithms. This comparison reveals the advancements made in this field over three decades. It reveals that, while the state-of-the-art swarm optimisation algorithms are indeed competitive in terms of the quality of solutions they find, their complexities have evolved to be more computationally demanding when compared to the nineteen original algorithms of swarm optimisation. The investigation suggests that there is an urge to continue to design swarm optimisation algorithms that are simpler, while maintaining their current competitive performance.
收起
摘要 :
Recently, research in unsupervised learning has gravitated towards exploring statistical-computational gaps induced by sparsity. A line of work initiated in Berthet and Rigollet (2013) has aimed to explain these gaps through reduc...
展开
Recently, research in unsupervised learning has gravitated towards exploring statistical-computational gaps induced by sparsity. A line of work initiated in Berthet and Rigollet (2013) has aimed to explain these gaps through reductions to conjecturally hard problems from complexity theory. However, the delicate nature of average-case reductions has limited the development of techniques and often led to weaker hardness results that only apply to algorithms robust to different noise distributions or that do not need to know the parameters of the problem. We introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture. Our new lower bounds include: Planted Independent Set: We show tight lower bounds for detecting a planted independent set of size $k$ in a sparse Erd?s-Rényi graph of size $n$ with edge density $\tilde{\Theta}(n^{-\alpha})$. Planted Dense Subgraph: If $p > q$ are the edge densities inside and outside of the community, we show the first lower bounds for the general regime $q = \tilde{\Theta}(n^{-\alpha})$ and $p - q = \tilde{\Theta}(n^{-\gamma})$ where $\gamma \ge \alpha$, matching the lower bounds predicted in Chen and Xu (2016). Our lower bounds apply to a deterministic community size $k$, resolving a question raised in Hajek et al. (2015). Biclustering: We show strong lower bounds for Gaussian biclustering as a simple hypothesis testing problem to detect a uniformly at random planted flat $k \times k$ submatrix. Sparse Rank-1 Submatrix: We show that detection in the sparse spiked Wigner model is often harder than biclustering, and are able to obtain two different tight lower bounds for these problems with different reductions from planted clique. Sparse PCA: We give a reduction between rank-1 submatrix and sparse PCA to obtain tight lower bounds in the less sparse regime $k \gg \sqrt{n}$, when the spectral algorithm is optimal over the SDP. We give an alternate reduction recovering the lower bounds of Berthet and Rigollet (2013) and Gao et al. (2017) in the simple hypothesis testing variant of sparse PCA. We also observe a subtlety in the complexity of sparse PCA that arises when the planted vector is biased. Subgraph Stochastic Block Model: We introduce a model where two small communities are planted in an Erd?s-Rényi graph of the same average edge density and give tight lower bounds yielding different hard regimes than planted dense subgraph. Our results demonstrate that, despite the delicate nature of average-case reductions, using natural problems as intermediates can often be beneficial, as is the case in worst-case complexity. Our main technical contribution is to introduce a set of techniques for average-case reductions that: (1) maintain the level of signal in an instance of a problem; (2) alter its planted structure; and (3) map two initial high-dimensional distributions simultaneously to two target distributions approximately under total variation. We also give algorithms matching our lower bounds and identify the information-theoretic limits of the models we consider.
收起
摘要 :
Recently, research in unsupervised learning has gravitated towards exploring statistical-computational gaps induced by sparsity. A line of work initiated in Berthet and Rigollet (2013) has aimed to explain these gaps through reduc...
展开
Recently, research in unsupervised learning has gravitated towards exploring statistical-computational gaps induced by sparsity. A line of work initiated in Berthet and Rigollet (2013) has aimed to explain these gaps through reductions to conjecturally hard problems from complexity theory. However, the delicate nature of average-case reductions has limited the development of techniques and often led to weaker hardness results that only apply to algorithms robust to different noise distributions or that do not need to know the parameters of the problem. We introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture. Our new lower bounds include: Planted Independent Set: We show tight lower bounds for detecting a planted independent set of size $k$ in a sparse Erd?s-Rényi graph of size $n$ with edge density $\tilde{\Theta}(n^{-\alpha})$. Planted Dense Subgraph: If $p > q$ are the edge densities inside and outside of the community, we show the first lower bounds for the general regime $q = \tilde{\Theta}(n^{-\alpha})$ and $p - q = \tilde{\Theta}(n^{-\gamma})$ where $\gamma \ge \alpha$, matching the lower bounds predicted in Chen and Xu (2016). Our lower bounds apply to a deterministic community size $k$, resolving a question raised in Hajek et al. (2015). Biclustering: We show strong lower bounds for Gaussian biclustering as a simple hypothesis testing problem to detect a uniformly at random planted flat $k \times k$ submatrix. Sparse Rank-1 Submatrix: We show that detection in the sparse spiked Wigner model is often harder than biclustering, and are able to obtain two different tight lower bounds for these problems with different reductions from planted clique. Sparse PCA: We give a reduction between rank-1 submatrix and sparse PCA to obtain tight lower bounds in the less sparse regime $k \gg \sqrt{n}$, when the spectral algorithm is optimal over the SDP. We give an alternate reduction recovering the lower bounds of Berthet and Rigollet (2013) and Gao et al. (2017) in the simple hypothesis testing variant of sparse PCA. We also observe a subtlety in the complexity of sparse PCA that arises when the planted vector is biased. Subgraph Stochastic Block Model: We introduce a model where two small communities are planted in an Erd?s-Rényi graph of the same average edge density and give tight lower bounds yielding different hard regimes than planted dense subgraph. Our results demonstrate that, despite the delicate nature of average-case reductions, using natural problems as intermediates can often be beneficial, as is the case in worst-case complexity. Our main technical contribution is to introduce a set of techniques for average-case reductions that: (1) maintain the level of signal in an instance of a problem; (2) alter its planted structure; and (3) map two initial high-dimensional distributions simultaneously to two target distributions approximately under total variation. We also give algorithms matching our lower bounds and identify the information-theoretic limits of the models we consider.
收起
摘要 :
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We intro...
展开
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies. In the fully-adversarial setting, our controller attains an optimal regret bound of $\sqrt{T}$-when the system is known, and, when combined with an initial stage of least-squares estimation, $T^{2/3}$ when the system is unknown; both yield the first sublinear regret for the partially observed setting. Our bounds are the first in the non-stochastic control setting that compete with \emph{all} stabilizing linear dynamical controllers, not just state feedback. Moreover, in the presence of semi-adversarial noise containing both stochastic and adversarial components, our controller attains the optimal regret bounds of $\mathrm{poly}(\log T)$ when the system is known, and $\sqrt{T}$ when unknown. To our knowledge, this gives the first end-to-end $\sqrt{T}$ regret for online Linear Quadratic Gaussian controller, and applies in a more general setting with adversarial losses and semi-adversarial noise.
收起