The GeoNetwork opensource (GNOS) project is a free and open source (FOSS) cataloging application for spatially referenced resources. It is a catalog of location-oriented information. == Outline == It is a standardized and decentralized spatial information management environment designed to enable access to geo-referenced databases, cartographic products and related metadata from a variety of sources, enhancing the spatial information exchange and sharing between organizations and their audience, using the capacities of the internet. Using the Z39.50 protocol it both accesses remote catalogs and makes its data available to other catalog services. As of 2007, OGC Web Catalog Service are being implemented. Maps, including those derived from satellite imagery, are effective communicational tools and play an important role in the work of decision makers (e.g., sustainable development planners and humanitarian and emergency managers) in need of quick, reliable and up-to-date user-friendly cartographic products as a basis for action and to better plan and monitor their activities; GIS experts in need of exchanging consistent and updated geographical data; and spatial analysts in need of multidisciplinary data to perform preliminary geographical analysis and make reliable forecasts. == Deployment == The software has been deployed to various organizations, the first being FAO GeoNetwork and WFP VAM-SIE-GeoNetwork, both at their headquarters in Rome, Italy. Furthermore, the WHO, CGIAR, BRGM, ESA, FGDC and the Global Change Information and Research Centre (GCIRC) of China are working on GeoNetwork opensource implementations as their spatial information management capacity. It is used for several risk information systems, in particular in the Gambia. Several related tools are packaged with GeoNetwork, including GeoServer. GeoServer stores geographical data, while GeoNetwork catalogs collections of such data.
JAX (software)
JAX is a Python library for accelerator-oriented array computation and program transformation, designed for high-performance numerical computing and large-scale machine learning. It is developed by Google with contributions from Nvidia and other community contributors. It is described as bringing together a modified version of the automatic differentiation system autograd and OpenXLA's XLA (Accelerated Linear Algebra). It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary features of JAX are: Providing a unified NumPy-like interface to computations that run on CPU, GPU, or TPU, in local or distributed settings. Built-in Just-In-Time (JIT) compilation via OpenXLA, an open-source machine learning compiler ecosystem. Efficient evaluation of gradients via its automatic differentiation transformations. Automatic vectorization to efficiently map functions over arrays representing batches of inputs. == Libraries using Jax == Flax Equinox Optax
Mehryar Mohri
Mehryar Mohri is a professor and theoretical computer scientist at the Courant Institute of Mathematical Sciences. He is also heading the Machine Learning Theory (ML Theory) team at Google Research. == Career == Prior to joining the Courant Institute, Mohri was a research department head and later technology leader at AT&T Bell Labs, where he was a member of the technical staff for about ten years. Mohri has also taught as an assistant professor at the University of Paris 7 (1992-1993) and Ecole Polytechnique (1992-1994). == Research == Mohri's main area of research is machine learning, in particular learning theory. He is also an expert in automata theory and algorithms. He is the author of several core algorithms that have served as the foundation for the design of many deployed speech recognition and natural language processing systems. == Publications == Mohri is the author of the reference book Foundations of Machine Learning used as a textbook in many graduate-level machine learning courses. Mohri is also a member of the Lothaire group of mathematicians with the pseudonym M. Lothaire and contributed to the book on Applied Combinatorics on Words. He is the author of more than 250 conference and journal publications. == Organizational affiliations == Mohri is currently the President of the Association for Algorithmic Learning Theory (AALT) and the Steering Committee Chair for the ALT conference. He is also Editorial Board member of Machine Learning and TheoretiCS, Action Editor of the Journal of Machine Learning Research (JMLR) and a member of the advisory board for the Journal of Automata, Languages and Combinatorics.
Adam Tauman Kalai
Adam Tauman Kalai is an American computer scientist who specializes in artificial intelligence and works at OpenAI. == Education and career == Kalai graduated from Harvard University in 1996 with a BA in computer science and received a MA and PhD, both in computer science, from Carnegie Mellon University in 1999 and 2001, respectively. His doctoral advisor was Avrim Blum. After graduation, Kalai did his postdoctoral research at Massachusetts Institute of Technology under Santosh Vempala until 2003. Kalai became a faculty member at the Toyota Technological Institute at Chicago from 2003 to 2006, followed by a stint as an assistant professor at Georgia Institute of Technology from 2007 to 2008. He joined Microsoft Research in 2008 and subsequently moved to OpenAI in 2023. == Contributions == Kalai is known for his algorithm for generating random factored numbers (see Bach's algorithm), for co-inventing the cooperative-competitive value (coco value), for efficiently learning learning mixtures of Gaussians, for the Blum-Kalai-Wasserman algorithm for learning parity with noise, and for the intractability of the folk theorem in game theory. More recently, Kalai is known for identifying and reducing gender bias in word embeddings, which are a representation of words commonly used in AI systems. In 2026, he coauthored a Nature paper on hallucinations in large language models. == Personal life == Kalai is the son of game theorist Ehud Kalai and is married to cryptographer Yael Tauman Kalai.
Claire Cardie
Claire Cardie is an American computer scientist specializing in natural language processing. Since 2006, she has been a professor of computer science and information science at Cornell University, and from 2010 to 2011 she was the first Charles and Barbara Weiss Chair of Information Science at Cornell. Her research interests include coreference resolution and sentiment analysis. == Education and career == Cardie is a 1982 graduate of Yale University, majoring in computer science. After working for several companies as a computer programmer, she returned to graduate study in the late 1980s and completed her Ph.D. at the University of Massachusetts Amherst in 1994. Her dissertation, Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis, was supervised by Wendy Lehnert. She has been on the Cornell University faculty since 1994, initially in computer science and since 2005 also in information science. She was an assistant professor (1994–2000) and associate professor (2000–06), before being promoted to a full professorship in 2006. In 2007 she founded a start-up company, Appinions, and she was its chief scientist until 2015. Her doctoral students at Cornell have included Amit Singhal and Kiri Wagstaff. == Recognition == Cardie became a Fellow of the Association for Computational Linguistics in 2016. She was elected as an ACM Fellow in 2019 "for contributions to natural language processing, including coreference resolution, information and opinion extraction". She was named to the 2021 class of Fellows of the American Association for the Advancement of Science.
TIMIT
TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time. TIMIT was designed to further acoustic-phonetic knowledge and automatic speech recognition systems. It was commissioned by DARPA and corpus design was a joint effort between the Massachusetts Institute of Technology, SRI International, and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and verified and prepared for publishing by the National Institute of Standards and Technology (NIST). There is also a telephone bandwidth version called NTIMIT (Network TIMIT). TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset. == Data == TIMIT contains ~5 hours of speech, of 10 sentences spoken by each of 630 speakers. The sentences were randomly sampled from a corpus of 2342 sentences. The speakers were native speakers of American English, classified under 8 major dialect regions: New England, Northern, North Midland, South Midland, Southern, New York City, Western, Army Brat (moved around). The speakers were 70% male and 30% female. Recordings were made in a noise-isolated recording booth at Texas Instrument, using a semi-automatic computer system (STEROIDS) to control the presentation of prompts to the speaker and the recording. Two-channel recordings were made using a Sennheiser HMD 414 headset-mounted microphone and a Brüel & Kjær 1/2" far-field pressure microphone (#4165). The speech was digitized at a sample rate of 20 kHz then and downsampled to 16 kHz. == History == The TIMIT telephone corpus was an early attempt to create a database with speech samples. It was published in the year 1988 on CD-ROM and consists of only 10 sentences per speaker. Two 'dialect' sentences were read by each speaker, as well as another 8 sentences selected from a larger set Each sentence averages 3 seconds long and is spoken by 630 different speakers. It was the first notable attempt in creating and distributing a speech corpus and the overall project has produced costs of 1.5 million US$. An update was released in October 1990. It included full 630-speaker corpus; checked and corrected transcriptions; word-alignment transcriptions; NIST SPHERE-headered waveform files and header manipulation software; phonemic dictionary; new test and training subsets balanced for dialectal and phonetic coverage; more extensive documentation. The full name of the project is DARPA-TIMIT Acoustic-Phonetic Continuous Speech Corpus and the acronym TIMIT stands for Texas Instruments/Massachusetts Institute of Technology. The main reason why a corpus of telephone speech was created was to train speech recognition software. In the Blizzard challenge, different software has the obligation to convert audio recordings into textual data and the TIMIT corpus was used as a standardized baseline.
MRF optimization via dual decomposition
In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization. Dual decomposition is applied to markov logic programs as an inference technique. == Background == Discrete MRF Optimization (inference) is very important in Machine Learning and Computer vision, which is realized on CUDA graphical processing units. Consider a graph G = ( V , E ) {\displaystyle G=(V,E)} with nodes V {\displaystyle V} and Edges E {\displaystyle E} . The goal is to assign a label l p {\displaystyle l_{p}} to each p ∈ V {\displaystyle p\in V} so that the MRF Energy is minimized: (1) min Σ p ∈ V θ p ( l p ) + Σ p q ∈ ε θ p q ( l p ) ( l q ) {\displaystyle \min \Sigma _{p\in V}\theta _{p}(l_{p})+\Sigma _{pq\in \varepsilon }\theta _{pq}(l_{p})(l_{q})} Major MRF Optimization methods are based on Graph cuts or Message passing. They rely on the following integer linear programming formulation (2) min x E ( θ , x ) = θ . x = ∑ p ∈ V θ p . x p + ∑ p q ∈ ε θ p q . x p q {\displaystyle \min _{x}E(\theta ,x)=\theta .x=\sum _{p\in V}\theta _{p}.x_{p}+\sum _{pq\in \varepsilon }\theta _{pq}.x_{pq}} In many applications, the MRF-variables are {0,1}-variables that satisfy: x p ( l ) = 1 {\displaystyle x_{p}(l)=1} ⇔ {\displaystyle \Leftrightarrow } label l {\displaystyle l} is assigned to p {\displaystyle p} , while x p q ( l , l ′ ) = 1 {\displaystyle x_{pq}(l,l^{\prime })=1} , labels l , l ′ {\displaystyle l,l^{\prime }} are assigned to p , q {\displaystyle p,q} . == Dual Decomposition == The main idea behind decomposition is surprisingly simple: decompose your original complex problem into smaller solvable subproblems, extract a solution by cleverly combining the solutions from these subproblems. A sample problem to decompose: min x Σ i f i ( x ) {\displaystyle \min _{x}\Sigma _{i}f^{i}(x)} where x ∈ C {\displaystyle x\in C} In this problem, separately minimizing every single f i ( x ) {\displaystyle f^{i}(x)} over x {\displaystyle x} is easy; but minimizing their sum is a complex problem. So the problem needs to get decomposed using auxiliary variables { x i } {\displaystyle \{x^{i}\}} and the problem will be as follows: min { x i } , x Σ i f i ( x i ) {\displaystyle \min _{\{x^{i}\},x}\Sigma _{i}f^{i}(x^{i})} where x i ∈ C , x i = x {\displaystyle x^{i}\in C,x^{i}=x} Now we can relax the constraints by multipliers { λ i } {\displaystyle \{\lambda ^{i}\}} which gives us the following Lagrangian dual function: g ( { λ i } ) = min { x i ∈ C } , x Σ i f i ( x i ) + Σ i λ i . ( x i − x ) = min { x i ∈ C } , x Σ i [ f i ( x i ) + λ i . x i ] − ( Σ i λ i ) x {\displaystyle g(\{\lambda ^{i}\})=\min _{\{x^{i}\in C\},x}\Sigma _{i}f^{i}(x^{i})+\Sigma _{i}\lambda ^{i}.(x^{i}-x)=\min _{\{x^{i}\in C\},x}\Sigma _{i}[f^{i}(x^{i})+\lambda ^{i}.x^{i}]-(\Sigma _{i}\lambda ^{i})x} Now we eliminate x {\displaystyle x} from the dual function by minimizing over x {\displaystyle x} and dual function becomes: g ( { λ i } ) = min { x i ∈ C } Σ i [ f i ( x i ) + λ i . x i ] {\displaystyle g(\{\lambda ^{i}\})=\min _{\{x^{i}\in C\}}\Sigma _{i}[f^{i}(x^{i})+\lambda ^{i}.x^{i}]} We can set up a Lagrangian dual problem: (3) max { λ i } ∈ Λ g ( λ i ) = Σ i g i ( x i ) , {\displaystyle \max _{\{\lambda ^{i}\}\in \Lambda }g({\lambda ^{i}})=\Sigma _{i}g^{i}(x^{i}),} The Master problem (4) g i ( x i ) = m i n x i f i ( x i ) + λ i . x i {\displaystyle g^{i}(x^{i})=min_{x^{i}}f^{i}(x^{i})+\lambda ^{i}.x^{i}} where x i ∈ C {\displaystyle x^{i}\in C} The Slave problems == MRF optimization via Dual Decomposition == The original MRF optimization problem is NP-hard and we need to transform it into something easier. τ {\displaystyle \tau } is a set of sub-trees of graph G {\displaystyle G} where its trees cover all nodes and edges of the main graph. And MRFs defined for every tree T {\displaystyle T} in τ {\displaystyle \tau } will be smaller. The vector of MRF parameters is θ T {\displaystyle \theta ^{T}} and the vector of MRF variables is x T {\displaystyle x^{T}} , these two are just smaller in comparison with original MRF vectors θ , x {\displaystyle \theta ,x} . For all vectors θ T {\displaystyle \theta ^{T}} we'll have the following: (5) ∑ T ∈ τ ( p ) θ p T = θ p , ∑ T ∈ τ ( p q ) θ p q T = θ p q . {\displaystyle \sum _{T\in \tau (p)}\theta _{p}^{T}=\theta _{p},\sum _{T\in \tau (pq)}\theta _{pq}^{T}=\theta _{pq}.} Where τ ( p ) {\displaystyle \tau (p)} and τ ( p q ) {\displaystyle \tau (pq)} denote all trees of τ {\displaystyle \tau } than contain node p {\displaystyle p} and edge p q {\displaystyle pq} respectively. We simply can write: (6) E ( θ , x ) = ∑ T ∈ τ E ( θ T , x T ) {\displaystyle E(\theta ,x)=\sum _{T\in \tau }E(\theta ^{T},x^{T})} And our constraints will be: (7) x T ∈ χ T , x T = x | T , ∀ T ∈ τ {\displaystyle x^{T}\in \chi ^{T},x^{T}=x_{|T},\forall T\in \tau } Our original MRF problem will become: (8) min { x T } , x Σ T ∈ τ E ( θ T , x T ) {\displaystyle \min _{\{x^{T}\},x}\Sigma _{T\in \tau }E(\theta ^{T},x^{T})} where x T ∈ χ T , ∀ T ∈ τ {\displaystyle x^{T}\in \chi ^{T},\forall T\in \tau } and x T ∈ x | T , ∀ T ∈ τ {\displaystyle x^{T}\in x_{|T},\forall T\in \tau } And we'll have the dual problem we were seeking: (9) max { λ T } ∈ Λ g ( { λ T } ) = ∑ T ∈ τ g T ( λ T ) , {\displaystyle \max _{\{\lambda ^{T}\}\in \Lambda }g(\{\lambda ^{T}\})=\sum _{T\in \tau }g^{T}(\lambda ^{T}),} The Master problem where each function g T ( . ) {\displaystyle g^{T}(.)} is defined as: (10) g T ( λ T ) = min x T E ( θ T + λ T , x T ) {\displaystyle g^{T}(\lambda ^{T})=\min _{x^{T}}E(\theta ^{T}+\lambda ^{T},x^{T})} where x T ∈ χ T {\displaystyle x^{T}\in \chi ^{T}} The Slave problems == Theoretical Properties == Theorem 1. Lagrangian relaxation (9) is equivalent to the LP relaxation of (2). min { x T } , x { E ( x , θ ) | x p T = s p , x T ∈ CONVEXHULL ( χ T ) } {\displaystyle \min _{\{x^{T}\},x}\{E(x,\theta )|x_{p}^{T}=s_{p},x^{T}\in {\text{CONVEXHULL}}(\chi ^{T})\}} Theorem 2. If the sequence of multipliers { α t } {\displaystyle \{\alpha _{t}\}} satisfies α t ≥ 0 , lim t → ∞ α t = 0 , ∑ t = 0 ∞ α t = ∞ {\displaystyle \alpha _{t}\geq 0,\lim _{t\to \infty }\alpha _{t}=0,\sum _{t=0}^{\infty }\alpha _{t}=\infty } then the algorithm converges to the optimal solution of (9). Theorem 3. The distance of the current solution { θ T } {\displaystyle \{\theta ^{T}\}} to the optimal solution { θ ¯ T } {\displaystyle \{{\bar {\theta }}^{T}\}} , which decreases at every iteration. Theorem 4. Any solution obtained by the method satisfies the WTA (weak tree agreement) condition. Theorem 5. For binary MRFs with sub-modular energies, the method computes a globally optimal solution.