WomanStats Project

WomanStats Project

The WomanStats Project is a donor-funded research and database project housed at Brigham Young University that "seeks to collect detailed statistical data on the status of women around the world, and to connect that data with data on the security of states." The WomanStats Database aims to provide a comprehensive compilation of information on the status of women in the world. Coders comb the extant literature and conduct expert interviews to find qualitative and quantitative information on over 300 indicators of women's status in 174 countries with populations of at least 200,000. Access to the online database is free. == History and structure == WomanStats began as an outgrowth of a paper Dr. Valerie M. Hudson (of the Brigham Young University Political Science department) and one of her graduate students, Andrea den Boer, published in International Security on the association between national security and the abnormal sex ratio in Asia. After the success and influence of their first article, (later added as one of their top twenty national security articles of that journal of all time), Hudson and den Boer did further research on the connection between the status of women and national security, but found that there was no single database that covered the range of topics that they needed for their research. Consequently, they began compiling information on variables regarding the status of women around the world. The database was officially formed in 2001 and grew exponentially as it later added more variables. The Project went live on the Internet in July 2007. The principal investigators are: Valerie M. Hudson (International Relations), Bonnie Ballif-Spanvill (Psychology, emeritus), and Chad F. Emmett (Geography) all from Brigham Young University, Mary Caprioli from the University of Minnesota, Duluth (International Relations), Rose McDermott from Brown University (International Relations), Andrea Den Boer from the University of Kent at Canterbury in the United Kingdom (International Relations) and S. Matthew Stearmer from the Ohio State University (Sociology; doctoral student). Approximately a dozen undergraduate and graduate students at Brigham Young University and Texas A&M University work at any one time as coders for the project. The coders take the raw quantitative and qualitative data collected in government reports, news articles, research papers, etc. and sort the applicable information on women into categories. They may also implement scales developed by the principal investigators, or that they (the students) themselves have developed. == Database == As of February 2011, the database has 307 variables, covers 174 nations with populations over 200,000, uses 18,015 sources and contains over 111,000 individual data points. All data is referenced to original sources. Not every variable has information for each country; similarly, not all countries have information for each variable: overall, about 70% of country-variable combinations have information. These database coding gaps exist where information is not available or is incomplete, or variables are not collected and reported by governments or international organizations. At times, information from different sources may be contradictory, and the WomanStats Database records this discrepant information for triangulation purposes. == Users and role of the database == The database is meant to help fill a hole in the extant data on the situation of women around the world. WomanStats data and research has been vetted and/or used by the United Nations, the United States Department of Defense, the Central Intelligence Agency, and the World Bank. Their data and research were also used by the United States Senate Committee on Foreign Relations in crafting the International Violence Against Women’s Act. The Inter-Agency Network on Women and Gender Equality (IANWGE) of the United Nations has stated that the WomanStats project "filled a major gap in the availability of data on women" (2007). Victor Asal and Mitchell Brown, researchers not affiliated with WomanStats, stated in an article published in Politics and Policy that "one of the most significant challenges of cross-national empirical studies of the prevalence of interpersonal violence is the paucity of available data, particularly reliable data," and that "WomanStats has allowed for an important first glimpse at analyzing the factors related to interpersonal violence." They conclude by stating that "Our findings suggest that, in the same way that larger disciplinary resources have invested in interstate and intrastate war, disciplinary resources need to be expended in creating a data set exploring interpersonal violence. Until the rights and the lives of women and children are taken as seriously as the survival of states by more proactively collaborating on projects like WomanStats, we will continue to only have a small lens through which to understand problems like this." Princeton University professor Evan S. Liberman wrote, "Although data on political regimes and group conflict have been in far greater demand by political scientists than data on gender politics and policies, two gender-related databases provide...examples of innovative HIRDs. Both the Womanstats database project (Hudson et al. 2009) and the Research Network on Gender Politics and the State (RNGS) project (McBride et al. 2008) are well-integrated presentations of quantitative and qualitative data characterizing the quality of gender relations around the world and, in particular, analytic descriptions of the treatment of women."." == Research == The research component of WomanStats focuses on exploring the relationship between the situation of women and the behavior and security of states. Current research initiatives include: Exploring the relationship between violent instability and inequity and family law. Examining the effect of polygyny and marriage market dislocations on the rise of suicide terrorism. Documenting discrepancies between laws on the books and cultural practices on the ground concerning gender issues. Investigating how well the situation of women predicts the peacefulness of nations-states, compared to their variables such as democracy, wealth, and civilization. The Project has published articles in International Security, International Studies Quarterly, Peace and Conflict, Journal of Peace Research, Political Psychology, Cumberland Law Review, and World Political Review, and has a forthcoming book from Columbia University Press.

POP-11

POP-11 is a reflective, incrementally compiled programming language with many of the features of an interpreted language. It is the core language of the Poplog programming environment developed originally by the University of Sussex, and recently in the School of Computer Science at the University of Birmingham, which hosts the main Poplog website. POP-11 is an evolution of the language POP-2, developed in Edinburgh University, and features an open stack model (like Forth, among others). It is mainly procedural, but supports declarative language constructs, including a pattern matcher, and is mostly used for research and teaching in artificial intelligence, although it has features sufficient for many other classes of problems. It is often used to introduce symbolic programming techniques to programmers of more conventional languages like Pascal, who find POP syntax more familiar than that of Lisp. One of POP-11's features is that it supports first-class functions. POP-11 is the core language of the Poplog system. The availability of the compiler and compiler subroutines at run-time (a requirement for incremental compiling) gives it the ability to support a far wider range of extensions (including run-time extensions, such as adding new data-types) than would be possible using only a macro facility. This made it possible for (optional) incremental compilers to be added for Prolog, Common Lisp and Standard ML, which could be added as required to support either mixed language development or development in the second language without using any POP-11 constructs. This made it possible for Poplog to be used by teachers, researchers, and developers who were interested in only one of the languages. The most successful product developed in POP-11 was the Clementine data mining system, developed by ISL. After SPSS bought ISL, they renamed Clementine to SPSS Modeler and decided to port it to C++ and Java, and eventually succeeded with great effort, and perhaps some loss of the flexibility provided by the use of an AI language. POP-11 was for a time available only as part of an expensive commercial package (Poplog), but since about 1999 it has been freely available as part of the open-source software version of Poplog, including various added packages and teaching libraries. An online version of ELIZA using POP-11 is available at Birmingham. At the University of Sussex, David Young used POP-11 in combination with C and Fortran to develop a suite of teaching and interactive development tools for image processing and vision, and has made them available in the Popvision extension to Poplog. == Simple code examples == Here is an example of a simple POP-11 program: define Double(Source) -> Result; Source2 -> Result; enddefine; Double(123) => That prints out: 246 This one includes some list processing: define RemoveElementsMatching(Element, Source) -> Result; lvars Index; [[% for Index in Source do unless Index = Element or Index matches Element then Index; endunless; endfor; %]] -> Result; enddefine; RemoveElementsMatching("the", [[the cat sat on the mat]]) => ;;; outputs [[cat sat on mat]] RemoveElementsMatching("the", [[the cat] [sat on] the mat]) => ;;; outputs [[the cat] [sat on] mat] RemoveElementsMatching([[= cat]], [[the cat]] is a [[big cat]]) => ;;; outputs [[is a]] Examples using the POP-11 pattern matcher, which makes it relatively easy for students to learn to develop sophisticated list-processing programs without having to treat patterns as tree structures accessed by 'head' and 'tail' functions (CAR and CDR in Lisp), can be found in the online introductory tutorial. The matcher is at the heart of the SimAgent (sim_agent) toolkit. Some of the powerful features of the toolkit, such as linking pattern variables to inline code variables, would have been very difficult to implement without the incremental compiler facilities.

Portable Format for Analytics

The Portable Format for Analytics (PFA) is a JSON-based predictive model interchange format conceived and developed by Jim Pivarski. PFA provides a way for analytic applications to describe and exchange predictive models produced by analytics and machine learning algorithms. It supports common models such as logistic regression and decision trees. Version 0.8 was published in 2015. Subsequent versions have been developed by the Data Mining Group. As a predictive model interchange format developed by the Data Mining Group, PFA is complementary to the DMG's XML-based standard called the Predictive Model Markup Language or PMML. == Release history == == Data Mining Group == The Data Mining Group is a consortium managed by the Center for Computational Science Research, Inc., a nonprofit founded in 2008. == Examples == reverse array: # reverse input array of doubles input: {"type": "array", "items": "double"} output: {"type": "array", "items": "double"} action: - let: { x : input} - let: { z : input} - let: { l : {a.len: [x]}} - let: { i : l} - while : { ">=" : [i,0]} do: - set : {z : {attr: z, path : [i] , to: {attr : x ,path : [ {"-":[{"-" : [l ,i]},1]}] } } } - set : {i : {-:[i,1]}} - z Bubblesort input: {"type": "array", "items": "double"} output: {"type": "array", "items": "double"} action: - let: { A : input} - let: { N : {a.len: [A]}} - let: { n : {-:[N,1]}} - let: { i : 0} - let: { s : 0.0} - while : { ">=" : [n,0]} do : - set : { i : 0 } - while : { "<=" : [i,{-:[n,1]}]} do : - if: {">": [ {attr: A, path : [i]} , {attr: A, path:[{+:[i,1]}]} ]} then : - set : {s : {attr: A, path: [i]}} - set : {A : {attr: A, path: [i], to: {attr: A, path:[{+:[i,1]}]} } } - set : {A : {attr: A, path: [{+:[i,1]}], to: s }} - set : {i : {+:[i,1]}} - set : {n : {-:[n,1]}} - A == Implementations == Hadrian (Java/Scala/JVM) - Hadrian is a complete implementation of PFA in Scala, which can be accessed through any JVM language, principally Java. It focuses on model deployment, so it is flexible (can run in restricted environments) and fast. Titus (Python 2.x) - Titus is a complete, independent implementation of PFA in pure Python. It focuses on model development, so it includes model producers and PFA manipulation tools in addition to runtime execution. Currently, it works for Python 2. Titus 2 (Python 3.x) - Titus 2 is a fork of Titus which supports PFA implementation for Python 3. Aurelius (R) - Aurelius is a toolkit for generating PFA in the R programming language. It focuses on porting models to PFA from their R equivalents. To validate or execute scoring engines, Aurelius sends them to Titus through rPython (so both must be installed). Antinous (Model development in Jython) - Antinous is a model-producer plugin for Hadrian that allows Jython code to be executed anywhere a PFA scoring engine would go. It also has a library of model producing algorithms.

Responsible AI Safety and Education Act

The Responsible AI Safety and Education Act (RAISE Act) is a New York State law that imposes transparency, safety, and reporting requirements on developers of large frontier artificial intelligence models. The law was signed by Governor Kathy Hochul on December 19, 2025. It was sponsored by State Senator Andrew Gounardes and Assemblymember Alex Bores. The RAISE Act is the second U.S. state law to regulate frontier AI model developers, following California's Transparency in Frontier Artificial Intelligence Act (TFAIA), which was signed in September 2025. Hochul signed the bill on the condition that the legislature would pass chapter amendments to bring the law closer to the California model. The amending bills (A9449/S8828) were introduced in January 2026; as of February 2026 they remain in committee, though the Governor's office and legal commentators treat the agreed-upon amendments as representing the final form of the law. == Provisions == The following describes the RAISE Act as it is expected to operate after the agreed-upon chapter amendments take effect. The law is expected to take effect on January 1, 2027. === Scope === The law applies to "large frontier developers," defined as companies with annual revenues exceeding $500 million that develop "frontier models," which are foundation models trained using more than 1026 floating-point operations (FLOPs). The version passed by the legislature in June 2025 had instead defined large developers based on having spent over $100 million in aggregate compute costs, and also included a provision prohibiting deployment of frontier models posing "unreasonable risk of critical harm"; both were removed as part of the negotiations between Hochul and the legislature. Accredited colleges and universities engaged in academic research are exempt, as is the state's Empire AI consortium. === Safety and transparency framework === Large frontier developers must write, implement, and publicly publish a "frontier AI framework" describing how they assess and mitigate catastrophic risks, secure unreleased model weights against unauthorized access, use third-party evaluators, govern internal use of frontier models, and respond to safety incidents. The framework must describe these measures "in detail," a requirement that goes beyond the California TFAIA's requirement to describe a developer's "approach." The framework must be reviewed at least annually, and material modifications must be published with justification within 30 days. Before or concurrently with deploying a new or substantially modified frontier model, developers must publish a transparency report including the model's release date, supported languages and output modalities, intended uses, and any restrictions on use. Large frontier developers must additionally include summaries of catastrophic risk assessments and the extent of third-party involvement. === Catastrophic risk and incident reporting === The law defines "catastrophic risk" as a foreseeable and material risk that a frontier model will contribute to the death of or serious injury to more than 50 people, or more than $1 billion in property damage, arising from a frontier model providing expert-level assistance in creating chemical, biological, radiological, or nuclear weapons; engaging in cyberattacks or conduct equivalent to crimes such as murder, assault, or theft without meaningful human oversight; or evading the control of its developer or user. Loss of equity value is explicitly excluded from the definition of property damage. "Critical safety incidents" include unauthorized access to model weights resulting in death or injury, materialization of a catastrophic risk, loss of control of a frontier model causing death or injury, and a model using deceptive techniques to subvert developer controls outside of an evaluation context in a manner that increases catastrophic risk. Frontier developers must report critical safety incidents within 72 hours, or within 24 hours if the incident poses an imminent risk of death or serious physical injury. === Enforcement === The chapter amendments establish a new office within the New York State Department of Financial Services to oversee compliance, receive incident reports, and publish annual reports on AI safety beginning in 2028. Large frontier developers must file disclosure statements with this office and pay pro rata assessments to fund its operations. The New York Attorney General may bring civil actions, with penalties of up to $1 million for a first violation and $3 million for subsequent violations. The version passed by the legislature in June 2025 had set penalties at up to $10 million and $30 million respectively. The law does not create a private right of action. == Legislative history == The bill was introduced in the Assembly on March 5, 2025, by Assemblymember Alex Bores, and in the Senate on March 27, 2025, by Senator Andrew Gounardes. After a series of amendments, the legislature passed the bill in June 2025. Governor Hochul did not immediately sign the bill, using nearly all the time available under New York law before acting; had she not signed by the end of 2025, the bill would have been pocket vetoed. The tech industry lobbied against the bill during this period, and Hochul initially proposed a near-complete rewrite modeled on California's TFAIA. Legislators resisted the extent of the changes, and the two sides ultimately agreed on a version that used the California law as a base but preserved several provisions that went beyond it, including the 72-hour incident reporting timeline and the creation of a dedicated enforcement office. Hochul signed the original bill (S6953-B/A6453-B) on December 19, 2025, with the legislature committing to pass chapter amendments formalizing the agreed changes in the January 2026 session. The amending bills (A9449 in the Assembly, S8828 in the Senate) were introduced on January 6 and January 8, 2026. OpenAI and Anthropic expressed support for the law. Anthropic's head of external affairs Sarah Heck said the two state laws "should inspire Congress to build on them." The super PAC network Leading the Future, backed by Andreessen Horowitz and OpenAI president Greg Brockman, subsequently announced plans to challenge Bores in a future election. == Federal preemption debate == Hochul signed the RAISE Act eight days after President Donald Trump issued an executive order on December 11, 2025, directing the Department of Justice to challenge state AI laws deemed to conflict with a "minimally burdensome" national AI policy. On January 9, 2026, the Department of Justice announced the establishment of an AI Litigation Task Force as called for by the executive order. The executive order also threatened states with loss of certain federal broadband funding if their AI laws were found to be onerous. Legal commentators have noted several potential avenues for federal challenge, including arguments that the law constitutes compelled speech, violates the dormant Commerce Clause by creating a patchwork of state regulations, or is preempted by federal AI policy. == Comparison with California's TFAIA == The RAISE Act was designed to align with California's Transparency in Frontier Artificial Intelligence Act, signed on September 29, 2025. Both laws use the same 1026 FLOP threshold to define frontier models and the same $500 million revenue threshold to define large developers. Both require public safety frameworks, transparency reports, and incident reporting. The RAISE Act's 72-hour incident reporting window is stricter than the TFAIA's 15-day window, though both require faster reporting for incidents posing imminent physical risk (24 hours under the RAISE Act, immediate under the TFAIA). The RAISE Act establishes a dedicated enforcement office within the Department of Financial Services, whereas California routes reports through the Office of Emergency Services. The RAISE Act requires developers to describe their safety measures "in detail" and how they "handle" various risks, whereas the TFAIA requires developers to describe their "approach."

LipNet

LipNet is a deep neural network for audio-visual speech recognition (ASVR). It was created by University of Oxford researchers Yannis Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. The researchers stated that could match mouth movements to text with 93 percent accuracy, though it was criticized for its test using a limited dataset of words and grammar. It was used in Nvidia's autonomous "backseat driver" prototype Co-Pilot.

Example-based machine translation

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning. == Translation by analogy == At the foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead, it is founded on the belief that people translate by first decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy is encoded to example-based machine translation through the example translations that are used to train such a system. Other approaches to machine translation, including statistical machine translation, also use bilingual corpora to learn the process of translation. == History == Example-based machine translation was first suggested by Makoto Nagao in 1984. He pointed out that it is especially adapted to translation between two totally different languages, such as English and Japanese. In this case, one sentence can be translated into several well-structured sentences in another language, therefore, it is no use to do the deep linguistic analysis characteristic of rule-based machine translation. == Example == Example-based machine translation systems are trained from bilingual parallel corpora containing sentence pairs like the example shown in the table above. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a minimal pair, meaning that the sentences vary by just one element. These sentences make it simple to learn translations of portions of a sentence. For example, an example-based machine translation system would learn three units of translation from the above example: How much is that X ? corresponds to Ano X wa ikura desu ka. red umbrella corresponds to akai kasa small camera corresponds to chiisai kamera Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences: President Kennedy was shot dead during the parade. and The convict escaped on July 15th., then we could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences. == Phrasal verbs == Example-based machine translation is best suited for sub-language phenomena like phrasal verbs. Phrasal verbs have highly context-dependent meanings. They are common in English, where they comprise a verb followed by an adverb and/or a preposition, which are called the particle to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language. As an example, consider the phrasal verb "put on" and its Hindustani translation. It may be used in any of the following ways: Ram put on the lights. (Switched on) (Hindustani translation: Jalana) Ram put on a cap. (Wear) (Hindustani translation: Pahenna)

Production Rule Representation

The Production Rule Representation (PRR) is a proposed standard of the Object Management Group (OMG) that aims to define a vendor-neutral model for representing production rules within the Unified Modeling Language (UML), specifically for use in forward-chaining rule engines. == History == The OMG set up a Business Rules Working Group in 2002 as the first standards body to recognize the importance of the "Business Rules Approach". It issued 2 main RFPs in 2003 – a standard for modeling production rules (PRR), and a standard for modeling business rules as business documentation (BSBR, now SBVR). PRR was mostly defined by and for vendors of Business Rule Engines (BREs) (sometimes termed Business Rules Engine(s), like in Wikipedia). Contributors have included all the major BRE vendors, members of RuleML, and leading UML vendors. == Evolution == The PRR RFP originally suggested that PRR use a combination of UML OCL and Action Semantics for rule conditions and actions. However, expecting modellers to learn 2 relatively obscure UML languages in order to define a production rule proved unpalatable. Therefore, PRR OCL was defined that included OCL extensions for simple rule actions (as well as external functions). PRR OCL is currently considered "non-normative" i.e. is not part of the PRR standard per se. PRR beta applies just to a PRR Core that excludes an explicit expression language. The PRR RFP envisaged covering both forward and backward chaining rule engines. However, the lack of vendor support for / interest in backward chaining caused this to be revise to forward chaining and "sequential" semantics. The latter is simply the scripting mode provided by many BPM tools, where rules are listed and executed sequentially as if programmed. This provides PRR with better compatibility with typical BPM scripting engines (and acknowledges the fact that most BREs today support a "sequential" mode of operation, improving performance in some circumstances). == Status == PRR is currently at version 1.0.