AI Essay Verification

AI Essay Verification — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Non-separable wavelet

    Non-separable wavelet

    Non-separable wavelets are multi-dimensional wavelets that are not directly implemented as tensor products of wavelets on some lower-dimensional space. They have been studied since 1992. They offer a few important advantages. Notably, using non-separable filters leads to more parameters in design, and consequently better filters. The main difference, when compared to the one-dimensional wavelets, is that multi-dimensional sampling requires the use of lattices (e.g., the quincunx lattice). The wavelet filters themselves can be separable or non-separable regardless of the sampling lattice. Thus, in some cases, the non-separable wavelets can be implemented in a separable fashion. Unlike separable wavelet, the non-separable wavelets are capable of detecting structures that are not only horizontal, vertical or diagonal (show less anisotropy). == Examples == Red-black wavelets Contourlets Shearlets Directionlets Steerable pyramids Non-separable schemes for tensor-product wavelets

    Read more →
  • Artificial intelligence in Brazilian industry

    Artificial intelligence in Brazilian industry

    In 2022, 16.9% (1,620) of the 9,586 Brazilian industrial companies with 100 or more employees used artificial intelligence in their operations Among the companies that used AI, the areas of administration (73.8%), product project development (65.9%), processes, services and marketing (65.1%) were those that used it the most, followed by the areas of production (56.4%) and logistics (48.4%). == Current scenario == === Adoption in Brazilian industrial sectors === In senior management, the majority (56%) of executives have a long-term vision for its use. The study also shows that IT, Innovation, and Marketing are the areas where AI use is most widespread, and that 43% of companies are developing or adapting the algorithms they use. The majority of large institutions that reported some type of AI use purchased these solutions from other companies (76%). Some factors for the adoption of artificial intelligence in companies include the establishment of an autonomous strategy by the company (87.0%), and the influence of suppliers and/or customers (63.0%) and the main difficulties in using technologies were high costs (80.8%), lack of qualified personnel in the company (54.6%) and excessive economic risks (49.5%). Three variables are considered the most relevant to explain the option to use AI: the implementation of a digital security policy, the size of companies with 250 or more employees and the characteristics of the company related to information and communication. When analyzing AI use by company size in Brazil, large companies have the highest proportion of AI use, mainly due to their investment capacity and technology experimentation. However, when comparing Brazil and Europe, indicators show an acceleration in AI use among large European companies, while in Brazil the situation remains stable. In 2023, 30% of large companies in the European bloc used some type of AI, a figure that rose to 41% in 2024, while in Brazil these proportions were 41% in 2023 and 38% in 2024. === Workforce === The challenge of upskilling begins with employees who are capable of understanding recent technological changes. Similarly, companies must create the environment and conditions for workforce development conducive to innovation, and universities must be prepared to provide knowledge aligned with the transition process, which in turn must be supported by public policies. The concern with training a specialized workforce in AI can be seen in the low number of graduates and PhDs in computer science and computer engineering in Brazil, compared to the number shown in other countries. As recorded in the document Recommendations for the Advancement of Artificial Intelligence in Brazil, 2019 data from the Coordination for the Improvement of Higher Education Personnel (CAPES) indicate that "the number of PhDs graduated annually in computing remained below 400 in 2016, and is not expected to have increased during the Covid-19 pandemic" (ABC, 2023). In the United States, by contrast, the number of PhDs graduated in these two areas has remained around 1,800 for the past 11 years, and during this period, the number of PhDs specializing in AI jumped from 10% to 19%. Based on data from the CNPq Lattes Platform (October 2019), it is possible to observe that the number of professionals in the AI field in Brazil is 4,429 specialists. This is still a small number compared to the 415,166 IT jobs in the country's business sector alone. === R&D, scientific production and integration with industry === China and the United States lead in the number of publications. These two countries are followed by the G7 members: India, Austria, South Korea, and Spain. Brazil appears in the next group, alongside the Netherlands, Russia, Indonesia, and Ireland. Regarding the promotion of research and technologies related to AI, public entities such as the Coordination for the Improvement of Higher Education Personnel (Capes) and the National Council for Scientific and Technological Development (CNPq) stood out as the main funders. Currently, different countries and territories have been promoting the development of Artificial Intelligence (AI). In the Brazilian case, one of the main initiatives is the creation of Engineering Research Centers/Applied Research Centers (CPE/CPA) in AI by the São Paulo Research Foundation (FAPESP), in collaboration with the Ministry of Science, Technology and Innovation (MCTI), the Ministry of Communications (MC) and the Brazilian Internet Steering Committee (CGI.br). In terms of the number of patents filed and the volume of investments, the leading nations in AI are the United States, China, France, Germany, the United Kingdom, Russia, India, Switzerland, Japan, South Korea, the Netherlands, Sweden, Finland, Ireland, Singapore, Canada, Israel, and Italy. Brazil appears among the top twenty countries in some rankings, mainly due to its good number of publications (approximately 10% of the number of articles published by the United States). The US is home to approximately 60% of the world's top AI researchers, followed by China (11%), Europe (10%), and Canada (6%). To change this scenario, in August 2024, the Brazilian government announced an investment of R$23 billion until 2028 in artificial intelligence, seeking to “transform the country into a global reference in innovation”. == Future challenges == The Organization for Economic Cooperation and Development (2020) report highlighted three factors that hinder the digital transformation journey and application of AI in Brazil: insufficient infrastructure, high costs due to the tax system, and financial limitations, such as limited access to financing. The costs of adopting technology, its incompatibility with the business, and the lack of training also represent obstacles that Brazilian industry must overcome. There are also inherent obstacles for companies. A McKinsey review emphasizes that once a company chooses one or more sectors to focus on, it must select specific applications. Buyers aren't interested in artificial intelligence simply because it's a breakthrough technology; they want AI to generate a good return on investment, whether by solving specific problems, saving money, or increasing sales. If an AI vendor tried to offer a horizontal solution, the value proposition might not be as compelling. Part of the solution to Brazil's technological backwardness involves building an ecosystem fueled by private institutions, universities, and governments.

    Read more →
  • CENDI

    CENDI

    CENDI (Commerce, Energy, NASA, Defense Information Managers Group) is an interagency group of senior Scientific and Technical Information (STI) managers from 14 United States federal agencies. CENDI managers cooperate by exchanging information and ideas, collaborating to address common issues, and undertaking joint initiatives. CENDI's accomplishments range from impacting federal information policy to educating a broad spectrum of stakeholders on all aspects of federal STI systems, including its value to research and the taxpayer, and to operational improvements in agency and interagency STI operations. == History == CENDI traces its roots to the Committee on Scientific and Technical Information (COSATI) of the Federal Council on Science and Technology. COSATI was established in the early 1960s to coordinate the management of the results from the U.S. government's increasing commitment to scientific research and technology development. The scientific and technical information (STI) managers of the government's major research and development (R&D) agencies worked within COSATI to standardize guidelines for cataloging and indexing technical reports. COSATI ceased formal operations in the early 1970s. To continue the cooperation begun under COSATI, managers of agency STI programs from Commerce (National Technical Information Service), Energy (Office of Scientific and Technical Information), NASA (HQ/STI Division), and Defense (Defense Technical Information Center) began meeting periodically to discuss common topics and stimulate more effective cooperation. In 1985, a Memorandum of Understanding was signed by the four charter agencies and CENDI was established. From this small core of STI managers, CENDI has grown to its current membership, which represents the major science agencies, the national libraries, and agencies involved in the dissemination and long-term management of scientific and technical information. The vision of CENDI is to facilitate cooperative enterprise where capabilities are shared and challenges are faced together so that the sum of the accomplishments is greater than each individual agency can achieve on its own amongst federal STI agencies. The abbreviation CENDI refers to the "Commerce, Energy, NASA, Defense Information Managers Group". == Membership == New members from other federal R&D information organizations may be admitted by unanimous agreement of the members. However, it is the intent of the group that membership in CENDI should remain small and focus on organizations with STI or supporting responsibilities. Each agency provides funding to CENDI. == Members == The members of CENDI are: Defense Technical Information Center (United States Department of Defense) Office of Research and Development and Office of Environmental Information (United States Environmental Protection Agency) Government Printing Office Library of Congress NASA Scientific and Technical Information Program National Agricultural Library (United States Department of Agriculture) National Archives and Records Administration National Library of Education (United States Department of Education) National Library of Medicine (United States Department of Health and Human Services) National Science Foundation National Technical Information Service (United States Department of Commerce) National Transportation Library (United States Department of Transportation) Office of Scientific and Technical Information (United States Department of Energy) USGS/Biological Resources Discipline (United States Department of the Interior) == Mission and operation == CENDI's mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information support systems. In fulfilling its mission, CENDI agencies play an important role in addressing science- and technology-based national priorities and strengthening U.S. competitiveness. === Goals === STI Coordination and Leadership: Provide coordination and leadership for information exchange on important STI policy issues. Improvement of STI Systems: Promote the development of improved STI systems through the productive interrelationship of content and technology. STI Understanding: Promote better understanding of STI and STI management. === Principals and Alternates === CENDI is made up of senior federal STI managers and each organization appoints a Principal representative. This person is the point of contact for that organization within CENDI. Each Principal has an Alternate. The Principals and Alternates comprise the main group that meets on a regular basis, usually every other month. === Secretariat === A Tennessee-based information management company, -- Information International Associates, Inc., currently serves as the CENDI Secretariat. The Secretariat provides day-to-day operations to CENDI. The Secretariat prepares the necessary materials for the Principals' meetings, provides support for the working group and task group meetings, assists in developing papers, and maintains the CENDI files and outreach tools. === Task Groups and Working Groups === The chair(s) of a working group is appointed by the Principals and has the overall responsibility for the group's activities. The Secretariat provides support at the request of the Working Group chair(s). The Working Groups and Task Groups that are currently operating are: Copyright and Intellectual Property Working Group Distribution Markings Task Group Digital Preservation Task Group Digitization Specifications Task Group Image Metadata Task Group Science.gov (see below) STI Policy Working Group Terminology Resources Task Group === Science.gov and Worldwidescience.org === In 2001, in response to the April 2001 workshop on "Strengthening the Public Information Infrastructure for Science", and taking into consideration a request from Firstgov (now USA.gov) to develop specialized topical portals, CENDI formed an alliance to develop an interagency website for access to STI. This website, called Science.gov, is a one-stop source of STI, including both selected, authoritative government websites and deep Web databases of technical reports, journal articles, conference proceedings, and other published materials. Through the volunteer efforts of members and involving over 100 staff, content and architecture is developed for the site. The Science.gov website is hosted by the Department of Energy (DOE) Office of Scientific and Technical Information (OSTI). The site was formally launched in December 2002. As a result of the success of Science.gov, under DOE leadership and in cooperation with the International Council of Scientific and Technical Information, a worldwide coordination across national portals called WorldWideScience was launched in 2008. === Work with non-member organizations === CENDI works with several cooperating non-member organizations on a regular basis. These agencies are in academia, federal government, legal and policy analysis, international, non-governmental, and private organizations.

    Read more →
  • Information school

    Information school

    Information school (sometimes abbreviated I-school or iSchool) is a university-level institution committed to understanding the role of information in nature and human endeavors. Synonyms include school of information, department of information studies, or information department. Information schools faculty conduct research into the fundamental aspects of information and related technologies. In addition to granting academic degrees, information schools educate information professionals, researchers, and scholars for an increasingly information-driven world. Information school can also refer, in a more restricted sense, to the members of the iSchools organization (formerly the "iSchools Project"), as governed by the iCaucus. Members of this group share a fundamental interest in the relationships between people, information, technology, and science. These schools, colleges, and departments have been either newly established or have evolved from programs focused on information systems, library science, informatics, computer science, library and information science and information science. Information schools promote an interdisciplinary approach to understanding the opportunities and challenges of information management, with a core commitment to concepts like universal access and user-centered organization of information. The field is concerned broadly with questions of design and preservation across information spaces, from digital and virtual spaces like online communities, the World Wide Web, and databases to physical spaces such as libraries, museums, archives, and other repositories. Information school degree programs include course offerings in areas such as data science, information architecture, design, economics, policy, retrieval, security, and telecommunications; knowledge management, user experience design, and usability; conservation and preservation, including digital preservation; librarianship and library administration; the sociology of information; and human–computer interaction.

    Read more →
  • Robotic process automation

    Robotic process automation

    Robotic process automation (RPA) is a form of business process automation that is based on software robots (bots) or artificial intelligence (AI) agents. RPA should not be confused with artificial intelligence as it is based on automation technology following a predefined workflow. It is sometimes referred to as software robotics (not to be confused with robot software). In traditional workflow automation tools, a software developer produces a list of actions to automate a task and interface to the back end system using internal application programming interfaces (APIs) or dedicated scripting language. In contrast, RPA systems develop the action list by watching the user perform that task in the application's graphical user interface (GUI) and then perform the automation by repeating those tasks directly in the GUI. This can lower the barrier to the use of automation in products that might not otherwise feature APIs for this purpose. RPA tools have strong technical similarities to graphical user interface testing tools. These tools also automate interactions with the GUI, and often do so by repeating a set of demonstration actions performed by a user. RPA tools differ from such systems in that they allow data to be handled in and between multiple applications, for instance, receiving email containing an invoice, extracting the data, and then typing that into a bookkeeping system. == Historic evolution == As a form of automation, the concept has been around for a long time in the form of screen scraping, so long that to early PC users the reminder of it often blurs with the idea of malware infection. Yet compared to screen scraping, RPA is much more extensible, consisting of API integration into other enterprise applications, connectors into ITSM systems, terminal services and even some types of AI (e.g. machine learning) services such as image recognition. It is considered to be a significant technological evolution in the sense that new software platforms are emerging which are sufficiently mature, resilient, scalable and reliable to make this approach viable for use in large enterprises (who would otherwise be reluctant due to perceived risks to quality and reputation). == Use == The hosting of RPA services also aligns with the metaphor of a software robot, with each robotic instance having its own virtual workstation, much like a human worker. The robot uses keyboard and mouse controls to take actions and execute automations. Normally, all of these actions take place in a virtual environment and not on screen; the robot does not need a physical screen to operate, rather it interprets the screen display electronically. The scalability of modern solutions based on architectures such as these owes much to the advent of virtualization technology, without which the scalability of large deployments would be limited by the available capacity to manage physical hardware and by the associated costs. The implementation of RPA in business enterprises has shown dramatic cost savings when compared to traditional non-RPA solutions. === RPA actual use === Banking and finance process automation Mortgage and lending processes Customer care automation eCommerce merchandising operations Social media marketing Optical character recognition applications Data extraction process Fixed automation process Manual and repetitive tasks automation Voice recognition and digital dictation software linked to join up business processes for straight through processing without manual intervention Specialised remote infrastructure management software featuring automated investigation and resolution of problems, using robots for the first line IT support Chatbots used by internet retailers and service providers to service customer requests for information. Also used by companies to service employee requests for information from internal databases Presentation layer automation software, increasingly used by business process outsourcers to displace human labour Interactive voice response (IVR) systems incorporating intelligent interaction with callers == Impact on employment == According to Harvard Business Review, most operations groups adopting RPA have promised their employees that automation would not result in layoffs. Instead, workers have been redeployed to do more interesting work. One academic study highlighted that knowledge workers did not feel threatened by automation: they embraced it and viewed the robots as team-mates. The same study highlighted that, rather than resulting in a lower "headcount", the technology was deployed in such a way as to achieve more work and greater productivity with the same number of people. Conversely, however, some analysts proffer that RPA represents a threat to the business process outsourcing (BPO) industry. The thesis behind this notion is that RPA will enable enterprises to "repatriate" processes from offshore locations into local data centers, with the benefit of this new technology. The effect, if true, will be to create high-value jobs for skilled process designers in onshore locations (and within the associated supply chain of IT hardware, data center management, etc.) but to decrease the available opportunity to low-skilled workers offshore. On the other hand, this discussion appears to be healthy ground for debate as another academic study was at pains to counter the so-called "myth" that RPA will bring back many jobs from offshore. === Impact on society === Academic studies project that RPA, among other technological trends, is expected to drive a new wave of productivity and efficiency gains in the global labour market. Although not directly attributable to RPA alone, Oxford University conjectures that up to 35% of all jobs might be automated by 2035. There are geographic implications to the trend in robotic automation. In the example above where an offshored process is "repatriated" under the control of the client organization (or even displaced by a business process outsourcer) from an offshore location to a data centre, the impact will be a deficit in economic activity to the offshore location and an economic benefit to the originating economy. On this basis, developed economies – with skills and technological infrastructure to develop and support a robotic automation capability – can be expected to achieve a net benefit from the trend. In a TEDx talk hosted by University College London (UCL), entrepreneur David Moss explains that digital labour in the form of RPA is likely to revolutionize the cost model of the services industry by driving the price of products and services down, while simultaneously improving the quality of outcomes and creating increased opportunity for the personalization of services. In a separate TEDx in 2019 talk, Japanese business executive, and former CIO of Barclays bank, Koichi Hasegawa noted that digital robots can be a positive effect on society if we start using a robot with empathy to help every person. He provides a case study of the Japanese insurance companies – Sompo Japan and Aioi – both of whom introduced bots to speed up the process of insurance pay-outs in past massive disaster incidents. Meanwhile, Professor Willcocks, author of the LSE paper cited above, speaks of increased job satisfaction and intellectual stimulation, characterising the technology as having the ability to "take the robot out of the human", a reference to the notion that robots will take over the mundane and repetitive portions of people's daily workload, leaving them to be used in more interpersonal roles or to concentrate on the remaining, more meaningful, portions of their day. It was also found in a 2021 study observing the effects of robotization in Europe that, the gender pay gap increased at a rate of .18% for every 1% increase in robotization of a given industry. == Unassisted RPA == Unassisted RPA, or RPAAI, is the next generation of RPA related technologies. Technological advancements around artificial intelligence allow a process to be run on a computer without needing input from a user. == Hyperautomation == Hyperautomation is the application of advanced technologies like RPA, artificial intelligence, machine learning (ML) and process mining to augment workers and automate processes in ways that are significantly more impactful than traditional automation capabilities. Hyperautomation is the combination of technologies that allow faster application authorship (like low-code and no-code) with automation technologies that coordinate different worker types (i.e. human and artificial) for intelligent and strategic workflow optimization. Gartner's report notes that this trend was kicked off with robotic process automation (RPA). The report notes that, "RPA alone is not hyperautomation. Hyperautomation requires a combination of tools to help support replicating pieces of where the human is involved in a task." == Outsourcing == Back office clerical processes outsourced by large organisations

    Read more →
  • Ballin' (Mustard and Roddy Ricch song)

    Ballin' (Mustard and Roddy Ricch song)

    "Ballin'" is a song by American record producer Mustard featuring American rapper Roddy Ricch. The track was released as the third single from Mustard's third studio album, Perfect Ten, on August 20, 2019, though it was available as early as the end of June 2019. The song and its accompanying video received acclaim from music critics, with Complex magazine naming it the Best Song of 2019. It peaked at number 11 on the Billboard Hot 100, marking Mustard's highest charting song in the US. The song received a nomination for Best Rap/Sung Performance at the 2020 Grammy Awards, making it the first time Ricch has been nominated for a Grammy and Mustard's first nomination as an artist. Later in 2019, the two released another collaboration, "High Fashion". == Background == Roddy Ricch revealed in an interview that the song was composed in late 2018, but Mustard wanted to keep it for his album, Perfect Ten, which he was still working on. The song was later included on the album, released in June 2019. Ricch said he knew the song was "hard enough" the first time he heard it, while Mustard proclaimed "this is going to be the one". == Composition and lyrics == "Ballin'" has a "rags to riches" theme. In its intro, the song samples girl group 702's 1997 top ten hit "Get It Together". The song features a "smooth, bouncy beat", with Roddy Ricch rapping about his come-up and ascent in the music industry. In the first verse, Ricch salutes fellow Los Angeles rapper, the late Nipsey Hussle and his girlfriend Lauren London: "I run these racks up with my queen like London and Nip". The line simultaneously references Ricch and Hussle's collaboration "Racks in the Middle", released earlier in 2019 as Hussle's last single before his death. Billboard's Heran Mamo noted that "in typical Hussle fashion", Roddy Ricch "narrates his life's hardships before delving into his newfound treasures". == Critical reception == The song was widely acclaimed by music critics. Charles Holmes of Rolling Stone magazine called it "a song of the year contender", while Complex and Billboard both named it as a "standout track" on the album. Pitchfork magazine included "Ballin'" in its list of The Best Rap Songs of 2019 and called it "the centerpiece of Mustard's underappreciated album Perfect Ten". Complex later named it the Best Song of 2019, calling it "a feel-good anthem so infectious you'll need antibiotics just to stop running it back". == Chart performance == "Ballin'" was at the time Mustard's highest charting song in the US, peaking at number 11 on the Billboard Hot 100. It was also Roddy Ricch's highest charting song, until he surpassed it a week later, with the release of his album track "The Box", which eventually reached number 1 on the chart. It reached number one on Billboard's Rhythmic Songs chart, becoming Mustard's second number one following "Pure Water" and Ricch's first number one. The song also topped the Rap Airplay chart. == Music video == The music video for the track was teased by Mustard on his Instagram page on September 29, 2019. The music video for the track was eventually released on October 2, 2019 to critical acclaim. The video features Mustard and Roddy Ricch driving a Lamborghini Aventador in Los Angeles, where they both are from, playing poker in a casino, and going to a strip club. This is contrasted with scenes in which Mustard and Roddy Ricch as children play cards with Monopoly money and playing with miniature toy Lamborghinis together, aspiring for wealth and luxury, representing how they went from "rags to riches". The video also pays tribute to rapper Nipsey Hussle, who had been killed a few months ago. == Live performances == On December 16, 2019, Roddy Ricch performed the song live, alongside an 8-piece orchestra, at Peppermint Club in Los Angeles for Audiomack's Trap Symphony series. Along with Mustard, he performed it at The Pop Out: Ken & Friends on June 19, 2024. == Other uses == The song can be heard on "Elyse's Skit", track 10 off Roddy Ricch's debut album Please Excuse Me for Being Antisocial. In the skit, which is an actual voicenote recording, the mother of a woman named Elyse sends her daughter a voicenote, with "Ballin'" playing in the background, while the mother proceeds to say "I can't get that damn song out my head", jokingly calling it "inappropriate music". Ricch called the skit "something natural". In 2023, AI covers of the song using models based on pop culture characters and real-world celebrities gained viral popularity. == Awards and nominations == 62nd Annual Grammy Awards == Charts == == Certifications ==

    Read more →
  • Leiden algorithm

    Leiden algorithm

    The Leiden algorithm is a community detection algorithm developed by Traag et al at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity. == Improvement over Louvain method == Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm: a local node moving step (though, the method by which nodes are considered in Leiden is more efficient) and a graph aggregation step. However, to address the issues with poorly-connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected. Consider, for example, the following graph: Three communities are present in this graph (each color represents a community). Additionally, the center "bridge" node (represented with an extra circle) is a member of the community represented by blue nodes. Now consider the result of a node-moving step which merges the communities denoted by red and green nodes into a single community (as the two communities are highly connected): Notably, the center "bridge" node is now a member of the larger red community after node moving occurs (due to the greedy nature of the local node moving algorithm). In the Louvain method, such a merging would be followed immediately by the graph aggregation phase. However, this causes a disconnection between two different sections of the community represented by blue nodes. In the Leiden algorithm, the graph is instead refined: The Leiden algorithm's refinement step ensures that the center "bridge" node is kept in the blue community to ensure that it remains intact and connected, despite the potential improvement in modularity from adding the center "bridge" node to the red community. == Graph components == Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph. === Vertices and edges === A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let V {\displaystyle V} be the set of vertices, and E {\displaystyle E} be the set of edges: V := { v 1 , v 2 , … , v n } E := { e i j , e i k , … , e k l } {\displaystyle {\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}} where e i j {\displaystyle e_{ij}} is the directed edge from vertex v i {\displaystyle v_{i}} to vertex v j {\displaystyle v_{j}} . We can also write this as an ordered pair: e i j := ( v i , v j ) {\displaystyle {\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}} === Community === A community is a unique set of nodes: C i ⊆ V C i ⋂ C j = ∅ ∀ i ≠ j {\displaystyle {\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}} and the union of all communities must be the total set of vertices: V = ⋃ i = 1 C i {\displaystyle {\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}} === Partition === A partition is the set of all communities: P = { C 1 , C 2 , … , C n } {\displaystyle {\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}} == Partition quality == How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities. === Modularity === Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for an adjacency matrix, A, as: Q = 1 2 m ∑ i j ( A i j − k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q={\frac {1}{2m}}\sum _{ij}(A_{ij}-{\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: A i j {\displaystyle A_{ij}} represents the edge weight between nodes i {\displaystyle i} and j {\displaystyle j} ; see Adjacency matrix; k i {\displaystyle k_{i}} and k j {\displaystyle k_{j}} are the sum of the weights of the edges attached to nodes i {\displaystyle i} and j {\displaystyle j} , respectively; m {\displaystyle m} is the sum of all of the edge weights in the graph; c i {\displaystyle c_{i}} and c j {\displaystyle c_{j}} are the communities to which the nodes i {\displaystyle i} and j {\displaystyle j} belong; and δ {\displaystyle \delta } is Kronecker delta function: δ ( c i , c j ) = { 1 if c i and c j are the same community 0 otherwise {\displaystyle {\begin{aligned}\delta (c_{i},c_{j})&={\begin{cases}1&{\text{if }}c_{i}{\text{ and }}c_{j}{\text{ are the same community}}\\0&{\text{otherwise}}\end{cases}}\end{aligned}}} === Reichardt Bornholdt Potts Model (RB) === One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB). This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition. This model introduces a resolution parameter γ {\displaystyle \gamma } and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as: Q = ∑ i j ( A i j − γ k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q=\sum _{ij}(A_{ij}-\gamma {\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: γ {\displaystyle \gamma } represents a linear resolution parameter === Constant Potts Model (CPM) === Another metric similar to RB is the Constant Potts Model (CPM). This metric also relies on a resolution parameter γ {\displaystyle \gamma } The quality function is defined as: H = − ∑ i j ( A i j w i j − γ ) δ ( c i , c j ) {\displaystyle H=-\sum _{ij}(A_{ij}w_{ij}-\gamma )\delta (c_{i},c_{j})} === Understanding Potts Model resolution parameters/Resolution limit === Typically Potts models such as RB or CPM include a resolution parameter in their calculation. Potts models are introduced as a response to the resolution limit problem that is present in modularity maximization based community detection. The resolution limit problem is that, for some graphs, maximizing modularity may cause substructures of a graph to merge and become a single community and thus smaller structures are lost. These resolution parameters allow modularity adjacent methods to be modified to suit the requirements of the user applying the Leiden algorithm to account for small substructures at a certain granularity. The figure on the right illustrates why resolution can be a helpful parameter when using modularity based quality metrics. In the first graph, modularity only captures the large scale structures of the graph; however, in the second example, a more granular quality metric could potentially detect all substructures in a graph. == Algorithm == The Leiden algorithm starts with a graph of disorganized nodes (a) and sorts it by partitioning them to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). The method it uses is similar to the Louvain algorithm, except that after moving each node it also considers that node's neighbors that are not already in the community it was placed in. This process results in our first partition (b), also referred to as P {\displaystyle {\mathcal {P}}} . Then the algorithm refines this partition by first placing each node into its own individual community and then moving them from one community to another to maximize modularity. It does this iteratively until each node has been visited and moved, and each community has been refined - this creates partition (c), which is the initial partition of P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} . Then an aggregate network (d) is created by turning each community into a node. P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} is used as the basis for the aggregate network while P {\displaystyle {\mathcal {P}}} is used to create its initial partition. Because we use the original partition P {\displaystyle {\mathcal {P}}} in this step, we must retain it so that it can be used in future iterations. These steps together form the first iteration of the algorithm. In subsequent iterations, the nodes of the aggregate network (which each represent a community) are once again placed into their own individual communities and then sorted according to modularity to form a new P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} , forming (e) in the above graphic. In the case depicted by the graph, the nodes were already sorted optimally, so no change too

    Read more →
  • Parchive

    Parchive

    Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operations that can repair or regenerate corrupted or missing data. Parchive was originally written to solve the problem of reliable file sharing on Usenet, but it can be used for protecting any kind of data from data corruption, disc rot, bit rot, and accidental or malicious damage. Despite the name, Parchive uses more advanced techniques (specifically error correction codes) than simplistic parity methods of error detection. As of 2015, PAR1 is obsolete, PAR2 is mature for widespread use, and PAR3 is a discontinued experimental version developed by MultiPar author Yutaka Sawada. The original SourceForge Parchive project has been inactive since April 30, 2015. A new PAR3 specification has been worked on since April 28, 2019 by PAR2 specification author Michael Nahas. An alpha version of the PAR3 specification has been published on January 29, 2022 while the program itself is being developed. == History == Parchive was intended to increase the reliability of transferring files via Usenet newsgroups. Usenet was originally designed for informal conversations, and the underlying protocol, NNTP was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit ASCII text. Various techniques were devised to send files over Usenet, such as uuencoding and Base64. Later Usenet software allowed 8 bit Extended ASCII, which permitted new techniques like yEnc. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained. With the introduction of Parchive, parity files could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (.par in version 1 and .par2 in version 2) that do not contain any recovery data. These indexes contain file hashes that can be used to quickly identify the target files and verify their integrity. Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available). In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001. Par1 used Reed–Solomon error correction to create new recovery files. Any of the recovery files can be used to rebuild a missing file from an incomplete download. Version 1 became widely used on Usenet, but it did suffer some limitations: It was restricted to handle at most 255 files. The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.) The recovery algorithm had a bug, due to a flaw in the academic paper on which it was based. It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience. In January 2002, Howard Fukada proposed that a new Par2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002. Peter Clements then went on to write the first two Par2 implementations, QuickPar and par2cmdline. Abandoned since 2004, Paul Houle created phpar2 to supersede par2cmdline. Yutaka Sawada created MultiPar to supersede QuickPar. MultiPar uses par2j.exe (which is partially based on par2cmdline's optimization techniques) to use as MultiPar's backend engine. == Versions == Versions 1 and 2 of the file format are incompatible. (However, many clients support both.) === Par1 === For Par1, the files f1, f2, ..., fn, the Parchive consists of an index file (f.par), which is CRC type file with no recovery blocks, and a number of "parity volumes" (f.p01, f.p02, etc.). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth. Par1 supports up to a total of 256 source and recovery files. === Par2 === Par2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The number after the "+" in the filename indicates how many blocks it contains, and the number after "vol" indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable. There is also an index file filename.PAR2, it is identical in function to the small index file used in PAR1. Par2 specification supports up to 32,768 source blocks and up to 65,535 recovery blocks. Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file. Although Unicode is mentioned in the PAR2 specification as an option, most PAR2 implementations do not support Unicode. Directory support is included in the PAR2 specification, but most or all implementations do not support it. === Par3 === The Par3 specification was originally planned to be published as an enhancement over the Par2 specification. However, to date, it has remained closed source by specification owner Yutaka Sawada. A discussion on a new format started in the GitHub issue section of the maintained fork par2cmdline on January 29, 2019. The discussion led to a new format which is also named as Par3. The new Par3 format's specification is published on GitHub, but remains being an alpha draft as of January 28, 2022. The specification is written by Michael Nahas, the author of Par2 specification, with the help from Yutaka Sawada, animetosho and malaire. The new format claims to have multiple advantages over the Par2 format, including support for: More than 216 files and more than 216 blocks. Packing small files into one block, as well as deduplication when a block appears in multiple files. UTF-8 file names. File permissions, hard links, symbolic/soft links, and empty directories. Embedding PAR data inside other formats, like ZIP archives or ISO disk images. "Incremental backups", where a user creates recovery files for some file or folder, change some data, and create new recovery files reusing some of the older files. More error correction code algorithms (such as LDPC and sparse random matrix). BLAKE3 hashes, dropping support for the MD5 hashes used in PAR2. == Software == === Multi-platform === par2+tbb (GPLv2) — a concurrent (multithreaded) version of par2cmdline 0.4 using TBB. Only compatible with x86 based CPUs. It is available in the FreeBSD Ports system as par2cmdline-tbb. Original par2cmdline — (obsolete). Available in the FreeBSD Ports system as par2cmdline. par2cmdline maintained fork by BlackIkeEagle. par2cmdline-mt is another multithreaded version of par2cmdline using OpenMP, GPLv2, or later. Currently merged into BlackIkeEagle's fork and maintained there. ParPar (CC0) is a high performance, multithreaded PAR2 client and Node.js library. Does not support verifying or repair, it can currently only create PAR2 archives. par2deep (LGPL-3.0) — Produce, verify and repair par2 files recursively, both on the command line as well as with the aid of a graphical user interface. It is available in the Python Package Index system as par2deep. par2cron (MIT License) is an o

    Read more →
  • LMArena

    LMArena

    Arena (formerly LMArena and Chatbot Arena) is a public, web-based platform that evaluates large language models (LLMs). Users enter prompts for two anonymous models to respond to and vote on the model that gave the better response, after which the models' identities are revealed. Users can also choose models to test themselves via the "Direct" selection. Companies which have supplied the company with their large language models include OpenAI, Google DeepMind, and Anthropic. The website has been used for preview releases of upcoming models. Chinese company DeepSeek tested its prototype models in the Arena months before its R1 model gained attention in Western media. Other notable pre-release models include OpenAI's GPT-5 under the codename "summit" and Google DeepMind's Gemini 2.5 Flash Image (an image-generation and editing model) under the codename "Nano Banana". Research has identified specific limitations in Arena's methodology. == History == Chatbot Arena was released on April 24, 2023. In June 2024, Chatbot Arena added image support. In September 2024, Chatbot Arena moved to its own dedicated domain name, lmarena.ai (or LMArena). In April 2025, Meta released Llama 4. Llama 4 Maverick beat GPT-4o and Gemini 2.0 Flash on LMArena, but the version of Maverick on LMArena unfairly differed from the publicly available version. LMArena updated their policies in response. In April 2025, LMArena incorporated as an independent company. That May, LMArena raised $100 million in a seed funding round, valuing the company at $600 million. Participants in the seed funding round included Andreessen Horowitz, UC Investments, Lightspeed Venture Partners, Felicis Ventures, and Kleiner Perkins. On January 6, 2026, LMArena announced the closing of a $150 million Series A funding round, bringing the company’s post-money valuation to approximately $1.7 billion. The round was led by Felicis and UC Investments (University of California), with participation from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, and Laude Ventures. In January 2026, LMArena added video support. On January 28, 2026, LMArena rebranded to "Arena".

    Read more →
  • Algorithm IMED

    Algorithm IMED

    In multi-armed bandit problems, IMED (for Indexed Minimum Empirical Divergence) is an algorithm developed in 2015 by Junya Honda and Akimichi Takemura. It is the first algorithm proved to be asymptotically optimal respect to the problem-dependant Lai–Robbins lower bound for distributions in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} . == Multi-armed bandit problem == The Multi-armed bandit problem is a sequential game where one player has to choose at each turn between K {\displaystyle K} actions (arms). Behind every arm a {\displaystyle a} there is an unknown distribution ν a {\displaystyle \nu _{a}} that lies in a set D {\displaystyle {\mathcal {D}}} known by the player (for example, D {\displaystyle {\mathcal {D}}} can be the set of Gaussian distributions or Bernoulli distributions). At each turn t {\displaystyle t} the player chooses (pulls) an arm a t {\displaystyle a_{t}} , he then gets an observation X t {\displaystyle X_{t}} of the distribution ν a t {\displaystyle \nu _{a_{t}}} . === Regret minimization === The goal is to minimize the regret at time T {\displaystyle T} that is defined as R T := ∑ a = 1 K Δ a E [ N a ( T ) ] {\displaystyle R_{T}:=\sum _{a=1}^{K}\Delta _{a}\mathbb {E} [N_{a}(T)]} where μ a := E [ ν a ] {\displaystyle \mu _{a}:=\mathbb {E} [\nu _{a}]} is the mean of arm a {\displaystyle a} μ ∗ := max a μ a {\displaystyle \mu ^{}:=\max _{a}\mu _{a}} is the highest mean Δ a := μ ∗ − μ a {\displaystyle \Delta _{a}:=\mu ^{}-\mu _{a}} N a ( t ) {\displaystyle N_{a}(t)} is the number of pulls of arm a {\displaystyle a} up to turn t {\displaystyle t} The player has to find an algorithm that chooses at each turn t {\displaystyle t} which arm to pull based on the previous actions and observations ( a s , X s ) s < t {\displaystyle (a_{s},X_{s})_{s μ } {\displaystyle {\mathcal {K}}_{inf}(\nu ,\mu ,{\mathcal {D}}):=\inf \left\{\mathrm {KL} (\nu ,{\tilde {\nu }})\ |\ {\tilde {\nu }}\in {\mathcal {P}}([-\infty ,1]),\ \mathbb {E} [{\tilde {\nu }}]>\mu \right\}} K L {\displaystyle \mathrm {KL} } is the Kullback–Leibler divergence P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} is the set of distribution in [ − ∞ , 1 ] {\displaystyle [-\infty ,1]} ν ^ a ( t ) {\displaystyle {\hat {\nu }}_{a}(t)} is the empirical distribution of arm a {\displaystyle a} at turn t {\displaystyle t} μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}^{}(t)} is the highest empirical mean of turn t {\displaystyle t} Remark : For arms a {\displaystyle a} that verify μ ^ a ( t ) = μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}_{a}(t)={\hat {\mu }}^{}(t)} we have K i n f ( ν ^ a ( t ) , μ ^ ∗ ( t ) ) = 0 {\displaystyle K_{inf}({\hat {\nu }}_{a}(t),{\hat {\mu }}^{}(t))=0} . Then there index is equal to ln ⁡ ( N a ( t ) ) {\displaystyle \ln(N_{a}(t))} === Pseudocode === for each arm i do: n[i] ← 1; nu[i] ← None; mu[i] ← None for t from 1 to K do: select arm t observe reward r n[t] ← n[t] + 1 nu[t] ← update empirical distribution mu[t] ← update empirical mean for t from K+1 to T do: mu ← highest mu for each arm i do: scoreK[i] ← n[i] K_inf(nu[i],mu) scoreN[i] ← ln(n[i]) index[i] ← scoreK[i] + scoreN[i] select arm a with smallest index[a] observe reward r n[a] ← n[a] + 1 nu[a] ← update empirical distribution mu[a] ← update empirical mean == Theoretical results == In the multi-armed bandit problem we have the asymptotic Lai–Robbins lower bound asymptotic lower bound on regret. The algorithm IMED is the first algorithm that matches this lower bound for distribution in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} in the first order. If the distribution are also bounded then it also match the second order. It is the first algorithm that match the second under of this lower bound. === Lai–Robbins lower bound === In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order It states that for every consistent algorithm on the set P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} — that is, an algorithm for which, for every ( ν 1 , … , ν K ) ∈ P ( [ − ∞ , 1 ] ) K {\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}} , the regret R T {\displaystyle R_{T}} is subpolynomial (i.e. R T = o T → + ∞ ( T α ) {\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })} for all α > 0 {\displaystyle \alpha >0} ) — we have: R T ≥ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − Ω T → + ∞ ( ln ⁡ ln ⁡ T ) . {\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).} This bound is asymptotic (as T → + ∞ {\displaystyle T\to +\infty } ) and gives a first-order lower bound of order ln ⁡ T {\displaystyle \ln T} with the optimal constant in front of it and the second order in − Ω ( ln ⁡ ln ⁡ T ) {\displaystyle -\Omega (\ln \ln T)} . === Regret bound for IMED === If the distribution of every arm a {\displaystyle a} is ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} ( i.e. ν a ∈ P ( [ − ∞ , 1 ] ) ) {\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))} then the regret of the algorithm IMED verify R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T + O ( 1 ) {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T+O(1)} If all the distribution ν a {\displaystyle \nu _{a}} are bounded then it exists a constant C > 0 {\displaystyle C>0} such that for T {\displaystyle T} large enough the regret of IMED is upper bounded by R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − C ln ⁡ ln ⁡ T {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-C\ln \ln T} == Computation time == The algorithm only requiere to compute the K i n f {\displaystyle K_{inf}} for suboptimal arms who are pulled O ( ln ⁡ T ) {\displaystyle O(\ln T)} times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the K i n f {\displaystyle K_{inf}} in the first order .

    Read more →
  • Five safes

    Five safes

    The Five Safes is a framework for helping make decisions about making effective use of data which is confidential or sensitive. It is mainly used to describe or design research access to statistical data held by government and health agencies, and by data archives such as the UK Data Service. It is not an internationally accepted standard. Two of the Five Safes refer to statistical disclosure control, and so the Five Safes is usually used to contrast statistical and non-statistical controls when comparing data management options. == Concept == The Five Safes proposes that data management decisions be considered as solving problems in five 'dimensions': projects, people, settings, data and outputs. The combination of the controls leads to 'safe use'. These are most commonly expressed as questions, for example: These dimensions are scales, not limits. That is, solutions can have a mix of more or fewer controls in each dimension, but the overall aim of 'safe use' independent of the particular mix. For example, a public use file available for open download cannot control who uses it, where or for what purpose, and so all the control (protection) must be in the data itself. In contrast, a file which is only accessed through a secure environment with certified users can contain very sensitive information: the non-statistical controls allow the data to be 'unsafe'. One academic likened the process to a graphic equalizer, where bass and treble can be combined independently to produce a sound the listener likes, which has proven to be a very useful metaphor. This 2023 Data Foundation webinar is an expert discussion of how the elements interact, including an excellent introductory representation. There is no 'order' to the Five Safes, in that one is necessarily more important than the others. However, Ritchie argued that the 'managerial' controls (projects, people, setting) should be addressed before the 'statistical' controls (data, output). The Five Safes concept is associated with other topics which developed from the same programme at ONS, although these are not necessarily implemented. Safe people is associated with 'active researcher management', while safe outputs is linked with principles-based output statistical disclosure control. The Five Safes is a positive framework, describing what is and is not. The EDRU ('evidence-based, default-open, risk-managed, user-centred') attitudinal model is sometimes used to give a normative context == The 'data access spectrum' == From 2003 the Five Safes was also represented in a simpler form as a 'Data Access Spectrum'. The non-data controls (project, people, setting, outputs) tend to work together, in that organisations often see these as a complementary set of restrictions on access. These can then be contrasted with choices about data anonymisation to present a linear representation of data access options. This presentation is consistent with the idea of 'data as a residual', as well as data protection laws of the time which often characterised data simply as anonymous or not anonymous. A similar idea had already been developed independently in 2001 by Chuck Humphrey of the Canadian RDC network, the 'continuum of access'. More recently, The Open Data Institute has developed a 'Data Spectrum toolkit' which includes industry-specific examples. == History and terminology == The Five Safes was devised in the winter of 2002/2003 by Felix Ritchie at the UK Office for National Statistics (ONS) to describe its secure remote-access Virtual Microdata Laboratory (VML). It was described at this time as the 'VML Security Model'. This was adopted by the NORC data enclave, and more widely in the US, as the 'portfolio model' (although this is now also used to refer to a slightly different legal/statistical/educational breakdown). In 2012 the framework as was still being referred to as the 'VML security model', but its increasing use among non-UK organisations led to the adoption of the more general and informative phrase 'Five Safes'. The original framework only had four safes (projects, people, settings and outputs): the framework was used to describe highly detailed data access through a secure environment, and so the 'data' dimension was irrelevant. From 2007 onwards, 'safe data' was included as the framework was used to a describe a wider range of ONS activities. As the US version was based upon the 2005 specification, some US iterations uses have the original four dimensions (eg). Some discussions, such as the OECD, use the term 'secure' instead 'safe'. However, the use of both these terms can cause presentational problems: less control in a particular dimension could be seen to imply 'unsafe users' or 'insecure settings', for example, which distracts from the main message. Hence, the Australian government uses the term "five data sharing principles". The 'Anonymisation Decision-Making Framework' uses a framework based on the Five Safes but relabelling "projects", "people", and "settings" as "governance", "agency" and "infrastructure", respectively; "Output" is omitted, and "safe use" becomes "functional anonymisation". There is no reference to the Five Safes or any associated literature. The Australian version was required to include references to the Five Safes, and presented it as an alternative without comment. == Application == The framework has had three uses: pedagogical, descriptive, and design. Since 2016, it has also been used, directly and indirectly in legislation. See for more detailed examples. === Pedagogy === The first significant use of the framework, other than internal administrative use, was to structure researcher training courses at the UK Office for National Statistics from 2003. UK Data Archive, Administrative Data Research Network, Eurostat, Statistics New Zealand, the Mexican National Institute of Statistics and Geography, NORC, Statistics Canada and the Australian Bureau of Statistics, amongst others, have also used this framework. Most of these courses are for researchers using restricted-access facilities; the Eurostat courses are unusual in that they are designed for all users of sensitive data. === Description === The framework is often used to describe existing data access solutions (e.g. UK HMRC Data Lab, UK Data Service, Statistics New Zealand) or planned/conceptualised ones (e.g. Eurostat in 2011). An early use was to help identify areas where ONS' still had 'irreducible risks' in its provision of secure remote access. The framework is mostly used for confidential social science data. To date it appears to have made little impact on medical research planning, although it is now included in the revised guidelines on implementing HIPAA regulations in the US, and by Cancer Research UK and the Health Foundation in the UK. It has also been used to describe a security model for the Scottish Health Informatics Programme. === Design === In general the Five Safes has been used to describe solutions post-factum, and to explain/justify choices made, but an increasing number of organisations have used the framework to design data access solutions. For example, the Hellenic Statistical Agency developed a data strategy built around the Five Safes in 2016; the UK Health Foundation used the Five Safes to design its data management and training programmes. Use in the private sector is less common but some organisations have incorporated the Five Safes into consulting services. In 2015 the UK Data Service organized a workshop to encourage data users from the academic and private sectors to think about how to manage confidential research data, using the Five Safes to demonstrate alternative options and best practice. Early adopters for strategic design use were in Australia: both the Australian Bureau of Statistics and the Australian Department of Social Service used the Five Safes as an ex ante design tool. In 2017 the Australian Productivity Commission recommended adopting a version of the framework to support cross-government data sharing and re-use. This underwent extensive consultation and culminated in the DAT Act 2022. Since 2020 the Five Safes has been the overriding framework for the design of new secure facilities and data sharing arrangements in the UK for public health and social sciences. This has been promoted by the Office for Statistics Regulation, the UK Statistics Authority, NHS DIgital, and the research funding bodies Administrative Data Research UK and DARE UK. === Regulation and legislation === Three laws have incorporated the Fives Safes. They are explicit in the South Australian Public Sector (Data Sharing) Act 2016, and implicit in the research provisions of the UK Digital Economy Act 2017. The Australian Data Availability and Transparency Act 2022 renames the Five Safes as the Five Data Sharing Principles.A 2025 statutory review of the DAT Act 2022 found "that the DAT Act has not been effective in achieving its objectives.". The review includes specific referen

    Read more →
  • List of information schools

    List of information schools

    This list of information schools, sometimes abbreviated to iSchools, includes members of the iSchools organization. The iSchools organization reflects a consortium of over 130 information schools across the globe. == History == The first iSchools Caucus was formed in 1988 by Syracuse, Pittsburgh, and Drexel and was called the Gang of Three (sometimes gang of four with Rutgers). Syracuse renamed the School of Library Science as the School of Information Studies in 1974, and is considered as the first “iSchool” in history. The group was formally named "the iSchools Caucus" or more casually, the iCaucus. By 2003, the group expanded to include the Universities of Michigan, Washington, Illinois, UNC, Florida State, Indiana, and Texas, and was called the Gang of Ten. The current iSchools Caucus organization was formalized by 2005, with additions of UC Berkeley, UC Irvine, UCLA, Penn State, Georgia Tech, Maryland, Toronto, Carnegie Mellon and Singapore Management University. == iSchools organization == The iSchools promote an interdisciplinary approach to understanding the opportunities and challenges of information management, with a core commitment to concepts like universal access and user-centered organization of information. The field is concerned broadly with questions of design and preservation across information spaces, from digital and virtual spaces such as online communities, social networking, the World Wide Web, and databases to physical spaces such as libraries, museums, collections, and other repositories. "School of Information", "Department of Information Studies", or "Information Department" are often the names of the participating organizations. Degree programs at iSchools include course offerings in areas such as information architecture, design, policy, and economics; knowledge management, user experience design, and usability; preservation and conservation; librarianship and library administration; the sociology of information; and human-computer interaction and computer science. === Leadership === The executive committee of the iSchools is made up of the current chair (Ina Fourie, University of Pretoria, South Africa), past chair (Gillian Oliver, Monash University, Australia) and the chair elect (Javed Mostafa, University of Toronto Canada), plus representatives from the three regions (North America, Europe, and Asia-Pacific). The current executive director is Slava Sterzer. == Member institutions == Between 2010 and 2026, the organization expanded globally beyond North America, growing to 133 member schools as of March 2026. For an updated and complete list of member schools, please visit the member database of the iSchools. == iConferences == Members of the iSchools organize a regular academic conference, known as the iConference, hosted by a different member institution each year. September 2005: Pennsylvania State University October 2006: University of Michigan February 2008: University of California, Los Angeles February 2009: University of North Carolina February 2010: University of Illinois at Urbana-Champaign February 2011: University of Washington, Seattle February 2012: University of Toronto February 2013: University of North Texas March 2014: Humboldt-Universität zu Berlin March 2015: University of California, Irvine March 2016: Drexel University March 2017: Wuhan University March 2018: University of Sheffield and Northumbria University March 2019: University of Maryland March 2020: University of Borås (virtual only) March 2021: Renmin University of China (virtual only) February/March 2022: University of Texas at Austin, University College Dublin & Kyushu University (virtual only) March 2023: Universitat Oberta de Catalunya March 2024: Jilin University March 2025: Indiana University March/April 2026: Edinburgh Napier University 2027: Victoria University of Wellington == Other schools of information == Other information schools and programs include: Documentation Research and Training Centre, Indian Statistical Institute, Bangalore San Jose State University, School of Information University of Southern California Library Science Degree Ankara University, Department of Information and Records Management, Ankara/Turkey Marmara University, Department of Information and Records Management, Istanbul/Turkey University of Kelaniya, Department of Library and Information Science, Kelaniya/Sri Lanka University of Colombo, National Institute of Library and Information Science (NILIS), Colombo/Sri Lanka Chicago State University, Department of Information Studies

    Read more →
  • Outline of robotics

    Outline of robotics

    The following outline is provided as an overview of and topical guide to robotics: Robotics is a branch of mechanical engineering, electrical engineering and computer science that deals with the design, construction, operation, and application of robots, as well as computer systems for their control, sensory feedback, and information processing. These technologies deal with automated machines that can take the place of humans in dangerous environments or manufacturing processes, or resemble humans in appearance, behaviour, and or cognition. Many of today's robots are inspired by nature contributing to the field of bio-inspired robotics. The word "robot" was introduced to the public by Czech writer Karel Čapek in his play R.U.R. (Rossum's Universal Robots), published in 1920. The term "robotics" was coined by Isaac Asimov in his 1941 science fiction short-story "Liar!" == Nature of robotics == Robotics can be described as: An applied science – scientific knowledge transferred into a physical environment. A branch of computer science – A branch of electrical engineering – A branch of mechanical engineering – Research and development – A branch of technology – == Branches of robotics == Adaptive control – control method used by a controller which must adapt to a controlled system with parameters which vary, or are initially uncertain. For example, as an aircraft flies, its mass will slowly decrease as a result of fuel consumption; a control law is needed that adapts itself to such changing conditions. Aerial robotics – development of unmanned aerial vehicles (UAVs), commonly known as drones, aircraft without a human pilot aboard. Their flight is controlled either autonomously by onboard computers or by the remote control of a pilot on the ground or in another vehicle. Android science – interdisciplinary framework for studying human interaction and cognition based on the premise that a very humanlike robot (that is, an android) can elicit human-directed social responses in human beings. Anthrobotics – science of developing and studying robots that are either entirely or in some way human-like. Artificial intelligence – the intelligence of machines and the branch of computer science that aims to create it. Artificial neural networks – a mathematical model inspired by biological neural networks. Autonomous car – an autonomous vehicle capable of fulfilling the human transportation capabilities of a traditional car Autonomous research robotics – Bayesian network – BEAM robotics – a style of robotics that primarily uses simple analogue circuits instead of a microprocessor in order to produce an unusually simple design (in comparison to traditional mobile robots) that trades flexibility for robustness and efficiency in performing the task for which it was designed. Behavior-based robotics – the branch of robotics that incorporates modular or behavior based AI (BBAI). Bio-inspired robotics – making robots that are inspired by biological systems. Biomimicry and bio-inspired design are sometimes confused. Biomimicry is copying the nature while bio-inspired design is learning from nature and making a mechanism that is simpler and more effective than the system observed in nature. Biomimetic – see Bionics. Biomorphic robotics – a sub-discipline of robotics focused upon emulating the mechanics, sensor systems, computing structures and methodologies used by animals. Bionics – also known as biomimetics, biognosis, biomimicry, or bionical creativity engineering is the application of biological methods and systems found in nature to the study and design of engineering systems and modern technology. Biorobotics – a study of how to make robots that emulate or simulate living biological organisms mechanically or even chemically. Cloud robotics – is a field of robotics that attempts to invoke cloud technologies such as cloud computing, cloud storage, and other Internet technologies centered around the benefits of converged infrastructure and shared services for robotics. Cognitive robotics – views animal cognition as a starting point for the development of robotic information processing, as opposed to more traditional Artificial Intelligence techniques. Clustering – Computational neuroscience – study of brain function in terms of the information processing properties of the structures that make up the nervous system. Robot control – a study of controlling robots Robotics conventions – Data mining Techniques – Degrees of freedom – in mechanics, the degree of freedom (DOF) of a mechanical system is the number of independent parameters that define its configuration. It is the number of parameters that determine the state of a physical system and is important to the analysis of systems of bodies in mechanical engineering, aeronautical engineering, robotics, and structural engineering. Developmental robotics – a methodology that uses metaphors from neural development and developmental psychology to develop the mind for autonomous robots Digital control – a branch of control theory that uses digital computers to act as system controllers. Digital image processing – the use of computer algorithms to perform image processing on digital images. Dimensionality reduction – the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. Distributed robotics – Electronic stability control – is a computerized technology that improves the safety of a vehicle's stability by detecting and reducing loss of traction (skidding). Evolutionary computation – Evolutionary robotics – a methodology that uses evolutionary computation to develop controllers for autonomous robots Extended Kalman filter – Flexible Distribution functions – Feedback control and regulation – Human–computer interaction – a study, planning and design of the interaction between people (users) and computers Human robot interaction – a study of interactions between humans and robots Intelligent vehicle technologies – comprise electronic, electromechanical, and electromagnetic devices - usually silicon micromachined components operating in conjunction with computer controlled devices and radio transceivers to provide precision repeatability functions (such as in robotics artificial intelligence systems) emergency warning validation performance reconstruction. Computer vision – Machine vision – Kinematics – study of motion, as applied to robots. This includes both the design of linkages to perform motion, their power, control and stability; also their planning, such as choosing a sequence of movements to achieve a broader task. Laboratory robotics – the act of using robots in biology or chemistry labs Robot learning – learning to perform tasks such as obstacle avoidance, control and various other motion-related tasks Direct manipulation interface – In computer science, direct manipulation is a human–computer interaction style which involves continuous representation of objects of interest and rapid, reversible, and incremental actions and feedback. The intention is to allow a user to directly manipulate objects presented to them, using actions that correspond at least loosely to the physical world. Manifold learning – Microrobotics – a field of miniature robotics, in particular mobile robots with characteristic dimensions less than 1 mm Motion planning – (a.k.a., the "navigation problem", the "piano mover's problem") is a term used in robotics for the process of detailing a task into discrete motions. Motor control – information processing related activities carried out by the central nervous system that organize the musculoskeletal system to create coordinated movements and skilled actions. Nanorobotics – the emerging technology field creating machines or robots whose components are at or close to the scale of a nanometer (10−9 meters). Passive dynamics – refers to the dynamical behavior of actuators, robots, or organisms when not drawing energy from a supply (e.g., batteries, fuel, ATP). Programming by Demonstration – an End-user development technique for teaching a computer or a robot new behaviors by demonstrating the task to transfer directly instead of programming it through machine commands. Quantum robotics – a subfield of robotics that deals with using quantum computers to run robotics algorithms more quickly than digital computers can. Rapid prototyping – automatic construction of physical objects via additive manufacturing from virtual models in computer aided design (CAD) software, transforming them into thin, virtual, horizontal cross-sections and then producing successive layers until the items are complete. As of June 2011, used for making models, prototype parts, and production-quality parts in relatively small numbers. Reinforcement learning – an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward. Robot

    Read more →
  • Known-item search

    Known-item search

    Known-item search is a specialization of information exploration which represents the activities carried out by searchers who have a particular item in mind. In the context of library catalogs, known‐item search means a search for an item for which the author or title is known. Although the concept of known-item search originated in library science, it is now applied in the context of web search and other online search activities. Known-item search is distinguished from exploratory search, in which a searcher is unfamiliar with the domain of their search goal, unsure about the ways to achieve their goal, and/or unsure about what their goal is.

    Read more →
  • Algorithmic Puzzles

    Algorithmic Puzzles

    Algorithmic Puzzles is a book of puzzles based on computational thinking. It was written by computer scientists Anany and Maria Levitin, and published in 2011 by Oxford University Press. == Topics == The book begins with a "tutorial" introducing classical algorithm design techniques including backtracking, divide-and-conquer algorithms, and dynamic programming, methods for the analysis of algorithms, and their application in example puzzles. The puzzles themselves are grouped into three sets of 50 puzzles, in increasing order of difficulty. A final two chapters provide brief hints and more detailed solutions to the puzzles, with the solutions forming the majority of pages of the book. Some of the puzzles are well known classics, some are variations of known puzzles making them more algorithmic, and some are new. They include: Puzzles involving chessboards, including the eight queens puzzle, knight's tours, and the mutilated chessboard problem Balance puzzles River crossing puzzles The Tower of Hanoi Finding the missing element in a data stream The geometric median problem for Manhattan distance == Audience and reception == The puzzles in the book cover a wide range of difficulty, and in general do not require more than a high school level of mathematical background. William Gasarch notes that grouping the puzzles only by their difficulty and not by their themes is actually an advantage, as it provides readers with fewer clues about their solutions. Reviewer Narayanan Narayanan recommends the book to any puzzle aficionado, or to anyone who wants to develop their powers of algorithmic thinking. Reviewer Martin Griffiths suggests another group of readers, schoolteachers and university instructors in search of examples to illustrate the power of algorithmic thinking. Gasarch recommends the book to any computer scientist, evaluating it as "a delight".

    Read more →