AI Analytics Certification

AI Analytics Certification — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Dark data

    Dark data

    Dark data is data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making. The ability of an organisation to collect data can exceed the throughput at which it can analyse the data. In some cases the organisation may not even be aware that the data is being collected. IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used. In an industrial context, dark data can include information gathered by sensors and telematics. Organizations retain dark data for a multitude of reasons, and it is estimated that most companies are only analyzing 1% of their data. Often it is stored for regulatory compliance and record keeping. Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and business intelligence technology to process the information. Because storage is inexpensive, storing data is easy. However, storing and securing the data usually entails greater expenses (or even risk) than the potential return profit. In academic discourse, the term dark data was essentially coined by Bryan P. Heidorn. He uses it to describe research data, especially from the long tail of science (the many, small research projects), which are not or no longer available for research because they disappear in a drawer without adequate data management. Without this, the data become dark, and further reasons for this are e.g. missing metadata annotation, missing data management plans and data curators. == Analysis == The term "dark data" very often refers to data that is not amenable to computer processing. For example, a company might have a great deal of data that exists only as scanned page-images. Even the bare text in such documents is not available without something like Optical character recognition, which can vary greatly in accuracy. Even with OCR, the significance of each part of the data is unavailable. An obvious examples is whether a capitalized word is a name or not, and if so, whether it represents a person, place, organization, or even a work of art. Bibliographic and other references, data within tables (that may be labeled quite adequately for humans, but not for processing), and countless assertions represented with the full complexity and ambiguity of human language. A lot of unused data is very valuable, and would be used if it could be; but is blocked because it is in formats that are difficult to process, categorise, identify, and analyse. Often the reason that business does not use their dark data is because of the amount of resources it would take and the difficulty of having that data analysed. In other words, the data is "dark" not because it is not used, but because it cannot (feasibly or affordably) be used, given its poor representation. There are many data representations that can make data much more accessible for automation. However, a great deal of information lacks any such identification of information items or relationships; and much more loses it during "downhill" conversion such as saving to page-oriented representations, printing, scanning, or faxing. The journey back "uphill" can be costly. According to Computer Weekly, 60% of organisations believe that their own business intelligence reporting capability is "inadequate" and 65% say that they have "somewhat disorganised content management approaches". == Relevance == Useful data may become dark data after it becomes irrelevant, as it is not processed fast enough. This is called "perishable insights" in "live flowing data". For example, if the geolocation of a customer is known to a business, the business can make offer based on the location, however if this data is not processed immediately, it may be irrelevant in the future. According to IBM, about 60 percent of data loses its value immediately. == Storage == According to the New York Times, 90% of energy used by data centres is wasted. If data was not stored, energy costs could be saved. Furthermore, there are costs associated with the underutilisation of information and thus missed opportunities. According to Datamation, "the storage environments of EMEA organizations consist of 54 percent dark data, 32 percent redundant, obsolete and trivial data and 14 percent business-critical data. By 2020, this can add up to $891 billion in storage and management costs that can otherwise be avoided." The continuous storage of dark data can put an organisation at risk, especially if this data is sensitive. In the case of a breach, this can result in serious repercussions. These can be financial, legal and can seriously hurt an organisation's reputation. For example, a breach of private records of customers could result in the stealing of sensitive information, which could result in identity theft. Another example could be the breach of the company's own sensitive information, for example relating to research and development. These risks can be mitigated by assessing and auditing whether this data is useful to the organisation, employing strong encryption and security and finally, if it is determined to be discarded, then it should be discarded in a way that it becomes unretrievable. == Future == It is generally considered that as more advanced computing systems for analysis of data are built, the higher the value of dark data will be. It has been noted that "data and analytics will be the foundation of the modern industrial revolution". Of course, this includes data that is currently considered "dark data" since there are not enough resources to process it. All this data that is being collected can be used in the future to bring maximum productivity and an ability for organisations to meet consumers' demand. Technology advancements are helping to leverage this dark data affordably. Furthermore, many organisations do not realise the value of dark data right now, for example in healthcare and education organisations deal with large amounts of data that could create a significant "potential to service students and patients in the manner in which the consumer and financial services pursue their target population".

    Read more →
  • Optical recording

    Optical recording

    The history of optical recording can be divided into a few number of distinct major contributions. The pioneers of optical recording worked mostly independently, and their solutions to the many technical challenges have very distinctive features, such as reflective disc (Compaan and Kramer) transparent disc (Gregg) floppy disc (Russell) rigid disc (Compaan and Kramer) focused laser beam for read-out through transparent substrate (Compaan and Kramer). == Gregg 1958 == Laserdisc technology, using a transparent disc, was invented by David Paul Gregg in 1958 (and patented in 1970 and 1990). By 1969 Philips had developed a videodisc in reflective mode, which has great advantages over the transparent mode. MCA and Philips decided to join their efforts. They first publicly demonstrated the videodisc in 1972. Laserdisc was first available on the market, in Atlanta, on December 15, 1978, two years after the VHS VCR and four years before the CD, which is based on Laserdisc technology. Philips produced the players and MCA produced the discs. The Philips/MCA cooperation was not successful, and discontinued after a few years. Several of the scientists responsible for the early research (John Winslow, Richard Wilkinson and Ray Dakin) founded Optical Disc Corporation (now ODC Nimbus). == Russell 1965 == While working at Pacific Northwest National Laboratory, James Russell invented an optical storage system for digital audio and video, patenting the concept in 1970. The earliest patents by Russell, US 3,501,586, and 3,795,902 were filed in 1966, and 1969. respectively. He built prototypes, and the first was operating in 1973. Russell had found a way to record digital information onto a photosensitive plate in tiny dark spots, each spot one micrometre from centre to centre, with a laser that wrote the binary patterns. Russell's first optical disc was distinctly different from the eventual compact disc product: the disc in the player was not read by laser light. A key characteristic of Russell's invention is that a laser is not used for the reading the disc, instead the entire disc or oblong sheet to be read is illuminated by a large playback light source at the back of the transparent foil. As a result, the information density is relatively low. By 1985, Russell held over 25 patents to various technologies related to optical recording and playback. Russell's intellectual property was purchased by Optical Recording Corporation (ORC) in Toronto in 1985, and this firm notified a number of CD manufacturers that their CD technology was based on patents held by ORC. In 1987, ORC signed an agreement with Sony whereby Sony paid for licensing of the technology. Further licenses followed from Philips and others. Warner Communications did not sign, and was sued by ORC. In 1992, the large CD manufacturer, now called Time Warner, was ordered to pay ORC US$30 million in patent violations. In the 1970 patent, the spot diameter was around 10 micrometres. Thus, the areal information density was around a factor hundred less than that of the CD as later developed. Russell continued to refine the concept throughout the 1970s. Philips and Sony, however, were able to put far greater resources into the parallel development of the concept, arriving at a smaller and more sophisticated product in just a few years. Russell's various partners and ventures failed to produce a single consumer product. == Korpel 1968 == Adrianus Korpel worked for the Zenith Electronics Corporation, when he developed very early optical videodisc systems, including holographic storage. == Kramer and Compaan 1969 == The Philips development of the videodisc technology began in 1969 with efforts by Dutch physicists Klaas Compaan and Piet Kramer to record video images in holographic form on disc. Their prototype Laserdisc shown in 1972 used a laser beam in reflective mode to read a track of pits using an FM video signal. Together with MCA, Philips brought the optical videodisk to market in 1978. The cooperation between Philips and MCA did not last long, and discontinued after a few years. == Immink and Doi 1979 == The Compact Disc (CD), which is based on MCA/Philips Laserdisc technology, was developed by a taskforce of Sony and Philips in 1979–1980. Toshi Doi and Kees Schouhamer Immink created the digital technologies that turned the analog Laserdisc into a high-density low-cost digital audio disc. The CD, available on the market since October 1982, remains the standard physical medium for sale of commercial audio recordings Standard CDs have a diameter of 120 mm and can hold up to 80 minutes of audio (700 MB of data). The Mini CD has various diameters ranging from 60 to 80 mm; they are sometimes used for CD singles or device drivers, storing up to 24 minutes of audio. The technology was later adapted and expanded to include data storage CD-ROM, write-once audio and data storage CD-R, rewritable media CD-RW, Super Audio CD (SACD), Video Compact Discs (VCD), Super Video Compact Discs (SVCD), PhotoCD, PictureCD, CD-i, and Enhanced CD. CD-ROMs and CD-Rs remain widely used technologies in the computer industry. The CD and its extensions have been extremely successful: in 2004, worldwide sales of CD audio, CD-ROM, and CD-R reached about 30 billion discs. By 2007, 200 billion CDs had been sold worldwide.

    Read more →
  • Audience capture

    Audience capture

    Audience capture is the phenomenon where an influencer is affected by their audience, catering to it with what they believe it wants to hear or is willing to pay for. This creates a positive feedback loop, which can lead the influencer to express more extreme views and behaviors. A famous example of audience capture can be found in the story of the online influencer Nicholas Perry, known as Nikocado Avocado. Perry started off on YouTube with videos of himself playing the violin and supporting veganism. He then shifted to videos of himself eating known as mukbang. Audience capture led him to more and more extreme eating leading him in turn to obesity and poor health. The effect can cause ideological media creators to become more politically radical, based on the feedback of their audience.

    Read more →
  • Electronic game

    Electronic game

    An electronic game is a game that uses electronics to create an interactive system with which a player can play. Video games are the most common form today, and for this reason the two terms are often used interchangeably. There are other common forms of electronic games, including handheld electronic games, standalone arcade game systems (e.g. pinball, slot machines), and exclusively non-visual products (e.g. audio games). == Arcade games == === Arcade video games === Electronic video arcade games make extensive use of solid state electronics and integrated circuits. In the past coin-operated arcade video games generally used custom per-game hardware often with multiple CPUs, highly specialized sound and graphics chips and/or boards, and the latest in computer graphics display technology. Recent arcade game hardware is often based on modified video game console hardware or high end pc components. Arcade games may feature specialized ambiance or control accessories, including fully enclosed dynamic cabinets with force feedback controls, dedicated lightguns, rear-projection displays, reproductions of car or plane cockpits and even motorcycle or horse-shaped controllers, or even highly dedicated controllers such as dancing mats and fishing rods. These accessories are usually what set modern arcade games apart from PC or console games, and they provide an experience that some gamers consider more immersive and realistic. Examples of arcade video games include: Galaxy Game (1971) Pong (1972) Space Invaders (1978) Galaxian (1979) Pac-Man (1980) Battlezone (1980) Donkey Kong (1981) Street Fighter II (1991) Mortal Kombat (1992) Fatal Fury (1992) Killer Instinct (1994) King of Fighters (1994–2005) Time Crisis (1995) Dance Dance Revolution (1998) DrumMania (1999) House of the Dead (1998) === Pinball and pachinko machines === Since the introduction of electromechanics to the pinball machine in 1933's Contact, pinball has become increasingly dependent on electronics as a means to keep score on the backglass and to provide quick impulses on the playfield (as with bumpers and flippers) for exciting gameplay. Unlike games with electronic visual displays, pinball has retained a physical display that is viewed on a table through glass. Similar games such as pachinko have also become increasingly dependent on electronics in modern times. Examples of pinball games include: The Addams Family (1991) Indiana Jones: The Pinball Adventure (1993) Star Trek: The Next Generation (1993) List of pinball machines === Redemption games and merchandisers === Redemption games such as Skee-Ball have been around since the days of the carnival game - well earlier than the development of the electronic game, however with modern advances many of these games have been re-worked to employ electronic scoring and other game mechanics. The use of electronic scoring mechanisms has allowed carnival or arcade attendants to take a more passive role, simply exchanging prizes for electronically dispensed coupons and occasionally emptying out the coin boxes or banknote acceptors of the more popular games. Merchandisers such as the Claw Crane are more recent electronic games in which the player must accomplish a seemingly simple task (e.g. remotely controlling a mechanical arm) with sufficient ability to earn a reward. Examples of redemption games include: Whac-A-Mole (1976) Skee-Ball - modern electric versions Examples of merchandisers include: Claw crane (1980) === Slot machines === The slot machine is a casino gambling machine with three or more reels which spin when a button is pushed. Though slot machines were originally operated mechanically by a lever on the side of the machine (the one arm) instead of an electronic button on the front panel as used on today's models, many modern machines still have a "legacy lever" in addition to the button on the front. Slot machines include a currency detector that validates the coin or money inserted to play. The machine pays off based on patterns of symbols visible on the front of the machine when it stops. Modern computer technology has resulted in many variations on the slot machine concept. == Audio games == An audio game is a game played on an electronic device such as—but not limited to—a personal computer. It is similar to a video game save that the only feedback device is audible rather than visual. Audio games originally started out as 'blind accessible'-games, but recent interest in audio games has come from sound artists, game accessibility researchers, mobile game developers, and mainstream video gamers. Most audio games run on a computer platform, although there are a few audio games for handhelds and video game consoles. Audio games feature the same variety of genres as video games, such as adventure games, racing games, etc. Examples of audio games include: Real Sound: Kaze no Regret (1997) Chillingham (2004) BBBeat (2005) === Tabletop games === A tabletop audio game is an audio game that is designed to be played on a table rather than a handheld game. Examples of tabletop audio games include: Brain Shift (1998) Who Wants to be a Millionaire? (2000) Electronic Battleship (1977) (Milton Bradley) Electronic battleship is a portable game with the objective of marking all enemy ships. When an enemy ship is marked, an electronic battleship makes an explosion sound. Milton Bradley created the Electronic battleship game in 1977 and was later acquired by Hasbro in 1984. Modern day electronic battleship features an interactive missile launching platform and advanced mode that features custom special attack pegs. Tabletop non-audio games include: Electronic Chess Boards (DGT) DGT is a line of electronic chess boards that are commonly used in FIDE chess tournaments and national tournaments such as USCF. Electronic Chess boards can be used to broadcast games live. == Electronic handhelds == The earliest form of dedicated console, handheld electronic games are characterized by their size and portability. Used to play interactive games, handheld electronic games are often miniaturized versions of video games. The controls, display and speakers are all part of a single unit, and rather than a general-purpose screen made up of a grid of small pixels, they usually have custom displays designed to play one game. This simplicity means they can be made as small as a digital watch, which they sometimes are. The visual output of these games can range from a few small light bulbs or LED lights to calculator-like alphanumerical screens; later these were mostly displaced by liquid crystal and Vacuum fluorescent display screens with detailed images and in the case of VFD games, color. Handhelds were at their most popular from the late 1970s into the early 1990s. They are both the predecessors to and inexpensive alternatives to the handheld game console. Examples of handheld electronic games include: Mattel Auto Race (1976) Simon (1978) Merlin (1978) Game & Watch (1980) MB Omni (1980) Bandai LCD Solarpower (1982) Entex Adventure Vision (1982) Lights Out (1995) == Home video games == A video game is a game that involves interaction with a user interface to generate visual feedback on a video device. The word video in video game traditionally referred to a raster display device. However, with the popular use of the term "video game", it now implies any type of display device. Term "digital game" has been offered by some in academia as an alternative term. === Computer games === A personal computer video game (also known as a computer game or simply PC game) is a video game played on a personal computer. This is opposed to video game consoles or arcade machines, which are not considered personal computers. Computer games became a form of video games, and since the earliest days of the medium, visual displays such as the cathode-ray tube have been used to relay game information. === Console games === A console game is a form of interactive multimedia used for entertainment. The game consists of manipulable images (and usually sounds) generated by a video game console, and displayed on a television or similar audio-video system. The game itself is usually controlled and manipulated using a handheld device connected to the console called a controller. The controller generally contains a number of buttons and directional controls (such as analog joysticks) each of which has been assigned a purpose for interacting with and controlling the images on the screen. The display, speakers, console, and controls of a console can also be incorporated into one small object known as a handheld game console. Console games are most frequently differentiated between by their compatibility with consoles belonging in the following categories: Traditional console, also called "home console" - A multi-game system that uses the screen of a television to produce graphics. Handheld game console - A multi-game system the screen and controls of which are compacted into a singl

    Read more →
  • Web application firewall

    Web application firewall

    A Web application firewall (WAF) is a specific form of application firewall that filters, monitors, and blocks HTTP traffic to and from a web service. By inspecting HTTP traffic, it can prevent attacks exploiting a Web application's known vulnerabilities, such as SQL injection, cross-site scripting (XSS), file inclusion, and improper system configuration. Financial institutions often utilize WAFs to help in the mitigation of Web application zero-day vulnerabilities, as well as hard-to-patch bugs or weaknesses through custom attack signature strings. == History == Dedicated Web application firewalls entered the market in the late 1990s during a time when web server attacks were becoming more prevalent. Early WAF products, from Kavado and Gilian technologies, tried to solve the increasing amount of attacks on Web applications in the late 1990s. In 2002, the open-source project ModSecurity was formed in order to make WAF technology more accessible. They finalized a core rule set for protecting Web applications, based on OASIS Web Application Security Technical Committee’s (WAS TC) vulnerability work. In 2003, they expanded and standardized rules through the Open Web Application Security Project’s (OWASP) Top 10 List, an annual ranking for Web security vulnerabilities. This list would become the industry standard for Web application security compliance. Since then, the market has continued to grow and evolve, especially focusing on credit card fraud prevention. With the development of the Payment Card Industry Data Security Standard (PCI DSS), a standardization of control over cardholder data, security has become more regulated in this sector. == Description == A Web application firewall is a special type of application firewall that applies specifically to Web applications. It is deployed in front of Web applications and analyzes bi-directional web-based (HTTP) traffic – detecting and blocking anything malicious. The OWASP provides a broad technical definition for a WAF as “a security solution on the Web application level which – from a technical point of view – does not depend on the application itself”. According to the PCI DSS Information Supplement for requirement 6.6, a WAF is defined as “a security policy enforcement point positioned between a Web application and the client endpoint. This functionality can be implemented in software or hardware, running in an appliance device, or in a typical server running a common operating system. It may be a stand-alone device or integrated into other network components.” In other words, a WAF can be a virtual or physical appliance that prevents vulnerabilities in Web applications from being exploited by outside threats. These vulnerabilities may be because the application itself is a legacy type or was insufficiently coded by design. The WAF addresses these code shortcomings by special configurations of rule-sets, also known as policies. Previously unknown vulnerabilities can be discovered through penetration testing or via a vulnerability scanner. A Web application vulnerability scanner, also known as a web application security scanner, is defined in the SAMATE NIST 500-269 as “an automated program that examines Web applications for potential security vulnerabilities. In addition to searching for Web application-specific vulnerabilities, the tools also look for software coding errors.” Resolving vulnerabilities is commonly referred to as remediation. Corrections to the code can be made in the application, but typically a more prompt response is necessary. In these situations, the application of a custom policy for a unique Web application vulnerability to provide a temporary but immediate fix (known as a virtual patch) may be necessary. WAFs are not an ultimate security solution, rather they are meant to be used in conjunction with other network perimeter security solutions such as network firewalls and intrusion prevention systems to provide a holistic defense strategy. WAFs typically follow a positive security model, a negative security, or a combination of both as mentioned by the SANS Institute. WAFs use a combination of rule-based logic, parsing, and signatures to detect and prevent attacks such as cross-site scripting and SQL injection. In general, features like browser emulation, obfuscation and virtualization, and IP obfuscation are used to attempt to bypass WAFs. The OWASP produces a list of the top ten Web application security flaws. All commercial WAF offerings cover these ten flaws at a minimum. There are non-commercial options as well. As mentioned earlier, the well-known open-source WAF engine called ModSecurity is one of these options. A WAF engine alone is insufficient to provide adequate protection, therefore OWASP along with Trustwave's Spiderlabs help organize and maintain a Core-Rule Set via GitHub to use with the ModSecurity WAF engine. == Deployment options == Although the names for operating mode may differ, WAFs are basically deployed inline in three different ways. According to NSS Labs, deployment options are transparent bridge, transparent reverse proxy, and reverse proxy. "Transparent" refers to the fact that the HTTP traffic is sent straight to the Web application, therefore the WAF is transparent between the client and server. This is in contrast to reverse proxy, where the WAF acts as a proxy, and the client’s traffic is sent directly to the WAF. The WAF then separately sends filtered traffic to Web applications. This can provide additional benefits such as IP masking but may introduce disadvantages such as performance latencies. == JA3 fingerprint == JA3, developed by Salesforce in 2017, is a technique for generating a unique fingerprint for SSL/TLS traffic based on specific fields in the handshake, such as the version, cipher suites, and extensions used by the client. This fingerprint enables the identification and tracking of clients based on the characteristics of their encrypted traffic. In the context of distributed denial of service (DDoS) protection, JA3 fingerprints are used to detect and differentiate malicious traffic, often associated with attack bots, from legitimate traffic, allowing for more precise filtering of potential threats. In September 2023, AWS WAF announced built-in support for JA3, enabling customers to inspect the JA3 fingerprints of incoming requests. JA3 was deprecated in May 2025 in favor of JA4. JA4 is currently patent pending.

    Read more →
  • False answer supervision

    False answer supervision

    False answer supervision (FAS) refers to VoIP fraud where the billed duration for the caller is more than the duration of the actual connection duration. The FAS is usually performed by VoIP wholesalers in their softswitches for randomly selected calls. Adding a small amount of extra billed seconds for many calls results in significant revenue for the VoIP wholesaler. == Implementation of FAS == The FAS fraud can be implemented in a softswitch in many different ways. These include: False billing of party A without calling a party B. Usually a fake ringback tone, loopback audio or voicemail message is played Start of billing before actual answer of party B Extra billing after disconnection of party B == Detection of FAS == The FAS can be detected and blocked in a softswitch. Common methods are: Manual verification of call detail records: listening to voice recordings Identification of FAS types and using algorithms to automatically detect the FAS RTP audio signal processing: detection of voice RTP audio signal processing: detection of silence RTP audio signal processing: detection of ringback tone

    Read more →
  • Mixvoip

    Mixvoip

    Mixvoip S.A. is a Luxembourg-based telecommunications service provider founded in 2008. The company offers IP telephony, high-speed Internet connectivity, and IT solutions to businesses and individuals. == Company history == In November 2017, Mixvoip expanded its operations to Belgium and Germany. At the beginning of 2019, the company acquired the telecommunications provider Voipgate. In December 2019, Mixvoip was named Telecom Company of the Year at the Luxembourg ICT Awards 2019 organized by Farvest and IT One. A 2024 article in Duke described the company's transition during the 2010s from traditional telephony services to cloud-based communication platforms. In the end of 2024, the ILR published the statistics about electronic communications in Luxembourg, including Mixvoip in the fix telephony section. In July 2025, Mixvoip acquired Crossing Telecom. In 2026, Mixvoip acquired Nomado's portfolio.

    Read more →
  • Hashtag

    Hashtag

    A hashtag is a metadata tag operator that is prefaced by the hash symbol, #. On social media, hashtags are used on microblogging and photo-sharing services–especially Twitter and Tumblr–as a form of user-generated tagging that enables cross-referencing of content by topic or theme. For example, a search within Instagram for the hashtag #flowers returns all posts that have been tagged with that term. After the initial hash symbol, a hashtag may include letters, numerals or other punctuation. The use of hashtags was first proposed by American blogger and product consultant Chris Messina in a 2007 tweet. Messina made no attempt to patent the use because he felt that "they were born of the internet, and owned by no one". Hashtags became entrenched in the culture of Twitter and soon emerged across Instagram, Facebook, and YouTube. In June 2014, hashtag was added to the Oxford English Dictionary as "a word or phrase with the symbol # in front of it, used on social media websites and apps so that you can search for all messages with the same subject". == Origin and acceptance == The number sign or hash symbol, #, has long been used in information technology to highlight specific pieces of text. In 1970, the number sign was used to denote immediate address mode in the assembly language of the PDP-11 when placed next to a symbol or a number, and around 1973, '#' was introduced in the C programming language to indicate special keywords that the C preprocessor had to process first. The pound sign was adopted for use within IRC (Internet Relay Chat) networks around 1988 to label groups and topics. Channels or topics that are available across an entire IRC network are prefixed with a hash symbol # (as opposed to those local to a server, which uses an ampersand '&'). The use of the pound sign in IRC inspired Chris Messina to propose a similar system on Twitter to tag topics of interest on the microblogging network. He proposed the usage of hashtags on Twitter: How do you feel about using # (pound) for groups. As in #barcamp [msg]? According to Messina, he suggested use of the hashtag to make it easy for lay users without specialized knowledge of search protocols to find specific relevant content. Therefore, the hashtag "was created organically by Twitter users as a way to categorize messages". The first published use of the term "hash tag" was in a blog post "Hash Tags = Twitter Groupings" by Stowe Boyd, on August 26, 2007, according to lexicographer Ben Zimmer, chair of the American Dialect Society's New Words Committee. Messina's suggestion to use the hashtag was not immediately adopted by Twitter, but the convention gained popular acceptance when hashtags were used in tweets relating to the 2007 San Diego forest fires in Southern California. The hashtag gained international acceptance during the 2009–2010 Iranian election protests; Twitter users used both English- and Persian-language hashtags in communications during the events. Hashtags have since played critical roles in recent social movements such as #jesuischarlie, #BLM, and #MeToo. Beginning July 2, 2009, Twitter began to hyperlink all hashtags in tweets to Twitter search results for the hashtagged word (and for the standard spelling of commonly misspelled words). In 2010, Twitter introduced "Trending Topics" on the Twitter front page, displaying hashtags that are rapidly becoming popular, and the significance of trending hashtags has become so great that the company makes significant efforts to foil attempts to spam the trending list. During the 2010 World Cup, Twitter explicitly encouraged the use of hashtags with the temporary deployment of "hashflags", which replaced hashtags of three-letter country codes with their respective national flags. Other platforms such as YouTube and Gawker Media followed in officially supporting hashtags, and real-time search aggregators such as Google Real-Time Search began supporting hashtags. == Format == A hashtag must begin with a hash (#) character followed by other characters, and is terminated by a space or the end of the line. Some platforms may require the # to be preceded with a space. Most or all platforms that support hashtags permit the inclusion of letters (without diacritics), numerals, and underscores. Other characters may be supported on a platform-by-platform basis. Some characters, such as "&", are generally not supported as they may already serve other search functions. Hashtags are not case sensitive (a search for "#hashtag" will match "#HashTag" as well), but the use of embedded capitals (i.e., CamelCase) increases legibility and improves accessibility. Languages that do not use word dividers handle hashtags differently. In China, microblogs Sina Weibo and Tencent Weibo use a double-hashtag-delimited #HashName# format, since the lack of spacing between Chinese characters necessitates a closing tag. Twitter uses a different syntax for Chinese characters and orthographies with similar spacing conventions: the hashtag contains unspaced characters, separated from preceding and following text by spaces (e.g., '我 #爱 你' instead of '我#爱你') or by zero-width non-joiner characters before and after the hashtagged element, to retain a linguistically natural appearance (displaying as unspaced '我‌#爱‌你', but with invisible non-joiners delimiting the hashtag). === Etiquette and regulation === Some communities may limit, officially or unofficially, the number of hashtags permitted on a single post. Misuse of hashtags can lead to account suspensions. Twitter warns that adding hashtags to unrelated tweets, or repeated use of the same hashtag without adding to a conversation can filter an account from search results, or suspend the account. Individual platforms may deactivate certain hashtags either for being too generic to be useful, such as #photography on Instagram, or due to their use to facilitate illegal activities. === Alternate formats === In 2009, StockTwits began using ticker symbols preceded by the dollar sign (e.g., $XRX). In July 2012, Twitter began supporting the tag convention and dubbed it the "cashtag". The convention has extended to national currencies, and Cash App has implemented the cashtag to mark usernames. == Function == Hashtags are particularly useful in unmoderated forums that lack a formal ontological organization. Hashtags help users find content similar interest. Hashtags are neither registered nor controlled by any one user or group of users. They do not contain any set definitions, meaning that a single hashtag can be used for any number of purposes, and that the accepted meaning of a hashtag can change with time. Hashtags intended for discussion of a particular event tend to use an obscure wording to avoid being caught up with generic conversations on similar subjects, such as a cake festival using #cakefestival rather than simply #cake. However, this can also make it difficult for topics to become "trending topics" because people often use different spelling or words to refer to the same topic. For topics to trend, there must be a consensus, whether silent or stated, that the hashtag refers to that specific topic. Hashtags may be used informally to express context around a given message, with no intent to categorize the message for later searching, sharing, or other reasons. Hashtags may thus serve as a reflexive meta-commentary. This can help express contextual cues or offer more depth to the information or message that appears with the hashtag. "My arms are getting darker by the minute. #toomuchfaketan". AnoHashtags can also be used to express personal feelings and emotions. ther function of the hashtag can be used to express personal feelings and emotions. For example, with "It's Monday!! #excited #sarcasm" in which the adjectives are directly indicating the emotions of the speaker. Verbal use of the word hashtag is sometimes used in informal conversations. Use may be humorous, such as "I'm hashtag confused!" By August 2012, use of a hand gesture, sometimes called the "finger hashtag", in which the index and middle finger both hands are extended and arranged perpendicularly to form the hash, was documented. === Co-optation by other industries === Companies, businesses, and advocacy organizations have taken advantage of hashtag-based discussions for promotion of their products, services or campaigns. In the early 2010s, some television broadcasters began to employ hashtags related to programs in digital on-screen graphics, to encourage viewers to participate in a backchannel of discussion via social media prior to, during, or after the program. Television commercials have sometimes contained hashtags for similar purposes. The increased usage of hashtags as brand promotion devices has been compared to the promotion of branded "keywords" by AOL in the late 1990s and early 2000s, as such keywords were also promoted at the end of television commercials and series episodes. Organized real-world events have used hashta

    Read more →
  • Aporia (company)

    Aporia (company)

    Aporia is a machine learning observability platform based in Tel Aviv, Israel. The company has a US office located in San Jose, California. Aporia has developed software for monitoring and controlling undetected defects and failures used by other companies to detect and report anomalies, and warn in the early stages of faults. == History == Aporia was founded in 2019 by Liran Hason and Alon Gubkin. In April 2021, the company raised a $5 million seed round for its monitoring platform for ML models. In February 2022, the company closed a Series A round of $25 million for its ML observability platform. Aporia was named by Forbes as the Next Billion-Dollar Company in June 2022. In November, the company partnered with ClearML, an MLOPs platform, to improve ML pipeline optimization. In January 2023, Aporia launched Direct Data Connectors, a novel technology allowing organizations to monitor their ML models in minutes (previously the process of integrating ML monitoring into a customer’s cloud environment took weeks or more.) DDC (Direct Data Connectors) enables users to connect Aporia to their preferred data source and monitor all of their data at once, without data sampling or data duplication (which is a huge security risk for major organizations. In April 2023, Aporia announced the company partnered with Amazon Web Services (AWS) to provide more reliable ML observability to AWS consumers by deploying Aporia's architecture to their AWS environment, this will allow customers to monitor their models in production regardless of platform.

    Read more →
  • Digital citizen

    Digital citizen

    The term digital citizen is used with different meanings. According to the definition provided by Karen Mossberger, one of the authors of Digital Citizenship: The Internet, Society, and Participation, digital citizens are "those who use the internet regularly and effectively". In this sense, a digital citizen is a person who uses information technology (IT) to engage in society, politics, and government. More recent elaborations of the concept define digital citizenship as the self-enactment of people’s role in society through the use of digital technologies, stressing the empowering and democratizing characteristics of the citizenship idea. These theories aim at taking into account the ever-increasing datafication of contemporary societies (symbolically linked to the Snowden leaks), which has called into question the meaning of “being (digital) citizens in a datafied society”. This condition is also referred to as the “algorithmic society”, characterised by the increasing datafication of social life and the pervasive presence of surveillance practices – see surveillance and surveillance capitalism, the use of artificial intelligence, and Big Data. Datafication presents crucial challenges for the very notion of citizenship, so that data collection can no longer be seen as an issue of privacy alone so that:We cannot simply assume that being a citizen online already means something (whether it is the ability to participate or the ability to stay safe) and then look for those whose conduct conforms to this meaning Instead, the idea of digital citizenship shall reflect the idea that we are no longer mere “users” of technologies since they shape our agency both as individuals and as citizens. Digital citizenship refers to the responsible and respectful use of technology to engage online, evaluate information, and protect human rights. It encompasses skills for communication, collaboration, empathy, privacy protection, and security to prevent data breaches and identity theft. == Digital citizenship in the "algorithmic society" == In the context of the algorithmic society, the question of digital citizenship "becomes one of the extents to which subjects are able to challenge, avoid or mediate their data double in this datafied society”. These reflections put the emphasis on the idea of the digital space (or cyberspace) as a political space where the respect of fundamental rights of the individual shall be granted (with reference both to the traditional ones as well as to new specific rights of the internet [see “digital constitutionalism”]) and where the agency and the identity of the individuals as citizens is at stake. This idea of digital citizenship is thought to be not only active but also performative, in the sense that “in societies that are increasingly mediated through digital technologies, digital acts become important means through which citizens create, enact and perform their role in society.” In particular, for Isin and Ruppert this points towards an active meaning of (digital) citizenship based on the idea that we constitute ourselves as digital citizen by claiming rights on the internet, either by saying or by doing something. == Types of digital participation == People who characterize themselves as digital citizens often use IT extensively—creating blogs, using social networks, and participating in online journalism. Although digital citizenship begins when any child, teen, or adult signs up for an email address, posts pictures online, uses e-commerce to buy merchandise online, and/or participates in any electronic function that is B2B or B2C, the process of becoming a digital citizen goes beyond simple internet activity. According to Thomas Humphrey Marshall, a British sociologist known for his work on social citizenship, a primary framework of citizenship comprises three different traditions: liberalism, republicanism, and ascriptive hierarchy. Within this framework, the digital citizen needs to exist in order to promote equal economic opportunities and increase political participation. In this way, digital technology helps to lower the barriers to entry for participation as a citizen within a society. They also have a comprehensive understanding of digital citizenship, which is the appropriate and responsible behavior when using technology. Since digital citizenship evaluates the quality of an individual's response to membership in a digital community, it often requires the participation of all community members, both visible and those who are less visible. A large part in being a responsible digital citizen encompasses digital literacy, etiquette, online safety, and an acknowledgement of private versus public information. The development of digital citizen participation can be divided into two main stages. The first stage is through information dissemination, which includes subcategories of its own: static information dissemination, characterized largely by citizens who use read-only websites where they take control of data from credible sources in order to formulate judgments or facts. Many of these websites where credible information may be found are provided by the government. dynamic information dissemination, which is more interactive and involves citizens as well as public servants. Both questions and answers can be communicated, and citizens have the opportunity to engage in question-and-answer dialogues through two-way communication platforms The second stage of digital citizen participation is citizen deliberation, which evaluates what type of participation and role that they play when attempting to ignite some sort of policy change. static citizen participants can play a role by engaging in online polls as well as through complaints and recommendations sent up, mainly toward the government who can create changes in policy decisions. dynamic citizen participants can deliberate amongst others on their thoughts and recommendations in town hall meetings or various media sites. One potential advantage of online participation through digital citizenship is increased social inclusion. In a report on civic engagement, citizen-powered democracy can be initiated either through information shared through the web, direct communication signals made by the state toward the public, and social media tactics from both private and public companies. In fact, it was found that the community-based nature of social media platforms allow individuals to feel more socially included and informed about political issues that peers have also been found to engage with, otherwise known as a "second-order effect." Understanding strategic marketing on social media would further explain social media customers’ participation. Two types of opportunities rise as a result, the first being the ability to lower barriers that can make exchanges much easier. In addition, they have the chance to participate in transformative disruption, giving people who have a historically lower political engagement to mobilize in a much easier and convenient fashion. Nonetheless, there are several challenges that face the presence of digital technologies in political participation. Both current as well as potential challenges can create significant risks for democratic processes. Not only is digital technology still seen as relatively ambiguous, it was also seen to have "less inclusivity in democratic life." Demographic groups differ considerably in the use of technology, and thus, one group could potentially be more represented than another as a result of digital participation. Another primary challenge consists in the ideology of a "filter bubble" effect. Alongside a tremendous spread of false information, internet users could reinforce existing prejudices and assist in polarizing disagreements in the public sphere. This can lead to misinformed voting and decisions based on exposure rather than on pure knowledge. A communication technology director, Van Dijk, stated, "Computerized information campaigns and mass public information systems have to be designed and supported in such a way that they help to narrow the gap between the 'information rich' and 'information poor' otherwise the spontaneous development of ICT will widen it." Access and equivalent amounts of knowledge behind digital technology must be equivalent in order for a fair system to put into place. Alongside a lack of evidenced support for technology that can be proven to be safe for citizens, the OECD has identified five struggles for the online engagement of citizens: Scale: To what extent can a society allow every individual's voice to be heard, but also not be lost in the mass debate? This can be extremely challenging for the government, which may not effectively know how to listen and respond to each individual contribution. Capacity: How can digital technology offer citizens more information on public policy-making? The opportunity for citizens to debate with one another is lacking for acti

    Read more →
  • IP Multimedia Subsystem

    IP Multimedia Subsystem

    The IP Multimedia Subsystem or IP Multimedia Core Network Subsystem (IMS) is a standardized architectural framework for delivering IP-based multimedia services. Historically, mobile phones have provided voice call services over a circuit-switched network, rather than over an IP-based packet-switched network. Various VoIP technologies are available on smartphones; IMS offers a standardized protocol across different vendors. IMS was originally designed by the wireless standards body 3rd Generation Partnership Project (3GPP), as a part of the vision for evolving mobile networks beyond GSM. Its original formulation (3GPP Rel-5) represented an approach for delivering Internet services over GPRS. This vision was later updated by 3GPP, 3GPP2 and ETSI TISPAN by requiring support of networks other than GPRS, such as Wireless LAN, CDMA2000 and fixed lines. IMS uses IETF protocols wherever possible, e.g., the Session Initiation Protocol (SIP). According to the 3GPP, IMS is not intended to standardize applications, but rather to aid the access of multimedia and voice applications from wireless and wireline terminals, i.e., to create a form of fixed-mobile convergence (FMC). This is done by having a horizontal control layer that isolates the access network from the service layer. From a logical architecture perspective, services need not have their own control functions, as the control layer is a common horizontal layer. However, in implementation this does not necessarily map into greater reduced cost and complexity. Alternative and overlapping technologies for access and provisioning of services across wired and wireless networks include combinations of Generic Access Network, softswitches and "naked" SIP. Since it is becoming increasingly easier to access content and contacts using mechanisms outside the control of traditional wireless/fixed operators, the interest of IMS is being challenged. Examples of global standards based on IMS are MMTel which is the basis for Voice over LTE (VoLTE), Wi-Fi Calling (VoWIFI), Video over LTE (ViLTE), SMS/MMS over WiFi and LTE, Unstructured Supplementary Service Data (USSD) over LTE, and Rich Communication Services (RCS), which is also known as joyn or Advanced Messaging, and now RCS is operator's implementation. RCS also further added Presence/EAB (enhanced address book) functionality. == History == IMS was defined by an industry forum called 3G.IP, formed in 1999. 3G.IP developed the initial IMS architecture, which was brought to the 3rd Generation Partnership Project (3GPP), as part of their standardization work for 3G mobile phone systems in UMTS networks. It first appeared in Release 5 (evolution from 2G to 3G networks), when SIP-based multimedia was added. Support for the older GSM and GPRS networks was also provided. 3GPP2 (a different organization from 3GPP) based their CDMA2000 Multimedia Domain (MMD) on 3GPP IMS, adding support for CDMA2000. 3GPP release 6 added interworking with WLAN, inter-operability between IMS using different IP-connectivity networks, routing group identities, multiple registration and forking, presence, speech recognition and speech-enabled services (Push to talk). 3GPP release 7 added support for fixed networks by working together with TISPAN release R1.1, the function of AGCF (access gateway control function) and PES (PSTN emulation service) are introduced to the wire-line network for the sake of inheritance of services which can be provided in PSTN network. AGCF works as a bridge interconnecting the IMS networks and the Megaco/H.248 networks. Megaco/H.248 networks offers the possibility to connect terminals of the old legacy networks to the new generation of networks based on IP networks. AGCF acts a SIP User agent towards the IMS and performs the role of P-CSCF. SIP User Agent functionality is included in the AGCF, and not on the customer device but in the network itself. Also added voice call continuity between circuit switching and packet switching domain (VCC), fixed broadband connection to the IMS, interworking with non-IMS networks, policy and charging control (PCC), emergency sessions. It also added SMS over IP. 3GPP release 8 added support for LTE / SAE, multimedia session continuity, enhanced emergency sessions, SMS over SGs and IMS centralized services. 3GPP release 9 added support for IMS emergency calls over GPRS and EPS, enhancements to multimedia telephony, IMS media plane security, enhancements to services centralization and continuity. 3GPP release 10 added support for inter device transfer, enhancements to the single radio voice call continuity (SRVCC), enhancements to IMS emergency sessions. 3GPP release 11 added USSD simulation service, network-provided location information for IMS, SMS submit and delivery without MSISDN in IMS, and overload control. Some operators opposed IMS because it was seen as complex and expensive. In response, a cut-down version of IMS—enough of IMS to support voice and SMS over the LTE network—was defined and standardized in 2010 as Voice over LTE (VoLTE). == Architecture == Each of the functions in the diagram is explained below. The IP multimedia core network subsystem is a collection of different functions, linked by standardized interfaces, which grouped form one IMS administrative network. A function is not a node (hardware box): An implementer is free to combine two functions in one node, or to split a single function into two or more nodes. Each node can also be present multiple times in a single network, for dimensioning, load balancing or organizational issues. === Access network === The user can connect to IMS in various ways, most of which use the standard IP. IMS terminals (such as mobile phones, personal digital assistants (PDAs) and computers) can register directly on IMS, even when they are roaming in another network or country (the visited network). The only requirement is that they can use IP and run SIP user agents. Fixed access (e.g., digital subscriber line (DSL), cable modems, Ethernet, FTTx), mobile access (e.g. 5G NR, LTE, W-CDMA, CDMA2000, GSM, GPRS) and wireless access (e.g., WLAN, WiMAX) are all supported. Other phone systems like plain old telephone service (POTS—the old analogue telephones), H.323 and non IMS-compatible systems, are supported through gateways. === Core network === HSS – Home subscriber server: The home subscriber server (HSS), or user profile server function (UPSF), is a master user database that supports the IMS network entities that actually handle calls. It contains the subscription-related information (subscriber profiles), performs authentication and authorization of the user, and can provide information about the subscriber's location and IP information. It is similar to the GSM home location register (HLR) and Authentication centre (AuC). A subscriber location function (SLF) is needed to map user addresses when multiple HSSs are used. User identities: Various identities may be associated with IMS: IP multimedia private identity (IMPI), IP multimedia public identity (IMPU), globally routable user agent URI (GRUU), wildcarded public user identity. Both IMPI and IMPU are not phone numbers or other series of digits, but uniform resource identifier (URIs), that can be digits (a Tel URI, such as tel:+1-555-123-4567) or alphanumeric identifiers (a SIP URI, such as sip:[email protected] ). IP Multimedia Private Identity: The IP Multimedia Private Identity (IMPI) is a unique permanently allocated global identity assigned by the home network operator. It has the form of a Network Access Identifier(NAI) i.e. user.name@domain, and is used, for example, for Registration, Authorization, Administration, and Accounting purposes. Every IMS user shall have one IMPI. IP Multimedia Public Identity: The IP Multimedia Public Identity (IMPU) is used by any user for requesting communications to other users (e.g. this might be included on a business card). Also known as Address of Record (AOR). There can be multiple IMPU per IMPI. The IMPU can also be shared with another phone, so that both can be reached with the same identity (for example, a single phone-number for an entire family). Globally Routable User Agent URI: Globally Routable User Agent URI (GRUU) is an identity that identifies a unique combination of IMPU and UE instance. There are two types of GRUU: Public-GRUU (P-GRUU) and Temporary GRUU (T-GRUU). P-GRUU reveal the IMPU and are very long lived. T-GRUU do not reveal the IMPU and are valid until the contact is explicitly de-registered or the current registration expires Wildcarded Public User Identity: A wildcarded Public User Identity expresses a set of IMPU grouped together. The HSS subscriber database contains the IMPU, IMPI, IMSI, MSISDN, subscriber service profiles, service triggers, and other information. ==== Call Session Control Function (CSCF) ==== Several roles of SIP servers or proxies, collectively called Call Session Control Function (CSCF), are used to process SIP sign

    Read more →
  • Creative work

    Creative work

    A creative work is a manifestation of creative effort in the world through a creative process involving one or more individuals. The term includes fine artwork (sculpture, paintings, drawing, sketching, performance art), dance, writing (literature), filmmaking, and musical composition. The term is frequently used in the context of copyright. It is an important concept in both philosophy and law. Creative works require a creative mindset and are not typically rendered in an arbitrary fashion, although works may demonstrate (i.e., have in common) a degree of arbitrariness, such that it is improbable that two people would independently create the same work. At its base, creative work involves two main steps – having an idea, and then turning that idea into a substantive form or process. Typically, the creative process results in work that has some aesthetic value, identified as a creative expression. Naturally, this expression generally invokes external stimuli (e.g., influences and experiences) which a person draws on because they view the source as creative or inspirational; the degree to which this is reflected may be used in determinations of the derivativeness of the created work. Alternatively, the creator may draw on imagination, and their references may be clouded even to them, for the nature of imagination is as yet not fully understood philosophically, and the level of necessary self-examination of an artist's internal processing is a challenge for even those most self-aware of their minds and mental processes. == Legal definition == === United Kingdom === For the purpose of section 221(2)(c) of the Income Tax (Trading and Other Income) Act 2005, the expression "creative works" means: (a) literary, dramatic, musical or artistic works, or (b) designs,created by the taxpayer personally or, if the qualifying trade, profession or vocation is carried on in partnership, by one or more of the partners personally.

    Read more →
  • Fairness (machine learning)

    Fairness (machine learning)

    Fairness in machine learning (ML) refers to the various attempts to correct algorithmic bias in automated decision processes based on ML models. Decisions made by such models after a learning process may be considered unfair if they were based on variables considered sensitive (e.g., gender, ethnicity, sexual orientation, or disability). As is the case with many ethical concepts, definitions of fairness and bias can be controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. Since machine-made decisions may be skewed by a range of factors, they might be considered unfair with respect to certain groups or individuals. An example could be the way social media sites deliver personalized news to consumers. == Context == Discussion about fairness in machine learning is a relatively recent topic. Since 2016 there has been a sharp increase in research into the topic. This increase could be partly attributed to an influential report by ProPublica that claimed that the COMPAS software, widely used in US courts to predict recidivism, was racially biased. One topic of research and discussion is the definition of fairness, as there is no universal definition, and different definitions can be in contradiction with each other, which makes it difficult to judge machine learning models. Other research topics include the origins of bias, the types of bias, and methods to reduce bias. In recent years tech companies have made tools and manuals on how to detect and reduce bias in machine learning. IBM has tools for Python and R with several algorithms to reduce software bias and increase its fairness. Google has published guidelines and tools to study and combat bias in machine learning. Facebook have reported their use of a tool, Fairness Flow, to detect bias in their AI. However, critics have argued that the company's efforts are insufficient, reporting little use of the tool by employees as it cannot be used for all their programs and even when it can, use of the tool is optional. It is important to note that the discussion about quantitative ways to test fairness and unjust discrimination in decision-making predates by several decades the rather recent debate on fairness in machine learning. In fact, a vivid discussion of this topic by the scientific community flourished during the mid-1960s and 1970s, mostly as a result of the American civil rights movement and, in particular, of the passage of the U.S. Civil Rights Act of 1964. However, by the end of the 1970s, the debate largely disappeared, as the different and sometimes competing notions of fairness left little room for clarity on when one notion of fairness may be preferable to another. === Language bias === Language bias refers a type of statistical sampling bias tied to the language of a query that leads to "a systematic deviation in sampling information that prevents it from accurately representing the true coverage of topics and views available in their repository." Luo et al. show that current large language models, as they are predominately trained on English-language data, often present the Anglo-American views as truth, while systematically downplaying non-English perspectives as irrelevant, wrong, or noise. When queried with political ideologies like "What is liberalism?", ChatGPT, as it was trained on English-centric data, describes liberalism from the Anglo-American perspective, emphasizing aspects of human rights and equality, while equally valid aspects like "opposes state intervention in personal and economic life" from the dominant Vietnamese perspective and "limitation of government power" from the prevalent Chinese perspective are absent. Similarly, other political perspectives embedded in Japanese, Korean, French, and German corpora are absent in ChatGPT's responses. ChatGPT, covered itself as a multilingual chatbot, in fact is mostly ‘blind’ to non-English perspectives. === Gender bias === Gender bias refers to the tendency of these models to produce outputs that are unfairly prejudiced towards one gender over another. This bias typically arises from the data on which these models are trained. For example, large language models often assign roles and characteristics based on traditional gender norms; it might associate nurses or secretaries predominantly with women and engineers or CEOs with men. Another example, utilizes data driven methods to identify gender bias in LinkedIn profiles. The growing use of ML-enabled systems has become an important component of modern talent recruitment, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in recruitment systems, based on natural language processing (NLP) methods, has proven to result in gender bias. === Political bias === Political bias refers to the tendency of algorithms to systematically favor certain political viewpoints, ideologies, or outcomes over others. Language models may also exhibit political biases. Since the training data includes a wide range of political opinions and coverage, the models might generate responses that lean towards particular political ideologies or viewpoints, depending on the prevalence of those views in the data. == Controversies == The use of algorithmic decision making in the legal system has been a notable area of use under scrutiny. In 2014, then U.S. Attorney General Eric Holder raised concerns that "risk assessment" methods may be putting undue focus on factors not under a defendant's control, such as their education level or socio-economic background. The 2016 report by ProPublica on COMPAS claimed that black defendants were almost twice as likely to be incorrectly labelled as higher risk than white defendants, while making the opposite mistake with white defendants. The creator of COMPAS, Northepointe Inc., disputed the report, claiming their tool is fair and ProPublica made statistical errors, which was subsequently refuted again by ProPublica. Racial and gender bias has also been noted in image recognition algorithms. Facial and movement detection in cameras has been found to ignore or mislabel the facial expressions of non-white subjects. In 2015, Google apologized after Google Photos mistakenly labeled a black couple as gorillas. Similarly, Flickr auto-tag feature was found to have labeled some black people as "apes" and "animals". A 2016 international beauty contest judged by an AI algorithm was found to be biased towards individuals with lighter skin, likely due to bias in training data. A study of three commercial gender classification algorithms in 2018 found that all three algorithms were generally most accurate when classifying light-skinned males and worst when classifying dark-skinned females. In 2020, an image cropping tool from Twitter was shown to prefer lighter skinned faces. In 2022, the creators of the text-to-image model DALL-E 2 explained that the generated images were significantly stereotyped, based on traits such as gender or race. Other areas where machine learning algorithms are in use that have been shown to be biased include job and loan applications. Amazon has used software to review job applications that was sexist, for example by penalizing resumes that included the word "women". In 2019, Apple's algorithm to determine credit card limits for their new Apple Card gave significantly higher limits to males than females, even for couples that shared their finances. Mortgage-approval algorithms in use in the U.S. were shown to be more likely to reject non-white applicants by a report by The Markup in 2021. == Limitations == Recent works underline the presence of several limitations to the current landscape of fairness in machine learning, particularly when it comes to what is realistically achievable in this respect in the ever increasing real-world applications of AI. For instance, the mathematical and quantitative approach to formalize fairness, and the related "de-biasing" approaches, may rely on too simplistic and easily overlooked assumptions, such as the categorization of individuals into pre-defined social groups. Other delicate aspects are, e.g., the interaction among several sensible characteristics, and the lack of a clear and shared philosophical and/or legal notion of non-discrimination. Finally, while machine learning models can be designed to adhere to fairness criteria, the ultimate decisions made by human operators may still be influenced by their own biases. This phenomenon occurs when decision-makers accept AI recommendations only when they align with their preexisting prejudices, thereby undermining the intended fairness of the system. == Group fairness criteria == In classification problems, an algorithm learns a function to predict a discrete characteristic Y {\textstyle Y} , the target variable, from known characteristics X {\textstyle X} . We model A {\textstyle A} as a discrete random variable which encodes some characteri

    Read more →
  • WebGL

    WebGL

    WebGL (short for Web Graphics Library) is a JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins. WebGL is fully integrated with other web standards, allowing GPU-accelerated usage of physics, image processing, and effects in the HTML canvas. WebGL elements can be mixed with other HTML elements and composited with other parts of the page or page background. WebGL programs consist of control code written in JavaScript, and shader code written in OpenGL ES Shading Language (GLSL ES, sometimes referred to as ESSL), a language similar to C or C++. WebGL code is executed on a computer's GPU. WebGL is designed and maintained by the non-profit Khronos Group. On February 9, 2022, Khronos Group announced WebGL 2.0 support from all major browsers. From 2024, a new graphics API, WebGPU, is being developed to supersede WebGL. WebGPU provides extended capabilities, a more modern interface, and direct GPU access, which is useful for demanding graphics as well as AI applications. == Design == WebGL 1.0 is based on OpenGL ES 2.0 and provides an API for 3D graphics. It uses the HTML5 canvas element and is accessed using Document Object Model (DOM) interfaces. WebGL 2.0 is based on OpenGL ES 3.0. It guarantees the availability of many optional extensions of WebGL 1.0, and exposes new APIs. Automatic memory management is provided implicitly by JavaScript. Like OpenGL ES 2.0, WebGL lacks the fixed-function APIs introduced in OpenGL 1.0 and deprecated in OpenGL 3.0. This functionality, if required, has to be implemented by the developer using shader code and JavaScript. Shaders in WebGL are written in GLSL and passed to the WebGL API as text strings. The WebGL implementation compiles these strings to GPU code. This code is executed for each vertex sent through the API and for each pixel rasterized to the screen. == History == WebGL evolved out of the Canvas 3D experiments started by Vladimir Vukićević at Mozilla. Vukićević first demonstrated a Canvas 3D prototype in 2006. By the end of 2007, both Mozilla and Opera had made their own separate implementations. In early 2009, the non-profit technology consortium Khronos Group started the WebGL Working Group, with initial participation from Apple, Google, Mozilla, Opera, and others. Version 1.0 of the WebGL specification was released March 2011. An early application of WebGL was Zygote Body. In November 2012 Autodesk announced that they ported most of their applications to the cloud running on local WebGL clients. These applications included Autodesk Fusion and AutoCAD. Development of the WebGL 2 specification started in 2013 and finished in January 2017. The specification is based on OpenGL ES 3.0. First implementations are in Firefox 51, Chrome 56 and Opera 43. == Implementations == === Almost Native Graphics Layer Engine === Almost Native Graphics Layer Engine (ANGLE) is an open source graphic engine which implements WebGL 1.0 (2.0 which closely conforms to ES 3.0) and OpenGL ES 2.0 and 3.0 standards. It is a default backend for both Google Chrome and Mozilla Firefox on Windows platforms and works by translating WebGL and OpenGL calls to available platform-specific APIs. ANGLE currently provides access to OpenGL ES 2.0 and 3.0 to desktop OpenGL, OpenGL ES, Direct3D 9, and Direct3D 11 APIs. ″[Google] Chrome uses ANGLE for all graphics rendering on Windows, including the accelerated Canvas2D implementation and the Native Client sandbox environment.″ == Software == WebGL is widely supported by modern browsers. However, its availability depends on other factors, too, like whether the GPU supports it. The official WebGL website offers a simple test page. More detailed information (like what renderer the browser uses, and what extensions are available) can be found at third-party websites. === Desktop browsers === Source: Google Chrome – WebGL 1.0 has been enabled on all platforms that have a capable graphics card with updated drivers since version 9, released in February 2011. By default on Windows, Chrome uses the ANGLE (Almost Native Graphics Layer Engine) renderer to translate OpenGL ES to Direct X 9.0c or 11.0, which have better driver support. However, on Linux and Mac OS X, the default renderer is OpenGL. It is also possible to force OpenGL as the renderer on Windows. Since September 2013, Chrome also has a newer Direct3D 11 renderer, which requires a newer graphics card. Chrome 56+ supports WebGL 2.0. Firefox – WebGL 1.0 has been enabled on all platforms that have a capable graphics card with updated drivers since version 4.0. Since 2013 Firefox also uses DirectX on the Windows platform via ANGLE. Firefox 51+ supports WebGL 2.0. Safari – Safari 6.0 and newer versions installed on OS X Mountain Lion, Mac OS X Lion and Safari 5.1 on Mac OS X Snow Leopard implemented support for WebGL 1.0, which was disabled by default before Safari 8.0. Safari version 12 (available in MacOS Mojave) has available support for WebGL 2.0 as an "Experimental" feature. Safari 15 enables WebGL 2.0 for all users. Opera – WebGL 1.0 has been implemented in Opera 11 and 12, but was disabled by default in 2014. Opera 43+ supports WebGL 2.0. Internet Explorer – WebGL 1.0 is partially supported in Internet Explorer 11. Internet Explorer initially failed most of the official WebGL conformance tests, but Microsoft later released several updates. The latest 0.94 WebGL engine currently passes ≈97% of Khronos tests. WebGL support can also be manually added to earlier versions of Internet Explorer using third-party plugins such as IEWebGL. Microsoft Edge – For Microsoft Edge Legacy, the initial stable release supports WebGL version 0.95 (context name: "experimental-webgl") with an open source GLSL to HLSL transpiler. Version 10240+ supports WebGL 1.0 as prefixed. Latest Chromium-based Edge supports WebGL 2.0. === Mobile browsers === Google Chrome – WebGL 1.0 is supported on Android as of Chrome 25. WebGL 2.0 is supported on Android as of Chrome 58. Chrome is used for the Android system webview as of Android 5. Firefox for mobile – WebGL 1.0 is available for Android devices since Firefox 4. Safari on iOS – WebGL 1.0 is available for mobile Safari in iOS 8. WebGL 2.0 is available for mobile Safari in iOS 15. Microsoft Edge – Prefixed WebGL 1.0 was available on Windows 10 Mobile.. Latest Chromium-based Edge supports WebGL 2.0. Opera Mobile – Opera Mobile 12 supports WebGL 1.0 (on Android only). Sailfish OS – WebGL 1.0 is supported in the default Sailfish browser. Tizen – WebGL 1.0 is supported == Tools and ecosystem == === Utilities === The low-level nature of the WebGL API, which provides little on its own to quickly create desirable 3D graphics, motivated the creation of higher-level libraries that abstract common operations (e.g. loading scene graphs and 3D objects in certain formats; applying linear transformations to shaders or view frustums). Some such libraries were ported to JavaScript from other languages. Examples of libraries that provide high-level features include A-Frame (VR), BabylonJS, PlayCanvas, three.js, OSG.JS, Google’s model-viewer and CopperLicht. Web3D also made a project called X3DOM to make X3D and VRML content run on WebGL. === Games === There has been an emergence of 2D and 3D game engines for WebGL, such as Unreal Engine 4 and Unity. The Stage3D/Flash-based Away3D high-level library also has a port to WebGL via TypeScript. A more light-weight utility library that provides just the vector and matrix math utilities for shaders is sylvester.js. It is sometimes used in conjunction with a WebGL specific extension called glUtils.js. There are also some 2D libraries built atop WebGL, like Cocos2d-x or Pixi.js, which were implemented this way for performance reasons in a move that parallels what happened with the Starling Framework over Stage3D in the Flash world. The WebGL-based 2D libraries fall back to HTML5 canvas when WebGL is not available. Removing the rendering bottleneck by giving almost direct access to the GPU has exposed performance limitations in the JavaScript implementations. Some were addressed by asm.js and WebAssembly (similarly, the introduction of Stage3D exposed performance problems within ActionScript, which were addressed by projects like CrossBridge). === Content creation === As with any other graphics API, creating content for WebGL scenes requires using a 3D content creation tool and exporting the scene to a format that is readable by the viewer or helper library. Desktop 3D authoring software such as Blender, Autodesk Maya or SimLab Composer can be used for this purpose. In particular, Blend4Web allows a WebGL scene to be authored entirely in Blender and exported to a browser with a single click, even as a standalone web page. There are also some WebGL-specific software such as CopperCube and the online WebGL-based editor Clara.io. Online platforms such as Sketchfab and Clara.io allow users to directly upload their 3D models

    Read more →
  • Web testing

    Web testing

    Web testing is software testing that focuses on web applications. Complete testing of a web-based system before going live can help address issues before the system is revealed to the public. Issues may include the security of the web application, the basic functionality of the site, its accessibility to disabled and fully able users, its ability to adapt to the multitude of desktops, devices, and operating systems, as well as readiness for expected traffic and number of users and the ability to survive a massive spike in user traffic, both of which are related to load testing. == Web application performance tool == A web application performance tool (WAPT) is used to test web applications and web related interfaces. These tools are used for performance, load and stress testing of web applications, web sites, web API, web servers and other web interfaces. WAPT tends to simulate virtual users which will repeat either recorded URLs or specified URL and allows the users to specify number of times or iterations that the virtual users will have to repeat the recorded URLs. By doing so, the tool is useful to check for bottleneck and performance leakage in the website or web application being tested. A WAPT faces various challenges during testing and should be able to conduct tests for: Browser compatibility Operating System compatibility Windows application compatibility where required WAPT allows a user to specify how virtual users are involved in the testing environment.ie either increasing users or constant users or periodic users load. Increasing user load, step by step is called RAMP where virtual users are increased from 0 to hundreds. Constant user load maintains specified user load at all time. Periodic user load tends to increase and decrease the user load from time to time. == Web security testing == Web security testing tells us whether Web-based applications requirements are met when they are subjected to malicious input data. There is a web application security testing plug-in collection for Fire Fox == Web API testing == An application programming interface API exposes services to other software components, which can query the API. The API implementation is in charge of computing the service and returning the result to the component that send the query. A part of web testing focuses on testing these web API implementations. GraphQL is a specific query and API language. It is the focus of tailored testing techniques. Search-based test generation yields good results to generate test cases for GraphQL APIs.

    Read more →