AI Headshot Enhancer

AI Headshot Enhancer — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • List of Fortran software and tools

    List of Fortran software and tools

    This is a list of Fortran software and tools, including IDEs, compilers, libraries, debugging tools, numerical and scientific computing tools, and related projects. == Fortran compilers == Absoft Pro Fortran — Absoft Pro Fortran is discontinued and ran on Linux and macOS AOCC — from AMD Classic Flang — part of the LLVM Project LLVM Flang — part of the LLVM Project Fortran 77 — Fortran 77 was developed by Digital Equipment Corporation, it is discontinued. G95 – portable open-source Fortran 95 compiler GCC (GNU Fortran) PGI compilers – NVIDIA developed compilers after acquiring The Portland Group IBM XL Fortran — IBM XL Fortran is current and runs on Linux (Power/AIX) and integrates with Eclipse Intel Fortran Compiler – part of Intel OneAPI HPC toolkit LFortran — LFortran is current, cross-platform, and has IDE support. MinGW – cross compiler and forked into Mingw-w64 nAG Fortran Compiler - from nAG Open64 — Open64 is an open-source compiler that has been terminated and ran on Linux Open Watcom — Open Watcom is current, runs on MS-DOS and OS/2, and has IDE support. Oracle Fortran — Oracle Fortran is discontinued, ran on Linux and Solaris. ROSE — source-to-source compiler framework developed at Lawrence Livermore National Laboratory Silverfrost FTN95 — FTN95 from Silverfrost is current, runs on Windows, and has IDE support. == Integrated development environments (IDEs) and editors == Code::Blocks — supports Fortran with plugins Eclipse IDE — with Fortran support via Photran Emacs — extensible text editor with built-in Fortran modes and support for modern tooling via language servers Geany — lightweight cross-platform IDE based on GTK IntelliJ IDEA — cross-platform IDE by JetBrains with Fortran pluggin KDevelop — KDE-based IDE NetBeans — Apache software foundation IDE with Fortran configuration OpenWatcom — IDE and compiler suite for C, C++, and Fortran Simply Fortran — standalone Fortran IDE for Windows, Linux, and macOS Vim — modal text editor with native Fortran syntax support and extensive plugin-based development features Visual Studio — with Intel Fortran integration Visual Studio Code — supports Fortran via extensions == Mathematical libraries == == Scientific libraries == ABINIT — software suite to calculate optical, mechanical, vibrational, and other observable properties of materials Cantera — chemical kinetics, thermodynamics, and transport tool suite CERN Program Library — collection of Fortran libraries for physics applications from CERN CP2K — quantum chemistry and solid-state physics software package for atomistic simulations Dalton — molecular electronic structure program FFTPACK — subroutines for the fast Fourier transform Kinetic PreProcessor – open-source software tool used in atmospheric chemistry MESA — Modules for Experiments in Stellar Astrophysics Nek5000 — MPI parallel higher-order spectral element CFD solver NWChem — open-source high-performance computational chemistry software Octopus — real-space Time-Dependent Density Functional Theory code MODTRAN – model atmospheric propagation of electromagnetic radiation MOLCAS — quantum chemistry software package for multiconfigurational electronic structure calculations NOVAS – software library for astrometry-related numerical computations Physics Analysis Workstation – data analysis and graphical presentation in high-energy physics Quantum ESPRESSO — integrated suite for electronic-structure calculations and materials modeling SIESTA — first-principles materials simulation code using density functional theory Tinker — software tools for molecular design == Debugging and performance tools == GDB — GNU Debugger with Fortran support Valgrind — memory debugging and profiling tool VTune Profiler — performance analysis tool Allinea Forge — debugger and profiler for HPC applications == Build and package management == Autotools — build system supporting Fortran projects CMake — cross-platform build system supporting Fortran Make — build automation tool Spack — package manager for HPC software including Fortran libraries == Machine learning and AI libraries == Athena Fiats (Functional Inference And Training for Surrogates) FNN (Fortran Neural Network) FortNN Fortran-TF-lib (Fortran interface to TensorFlow) FTorch (Fortran interface to PyTorch) MlFortran RoseNNa == Parallel and high-performance computing tools == MPI Fortran bindings — standard interface for distributed-memory parallelism OpenMP — shared-memory parallel programming support through compiler directives Coarray Fortran — parallel programming model introduced in Fortran 2008 ScaLAPACK — parallel linear algebra package built on top of LAPACK == Testing frameworks == FUnit — open-source unit testing framework developed at NASA’s Langley Research Center, for Fortran 90, 95, and 2003. pFUnit — unit testing framework for Fortran, modeled after JUnit == Documentation and code analysis tools == FORD — automatic documentation generator for modern Fortran projects SQuORE — software quality and management platform with code analysis support Understand — static analysis and code comprehension tool for large Fortran projects

    Read more →
  • JasPer

    JasPer

    JasPer is a computer software project to create a reference implementation of the codec specified in the JPEG-2000 Part-1 standard (i.e. ISO/IEC 15444-1) - started in 1997 at Image Power Inc. and at the University of British Columbia. It consists of a C library and some sample applications useful for testing the codec. The copyright owner began licensing the code to the public under an MIT License-style license in 2004 in response to requests from the open-source community. As of 2011 JasPer operated as a component of many software projects, both free and proprietary, including (but not limited to) netpbm (as of release 10.12), ImageMagick and KDE (as of version 3.2). As of 22 June 2010 the GEGL graphics library supported JasPer in its latest Git versions. In a series of objective JPEG-2000-compression quality tests conducted in 2004, "JasPer was the best codec, closely followed by IrfanView and Kakadu". However, Jasper remains one of the slowest implementations of the JPEG-2000 codec, as it was designed for reference, not performance. == Etymology == The name "JasPer" has simultaneous connotations with Canada's Jasper National Park, with the semi-precious gemstone, jasper, and with "JP" as an abbreviation of the JPEG-2000 standard.

    Read more →
  • Automated attendant

    Automated attendant

    In telephony, an automated attendant (also auto attendant, auto-attendant, autoattendant, automatic phone menus, AA, or virtual receptionist) allows callers to be automatically transferred to an extension without the intervention of an operator/receptionist. Many AAs will also offer a simple menu system ("for sales, press 1, for service, press 2," etc.). An auto attendant may also allow a caller to reach a live operator by dialing a number, usually "0". Typically the auto attendant is included in a business's phone system such as a PBX, but some services allow businesses to use an AA without such a system. Modern AA services (which now overlap with more complicated interactive voice response or IVR systems) can route calls to mobile phones, VoIP virtual phones, other AAs/IVRs, or other locations using traditional land-line phones or voice message machines. == Feature description == Telephone callers will recognize an automated attendant system as one that greets calls incoming to an organization with a recorded greeting of the form, "Thank you for calling .... If you know your party's extension, you may dial it any time during this message." Callers who have a touch-tone (DTMF) phone can dial an extension number or, in most cases, wait for operator ("attendant") assistance. Since the telephone network does not transmit the DC signals from rotary dial telephones (except for audible clicks), callers who have rotary dial phones have to wait for assistance. On a purely technical level it could be argued that an automated attendant is a very simple kind of IVR however, in the telecom industry the terms IVR and auto attendant are generally considered distinct. An automated attendant serves a very specific purpose (replace live operator and route calls), whereas an IVR can perform all sorts of functions (telephone banking, account inquiries, etc.). An AA will often include a directory which will allow a caller to dial by name in order to find a user on a system. There is no standard format to these directories, and they can use combinations of first name, last name, or both. The following lists common routing steps that are components of an automated attendant: Transfer to extension Transfer to voicemail Play message (i.e., "our address is ...") Go to a sub-menu Repeat choices In addition, an automated attendant would be expected to have values for the following: '0' – where to go when the caller dials '0' Timeout – what to do if the caller does nothing (usually go to the same place as '0') Default mailbox – where to send calls if '0' is not answered (or is not pointing to a live person) == Background == PBXs (private branch exchanges) or PABXs (private automatic branch exchanges) are telephone systems that serve an organization that has many telephone extensions but fewer telephone lines (sometimes called "trunks") that connect that organization to the rest of the global telecommunications network. While persons within an enterprise served by a PBX can call each other by dialing their extension numbers, incoming calls, i.e., calls originating from a telephone not served by the PBX but intended for a party served by the PBX, required assistance from a switchboard operator (also called a "switchboard attendant") or a telephone service called DID ("direct inward dialing"). Direct inward dialing has advantages such as rapid connection to the destination party and disadvantages including cost, lack of identification of the called organization and use of ten-digit telephone numbers. Automated attendants provide, among many other things, a way for an external caller to be directed to an extension or department served by a PBX system without using direct inward dialing or without switchboard attendant assistance. == History == Automated attendants are not part of voicemail systems. Voice messaging (or voicemail or VM) technology has existed since the late 1970s; in the early 1980s companies provided voice-prompting systems that allowed callers to reach (route the call) to an intended party, not necessarily to leave a message. Automated attendant systems are also referred to as automated menu systems and much early work in this field was done by Michael J. Freeman, Ph.D. == Time-based routing == Many auto attendants will have options to allow for time-of-day routing, as well as weekend and holiday routing. The specifics of these features will depend entirely on the particular automated attendant, but typically there would be a normal greeting and routing steps that would take place during normal business hours, and a different greeting and routing for non-business hours.

    Read more →
  • Dispo

    Dispo

    Dispo (formerly David's Disposable) is an American photo sharing and social networking app owned by Dispo, Inc. and co-founded by CEO Daniel Liss, YouTuber David Dobrik, and Natalie Mariduena. When the app initially launched on iOS in December 2019, it briefly charted as the most downloaded free app on the App Store, ahead of both Disney+ and Instagram. The app was rebranded and relaunched as Dispo, expanding from a simple camera app to a full social network in March 2021. It is based on the disposable camera. == History == On December 21, 2019, the app was first launched on the App Store under the name "David's Disposable." In its first week of release, it was downloaded more than a million times, reaching number one among free apps in the App Store. In June 2020, the team decided to rename the app to Dispo, purchasing the Dispo.fun domain on June 21, 2020. The company announced the change in September 2020. The early Dispo team consisted of Dobrik's longtime friend and business associate Natalie Mariduena as its treasurer, entrepreneur and venture capitalist Daniel Liss as chief executive officer, Regynald Augustin as first engineer, and Briana Hokanson as lead designer. In October 2020, the company raised a $4M seed round with backing from Alexis Ohanian's venture fund Seven Seven Six alongside other investors including Unshackled Ventures, Shrug Capital, and Weekend Fund. In February 2021, Axios reported that the app had generated US$20 million in its series A round, led by Spark Capital. At this time, the app was valued at US$200 million. A New York Times profile asked, "Are Disposables the Future of Photosharing?" In March 2021, the app was officially relaunched with new social network features and its invite-only feature was dropped. On March 21, 2021, it was announced that Spark Capital would sever all ties with Dispo in light of several disparaging allegations against David Dobrik and The Vlog Squad. The same day, it was announced that Dobrik would leave the company and step down from the company's board of directors. On March 22, 2021, Seven Seven Six and Unshackled Ventures announced they would be standing by the company and its remaining employees but donating profits to charity. In June, 2021, CEO Daniel Liss announced Dispo's official Series A. Investors and advisors in the new Dispo include Ohanian's Seven Seven Six, Unshackled, Endeavor, photographers Annie Leibovitz and Raven B. Varona, NBA stars Kevin Durant and Andre Iguodala (through their 35 Ventures and F9 Strategies venture firms, respectively). Other participants include Cara Delevingne, Sofia Vergara, Shade Room CEO Angelica Nwandu, Latin World Entertainment CEO Luis Balaguer, and Amplify Africa co-founders Damilare Kujembola and Timi Adeyeba. == Overview == Dispo has been compared to other image sharing and social networking services, most notably Instagram and VSCO, although users cannot immediately see the photos they have taken using the app. When a user attempts to take a photo, the interface mimics the developing process of a disposable camera. Users can take as many photos on the app as they want; they do not appear on the app however, until 9 am the next day. Once the set of photos appear on the app, users can choose to save them or share them with other users in a "roll". == Reception == Screen Rant has called the app "like Clubhouse [referring to the app] but for photos," comparing the early invite-only features of the apps. As it greatly restricts the user's editing options and sets out to offer a more authentic social networking experience, the app has been widely dubbed the "anti-Instagram". Between March 2021 and June 2021, the app reached the top ten in the App Store's photo/video rankings on 5 continents including in the US, Japan, Spain, Germany, Brazil, and Australia. It has been a notable success in Japan, where it opened its first international office in July 2021. In July 2021, NBA number one draft pick Cade Cunningham announced he had selected Dispo as his exclusive social media partner for the NBA draft.

    Read more →
  • The Master Algorithm

    The Master Algorithm

    The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World is a book by Pedro Domingos released in 2015. Domingos wrote the book in order to generate interest from people outside the field. == Overview == The book outlines five approaches of machine learning: inductive reasoning, connectionism, evolutionary computation, Bayes' theorem and analogical modelling. The author explains these tribes to the reader by referring to more understandable processes of logic, connections made in the brain, natural selection, probability and similarity judgments. Throughout the book, it is suggested that each different tribe has the potential to contribute to a unifying "master algorithm". Towards the end of the book the author pictures a "master algorithm" in the near future, where machine learning algorithms asymptotically grow to a perfect understanding of how the world and people in it work. Although the algorithm doesn't yet exist, he briefly reviews his own invention of the Markov logic network. == In the media == In 2016 Bill Gates recommended the book, alongside Nick Bostrom's Superintelligence, as one of two books everyone should read to understand AI. In 2018 the book was noted to be on Chinese Communist Party general secretary Xi Jinping's bookshelf. === Reception === A computer science educator stated in Times Higher Education that the examples are clear and accessible. In contrast, The Economist agreed Domingos "does a good job" but complained that he "constantly invents metaphors that grate or confuse". Kirkus Reviews praised the book, stating that "Readers unfamiliar with logic and computer theory will have a difficult time, but those who persist will discover fascinating insights." A New Scientist review called it "compelling but rather unquestioning".

    Read more →
  • Final Cut Express

    Final Cut Express

    Final Cut Express was a video editing software suite created by Apple Inc. It was the consumer version of Final Cut Pro and was designed for advanced editing of digital video as well as high-definition video, which was used by many amateur and professional videographers. Final Cut Express was considered a step above iMovie in terms of capabilities, but a step underneath Final Cut Pro and its suite of applications. As of June 21, 2011, Final Cut Express was discontinued in favor of Final Cut Pro X. == History == Final Cut Express 1.0, based on Final Cut Pro 3, was released at Macworld Conference and Expo in San Francisco in 2003. The second version, based on Final Cut Pro 4, was released at Macworld San Francisco in 2004. The third version, capable of editing high definition video, was also announced at Macworld San Francisco a year later, and was released as Final Cut Express HD in February 2005. It was based on Final Cut Pro HD (version 4.5) and included LiveType 1.2 and Soundtrack 1.2. Final Cut Express version 3.5 was released with little fanfare in May 2006 as a Universal Binary. In addition to improving real-time rendering with Dynamic RT, version 3.5 upgraded LiveType to version 2.0 and Soundtrack to version 1.5. In November 2007, Apple released Final Cut Express 4, which although it did not support real-time editing in the AVCHD format (it only allowed for transcoding AVCHD to Apple Intermediate Codec (AIC) provided that the camera was actually attached to the computer - it did not convert AVCHD files stored elsewhere and is currently for Intel processors only), imported iMovie '08 projects and included 50 new filters. It did not include Soundtrack 1.5, but it still included LiveType which enables users to create advanced text for the movies they created in Final Cut. The price was dropped from $299 for version 3.5 to $199 for version 4.0. In June 2011, Final Cut Express was officially discontinued, in favor of Final Cut Pro X. == Features == Final Cut Express' interface was identical to that of Final Cut Pro, but lacks some film-specific features, including Cinema Tools, multi-cam editing, batch capture, and a time code view. The program performed 32 undo operations, while Final Cut Pro did 99 [2]. Features the program did include were: The ability to keyframe filters Dynamic RT, which changes real-time settings on-the-fly Motion path keyframing Opacity keyframing Ripple, roll, slip, slide and blade edits Picture-in-picture and split-screen effects Up to 99 video tracks and 12 compositing modes Up to 99 audio tracks Motion project import Two-way color correction. Chroma key One feature of Final Cut Express that was not available in Final Cut Pro is the ability to import iMovie '08 projects (though transitions are not preserved). === RT Extreme === Inherited from Final Cut Pro, Final Cut Express features RT Extreme, which allows previews of some video filters and transitions without rendering. Audio that is not in the native AIFF file format needs rendering before it can be played back. RT Extreme has three modes: 'Safe', for seeing multiple video layers at a quality that more or less guarantees a smooth playback; 'Unlimited', which allows the maximum number of composited video layers to be viewed at the same time; and 'Dynamic', which alternates between these settings depending on how many simultaneous video tracks are present. Frame dropping may result from using 'Unlimited' on low-resource machines. === Boris Calligraphy === Like Final Cut Pro, Express also comes with Boris Calligraphy, a plugin for advanced titling and scrolling/crawling titles more sophisticated than the ones that can be created with the built-in title overlays. Calligraphy has a WYSIWYG interface and features wrapping, alignment, leading, kerning and tracking features, as well as allowing up to five custom outlines and five custom drop shadows to be defined for a selected portion of the title. == Soundtrack == Prior to version 4, Final Cut Express included Soundtrack 1.5, a music program similar to the consumer-level GarageBand, but designed for videographers who wish to add music to their films. Soundtrack comes with around 4,000 professionally recorded instrument loops and sound effects that can be arranged in multiple tracks beneath the video track. To use Soundtrack, users export their Final Cut Express sequence, or a marked portion thereof, as a reference file, which can include scoring markers defined in the timeline. This reference file can be imported as the video track in Soundtrack. Soundtrack is functionally and visually identical to Soundtrack Pro's multitrack editing mode, but includes fewer Logic plugins and lacks the highly regarded noise removal tool. Soundtrack was removed from Final Cut Express 4, which lowered its price and may have encouraged people to buy Logic Express.

    Read more →
  • Structural similarity index measure

    Structural similarity index measure

    The structural similarity index measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. It is also used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference. SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. This distinguishes from other techniques such as mean squared error (MSE) or peak signal-to-noise ratio (PSNR) that instead estimate absolute errors. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or "texture" in the image. == History == The predecessor of SSIM was called Universal Quality Index (UQI), or Wang–Bovik index, which was developed by Zhou Wang and Alan Bovik in 2001. This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the IEEE Transactions on Image Processing. In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings. The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision (LCV) at New York University. Further variants of the model have been developed in the Image and Visual Computing Laboratory at University of Waterloo and have been commercially marketed. SSIM subsequently found strong adoption in the image processing community and in the television and social media industries. The 2004 SSIM paper has been cited over 50,000 times according to Google Scholar, making it one of the highest cited papers in the image processing and video engineering fields. It was recognized with the IEEE Signal Processing Society Best Paper Award for 2009. It also received the IEEE Signal Processing Society Sustained Impact Award for 2016, indicative of a paper having an unusually high impact for at least 10 years following its publication. Because of its high adoption by the television industry, the authors of the original SSIM paper were each accorded a Primetime Engineering Emmy Award in 2015 by the Television Academy. == Algorithm == The SSIM index is calculated between two windows of pixel values x {\displaystyle x} and y {\displaystyle y} of common size, from corresponding locations in two images to be compared. These SSIM values can be aggregated across the full images by averaging or other variations. === Special-case formula === In one simple special case, further explained in the next section, the SSIM measure between x {\displaystyle x} and y {\displaystyle y} is: SSIM ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 ) {\displaystyle {\hbox{SSIM}}(x,y)={\frac {(2\mu _{x}\mu _{y}+c_{1})(2\sigma _{xy}+c_{2})}{(\mu _{x}^{2}+\mu _{y}^{2}+c_{1})(\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2})}}} with: μ x {\displaystyle \mu _{x}} the pixel sample mean of x {\displaystyle x} ; μ y {\displaystyle \mu _{y}} the pixel sample mean of y {\displaystyle y} ; σ x 2 {\displaystyle \sigma _{x}^{2}} the sample variance of x {\displaystyle x} ; σ y 2 {\displaystyle \sigma _{y}^{2}} the sample variance of y {\displaystyle y} ; σ x y {\displaystyle \sigma _{xy}} the sample covariance of x {\displaystyle x} and y {\displaystyle y} ; c 1 = ( k 1 L ) 2 {\displaystyle c_{1}=(k_{1}L)^{2}} , c 2 = ( k 2 L ) 2 {\displaystyle c_{2}=(k_{2}L)^{2}} two variables to stabilize the division with weak denominator; L {\displaystyle L} the dynamic range of the pixel-values (typically this is 2 # b i t s p e r p i x e l − 1 {\displaystyle 2^{\#bits\ per\ pixel}-1} ); k 1 = 0.01 {\displaystyle k_{1}=0.01} and k 2 = 0.03 {\displaystyle k_{2}=0.03} by default. === General formula and components === The SSIM formula is based on three comparison measurements between the samples of x {\displaystyle x} and y {\displaystyle y} : luminance ( l {\displaystyle l} ), contrast ( c {\displaystyle c} ), and structure ( s {\displaystyle s} ). The individual comparison functions are: l ( x , y ) = 2 μ x μ y + c 1 μ x 2 + μ y 2 + c 1 {\displaystyle l(x,y)={\frac {2\mu _{x}\mu _{y}+c_{1}}{\mu _{x}^{2}+\mu _{y}^{2}+c_{1}}}} c ( x , y ) = 2 σ x σ y + c 2 σ x 2 + σ y 2 + c 2 {\displaystyle c(x,y)={\frac {2\sigma _{x}\sigma _{y}+c_{2}}{\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2}}}} s ( x , y ) = σ x y + c 3 σ x σ y + c 3 {\displaystyle s(x,y)={\frac {\sigma _{xy}+c_{3}}{\sigma _{x}\sigma _{y}+c_{3}}}} The SSIM for each block is then a weighted combination of those comparative measures: SSIM ( x , y ) = l ( x , y ) α ⋅ c ( x , y ) β ⋅ s ( x , y ) γ {\displaystyle {\text{SSIM}}(x,y)=l(x,y)^{\alpha }\cdot c(x,y)^{\beta }\cdot s(x,y)^{\gamma }} Choosing the third denominator stabilizing constant as: c 3 = c 2 / 2 {\displaystyle c_{3}=c_{2}/2} leads to a simplification when combining the c and s components with equal exponents ( β = γ {\displaystyle \beta =\gamma } ), as the numerator of c is then twice the denominator of s, leading to a cancellation leaving just a 2. Setting the weights (exponents) α , β , γ {\displaystyle \alpha ,\beta ,\gamma } to 1, the formula can then be reduced to the special case shown above. === Mathematical properties === SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or non-negativity, and thus is not a distance function. However, under certain conditions, SSIM may be converted to a normalized root MSE measure, which is a distance function. The square of such a function is not convex, but is locally convex and quasiconvex, making SSIM a feasible target for optimization. === Application of the formula === In order to evaluate the image quality, this formula is usually applied only on luma, although it may also be applied on color (e.g., RGB) values or chromatic (e.g. YCbCr) values. The resultant SSIM index is a decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation. For an image, it is typically calculated using a sliding Gaussian window of size 11×11 or a block window of size 8×8. The window can be displaced pixel-by-pixel on the image to create an SSIM quality map of the image. In the case of video quality assessment, the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation. === Variants === ==== Multi-scale SSIM ==== A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM) is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system. It has been shown to perform equally well or better than SSIM on different subjective image and video databases. ==== Multi-component SSIM ==== Three-component SSIM (3-SSIM) is a form of SSIM that takes into account the fact that the human eye can see differences more precisely on textured or edge regions than on smooth regions. The resulting metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings. This suggests that edge regions play a dominant role in image quality perception. The authors of 3-SSIM have also extended the model into four-component SSIM (4-SSIM). The edge types are further subdivided into preserved and changed edges by their distortion status. The proposed weighting is 0.25 for all four components. ==== Structural dissimilarity ==== Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance function as the triangle inequality is not necessarily satisfied. DSSIM ( x , y ) = 1 − SSIM ( x , y ) 2 {\displaystyle {\hbox{DSSIM}}(x,y)={\frac {1-{\hbox{SSIM}}(x,y)}{2}}} ==== Video quality metrics and temporal variants ==== It is worth noting that the original vers

    Read more →
  • Video browsing

    Video browsing

    Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search. == History == Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993. They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x. The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents. === Video Notebook === Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy), or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young. === Video Browser Showdown === The Video Browser Showdown (VBS) is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID. The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.

    Read more →
  • Conditional random field

    Conditional random field

    Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. The kind of graph used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions. Other examples where CRFs are used are: labeling or parsing of sequential data for natural language processing or biological sequences, part-of-speech tagging, shallow parsing, named entity recognition, gene finding, peptide critical functional region finding, and object recognition and image segmentation in computer vision. == Description == CRFs are a type of discriminative undirected probabilistic graphical model. Lafferty, McCallum and Pereira define a CRF on observations X {\displaystyle {\boldsymbol {X}}} and random variables Y {\displaystyle {\boldsymbol {Y}}} as follows: Let G = ( V , E ) {\displaystyle G=(V,E)} be a graph such that Y = ( Y v ) v ∈ V {\displaystyle {\boldsymbol {Y}}=({\boldsymbol {Y}}_{v})_{v\in V}} , so that Y {\displaystyle {\boldsymbol {Y}}} is indexed by the vertices of G {\displaystyle G} . Then ( X , Y ) {\displaystyle ({\boldsymbol {X}},{\boldsymbol {Y}})} is a conditional random field when each random variable Y v {\displaystyle {\boldsymbol {Y}}_{v}} , conditioned on X {\displaystyle {\boldsymbol {X}}} , obeys the Markov property with respect to the graph; that is, its probability is dependent only on its neighbours in G and not its past states: P ( Y v | X , { Y w : w ≠ v } ) = P ( Y v | X , { Y w : w ∼ v } ) {\displaystyle P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\neq v\})=P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\sim v\})} , where w ∼ v {\displaystyle {\mathit {w}}\sim v} means that w {\displaystyle w} and v {\displaystyle v} are neighbors in G {\displaystyle G} . What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets X {\displaystyle {\boldsymbol {X}}} and Y {\displaystyle {\boldsymbol {Y}}} , the observed and output variables, respectively; the conditional distribution p ( Y | X ) {\displaystyle p({\boldsymbol {Y}}|{\boldsymbol {X}})} is then modeled. === Inference === For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold. However, there exist special cases for which exact inference is feasible: If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs. If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions. If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include: Loopy belief propagation Alpha expansion Mean field inference Linear programming relaxations === Parameter learning === Learning the parameters θ {\displaystyle \theta } is usually done by maximum likelihood learning for p ( Y i | X i ; θ ) {\displaystyle p(Y_{i}|X_{i};\theta )} . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex. It can be solved for example using gradient descent algorithms, or Quasi-Newton methods such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used. === Examples === In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables X {\displaystyle X} represents a sequence of observations and Y {\displaystyle Y} represents a hidden (or unknown) state variable that needs to be inferred given the observations. The Y i {\displaystyle Y_{i}} are structured to form a chain, with an edge between each Y i − 1 {\displaystyle Y_{i-1}} and Y i {\displaystyle Y_{i}} . As well as having a simple interpretation of the Y i {\displaystyle Y_{i}} as "labels" for each element in the input sequence, this layout admits efficient algorithms for: model training, learning the conditional distributions between the Y i {\displaystyle Y_{i}} and feature functions from some corpus of training data. decoding, determining the probability of a given label sequence Y {\displaystyle Y} given X {\displaystyle X} . inference, determining the most likely label sequence Y {\displaystyle Y} given X {\displaystyle X} . The conditional dependency of each Y i {\displaystyle Y_{i}} on X {\displaystyle X} is defined through a fixed set of feature functions of the form f ( i , Y i − 1 , Y i , X ) {\displaystyle f(i,Y_{i-1},Y_{i},X)} , which can be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for Y i {\displaystyle Y_{i}} . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for Y i {\displaystyle Y_{i}} . Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence. Notably, in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence X {\displaystyle X} at any point during inference, and the range of the feature functions need not have a probabilistic interpretation. == Variants == === Higher-order CRFs and semi-Markov CRFs === CRFs can be extended into higher order models by making each Y i {\displaystyle Y_{i}} dependent on a fixed number k {\displaystyle k} of previous variables Y i − k , . . . , Y i − 1 {\displaystyle Y_{i-k},...,Y_{i-1}} . In conventional formulations of higher order CRFs, training and inference are only practical for small values of k {\displaystyle k} (such as k ≤ 5), since their computational cost increases exponentially with k {\displaystyle k} . However, another recent advance has managed to ameliorate these issues by leveraging concepts and tools from the field of Bayesian nonparametrics. Specifically, the CRF-infinity approach constitutes a CRF-type model that is capable of learning infinitely-long temporal dynamics in a scalable fashion. This is effected by introducing a novel potential function for CRFs that is based on the Sequence Memoizer (SM), a nonparametric Bayesian model for learning infinitely-long dynamics in sequential observations. To render such a model computationally tractable, CRF-infinity employs a mean-field approximation of the postulated novel potential functions (which are driven by an SM). This allows for devising efficient approximate training and inference algorithms for the model, without undermining its capability to capture and model temporal dependencies of arbitrary length. There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence Y {\displaystyle Y} . This provides much of the power of higher-order CRFs to model long-range dependencies of the Y i {\displaystyle Y_{i}} , at a reasonable computational cost. Finally, large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs. === Latent-dynamic conditional random field === Latent-dynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively. In an LDCRF, like in any sequence tagging task, given a sequence of observations x = x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} , the main problem the model must solve is how to assign a sequence of labels y = y 1 , … , y n {\displaystyle y_{1},\dots ,y_{n}} from one finite set

    Read more →
  • The Triple Revolution

    The Triple Revolution

    "The Triple Revolution" was an open memorandum sent to U.S. President Lyndon B. Johnson and other government figures on March 22, 1964. It concerned three megatrends of the time: increasing use of automation, the nuclear arms race, and advancements in human rights. Drafted under the auspices of the Center for the Study of Democratic Institutions, it was signed by an array of noted social activists, professors, and technologists who identified themselves as the Ad Hoc Committee on the Triple Revolution. The chief initiator of the proposal was W. H. "Ping" Ferry, at that time a vice-president of CSDI, basing it in large part on the ideas of the futurist Robert Theobald. == Overview == The statement identified three revolutions underway in the world: the cybernation revolution of increasing automation; the weaponry revolution of mutually assured destruction; and the human rights revolution. It discussed primarily the cybernation revolution. The committee claimed that machines would usher in "a system of almost unlimited productive capacity" while continually reducing the number of manual laborers needed, and increasing the skill needed to work, thereby producing increasing levels of unemployment. It proposed that the government should ease this transformation through large-scale public works, low-cost housing, public transit, electrical power development, income redistribution, union representation for the unemployed, and government restraint on technology deployment. == Legacy == Martin Luther King Jr.'s final Sunday sermon, delivered six days before his April 1968 assassination, explicitly references the thesis of "The Triple Revolution": There can be no gainsaying of the fact that a great revolution is taking place in the world today. In a sense it is a triple revolution: that is, a technological revolution, with the impact of automation and cybernation; then there is a revolution in weaponry, with the emergence of atomic and nuclear weapons of warfare; then there is a human rights revolution, with the freedom explosion that is taking place all over the world. Yes, we do live in a period where changes are taking place. And there is still the voice crying through the vista of time saying, "Behold, I make all things new; former things are passed away." In Harlan Ellison's 1967 anthology Dangerous Visions, Philip José Farmer's story "Riders of the Purple Wage" uses the Triple Revolution document as the premise of a future society, in which the "purple wage" of the title is a guaranteed income dole on which most of the population lives. At the 1968 World Science Fiction Convention in San Francisco, Farmer delivered a lengthy Guest of Honor speech in which he called for the founding of a grassroots activist organization called REAP which would work for implementation of the Ad Hoc Committee's recommendations. Looking back on the proposal in his 2008 book, Daniel Bell wrote: "the cybernetic revolution quickly proved to be illusory. There were no spectacular jumps in productivity. ... Cybernation had proved to be one more instance of the penchant for overdramatizing a momentary innovation and blowing it up far out of proportion to its actuality. ... The image of a completely automated production economy—with an endless capacity to turn out goods—was simply a social-science fiction of the early 1960s. Paradoxically, the vision of Utopia was suddenly replaced by the spectre of Doomsday. In place of the early-sixties theme of endless plenty, the picture by the end of the decade was one of a fragile planet of limited resources whose finite stocks were being rapidly depleted, and whose wastes from soaring industrial production were polluting the air and waters." In his 2015 book Rise of the Robots, Martin Ford claims The Triple Revolution's predictions of steady decline in future employment were not wrong, but rather premature. He cites "Seven Deadly Trends" that began in the 1970s-1980s and by the mid-2010s appeared set to continue: Stagnation in real wages Decline in labor's share of national income in many countries (breakdown of Bowley's law), while corporate profits increased Declining labor force participation Diminishing job creation, lengthening jobless recoveries, and soaring long-term unemployment Rising inequality Declining incomes, and underemployment for recent college graduates Polarization and part-time jobs (middle-class jobs are disappearing, to be replaced by a small number of high-paying jobs and large number of low-paying jobs) According to Ford, the 1960s were part of what in retrospect seems like a golden age for labor in the United States, when productivity and wages rose together in near lockstep, and unemployment was low. But after about 1980, wages began stagnating while productivity continued to rise. Labor's share of the economic output began to decline. Ford describes the role that automation and information technology play in these trends, and how new technologies including narrow AI threaten to destroy jobs faster than displaced workers can be retrained for new jobs, before automation takes the new jobs as well. This includes many job categories, such as in transportation, that were never threatened by automation before. According to a 2013 study, about 47% of US jobs are susceptible to automation. == Signatories ==

    Read more →
  • NER model

    NER model

    NER is one of several formulas for accessing live subtitles in television broadcasts and events that are produced using speech recognition. The three letters stand for number, edit error and recognition error. It has been promoted as an alternative to Word error rate (Word Error Rate) which is a more objective measure. The overall score is calculated as follows: Firstly, the number of edit and recognition errors is deducted from the total number of words in the live subtitles. This number is then divided by the total number of words in the live subtitles and finally multiplied by one hundred. N E R v a l u e = N − E − R N ∗ 100 {\displaystyle NERvalue={\frac {N-E-R}{N}}100} . The acronyms stand for the following: N (number) = total number of words in the live subtitles E (Edit error) = edit error R (Recognition error) = recognition error This measurement process has been used for public television broadcasts in European countries like Italy and Switzerland. One major drawback with NER is that it requires a human assessor to rate errors as either: 1 Minor edition or recognition errors 2 Normal edition or recognition errors 3 Serious errors which are then weighted in the assessment process. This is both subjective, time consuming and costly. Also, NER fails to account for words left out subtitles which is something that does not take account of the D/deaf audience who want verbatim subtitles. As a result, NER cannot accurately reflect the audience's experience of subtitles. Another problem is the inconsistency of human evaluation of subtitles, particularly with live subtitles, where there are differing opinions of the importance of subtitle errors. By way of contrast, Word error rate is an objective measure of subtitle errors, since it measures the textual discrepancy between the subtitles and the speech.

    Read more →
  • Fyuse

    Fyuse

    Fyuse is a spatial photography app which lets users capture and share interactive 3D images. By tilting or swiping one's smartphone, one can view such "fyuses" from various angles — as if one were walking around an object or subject. The app blends photography and video to create an interactive medium and was first published for iOS in April 2014. The Android version was released at the end of 2014. == The app == Fyuse lets users capture panoramas, selfies, and full 360° views of objects and allows one to view captured moments from different angles. It has its own personal gallery, social network and standalone web integration. With the app, Fyusion also created a social networking platform similar to Instagram. Fyuses can be shared, commented on, liked and re-shared to one's followers (called Echoes). One can build a network of followers and with engagement tracking, one can see how many times an image has been interacted with The images can also be saved for private, offline view, or shared to other social networks, like Facebook or Twitter, or embedded on a website where the images can be interacted with by desktop users via dragging the mouse. Furthermore, in the compass tab other fyuses can be discovered using the app's system of tags and categories. One's Fyuse feed is prepopulated with top users, and one can follow people to see when they post a new fyuse. The app will also find one's friends if one signs up with Facebook or connects it with one's Twitter account. To create a fyuse one moves around a person or object with one's phone's camera in one direction or moving/tilting one's phone around while holding one's finger on the screen. By combining photography and video the app allows one to capture moments that one may not have otherwise been able to capture by recording not one moment in time but stitched together little moments. According to Fyusion CEO Radu Rusu, a photo freezes a moment in time, while a video captures moments in a linear timeline — both still flat, when viewed. A fyuse image captures a moment in space, where one can not only see one side of something, but also around it. When it is done rendering, fyuses can also be edited – one can trim the fyuse for length and edit the brightness, contrast, exposure, saturation and sharpness. One can also add a vignette and apply a filters, with options to adjust their intensity. After editing, one can write a description, add hashtags, and tag parts of the fyuse before one can (voluntarily) publish and share it. Version 1.0 has been described as "alpha prototype" and version 2.0 was released on 17 December 2014. Version 3.0 introduced 3D tagging by which users can layer 3D graphic that animate accordingly with each interaction to add some context to the content. Version 4.0 was released on December 21, 2016 for iOS. Since January 2016 (v3.2) the app allows the export of fyuses as Live Photos. The app has also been described as a more sophisticated version of 3D stickers and flip images. == Applications == The app has many applications for e-commerce such as for fashion designers who want to showcase a garment from every angle, or real estate listings and Airbnb-type sites that want to make their rental properties seem as enticing as possible. The app can also be used for interactive art, 360° panoramas and selfies. == History == San Francisco-based Fyusion Inc.'s three founders — Radu B. Rusu, CTO Stefan Holzer, and VP of Engineering Stephen Miller — worked together at Willow Garage, the robotics research lab started by early Google employee Scott Hassan in the area of "personal robotics" — Hassan decided to turn the lab into more of an incubator, suggesting that the members spin off their technologies into consumer-facing enterprises. Rusu first set out with an open-source 3D perception software startup called Open Perception. Fyusion was officially founded in 2013, and soon after Rusu and his cofounders patented the technology for spatial photography. The company closed a seed funding round at the end of May, raising $3.35 million from investors, including an angel investment from Sun Microsystems cofounder Andreas Bechtolsheim. In 2014 the Fyuse team consisted of 13 employees, mostly engineers and designers, recruited from around the globe. In March 2015 the team displayed their app at Katy Perry's premiere for the movie "Prismatic World Tour on Epix" where Perry also took Fyuse for a test run. == Augmented reality == In September 2016 Fyusion unveiled its platform for creating augmented reality content using ones smartphone. It takes the images from ones smartphone and converts them into 3D holographic images, which one can then view on an AR headset. According to Rusu "by making it easy for people to capture their surroundings on any mobile device, [Fyusion is] revolutionizing the way that people view the world around them" and also states that for "AR to be successful, anyone should be able to create content for it" opposed to the current "small number of content creators and an even smaller number of hardware players". According to him "the applications of [Fyusion's] technology for consumers and businesses are incredibly limitless". The platform uses the company's patented 3D spatio-temporal platform that uses advanced sensor fusion, machine learning and computer vision algorithms and part of the platform is built into the Fyuse app. Before committing to releasing a separate consumer product the company intends to wait until the HoloLens device becomes available to the public. Until then any Fyuse representation created using Fyuse is AR ready and will be able to be shown in HoloLens in the future. == Fyuse - Point of No Return == Fyuse - Point of No Return is a science fiction short advert for Fyuse 3.0 in which Fyuse's digital medium is extrapolated into the future. In the film a woman uses a mini scanning-drone to 3D scan a tree with Fyuse and later recreate it as an augmented reality object at another place.

    Read more →
  • Decision tree pruning

    Decision tree pruning

    Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. One of the questions that arises in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to tell when a tree algorithm should stop because it is impossible to tell if the addition of a single extra node will dramatically decrease error. This problem is known as the horizon effect. A common strategy is to grow the tree until each node contains a small number of instances then use pruning to remove nodes that do not provide additional information. Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a cross-validation set. There are many techniques for tree pruning that differ in the measurement that is used to optimize performance. == Techniques == Pruning processes can be divided into two types (pre- and post-pruning). Pre-pruning procedures prevent a complete induction of the training set by replacing a stop () criterion in the induction algorithm (e.g. max. Tree depth or information gain (Attr)> minGain). Pre-pruning methods are considered to be more efficient because they do not induce an entire set, but rather trees remain small from the start. Prepruning methods share a common problem, the horizon effect. This is to be understood as the undesired premature termination of the induction by the stop () criterion. Post-pruning (or just pruning) is the most common way of simplifying trees. Here, nodes and subtrees are replaced with leaves to reduce complexity. Pruning can not only significantly reduce the size but also improve the classification accuracy of unseen objects. It may be the case that the accuracy of the assignment on the train set deteriorates, but the accuracy of the classification properties of the tree increases overall. The procedures are differentiated on the basis of their approach in the tree (top-down or bottom-up). === Bottom-up pruning === These procedures start at the last node in the tree (the lowest point). Following recursively upwards, they determine the relevance of each individual node. If the relevance for the classification is not given, the node is dropped or replaced by a leaf. The advantage is that no relevant sub-trees can be lost with this method. These methods include Reduced Error Pruning (REP), Minimum Cost Complexity Pruning (MCCP), or Minimum Error Pruning (MEP). === Top-down pruning === In contrast to the bottom-up method, this method starts at the root of the tree. Following the structure below, a relevance check is carried out which decides whether a node is relevant for the classification of all n items or not. By pruning the tree at an inner node, it can happen that an entire sub-tree (regardless of its relevance) is dropped. One of these representatives is pessimistic error pruning (PEP), which brings quite good results with unseen items. == Pruning algorithms == === Reduced error pruning === One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed. === Cost complexity pruning === Cost complexity pruning generates a series of trees ⁠ T 0 … T m {\displaystyle T_{0}\dots T_{m}} ⁠ where ⁠ T 0 {\displaystyle T_{0}} ⁠ is the initial tree and ⁠ T m {\displaystyle T_{m}} ⁠ is the root alone. At step ⁠ i {\displaystyle i} ⁠, the tree is created by removing a subtree from tree ⁠ i − 1 {\displaystyle i-1} ⁠ and replacing it with a leaf node with value chosen as in the tree building algorithm. The subtree that is removed is chosen as follows: Define the error rate of tree ⁠ T {\displaystyle T} ⁠ over data set ⁠ S {\displaystyle S} ⁠ as ⁠ err ⁡ ( T , S ) {\displaystyle \operatorname {err} (T,S)} ⁠. The subtree t {\displaystyle t} that minimizes err ⁡ ( prune ⁡ ( T , t ) , S ) − err ⁡ ( T , S ) | leaves ⁡ ( T ) | − | leaves ⁡ ( prune ⁡ ( T , t ) ) | {\displaystyle {\frac {\operatorname {err} (\operatorname {prune} (T,t),S)-\operatorname {err} (T,S)}{\left\vert \operatorname {leaves} (T)\right\vert -\left\vert \operatorname {leaves} (\operatorname {prune} (T,t))\right\vert }}} is chosen for removal. The function ⁠ prune ⁡ ( T , t ) {\displaystyle \operatorname {prune} (T,t)} ⁠ defines the tree obtained by pruning the subtrees ⁠ t {\displaystyle t} ⁠ from the tree ⁠ T {\displaystyle T} ⁠. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. == Examples == Pruning could be applied in a compression scheme of a learning algorithm to remove the redundant details without compromising the model's performances. In neural networks, pruning removes entire neurons or layers of neurons.

    Read more →
  • GamePigeon

    GamePigeon

    GamePigeon is a mobile app for iOS devices, developed by Vitalii Zlotskii and released on September 13, 2016. The game takes advantage of the iOS 10 update, which expanded how users could interact with Apple's Messages app. GamePigeon is only available through the Messages app, which allows players to start and respond to different party games in conversations. == Release == The app was first released on September 13, 2016, coinciding with the launch of iOS 10. The app was released for free, although it includes in-app purchases to unlock additional items, such as cosmetic skins, avatar items, new game modes, and an option to remove ads. == Games in the app == The following is a list of games that users can play within GamePigeon: Sources: Poker was one of the games included in GamePigeon at launch, although it has since been removed and is no longer listed on the game's App Store description. == Reception == GamePigeon has enjoyed commercial success, with VentureBeat noting that GamePigeon was ranked number-one in the "Top Free" category of the iMessage App Store, six months after its release. Critically, GamePigeon has been generally well received, being highlighted by online media publications early on shortly after the iOS 10 launch. It has since been included on many "best iMessage apps" lists. Based on over 162,000 ratings, the game holds a 4.0 out of 5 rating on the App Store. Julian Chokkattu of Digital Trends wrote "GamePigeon should be like the pre-installed versions of Solitaire and Minesweeper that used to come with older iterations of Windows." On its launch day, Boy Genius Report included it on a list of "10 of the best iMessage apps, games and stickers for iOS 10 on launch day." The Daily Dot wrote, "GamePigeon is easily the best current gaming option within iMessages." 8-ball and cup pong have been particularly well received by media outlets. The Daily Dot had specific praise for the app's billiards game: "8-Ball controls shockingly smoothly with your fingers, and there’s nothing quite like destroying a dear friend in poker." During his 2020 U.S. presidential campaign, Cory Booker was cited as playing the game with his family. In 2017, CNBC cited one teenager who expressed that GamePigeon was one of just a few reasons that those in her age range use the iMessage app. The game has received particular positive reception for allowing introverted individuals to exercise a form social activity; similarly, the game was highlighted as a way to maintain social distancing guidelines during the COVID-19 pandemic. As an April Fools' Day joke in 2020, The Chronicle, a Duke University newspaper, published that Duke's athletic program adopted GamePigeon's Cup Pong as an official varsity sport.

    Read more →
  • Direct voice input

    Direct voice input

    Direct voice input (DVI), sometimes called voice input control (VIC), is a style of human–machine interaction "HMI" in which the user makes voice commands to issue instructions to the machine through speech recognition. In the field of military aviation, DVI has been introduced into the cockpits of several modern military aircraft, such as the Eurofighter Typhoon, the Lockheed Martin F-35 Lightning II, the Dassault Rafale, the KF-21 Boramae and the Saab JAS 39 Gripen. Such systems have also been used for various other purposes, including industry control systems and speech recognition assistance for impaired individuals. == Overview == DVI systems can be divided into two major categories of functionality: "user-dependent" or "user-independent". A user-dependent system requires that a personal voice template to be generated for a specific person; the template for this individual has to be loaded onto their assigned machine prior to use of the DVI system for it to function properly. In contrast, a user-independent system does not require any personal voice template, being intended to respond correctly to the voice of any user. They can also be categorised between "discrete recognition" and "continuous recognition". Users of a discrete recognition system must pause between each word so that the DVI system can identify the separations between each word, while a continuous speech recognition system is capable of understanding a normal rate of speech. During the mid-2000s, researchers at the National Aerospace Laboratory in the Netherlands examined the use of DVI in the "GRACE" simulator; a total of twelve pilots participated in the ensuing experiment. The tests performed reportedly revealed that, while the hardware itself functioned well, several improvements were desirable prior to real-world deployment on aircraft since DVI operations actually consumed more time in comparison to traditional existing methods. Recommendations for improvements included the adoption of simpler syntax, the achievement of a greater recognition rate, and a decrease in response times; all of the issues encountered were determined to be of a technological nature, and were deemed feasible to resolve. The researchers concluded that in cockpits, especially during emergencies where pilots have to operate entirely on their own, a DVI system could be highly relevant, but that it was not of crucial importance during most other conceivable scenarios. Around the same time, evaluations of DVI systems for civil aviation purposes were conducted within the framework of Project SafeSound, coordinated by the European Union. It involved the observation of pilot workloads in real-world cockpits and contrasting them against pilot activity in flight simulators using both conventional systems and DVI assistance. The project aimed to enhance aviation safety and to decrease the workload in both ground and flight operations via the application of enhanced audio functions. == Applications == === Aviation === Prior to its widespread deployment, a handful of conventional military aircraft were converted to trial DVI systems; examples include the Harrier AV-8B and F-16 VISTA. In another case, a General Dynamics F-16 Fighting Falcon simulator was modified with DVI for a voice control study that was undertaken by the Royal Netherlands Air Force. DVI trials have also been conducted on helicopters, including the Boeing AH-64 Apache, showing the potential to improve flight safety and mission effectiveness. Numerous modern fighter aircraft have been outfitted with DVI systems, often in combination with various other man-machine interface schemes, such as HOTAS-compliant controls and other advanced control technologies. The combination of Voice and HOTAS control schemes has sometimes been referred to as the "V-TAS" concept. A prominent fighter aircraft to be furnished with a V-TAS cockpit is the Eurofighter Typhoon. The Lockheed Martin F-35 Lightning II also features a DVI system, which was developed by Adacel. Other examples includes the Dassault Rafale and the Saab JAS 39 Gripen. Numerous aircraft have been planned to use DVI. At one stage, the United States Air Force had sought to integrate DVI upon the Lockheed Martin F-22 Raptor; however, the technology was eventually judged to pose too many technical risks at that point in time, and thus such efforts were abandoned. === Personal === By 1990, working prototypes of speech recognition systems were being demonstrated; these were being promoted for the purpose of providing an effective man-machine interface for individuals with impaired speech. Techniques employed included time-encoded digital speech and automatic token set selection. Investigations of these early DVI systems reportedly included the use of automatic diagnostic routines and limited-scale trials using volunteers. During the 2010s, various companies were offering voice recognition systems to the general public in the form of personal digital assistants. One example is the Google Voice service, which allows users to pose questions via a DVI package installed on either a personal computer, tablet, or mobile phone. Numerous digital assistants have been developed, such as Amazon Echo, Siri, and Cortana, that use DVI to interact with users. === Commercial === DVI technology has enabled automated telephone systems to be widely deployed. Many companies commonly use centralised phone systems that route callers to the correct department via such methods. Various car manufacturers have also furnished their road vehicles with DVI systems; these typically allow drivers to control infotainment systems and interact with mobile phones with more convenience than legacy methods. During the late 1980s, investigations into the use of DVI systems for controlling CNC machines and other manufacturing apparatus were underway. During the 2010s, such systems were being used for logistics and warehouse management purposes.

    Read more →