AI Art Zelda

AI Art Zelda — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Quantum robotics

    Quantum robotics

    Quantum robotics is an interdisciplinary field that investigates the intersection of robotics and quantum mechanics. This field, in particular, explores the applications of quantum phenomena such as quantum entanglement within the realm of robotics. Examples of its applications include quantum communication in multi-agent cooperative robotic scenarios, the use of quantum algorithms in performing robotics tasks, and the integration of quantum devices (e.g., quantum detectors) in robotic systems. == Introduction == The free-space quantum communication between mobile platforms was proposed for reconfigurable quantum key distribution (QKD) applications using unmanned aerial vehicle (UAVs, a.k.a. drones) in 2017. This technology was later advanced in various aspects in mobile drone and vehicle platforms in several configurations such as drone-to-drone, drone-to-moving vehicle, and vehicle-to-vehicle systems. Some research has contributed to low-size, low-weight, and low-power quantum key distribution systems for small-form UAVs, the characterization of a polarization-based receiver for mobile free-space optical QKD, and optical-relayed entanglement distribution using drones as mobile nodes. The topic of free-space quantum communication between mobile platforms, initially developed to meet the need for free-space QKD and entanglement distribution using mobile nodes, was brought into the robotics domain as an emerging interdisciplinary mechatronics topic to investigate the interface between quantum technologies and the robotic systems domain. The main advantage of such integrated technology is the guaranteed security in communication between multi-agent and cooperative autonomous systems. Other advances are anticipated. == Quantum entanglement == According to quantum mechanics, entanglement occurs when more than one particle become connected. If the state of one particle changes then it will instantly change the state of other particles regardless of their distance. Entangled sensors do the same kind of work and achieve strong sensitivity. A group of quantum robots can measure magnetic fields, gravitational fields and other physical properties using entangled sensors with high rate of accuracy. Again the connection of one robot to other is increased (become strong) by quantum entanglement. == Quantum teleportation == Quantum teleportation is the transfer of quantum information (not physical objects). This is used in case of multi robot process. One robot is programmed with a complex quantum update. Then that robot can teleport that complex quantum information (the update) to other robots. This teleportation or communication is very secure because all the work is done in quantum state. == Kinematics == Quantum computing has been proposed as being optimal for calculating inverse kinematics values. == Alice and Bob robots == In the realm of quantum mechanics, the names Alice and Bob are frequently employed to illustrate various phenomena, protocols, and applications. These include their roles in QKD, quantum cryptography, entanglement, and teleportation. The terms "Alice Robot" and "Bob Robot" serve as analogous expressions that merge the concepts of Alice and Bob from quantum mechanics with mechatronic mobile platforms (such as robots, drones, and autonomous vehicles). For example, the Alice Robot functions as a transmitter platform that communicates with the Bob Robot, housing the receiving detectors.

    Read more →
  • Adobe Encore

    Adobe Encore

    Adobe Encore (previously Adobe Encore DVD) was a DVD authoring software tool produced by Adobe Systems and targeted at professional video producers. Video and audio resources could be used in their current format for development, allowing the user to transcode them to MPEG-2 video and Dolby Digital audio upon project completion. DVD menus could be created and edited in Adobe Photoshop using special layering techniques. Adobe Encore did not support writing to a Blu-ray Disc using AVCHD 2.0. Encore is bundled with Adobe Premiere Pro CS6. Adobe Encore CS6 was the last release. While Premiere Pro CC has moved to the Creative Cloud, Encore has now been discontinued. == Licensing == All forms of Adobe Encore used a proprietary licensing system from its developer, Adobe Systems. Versions 1.0 and 1.5 required a separate license fee (rather than making 1.5 available as a free update). Version 3, also known as CS3, was sold only in bundle with Premiere CS3. Encore CS4, CS5, CS5.5 and CS6 were only sold in the Premiere Pro CS4, CS5, CS5.5 and CS6 bundles, respectively. Adobe CC subscribers no longer have access to Adobe Encore CS6. Adobe Encore is not included with Premiere Pro CC. == Functionality == Adobe Encore allowed for creating interactive DVD menus from Photoshop documents, which could be tweaked from within Encore. Video and audio streams could be embedded in the DVD and be made to play when certain elements of the menu are interacted with. It had similar functionality to Adobe Flash and Premiere Pro, due to its ability to both edit video on a timeline and embed interactive content.

    Read more →
  • Sorenson Squeeze

    Sorenson Squeeze

    Sorenson Squeeze was a software video encoding tool used to compress and convert video and audio files on Mac OS X or Windows operating systems. It was sold as a standalone tool and has also long been bundled with Avid Media Composer. == History == Sorenson Squeeze was first announced on July 17, 2001, as the first variable bit rate (VBR) compression application for Mac OS X, and was released on October 29 of that same year. By March 2002, Sorenson Squeeze became available for Windows OS. Sorenson Squeeze was originally released as a tool for encoding videos for the Web and QuickTime playback but began adding new codecs as more versions were released. The software was discontinued by Sorenson in January 2019, and correspondingly was no longer offered as part of Avid Media Composer. == Features == Squeeze included a number of features to improve video & audio quality. Features included: GPU accelerated H.264 encoding, adaptive bitrate encoding, HD encoding and Dolby certified AC3 Audio. Intelligent encoding presets available in Squeeze included: x265 (H.265) MainConcept H.264 and MainConcept H.264 CUDA. Adaptive bitrate encoding allows for optimal bitrate and error resilience based on network conditions, resulting in a dynamic adjustment of the video bitstream being delivered. It encoded to multiple formats including QuickTime, Windows Media, Flash Video, Silverlight, WebM & WMV. It uses multiple codecs, including the Sorenson codecs SV3 Pro and Spark, H.265, H.264, H.263, VP6, VC1, MPEG2, and many others. Squeeze operates on the Apple Macintosh and Microsoft Windows operating systems. Squeeze offers native plugins to Avid, Apple Final Cut Pro and Adobe Premiere (CS4, CS5) NLEs. Each copy of Squeeze included the Dolby Certified AC3 Consumer encoder. Squeeze also included a simplified review and approval process, which allows the user to automatically send secure, password protected videos for immediate review. Instant feedback is received via Web or mobile. == Versions == Sorenson Squeeze was released on October 29, 2001. Sorenson Squeeze for Macromedia Flash MX was released on March 14, 2002. Sorenson Squeeze 3 for MPEG-4 was released in January 2003. Sorenson Squeeze 3 Compression Suite was released in January 2003. Sorenson Squeeze 5 was released on March 31, 2008. Sorenson Squeeze was updated to version 5.1 on May 11, 2009. Sorenson Squeeze 6 was released on November 3, 2009. Sorenson Squeeze 7 was released January 25, 2011. Sorenson Squeeze 11 was released August 27, 2016. == Awards == Streaming Media magazine Readers’ Choice Award for Encoding Software for 2007, 2008, 2009 and 2010. 2008 Vanguard Award from Digital Content Producer magazine == Squeeze 7 system requirements == Windows Pentium IV-based computer or greater Windows XP, Vista or 7 32- and 64-bit compatible (including AVID 64-bit update); Faster performance on 64-bit systems 512 MB RAM 120 MB available hard drive space QuickTime 7.2 or later DirectX 9.0b or later Macintosh Intel-based processor Mac OS 10.4 or later 32- and 64-bit compatible; Faster performance on 64-bit systems 512 MB RAM 120 MB available hard drive space QuickTime 7.2 or later

    Read more →
  • Verbal overshadowing

    Verbal overshadowing

    Verbal overshadowing is a phenomenon where giving a verbal description of sensory input impairs formation of memories of that input. This was first reported by Schooler and Engstler-Schooler (1990) where it was shown that the effects can be observed across multiple domains of cognition which are known to rely on non-verbal knowledge and perceptual expertise. One example of this is memory, which has been known to be influenced by language. Seminal work by Carmichael and collaborators (1932) demonstrated that when verbal labels are connected to non-verbal forms during an individual's encoding process, it could potentially bias the way those forms are reproduced. Because of this, memory performance relying on reportable aspects of memory that encode visual forms should be vulnerable to the effects of verbalization. == Initial findings == Schooler and Engstler-Schooler (1990) were the first to report findings of verbal overshadowing. In their study, participants watched a video of a simulated robbery and were instructed to either verbally describe the robber or engage in a control task. Those who engaged in giving a verbal description were less likely to correctly identify the robber from a test lineup, compared to those who engaged in the control task. A larger effect was detected when the verbal description was provided 20, rather than 5, minutes after the video, and immediately before the test lineup. A meta-analysis by Meissner and Brigham (2001) supported the effects of verbal overshadowing, showing a small but reliably negative effect. == General effects of verbal overshadowing == The effects of verbal overshadowing have been generalized across multiple domains of cognition that are known to rely on non-verbal knowledge and perceptual expertise, such as memory. Memory has been known to be influenced by language. Seminal work by Carmichael and collaborators (1932) demonstrated that labels attached to, or associated with, non-verbal forms during memory encoding can affect the way the forms were subsequently reproduced. Because of this, memory performance that relies on reportable aspects of memory that encode visual forms should be vulnerable to the effects of verbalization. Pelizzon, Brandimonte, and Luccio (2002) found that visual memory representations appear to incorporate visual, spatial, and temporal characteristics. It is explained as follows: With the temporal code (where the only information available is the sequence of the stimuli), performance levels remain high, unless participants are required to retrieve the stimuli in a different order from that used at encoding (visual cue). In this case, performance is significantly impaired, even in the presence of a visual cue. The study showed that order information acts as a link between the two separate representations of figure and background, hence preventing verbal overshadowing at encoding (temporal component) or attenuating its influence at retrieval (spatial component).(p. 960) Hatano, Ueno, Kitagami, and Kawaguchi found that verbal overshadowing is likely to occur when participants verbally described targets in detail. Detailed verbal descriptions resulted in more frequently inaccurate descriptions that in turn created inaccurate representations in the memories of participants. Inaccuracies are also likely to occur when face recognition comes immediately after verbalization. Other forms of non-verbal knowledge affected by verbal overshadowing include the following: [Verbal overshadowing] has also been observed when participants attempt to generate descriptions of other 'difficult-to-describe' stimuli such as colors (Schooler and Engstler-Schooler, 1990) or abstract figures (Brandimonte et al., 1997), or other non-visual tasks such as wine tasting (Melcher and Schooler, 1996), decision making (Wilson and Schooler, 1991), and insight problem-solving. (p. 871) (Schooler et al., 1993) Verbalization of stimuli leads to the disruption of non-reportable processes that are necessary for achieving insight solutions, which are distinct from language processes. Schooler, Ohlsson, and Brooks (1993) found that face recognition requires information that cannot be adequately verbalized, giving rise to difficulty in describing factors in recognition judgments. Subjects were less effective in solving insight problems when compelled to put their thoughts in words, which suggests that language may interfere with thought. The verbal overshadowing effect was not seen when participants engaged in articulatory suppression. Performance was reduced in both the verbal and non-verbal description conditions. This is evidence that verbal encoding plays a role in face recognition. By testing with distracting faces presented between study and test, Lloyd-Jones and Brown (2008) suggested a dual-process approach to recognition memory took place, that verbalization influenced familiarity-based processes at first, but its effects were later seen on recollection, when discrimination between items became more difficult. == Verbal overshadowing in facial recognition == The verbal overshadowing effect can be found for facial recognition because faces are predominately processed in a holistic or configurable manner. (Tanaka & Farah, 1993; Tanaka & Sengco, 1997) Verbalizing one's memory for a face is done using a featural or analytic strategy, leading to a drift from the configurable information about the face and to impaired recognition performance. However, Fallshore & Schooler (1995) found that the verbal overshadowing effect was not found when participants described faces of races different from their own. A study by Brown and Lloyd-Jones (2003) found that there was no verbal overshadowing effect found in car descriptions; it was only seen in facial descriptions. The authors noted that descriptions were no different on any measure including accuracy. It is suggested that less expertise in verbalizing faces rather than cars invokes a stronger shift in verbal and featural processing. This supports the concept of a transfer inappropriate retrieval framework and addresses some limitations of the effect. Wickham and Swift (2006) suggested that the verbal overshadowing effect is not seen in describing all faces, and one aspect that determines this is distinctiveness. Results showed that typical faces produce verbal overshadowing, while distinctive faces did not. In studies of eyewitness reports, variation in response criteria given by participants influenced the quality of the descriptions generated and accuracy on identification task, known as the retrieval-based effect. Face recognition was also impaired when subjects described a familiar face, such as a parent, or when describing a previously seen but novel face. Dodson, Johnson, and Schooler (1997) found that recognition was also impaired when participants were provided with a description of a previously seen face, and they were able to ignore provided versus self-generated descriptions more easily. This finding of verbal overshadowing suggested that eyewitness recognition is not only affected by their own descriptions, but of descriptions heard from others, such other eyewitness testimonies. == Voice recognition == The verbal overshadowing effect has also been found to affect voice identification. Research shows that describing a non-verbal stimuli leads to a decrease in recognition accuracy. In an unpublished study by Schooler, Fiore, Melcher, and Ambadar (1996), participants listened to a tape-recorded voice, after which they were asked either to verbally describe it or to not do so, and then asked to distinguish the voice from 3 similar distractor voices. The results showed that verbal overshadowing impaired accuracy of recognition based on gut feeling, suggesting an overall verbal overshadowing for voice recognition. Due to the forensic relevance of voices heard over the telephone and harassing phone calls that are often a problem for police, Perfect, Hunt, and Harris (2002) examined the influence of three factors on accuracy and confidence in voice recognition from a line-up. They expected to find an effect, because voice represents a class of stimuli that is difficult to describe verbally. This meets Schooler et al.'s (1997) modality mismatch criterion, meaning that describing the speakers age, gender, or accent is difficult, making voice recognition susceptible to the verbal overshadowing phenomenon. It was found that the method of memory encoding had no impact on performance, and that hearing a telephone voice reduced confidence but did not affect accuracy. They also found that providing a verbal description impaired accuracy but had no effect on confidence. The data showed an effect of verbal overshadowing in voice recognition and provided yet another disassociation between confidence and performance. Although there was a difference in confidence level, witnesses were able to identify voices over the telephone as accurately as voices heard direc

    Read more →
  • Ed (chatbot)

    Ed (chatbot)

    Ed was a chatbot co-developed by the Los Angeles Unified School District and AllHere Education. Described as a learning acceleration platform, it was the first personal assistant for students in the United States. Part of the district's Individual Acceleration Plan, it was able to interact with students both verbally and visually, offering support in 100 languages. The chatbot was launched on March 20, 2024, as part of the district's plan for academic recovery from the COVID-19 pandemic and to improve overall academic performance. Utilizing artificial intelligence, Ed organizes data and reports on grades, test scores, and attendance, creating individualized plans for each student. After the company behind it, AllHere, collapsed, the district shuttered operations of the chatbot on June 14, 2024. The firm is under investigation by the US Federal Bureau of Investigation. == History == On February 14, 2022, Alberto M. Carvalho became the Superintendent of the Los Angeles Unified School District, pledging to give the district a full academic recovery from the COVID-19 pandemic. In December 2022, he announced the Individual Acceleration Plan for the district, which aimed to provide each student with a unique progress report and help them determine if they were on track to graduate. The district faced criticism from disability advocates for its management of Individualized Education Programs, and in April 2022, the United States Department of Education announced that the district had failed to provide appropriate educational services to students with disabilities during the pandemic. The district had been grappling with significant absenteeism issues since the pandemic, which led to declining academic performance and disengagement among students. On February 17, 2023, the district issued a request for proposals to develop a fully integrated portal system. Later that year, they signed a $6 million, five-year contract with AllHere Education, a Boston-based company founded in 2016. The introduction of Ed follows the public launch of ChatGPT, which has been utilized by both teachers and students in educational settings. On August 4, 2023, during an annual address at the Walt Disney Concert Hall, Carvalho and the Los Angeles Unified School District announced the launch of Ed. The district invested $4 million into the chatbot, with Carvalho noting that this cost would be halved thanks to donor and grant funding. The chatbot was launched on March 20, 2024. Following its launch, a press conference was held to address security and technology concerns. Carvalho stated that the district had collaborated with security companies and incorporated filters to screen for threatening language. Months after its launch, AllHere Education furloughed most of its staff on June 14, citing their “current financial position” on its website as the reason. After learning about the furlough, the district terminated its dealings with AllHere Education. However, it stated its intention to bring the chatbot back in the future once officials determine the best course of action. Carvalho announced that he would appoint an independent task force to review what went wrong with AllHere Education and the chatbot. On February 25, 2026, the FBI served a search warrant on Carvalho’s home and office in connection with AllHere. The FBI also raided the LAUSD's headquarters. == Service == The chatbot was described as a personal assistant and a "one-stop shop for parents and students" who want to see information about a student's attendance and grades, as well as other resources from the district. Additionally, the application can function as an alarm clock, provide daily lunch menus from the school cafeteria, and offer updates on the location of school buses. The chatbot also helps students and parents who do not speak English as their first language by translating displayed information into approximately 100 different languages. The application can also help with submitting applications and give updates on progress and upcoming assignments. The district stated that the primary goal of Ed was to actively motivate students to complete homework and other tasks. == Reception == The chatbot received a mostly positive reception among parents and observers upon its launch. Some parents and teachers expressed caution about the technology, voicing concerns that the district's push for its implementation lacked public accountability. Rob Nelson from the University of Pennsylvania described the district's strategy as risky, saying that the release felt "like the beginning of a Clippy-level disaster". After the chatbot's shutdown, The 74 criticized it for misusing student data. Chris Whiteley, a former software engineer at AllHere Education, alleged that the data collected by the chatbot likely violated the district's data privacy rules.

    Read more →
  • Picture Prowler

    Picture Prowler

    Picture Prowler was an early piece of photo management software developed around and meant to show off Xing Technology's JPEG image decompression library during the early 1990s. Little known today, it featured thumbnail based picture management, printing, etc. The primary developer was Ray Bunnage from compression / decompression libraries developed by Howard Gordon and Chris Eddy.

    Read more →
  • Aphelion (software)

    Aphelion (software)

    The Aphelion Imaging Software Suite is a software suite that includes three base products - Aphelion Lab, Aphelion Dev, and Aphelion SDK for addressing image processing and image analysis applications. The suite also includes a set of extension programs to implement specific vertical applications that benefit from imaging techniques. The Aphelion software products can be used to prototype and deploy applications, or can be integrated, in whole or in part, into a user's system as processing and visualization libraries whose components are available as both DLLs or .Net components. == History and evolution == The development of Aphelion started in 1995 as a joint project of a French company, ADCIS S.A., and an American company, Amerinex Applied Imaging, Inc. (AAI) Aphelion's image processing and analysis functions were made from operators available from the KBVision software developed and sold by Amerinex's predecessor, Amerinex Artificial Intelligence Inc. In the 1990s, the XLim software library was developed at the Center of Mathematical Morphology of Mines ParisTech, and both companies carried out its development tasks. The first version of Aphelion was completed and released in April 1996. Successive versions were released before the first official stable release in December 1996 at the Photonics East conference in Boston and the Solutions Vision show in Paris in January 1997, where at the latter it competed with Stemmer Imaging's CVB imaging toolbox. In 1998, version 2.3 of Aphelion for Windows 98 was released, and its user base was growing in both France and the United States. Version 3.0, totally rewritten to take advantage of Microsoft's then-recent ActiveX technology, was officially released in 2000. It also became available as a « Developer » version, for rapid prototyping of applications using its intuitive GUI and the macro recording capability, and a « Core » version, including the full library as a set of ActiveX components to be used by software developers, integrators and original equipment manufacturers (OEM). As AAI turned its focus to security, in 2001, ADCIS took the lead on developing Aphelion. AAI focused on millimeter wave scanners for concealed weapon detection at airports, and eventually merged with Millimetrics to become Millivision. In 2004, ADCIS specified version 4.0 of Aphelion. The set of image processing/analysis functions was rewritten one more time to be compatible with the .NET technology and the emergence of 64 bit architecture PCs. In addition, the GUI was redesigned to address two usage types: a semi-automatic use where the user is guided through the different steps of functions, and a fully automatic use where the expert user can quickly invoke imaging functions. Its first release was presented at the IPOT exhibition in Birmingham, UK the same year. During the Vision Show in Paris in October 2008, the new Aphelion Lab product was launched for users that are not specialists in image processing. It is easier to use, and only includes fewer image processing functions. It was then included in the Aphelion Image Processing Suite, consisting of Aphelion Dev (replacing Aphelion Developer), Aphelion Lab, Aphelion SDK (replacing Aphelion Core), and a set of extensions. Nowadays, ADCIS is still working on the suite, and updated versions with new extensions and functionalities continually become available from the websites of both companies. In 2015, support was added for very large images and scan microscope images (virtual slides compound into a very large JPEG 2000 image) for high throughput imaging, and new specific extensions were also added. In late 2015, ADCIS announced Aphelion's port for tablets and smartphones, for vertical applications. The name "Aphelion" comes from the astronomical term of the same name, meaning the point on a planet rotating around the Sun where it lies farthest from it, applying the term in a metaphorical sense. Unix was the operating system used on scientific workstations in the 1990s, such as on the workstations manufactured by market leader Sun Microsystems, which Windows suite Aphelion was quite removed from. == Description == Aphelion is a software suite to be used for image processing and image analysis. It supports 2D and 3D, monochrome, color, and multi-band images. It is developed by ADCIS, a French software house located in Saint-Contest, Calvados, Normandy. Aphelion is widely used in the scientific/industry community to solve basic and complex imaging applications. First, the imaging application is quickly developed from the Graphical User Interface, involving a set of functions that can be automatically recorded into a macro command. The macro languages available in Aphelion (i.e. BasicScript, Python, and C#) help to process batch of images, and prompt the user if needed for specific parameters that are applied to the imaging functions. All Aphelion image processing functions are written in C++, and the Aphelion user interface is written in C#. C++ functions can be called from the C# language thanks the use of dedicated wrappers. The main principle of image processing is to automatically process pixels of a digital image, then extract one or more objects of interest (i.e. cells in the field of biology, inclusions in the field of material science) and compute one or more measurements on those objects to quantify the image and generate a verdict (good image, image with defects, cancerous cells). In other words, starting from an image, pixels are processed by a set of successive functions or operators until only measurements are computed and used as the input of a 3rd party system or a classification software that will classify objects of interest that have been extracted during the imaging process. An acquisition system such as a digital camera, a video camera, an optical or electron microscope, a medical scanner, or a smartphone can be used to capture images. The set of values or pixels can be processed as a 1D image (1D signal), a 2D image (array of pixel values corresponding to a monochrome or color image), or a 3D image displayed using volume rendering (array of voxels in the 3D space) or displaying surfaces by using 3D rendering. A 2D color image is made of 3 value pixels (typically Red, Green, and Blue information or another color space), and a 3D image is made of monochrome, color (indexed color are often used), multispectral, or hyperspectral data. When dealing with videos, an additional band is added corresponding to temporal information. The Aphelion Software Suite includes three base products, and a set of optional extensions for specific applications: Aphelion Lab: Entry-level package for non-experts in image processing. It helps to quickly segment an image in a semi-automatic or manual ways, and compute a set of measurements computed on objects of interest that have been extracted during the segmentation process. A set of wizards guides the user from image acquisition to report generation. Aphelion Dev: Full imaging environment including over 450 functions to develop and deploy an application that involves image processing and analysis. It also includes a set of macro-command languages to automate any application to be invoked from the user interface. It also helps to run the imaging algorithm on more than one image that are stored on disk, available on the network, or captured by an acquisition device. Aphelion libraries for image processing and visualization are provided in Aphelion Dev as DLLs and .Net components. Aphelion SDK: A set of libraries to develop a stand-alone application with a custom interface based on the Aphelion libraries. This software development kit including display, processing and analysis functions that can be used by software developers and OEMs. It is provided as DLLs and .Net components. The stand-alone application is typically developed in C# on one computer, and then deployed on multiple PCs and systems. A set of optional extensions can be added to the « Aphelion Dev » product, depending on the application. An evaluation version of Aphelion can be run on a PC for 30 days. A permanent version of Aphelion is available based on a perpetual license. Upgrades are available through a maintenance agreement based on a yearly fee. Technical support is provided by the engineers who are developing the product. The goal of image processing is usually to extract object(s) of interest in an image, and then to classify them based on some characteristics such as shape, density, position, etc. Using Aphelion, this goal is achieved by performing the following tasks: Load an image from disk or acquire an image using an acquisition device. Enhance the image removing noise or modifying its contrast. Segment the image extracting objects of interest to be measured and analyzed. Typically, for simple applications, a threshold is performed to generate a binary image. Then, morphological operators are applied to clean the image and only keep obj

    Read more →
  • Scientific Working Group – Imaging Technology

    Scientific Working Group – Imaging Technology

    The Scientific Working Group on Imaging Technology was convened by the Federal Bureau of Investigation in 1997 to provide guidance to law enforcement agencies and others in the criminal justice system regarding the best practices for photography, videography, and video and image analysis. This group was terminated in 2015. == History == As technology has advanced through the years, law enforcement has needed to stay abreast of emerging technological advances and use these in the investigation of crime. A factor that is considered when new technology is used in these investigations is the determination of whether the use of that new technology will be admissible in court. The judicial system in the United States currently has two standards used in the determination of admissibility of testimony regarding scientific evidence; the Daubert Standard and the Frye Standard. These standards guide the courts in the admissibility of testimony derived from the use of new technologies and scientific techniques. The Federal Bureau of Investigation (FBI), seeking to address possible admissibility issues with such testimony, established Scientific Working Groups starting with the Scientific Working Group on DNA Analysis and Methods (SWGDAM) in 1988. The goal of these groups is to open lines of communication between law enforcement agencies and forensic laboratories around the world while providing guidance on the use of new and innovative technologies and techniques. This guidance can lead to admissibility of evidence and/or testimony, provided proper methods in the collection of evidence and its analysis are employed. In 2009, the National Academy of Sciences released a report entitled, "Strengthening Forensic Science in the United States: A Path Forward." This report addresses many topics including challenges and disparities facing the forensic science community, standardization, certification of practitioners and accreditation of their respective entities, problems related to the interpretation of forensic evidence, the need for research, and the admission of forensic science evidence in litigation. This report mentions the Scientific Working Groups and their role in forensic science. The history of imaging technology (photography) can be said to extend back to the times of Chinese philosopher Mo-Ti (470-390 B.C.) who described the principles behind the precursor to the camera obscura. Since that time, advances in imaging technology include the discovery of chemical photographic processes in the 19th century and the use of electronic imaging technology that includes analog video cameras and digital video and still cameras. By the mid 1990s, it was apparent that technologically advanced camera systems such as these were being adopted for use in the criminal justice system. This led the FBI to convene a meeting of individuals working in the field of forensic imaging from federal, state, local, and foreign law enforcement, and the U.S. military, during the summer of 1997. As a result of this meeting, the Technical Working Group on Imaging Technology was formed from a core group of the meeting’s participants. This group later became the Scientific Working Group on Imaging Technology (SWGIT). Prior to the inception of SWGIT, some law enforcement agencies began adopting digital imaging technology. Due to the lack of guidelines or standards, some of these agencies attempted to replace all their film cameras with substandard digital cameras, only to find that the equipment they had purchased was not capable of accomplishing the mission for which they were intended. At that time only low resolution digital cameras were deemed affordable by some law enforcement agencies. Some of these agencies were forced to rethink their photography procedures and reverted to the use of film cameras or replaced their low-resolution digital cameras with higher quality, more expensive equipment. Also lacking at this early stage was guidance on how to store and archive digital image files. When SWGIT was formed, it was tasked with providing guidance to law enforcement and others in the criminal justice system by releasing documents that describe the best practices and guidelines for the use of imaging technology, to include these concerns and many others. This group was terminated in 2015. == SWGIT Function == During its existence, SWGIT provided information on the appropriate use of various imaging technologies including both established and new. This was accomplished through the release of documents such as the SWGIT Best Practices documents. As changes in technology occurred, these documents were updated. Over the course of its existence, SWGIT collaborated with other Scientific Working Groups to address imaging concerns within their respective disciplines. SWGIT published over 20 documents that dealt specifically with imaging technology. SWGIT also co-published documents with the Scientific Working Group on Digital Evidence (SWGDE) that had a component or components dealing with imaging technology. SWGIT also provided imaging technology guidance and input for documents from the Scientific Working Group on Friction Ridge Analysis, Study and Technology (SWGFAST), the Scientific Working Group for Forensic Document Examination (SWGDOC), and the Scientific Working Group on Shoeprint and Tire Tread Evidence (SWGTREAD). SWGIT assisted the American Society of Crime Lab Directors/Laboratory Accreditation Board (ASCLD/LAB) in the writing of definitions and standards for the accreditation of Digital and Multimedia Evidence sections of crime laboratories. In addition to releasing documents, SWGIT members disseminated best practices for law enforcement professionals where imaging technology was concerned. This was carried out by attending and lecturing at meetings and conferences of various forensic organizations that included: The American Academy of Forensic Sciences (AAFS) The International Association for Identification (IAI) The Law Enforcement and Emergency Services Video Association (LEVA) The American Society of Crime Lab Directors (ASCLD) The SWGIT membership consisted of approximately fifty scientists, photographers, instructors, and managers from more than two dozen federal, state, and local law enforcement agencies, as well as from the academic and research communities. The membership elected its officers from within. SWGIT was composed of the Executive Committee, four standing subcommittees, and ad hoc subcommittees appointed on an as-needed basis. The standing subcommittees were: Image Analysis, Forensic Photography, Video, and Outreach. This group was terminated in 2015. == Legal Proceedings == The following court cases have conducted Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993) hearings in which SWGIT best practice documents have been cited as accepted protocol, methodology, and as generally accepted techniques in the forensic community: U. S. v. Rudy Frabizio, U.S. District Court, Boston, MA, 2008 (Image Authentication) U.S. v. Nobumochi Furukawa, U.S. District Court, Minnesota, 2007 (Video Authentication) U.S. v. John Stroman, U.S. District Court, South Carolina, 2007 (Facial Comparison Analysis) State of Texas v. Daniel Day, Tarrant County Texas, 2005 (Camera Identification to Images) U.S. v. Marc Watzman, U.S. District Court, Northern Illinois, 2004 (Video Authentication) U.S. v. McKreith, U.S. District Court, Fort Lauderdale, FL, 2002 (Photo comparison of shirt) == Termination == This group was unfunded by the FBI in 2015.

    Read more →
  • Multi-focus image fusion

    Multi-focus image fusion

    Multi-focus image fusion is a multiple image compression technique using input images with different focus depths to make one output image that preserves all information. == Overview == The main idea of image fusion is gathering important and the essential information from the input images into one single image which ideally has all of the information of the input images. The research history of image fusion spans over 30 years and many scientific papers. Image fusion generally has two aspects: image fusion methods and objective evaluation metrics. In visual sensor networks (VSN), sensors are cameras which record images and video sequences. In many applications of VSN, a camera can't give a perfect illustration including all details of the scene. This is because of the limited depth of focus of the optical lens of cameras. Therefore, just the object located in the focal length of camera is focused and clear, and other parts of the image are blurred. VSN captures images with different depths of focus using several cameras. Due to the large amount of data generated by cameras compared to other sensors such as pressure and temperature sensors and some limitations of bandwidth, energy consumption and processing time, it is essential to process the local input images to decrease the amount of transmitted data. == Multi-Focus image fusion in the spatial domain == Huang and Jing have reviewed and applied several focus measurements in the spatial domain for the multi-focus image fusion process, suitable for real-time applications. They mentioned some focus measurements including variance, energy of image gradient (EOG), Tenenbaum's algorithm (Tenengrad), energy of Laplacian (EOL), sum-modified-Laplacian (SML), and spatial frequency (SF). Their experiments showed that EOL gave better results than other methods like variance and spatial frequency. == Multi-Focus image fusion in multi-scale transform and DCT domain == Image fusion based on the multi-scale transform is the most commonly used and promising technique. Laplacian pyramid transform, gradient pyramid-based transform, morphological pyramid transform and the premier ones, discrete wavelet transform, shift-invariant wavelet transform (SIDWT), and discrete cosine harmonic wavelet transform (DCHWT) are some examples of image fusion methods based on multi-scale transform. These methods are complex and have some limitations e.g. processing time and energy consumption. For example, multi-focus image fusion methods based on DWT require a lot of convolution operations, so they take more time and energy to process. Therefore, most methods in multi-scale transform are not suitable for real-time applications. Moreover, these methods are not very successful along edges, due to the wavelet transform process missing the edges of the image. They create ringing artefacts in the output image and reduce its quality. Due to the aforementioned problems in the multi-scale transform methods, researchers are interested in multi-focus image fusion in the DCT domain. DCT-based methods are more efficient in terms of transmission and archiving images coded in Joint Photographic Experts Group (JPEG) standard to the upper node in the VSN agent. A JPEG system consists of a pair of an encoder and a decoder. In the encoder, images are divided into non-overlapping 8×8 blocks, and the DCT coefficients are calculated for each. Since the quantization of DCT coefficients is a lossy process, many of the small-valued DCT coefficients are quantized to zero, which corresponds to high frequencies. DCT-based image fusion algorithms work better when the multi-focus image fusion methods are applied in the compressed domain. In addition, in the spatial-based methods, the input images must be decoded and then transferred to the spatial domain. After implementation of the image fusion operations, the output fused images must again be encoded. DCT domain-based methods do not require complex and time-consuming consecutive decoding and encoding operations. Therefore, the image fusion methods based on DCT domain operate with much less energy and processing time. Recently, a lot of research has been carried out in the DCT domain. DCT+Variance, DCT+Corr_Eng, DCT+EOL, and DCT+VOL are some prominent examples of DCT based methods.

    Read more →
  • Eaze

    Eaze

    Eaze is an American company based in San Francisco, California that launched a medical cannabis delivery app of the same name in 2014. == History == Eaze was launched in 2014 by Keith McCarty to deliver medical marijuana to patients in California. McCarty started the company in his San Francisco apartment with four employees. The company provides a mobile app to connect users with cannabis dispensaries, but does not grow or sell marijuana itself, and has been nicknamed “the Uber of Weed”. As of 2017, the company operates in more than 100 cities within California. In 2017, Eaze reported 300 percent growth over the previous year. It has 81 employees, and performs 120,000 deliveries per month to 250,000 users. A survey of Eaze users revealed that 66% are male, 57% are between 22 and 34, just over half have a bachelor's degree, and 49% have an annual income over $75,000. The company's vaporizer cartridge sales reached $1 million in sales in 4 months, and 31% of customers had ordered a vaporizer by the end of 2016. In 2016, Eaze founder Keith McCarty stepped down from his position as CEO and was replaced by Jim Patterson, who served as the company's chief product and technology officer. == EazeMD == EazeMD is a service that helps people acquire a medical marijuana card. It is a California-based telemedicine service in which physicians assess patients through an online video chat. It is California's largest telemedicine service for marijuana referrals. In June 2017, a former employee of one of these physicians accessed patient data in the physician's records system, causing a security breach. However, there was no evidence that Eaze data was accessed. == Eaze Insights == Eaze Insights conducts surveys of their users and compiles data into reports on cannabis use. Statistics from their reports have been cited in Seattle Weekly, Forbes, The Huffington Post, Business Insider, Fortune, and other general interest publications. == Financing == The company announced its $10 million Series A funding in April 2015 by multiple venture capital firms, including the Snoop Dogg-backed Casa Verde Capital. In October 2016, Eaze announced its series B funding in the amount of $13 million from five investors, making the company "the highest-funded startup in the history of the cannabis industry, as well as its fastest-growing one". In September 2017, the company raised another $27 million in venture funding. The Series B funding was led by Bailey Capital, joined by DCM Ventures, Kaya Ventures, and FJ Labs. According to the company' officials in 2017, Eaze managed to raise more than $52 million since its inception in 2014.

    Read more →
  • Automation

    Automation

    Automation describes a wide range of technologies that reduce human intervention in processes, mainly by predetermining decision criteria, subprocess relationships, and related actions, as well as embodying those predeterminations in machines. Automation has been achieved by various means including mechanical, hydraulic, pneumatic, electrical, electronic devices, and computers, usually in combination. Complicated systems, such as modern factories, airplanes, and ships typically use combinations of all of these techniques. The benefits of automation includes labor savings, reducing waste, savings in electricity costs, savings in material costs, and improvements to quality, accuracy, and precision. Automation includes the use of various equipment and control systems such as machinery, processes in factories, boilers, and heat-treating ovens, switching on telephone networks, steering, stabilization of ships, aircraft and other applications and vehicles with reduced human intervention. Examples range from a household thermostat controlling a boiler to a large industrial control system with tens of thousands of input measurements and output control signals. In the simplest type of an automatic control loop, a controller compares a measured value of a process with a desired set value and processes the resulting error signal to change some input to the process, in such a way that the process stays at its set point despite disturbances. This closed-loop control is an application of negative feedback to a system. The mathematical basis of control theory began in the 18th century and advanced rapidly in the 20th. The term automation, inspired by the earlier word automatic (coming from automaton), was not widely used before 1947, when Ford established an automation department. It was during this time that the industry was rapidly adopting feedback controllers, Technological advancements introduced in the 1930s revolutionized various industries significantly. The World Bank's World Development Report of 2019 shows evidence that the new industries and jobs in the technology sector outweigh the economic effects of workers being displaced by automation. Job losses and downward mobility blamed on automation have been cited as one of many factors in the resurgence of nationalist, protectionist and populist politics in the US, UK and France, among other countries since the 2010s. == History == === Early history === It was a preoccupation of the Greeks and Arabs (in the period between about 300 BC and about 1200 AD) to keep an accurate track of time. In Ptolemaic Egypt, about 270 BC, Ctesibius described a float regulator for a water clock, a device not unlike the ball and cock in a modern flush toilet. This was the earliest feedback-controlled mechanism. The appearance of the mechanical clock in the 14th century made the water clock and its feedback control system obsolete. The Persian Banū Mūsā brothers, in their Book of Ingenious Devices (850 AD), described a number of automatic controls. Two-step level controls for fluids, a form of discontinuous variable structure controls, were developed by the Banu Musa brothers. They also described a feedback controller. The design of feedback control systems up through the Industrial Revolution was by trial-and-error, together with a great deal of engineering intuition. It was not until the mid-19th century that the stability of feedback control systems was analyzed using mathematics, the formal language of automatic control theory. The centrifugal governor was invented by Christiaan Huygens in the seventeenth century, and used to adjust the gap between millstones. === Industrial Revolution in Western Europe === The introduction of prime movers, or self-driven machines advanced grain mills, furnaces, boilers, and the steam engine created a new requirement for automatic control systems including temperature regulators (invented in 1624; see Cornelius Drebbel), pressure regulators (1681), float regulators (1700) and speed control devices. Another control mechanism was used to tent the sails of windmills. It was patented by Edmund Lee in 1745. Also in 1745, Jacques de Vaucanson invented the first automated loom. Around 1800, Joseph Marie Jacquard created a punch-card system to program looms. In 1771 Richard Arkwright invented the first fully automated spinning mill driven by water power, known at the time as the water frame. An automatic flour mill was developed by Oliver Evans in 1785, making it the first completely automated industrial process. A centrifugal governor was used by Mr. Bunce of England in 1784 as part of a model steam crane. The centrifugal governor was adopted by James Watt for use on a steam engine in 1788 after Watt's partner Boulton saw one at a flour mill Boulton & Watt were building. The governor could not actually hold a set speed; the engine would assume a new constant speed in response to load changes. The governor was able to handle smaller variations such as those caused by fluctuating heat load to the boiler. Also, there was a tendency for oscillation whenever there was a speed change. As a consequence, engines equipped with this governor were not suitable for operations requiring constant speed, such as cotton spinning. Several improvements to the governor, plus improvements to valve cut-off timing on the steam engine, made the engine suitable for most industrial uses before the end of the 19th century. Advances in the steam engine stayed well ahead of science, both thermodynamics and control theory. The governor received relatively little scientific attention until James Clerk Maxwell published a paper that established the beginning of a theoretical basis for understanding control theory. === 20th century === Relay logic was introduced with factory electrification, which underwent rapid adaptation from 1900 through the 1920s. Central electric power stations were also undergoing rapid growth and the operation of new high-pressure boilers, steam turbines and electrical substations created a great demand for instruments and controls. Central control rooms became common in the 1920s, but as late as the early 1930s, most process controls were on-off. Operators typically monitored charts drawn by recorders that plotted data from instruments. To make corrections, operators manually opened or closed valves or turned switches on or off. Control rooms also used color-coded lights to send signals to workers in the plant to manually make certain changes. The development of the electronic amplifier during the 1920s, which was important for long-distance telephony, required a higher signal-to-noise ratio, which was solved by negative feedback noise cancellation. This and other telephony applications contributed to the control theory. In the 1940s and 1950s, German mathematician Irmgard Flügge-Lotz developed the theory of discontinuous automatic controls, which found military applications during the Second World War to fire control systems and aircraft navigation systems. Controllers, which were able to make calculated changes in response to deviations from a set point rather than on-off control, began being introduced in the 1930s. Controllers allowed manufacturing to continue showing productivity gains to offset the declining influence of factory electrification. Factory productivity was greatly increased by electrification in the 1920s. U.S. manufacturing productivity growth fell from 5.2%/yr 1919–29 to 2.76%/yr 1929–41. Alexander Field notes that spending on non-medical instruments increased significantly from 1929 to 1933 and remained strong thereafter. The First and Second World Wars saw major advancements in the field of mass communication and signal processing. Other key advances in automatic controls include differential equations, stability theory and system theory (1938), frequency domain analysis (1940), ship control (1950), and stochastic analysis (1941). Starting in 1958, various systems based on solid-state digital logic modules for hard-wired programmed logic controllers (the predecessors of programmable logic controllers [PLC]) emerged to replace electro-mechanical relay logic in industrial control systems for process control and automation, including early Telefunken/AEG Logistat, Siemens Simatic, Philips/Mullard/Valvo Norbit, BBC Sigmatronic, ACEC Logacec, Akkord Estacord, Krone Mibakron, Bistat, Datapac, Norlog, SSR, or Procontic systems. In 1959 Texaco's Port Arthur Refinery became the first chemical plant to use digital control. Conversion of factories to digital control began to spread rapidly in the 1970s as the price of computer hardware fell. === Significant applications === The automatic telephone switchboard was introduced in 1892 along with dial telephones. By 1929, 31.9% of the Bell system was automatic. Automatic telephone switching originally used vacuum tube amplifiers and electro-mechanical switches, which consumed a large amount of electricity. Call volume eve

    Read more →
  • Eat App

    Eat App

    Eat App is a global restaurant technology company that provides a cloud-based management platform for restaurants, hotels, and other venues. The platform enables venues to accept online reservations seamlessly, manage tables, and enhance customer relationship management (CRM). It utilizes AI to improve operational efficiency, provides marketing automation, and helps build a comprehensive guestbook. The company also offers a consumer app and website for discovering and booking restaurant tables online. According to the company, the system has seated over 100 million guests, and the number continues to grow. Eat was founded by Nezar Kadhem and David Feuillard in 2015 and has raised $13M to date from Silicon Valley's 500 startups, Middle East Venture Partners (MEVP), Derayah VC, amongst other business angels. The company is currently operational across the world, with offices in Dubai and the United States. == Product overview == === For restaurants === Eat App’s reservation system allows for a digital record of all reservations, all guests that have previously visited the restaurant, as well as analytics on the performance of the restaurant. The table management feature simplifies traditional restaurant operations by providing a live snapshot of current status, seating optimization, and shift management. The CRM and analytics suite gathers and monitors data to build a segmented guestbook for personalized marketing and provides dashboards for data-driven decision-making. Additionally, the review feature makes it easy for restaurants to automatically collect reviews from their guests. Additionally, Eat App includes a chit printer function that seamlessly prints reservation details at host stands and a review management feature that allows restaurants to manage online reviews directly within the platform. == History == In February 2015, Eat App raised $300k from Bahrain-based business angel group TENMOU. In June 2018, Eat raised $1.2 million from Dubai-based Middle East Venture Partners (MEVP). In February 2020, Eat App raised $5 million in a Series B funding round led by 500 Startups, Derayah Venture Fund, and MEVP, with participation from a few angel investors and family members. In February 2021, Eat App launched its technology with The Emaar Hospitality Group, implementing it across over 50 restaurants in Emaar properties and hotels. The cloud-based system runs natively on iPads in each restaurant, providing Emaar staff access to reservations and guest information, and integrates with the U by Emaar loyalty app to personalize service. On September 28, 2022, Eat App announced the closing of an $11 million Series B funding round. The investment was led by Middle East Venture Partners (MEVP), 500 Startups, Derayah Venture Capital, Dallah Albaraka, Ali Zaid Al Quraishi & Brothers Company, and Rasameel Investment Company, with participation from existing investors.

    Read more →
  • Jive (software)

    Jive (software)

    Jive (formerly known as Clearspace, then Jive SBS, then Jive Engage) is a commercial Java EE-based Enterprise 2.0 collaboration and knowledge management tool produced by Jive Software. It was first released as "Clearspace" in 2006, then renamed SBS (for "Social Business Software") in March 2009, then renamed "Jive Engage" in 2011, and renamed simply to "Jive" in 2012. Jive integrates the functionality of online communities, microblogging, social networking, discussion forums, blogs, wikis, and IM under one unified user interface. Content placed into any of the systems (blog, wiki, documentation, etc.) can be found through a common search interface. Other features include RSS capability, email integration, a reputation and reward system for participation, personal user profiles, JAX-WS web service interoperability, and integration with the Spring Framework. The product is a pure-Java server-side web application and will run on any platform where Java (JDK 1.5 or higher) is installed. It does not require a dedicated server - users have reported successful deployment in both shared environments and multiple machine clusters. As of Jive 8, released March 30, 2015, there is a Jive-n version which is for internal use (hosted by the consumer or hosted by Jive as a service) and a Jive-x version which is an external version hosted as a service. Jive no longer supports wiki markup language. == Server requirements for Jive 8-n == The following are the server requirements for Jive 8-n Operating systems: RHEL version 6 or 7 for x86_64, CentOS version 6 or 7 for x86_64 or SuSE Enterprise Linux Server (SLES) 11 and 12 for x86_64 Application Servers: Jive ships with its own embedded Apache HTTPD and Tomcat servers as part of the install package. It is not possible to deploy the application onto other appservers. Databases: MySQL (5.1, 5.5, 5.6) Oracle (11gR2, 12c) Postgres (9.0, 9.1, 9.2, 9.3, 9.4 - 9.2 or higher recommended) Microsoft SQL Server (2008R2, 2012, 2014) Environment: Jive recommends a server with at least 4GB of RAM and a dual-core 2 GHz processor with x86_64 architecture The product integrates with an LDAP repository or Active Directory For optimal deployment with a large community Jive Software recommends: using dedicated cache and document-conversion servers hosting the application and database servers separately == Releases == Jive 8, released on March 30, 2015 Jive 7, released in October 2013 Jive 9.0.x, released in November 2016 Jive 9, released in November 2016, supported now

    Read more →
  • Word error rate

    Word error rate

    Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity. This way, a WER of 0.8 means that there is an 80% error rate for compared sentences. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort. This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate. Word error rate can then be computed as: W E R = S + D + I N = S + D + I S + D + C {\displaystyle {\mathit {WER}}={\frac {S+D+I}{N}}={\frac {S+D+I}{S+D+C}}} where S is the number of substitutions, D is the number of deletions, I is the number of insertions, C is the number of correct words, N is the number of words in the reference (N=S+D+C) The intuition behind 'deletion' and 'insertion' is how to get from the reference to the hypothesis. So if we have the reference "This is wikipedia" and hypothesis "This _ wikipedia", we call it a deletion. Note that since N is the number of words in the reference, the word error rate can be larger than 1.0, namely if the number of insertions I is larger than the number of correct words C. When reporting the performance of a speech recognition system, sometimes word accuracy (WAcc) is used instead: W A c c = 1 − W E R = N − S − D − I N = C − I N {\displaystyle {\mathit {WAcc}}=1-{\mathit {WER}}={\frac {N-S-D-I}{N}}={\frac {C-I}{N}}} Since the WER can be larger than 1.0, the word accuracy can be smaller than 0.0. == Experiments == It is commonly believed that a lower word error rate shows superior accuracy in recognition of speech, compared with a higher word error rate. However, at least one study has shown that this may not be true. In a Microsoft Research experiment, it was shown that, if people were trained under "that matches the optimization objective for understanding", (Wang, Acero and Chelba, 2003) they would show a higher accuracy in understanding of language than other people who demonstrated a lower word error rate, showing that true understanding of spoken language relies on more than just high word recognition accuracy. == Other metrics == One problem with using a generic formula such as the one above, however, is that no account is taken of the effect that different types of error may have on the likelihood of successful outcome, e.g. some errors may be more disruptive than others and some may be corrected more easily than others. These factors are likely to be specific to the syntax being tested. A further problem is that, even with the best alignment, the formula cannot distinguish a substitution error from a combined deletion plus insertion error. Hunt (1990) has proposed the use of a weighted measure of performance accuracy where errors of substitution are weighted at unity but errors of deletion and insertion are both weighted only at 0.5, thus: W E R = S + 0.5 D + 0.5 I N {\displaystyle {\mathit {WER}}={\frac {S+0.5D+0.5I}{N}}} There is some debate, however, as to whether Hunt's formula may properly be used to assess the performance of a single system, as it was developed as a means of comparing more fairly competing candidate systems. A further complication is added by whether a given syntax allows for error correction and, if it does, how easy that process is for the user. There is thus some merit to the argument that performance metrics should be developed to suit the particular system being measured. Whichever metric is used, however, one major theoretical problem in assessing the performance of a system is deciding whether a word has been “mis-pronounced,” i.e. does the fault lie with the user or with the recogniser. This may be particularly relevant in a system which is designed to cope with non-native speakers of a given language or with strong regional accents. The pace at which words should be spoken during the measurement process is also a source of variability between subjects, as is the need for subjects to rest or take a breath. All such factors may need to be controlled in some way. For text dictation it is generally agreed that performance accuracy at a rate below 95% is not acceptable, but this again may be syntax and/or domain specific, e.g. whether there is time pressure on users to complete the task, whether there are alternative methods of completion, and so on. The term "Single Word Error Rate" is sometimes referred to as the percentage of incorrect recognitions for each different word in the system vocabulary. == Edit distance == The word error rate may also be referred to as the length normalized edit distance. The normalized edit distance between X and Y, d( X, Y ) is defined as the minimum of W( P ) / L ( P ), where P is an editing path between X and Y, W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).

    Read more →
  • Word error rate

    Word error rate

    Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity. This way, a WER of 0.8 means that there is an 80% error rate for compared sentences. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort. This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate. Word error rate can then be computed as: W E R = S + D + I N = S + D + I S + D + C {\displaystyle {\mathit {WER}}={\frac {S+D+I}{N}}={\frac {S+D+I}{S+D+C}}} where S is the number of substitutions, D is the number of deletions, I is the number of insertions, C is the number of correct words, N is the number of words in the reference (N=S+D+C) The intuition behind 'deletion' and 'insertion' is how to get from the reference to the hypothesis. So if we have the reference "This is wikipedia" and hypothesis "This _ wikipedia", we call it a deletion. Note that since N is the number of words in the reference, the word error rate can be larger than 1.0, namely if the number of insertions I is larger than the number of correct words C. When reporting the performance of a speech recognition system, sometimes word accuracy (WAcc) is used instead: W A c c = 1 − W E R = N − S − D − I N = C − I N {\displaystyle {\mathit {WAcc}}=1-{\mathit {WER}}={\frac {N-S-D-I}{N}}={\frac {C-I}{N}}} Since the WER can be larger than 1.0, the word accuracy can be smaller than 0.0. == Experiments == It is commonly believed that a lower word error rate shows superior accuracy in recognition of speech, compared with a higher word error rate. However, at least one study has shown that this may not be true. In a Microsoft Research experiment, it was shown that, if people were trained under "that matches the optimization objective for understanding", (Wang, Acero and Chelba, 2003) they would show a higher accuracy in understanding of language than other people who demonstrated a lower word error rate, showing that true understanding of spoken language relies on more than just high word recognition accuracy. == Other metrics == One problem with using a generic formula such as the one above, however, is that no account is taken of the effect that different types of error may have on the likelihood of successful outcome, e.g. some errors may be more disruptive than others and some may be corrected more easily than others. These factors are likely to be specific to the syntax being tested. A further problem is that, even with the best alignment, the formula cannot distinguish a substitution error from a combined deletion plus insertion error. Hunt (1990) has proposed the use of a weighted measure of performance accuracy where errors of substitution are weighted at unity but errors of deletion and insertion are both weighted only at 0.5, thus: W E R = S + 0.5 D + 0.5 I N {\displaystyle {\mathit {WER}}={\frac {S+0.5D+0.5I}{N}}} There is some debate, however, as to whether Hunt's formula may properly be used to assess the performance of a single system, as it was developed as a means of comparing more fairly competing candidate systems. A further complication is added by whether a given syntax allows for error correction and, if it does, how easy that process is for the user. There is thus some merit to the argument that performance metrics should be developed to suit the particular system being measured. Whichever metric is used, however, one major theoretical problem in assessing the performance of a system is deciding whether a word has been “mis-pronounced,” i.e. does the fault lie with the user or with the recogniser. This may be particularly relevant in a system which is designed to cope with non-native speakers of a given language or with strong regional accents. The pace at which words should be spoken during the measurement process is also a source of variability between subjects, as is the need for subjects to rest or take a breath. All such factors may need to be controlled in some way. For text dictation it is generally agreed that performance accuracy at a rate below 95% is not acceptable, but this again may be syntax and/or domain specific, e.g. whether there is time pressure on users to complete the task, whether there are alternative methods of completion, and so on. The term "Single Word Error Rate" is sometimes referred to as the percentage of incorrect recognitions for each different word in the system vocabulary. == Edit distance == The word error rate may also be referred to as the length normalized edit distance. The normalized edit distance between X and Y, d( X, Y ) is defined as the minimum of W( P ) / L ( P ), where P is an editing path between X and Y, W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).

    Read more →