AI For Students Gemini

AI For Students Gemini — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Adobe Prelude

    Adobe Prelude

    Adobe Prelude was an ingest and logging software application for tagging media with metadata for searching, post-production workflows, and footage lifecycle management. Adobe Prelude is also made to work closely with Adobe Premiere Pro. It is part of the Adobe Creative Cloud and is geared towards professional video editing alone or with a group. The software also offers features like rough cut creation. A speech transcription feature was removed in December 2014. == History == Adobe announced that on April 23, 2012 Adobe OnLocation would be shut down and Adobe Prelude would launch on May 7, 2012. Adobe stated OnLocation's production was stopping because of the growing trend in the industry toward tapeless, native workflows, Adobe stresses that Adobe Prelude is not a direct replacement for OnLocation. Adobe OnLocation was available in CS5 but not in CS6 and Adobe Prelude is only available in CS6. Adobe still offers technical support for OnLocation. In 2021, Adobe announced they would be discontinuing Adobe Prelude, starting by removing it from their website on September 8, 2021. Support for existing users will continue through September 8, 2024. == Features == Prelude is used to tag media, log data, create and export metadata and generate rough cuts that can be sent to Adobe Premiere Pro. A user can add a tag to a piece of media that will show up on Premiere Pro or if another user opens that media with Prelude. Ingest Footage Prelude can ingest all kinds of file types. Once ingested, Prelude can duplicate, transcode and verify the files. Log Footage Prelude can log data only using the keyboard. Create Rough Cuts Prelude is able to generate Rough Cuts. Rough Cuts are a combination of sub clips that will hold any metadata a user feeds into it. Rough cuts can hold metadata such as markers and comments, and this metadata will stay on this footage. Workflow Accessibility Prelude is an XMP - based open platform that allows for custom integration into many video editing platforms. == Features from OnLocation == Many features from Adobe OnLocation went to Adobe Prelude or Adobe Premiere Pro. Adobe OnLocation thrived on tape - based cameras and setting up a shot before shooting it, with the change in the industry, this problem is irrelevant in post production. Adobe OnLocation also allowed the user to add tags and scripting metadata that would carry over to Premiere Pro. OnLocation also had a Media Browser pane, which is the standard for any Adobe program today, Prelude has this Media Browser as well. == Prelude Live Logger == Prelude Live Logger is an application integrated with Prelude CC. Prelude Live Logger is designed to capture notes to use during video logging and editing while you shoot footage on an iPad's camera. Editors can import and combine this metadata with footage from Prelude throughout editing to facilitate various tasks.

    Read more →
  • JustWatch

    JustWatch

    JustWatch is a website that provides information on the availability of films and TV shows on various streaming platforms such as Netflix, HBO Max, Disney+, Hulu, Peacock, Fandango at Home, Apple TV, and Amazon Prime Video, among others. It is also available as a mobile application and smart TV application. JustWatch provides a search engine that allows users to discover which digital platforms host a particular movie or TV series. As of November 2023, JustWatch is available to users in 139 countries. == Features == JustWatch functions as a search engine by aggregating information about the online availability of films and TV series from video-on-demand streaming services. It aggregates information from more than 100 video content libraries, as well providing information about video resolution quality, pricing, and purchase or rental options. The website includes various filters for searching, including genre, price, release date, rating, and popularity. Users are also able to create lists of shows and movies and to share these lists with other users. == History == JustWatch GmbH is an international database company that is privately held and headquartered in Berlin, Germany. The company specializes in the online availability of movies and TV series. In addition to its user-facing website, the company also has an advertising-focused arm, JustWatch Media, that works with corporate clients, using data about what people watch that it gleans from user behavior to help entertainment companies tailor their marketing strategies. Its clients include Universal Pictures, Paramount Pictures, and Sony Pictures, among others. Development of the website began in 2014, and it was launched in the U.S. and Germany in February 2015. In 2018, the company received funding to improve databases within the European Union. In December 2019, the company acquired a rival streaming aggregation service, GoWatchIt, from Plexus Entertainment. JustWatch also used the acquisition to open its first New York office. In 2019, JustWatch had over 30 million users across 38 countries. By 2020, the company's streaming aggregation service was available in over 45 countries. By November 2023, it was available in 139 countries, and had over 40 million monthly users. === Founding === JustWatch was co-founded in 2013 by David Croyé, Cristoph Hoyer, Kevin Hiller, Dominik Raute, Ingke Weimert, and Michael Wilken. In a company blog post from February 2017, Croyé described the group of co-founders as all having previously "worked in leading roles at successful international tech-startups in Berlin." Croyé, who currently holds the title of CEO at JustWatch GmbH, had previously worked as the chief marketing officer at kaufDA, a European location-based mobile coupon and promotion service, and the background of other co-founders included time at the adtech company Trademob and the streaming site MyVideo. Startup capital for the website initially came from the founders themselves. Croyé in particular was able to reinvest funds he had obtained from the sale of kaufDA to Axel Springer, a European media company, in March 2011. Since 2015, the company has had at least one additional round of seed funding, with investors including venture capital groups CG Partners and STS Ventures.

    Read more →
  • Distributed concurrency control

    Distributed concurrency control

    Distributed concurrency control is the concurrency control of a system distributed over a computer network (Bernstein et al. 1987, Weikum and Vossen 2001). In database systems and transaction processing (transaction management) distributed concurrency control refers primarily to the concurrency control of a distributed database. It also refers to the concurrency control in a multidatabase (and other multi-transactional object) environment (e.g., federated database, grid computing, and cloud computing environments. A major goal for distributed concurrency control is distributed serializability (or global serializability for multidatabase systems). Distributed concurrency control poses special challenges beyond centralized one, primarily due to communication and computer latency. It often requires special techniques, like distributed lock manager over fast computer networks with low latency, like switched fabric (e.g., InfiniBand). The most common distributed concurrency control technique is strong strict two-phase locking (SS2PL, also named rigorousness), which is also a common centralized concurrency control technique. SS2PL provides both the serializability and strictness. Strictness, a special case of recoverability, is utilized for effective recovery from failure. For large-scale distribution and complex transactions, distributed locking's typical heavy performance penalty (due to delays, latency) can be saved by using the atomic commitment protocol, which is needed in a distributed database for (distributed) transactions' atomicity.

    Read more →
  • GEPIR

    GEPIR

    GEPIR (Global Electronic Party Information Registry) was a distributed database operated and owned by GS1 that contains basic information on over 1,000,000 companies in over 100 countries. The database could be searched by Global Trade Item Number (GTIN) code (including Universal Product Code (UPC) and EAN-13 codes), container Code (Serial Shipping Container Code (SSCC)), location number (Global Location Number (GLN)), and (in some countries) the company name. A SOAP webservice existed for API access. As of end December 2023, GEPIR was replaced by a service called Verified by GS1. While it operated, GEPIR had more than 1 million members in more than 100 countries. In 2013, all GS1 111 member organisations joined GEPIR. == Access == GEPIR was accessible for free in almost all countries but the number of request per day was limited (from 20 to 30). Since October 2013, GS1 France restricts access to GEPIR to companies (registration with SIREN code was required to use it). A premium access service had been created by GS1 France in January 2010 which allows companies to use GS1 web and SOAP interface without any limit. == System architecture == GEPIR was a lookup service coordinated by the GS1 GO that provided all end users with the ability to look up information about GS1 Identification Keys. Depending on the service, systems were provided by GS1 Member Organisations (MOs) or 3rd party service providers, or both. Where a GS1 MO did not choose to provide the service directly to its end users, the GS1 Global Office provided the service for that geography. Some services involved a technical component deployed by the GS1 Global Office that coordinates the systems provided by GS1 MOs and/or 3rd party service providers. The GEPIR service was provided by systems deployed by GS1 MOs, with the GS1 GO providing a central point of coordination to federate the local systems. The GS1 GO also provides the MO-level service for MOs that could not or did not wish to deploy their own system.

    Read more →
  • ClearForest

    ClearForest

    ClearForest was an Israeli software company that developed and marketed text analytics and text mining solutions. == History == Founded in 1998, ClearForest had its headquarters just outside Boston and a development center in Or Yehuda. The company was acquired by Reuters in April, 2007. It now markets its services under the names Calais, OpenCalais, and OneCalais. ClearForest was previously venture-backed; its last funding round was led by Greylock Ventures and closed in 2005. Other investors included DB Capital Partners, Pitango, Walden Israel, Booz Allen, JP Morgan Partners and HarbourVest Partners. On February 7, 2008 Reuters announced the launch of Open Calais, a named-entity recognition and semantic analysis service that uses ClearForest technology. On April 30, 2007, Reuters announced that it would acquire ClearForest. Sources estimate the acquisition to be for $25 Million. == Solutions and products == ClearForest offers several hosted solutions, including: OpenCalais, a free web service and open API (for commercial and non-commercial use) that performs named-entity recognition and enables automatic metadata generation using the ClearForest financial module. Semantic Web Services (SWS), an on-demand service that makes ClearForest's natural language processing tools available as a standard web service. A subset of ClearForest's capabilities is available via SWS at no cost. Gnosis, a free Firefox extension that uses SWS to analyze the content of a web page. Gnosis identifies named entities such as people, companies, organizations, geographies and products on the page being viewed. Gnosis also automatically processes pages from Wikipedia, providing additional links for people, geographies and other entities which were not explicitly linked within the subject article. Harvest, a real-time machine-readable news service that uses SWS to process a company's news and document feeds and return machine-readable information about people, companies, locations and over 200 other entities facts and events. ClearForest also offers Text Analytics solutions targeted at specific business problems, including: Equity valuation for hedge funds and alternative investments firms Metadata & database creation for publishers and information providers/services Tapping "voice of customer" for market and survey research firms Quality Early Warning for vehicle, capital equipment & durable goods manufacturers

    Read more →
  • Database index

    Database index

    A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. An index is a copy of selected columns of data, from a table, that is designed to enable very efficient search. An index normally includes a "key" or direct link to the original row of data from which it was copied, to allow the complete row to be retrieved efficiently. Some databases extend the power of indexing by letting developers create indexes on column values that have been transformed by functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper-case versions of the last_name field in the index. Another option sometimes supported is the use of partial index, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions. == Usage == === Support for fast lookup === Most database software includes indexing technology that enables sub-linear time lookup to improve performance, as linear search is inefficient for large databases. Suppose a database contains N data items and one must be retrieved based on the value of one of the fields. A simple implementation retrieves and examines each item according to the test. If there is only one matching item, this can stop when it finds that single item, but if there are multiple matches, it must test everything. This means that the number of operations in the average case is O(N) or linear time. Since databases may contain many objects, and since lookup is a common operation, it is often desirable to improve performance. An index is any data structure that improves the performance of lookup. There are many different data structures used for this purpose. There are complex design trade-offs involving lookup performance, index size, and index-update performance. Many index designs exhibit logarithmic (O(log(N))) lookup performance and in some applications it is possible to achieve flat (O(1)) performance. === Policing the database constraints === Indexes are used to police database constraints, such as UNIQUE, EXCLUSION, PRIMARY KEY and FOREIGN KEY. An index may be declared as UNIQUE, which creates an implicit constraint on the underlying table. Database systems usually implicitly create an index on a set of columns declared PRIMARY KEY, and some are capable of using an already-existing index to police this constraint. Many database systems require that both referencing and referenced sets of columns in a FOREIGN KEY constraint are indexed, thus improving performance of inserts, updates and deletes to the tables participating in the constraint. Some database systems support an EXCLUSION constraint that ensures that, for a newly inserted or updated record, a certain predicate holds for no other record. This can be used to implement a UNIQUE constraint (with equality predicate) or more complex constraints, like ensuring that no overlapping time ranges or no intersecting geometry objects would be stored in the table. An index supporting fast searching for records satisfying the predicate is required to police such a constraint. == Index architecture and indexing methods == === Non-clustered === The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and the row number in the data page in page-organized engines; row offset in file-organized engines). In a non-clustered index, The physical order of the rows is not the same as the index order. The indexed columns are typically non-primary key columns used in JOIN, WHERE, and ORDER BY clauses. There can be more than one non-clustered index on a database table. === Clustered === Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected. Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s). === Cluster === When multiple databases and multiple tables are joined, it is called a cluster (not to be confused with clustered index described previously). The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks. This may improve the joins of these tables on the cluster key, since the matching records are stored together and less I/O is required to locate them. The cluster configuration defines the data layout in the tables that are parts of the cluster. A cluster can be keyed with a B-tree index or a hash table. The data block where the table record is stored is defined by the value of the cluster key. == Column order == The order that the index definition defines the columns in is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column. For example, in a phone book organized by city first, then by last name, and then by first name, in a particular city, one can easily extract the list of all phone numbers. However, it would be very tedious to find all the phone numbers for a particular last name. One would have to look within each city's section for the entries with that last name. Some databases can do this, others just won't use the index. In the phone book example with a composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns. == Applications and limitations == Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the index data structure (typically a B-tree) until the Smith entry has been found; this is much less computationally expensive than a full table scan. Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@wikipedia.org';. This query would yield an email address for every customer whose email address ends with "@wikipedia.org", but even if the email_address column has been indexed the database must perform a full index scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the database software is unable to use the underlying index data structure (in other words, the WHERE-clause is not sargable). This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@wikipedia.org');. This puts the wild-card at the right-most part of the query (now gro.aidepikiw@%), which the index on reverse(email_address) can satisfy. When the wildcard characters are used on both sides of the search word as %wikipedia.org%, the index available on this field is not used. Rather only a sequential search is performed, which takes ⁠ O ( N ) {\displaystyle

    Read more →
  • Security and Privacy in Computer Systems

    Security and Privacy in Computer Systems

    Security and Privacy in Computer Systems is a paper by Willis Ware that was first presented to the public at the 1967 Spring Joint Computer Conference. == Significance == Ware's presentation was the first public conference session about information security and privacy in respect of computer systems, especially networked or remotely-accessed ones. The IEEE Annals of the History of Computing said that Ware's 1967 Spring Joint Computer Conference session, together with 1970's Ware report, marked the start of the field of computer security.

    Read more →
  • Human image synthesis

    Human image synthesis

    Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work. == Timeline of human image synthesis == In 1971 Henri Gouraud made the first CG geometry capture and representation of a human face. Modeling was his wife Sylvie Gouraud. The 3D model was a simple wire-frame model and he applied the Gouraud shader he is most known for to produce the first known representation of human-likeness on computer. The 1972 short film A Computer Animated Hand by Edwin Catmull and Fred Parke was the first time that computer-generated imagery was used in film to simulate moving human appearance. The film featured a computer simulated hand and face (watch film here). The 1976 film Futureworld reused parts of A Computer Animated Hand on the big screen. The 1983 music video for song Musique Non-Stop by German band Kraftwerk aired in 1986. Created by the artist Rebecca Allen, it features non-realistic looking, but clearly recognizable computer simulations of the band members. The 1994 film The Crow was the first film production to make use of digital compositing of a computer simulated representation of a face onto scenes filmed using a body double. Necessity was the muse as the actor Brandon Lee portraying the protagonist was tragically killed accidentally on-stage. In 1999 Paul Debevec et al. of USC captured the reflectance field of a human face with their first version of a light stage. They presented their method at the SIGGRAPH 2000 In 2003 audience debut of photo realistic human-likenesses in the 2003 films The Matrix Reloaded in the burly brawl sequence where up-to-100 Agent Smiths fight Neo and in The Matrix Revolutions where at the start of the end showdown Agent Smith's cheekbone gets punched in by Neo leaving the digital look-alike unnaturally unhurt. The Matrix Revolutions bonus DVD documents and depicts the process in some detail and the techniques used, including facial motion capture and limbal motion capture, and projection onto models. In 2003 The Animatrix: Final Flight of the Osiris a state-of-the-art want-to-be human likenesses not quite fooling the watcher made by Square Pictures. In 2003 digital likeness of Tobey Maguire was made for movies Spider-man 2 and Spider-man 3 by Sony Pictures Imageworks. In 2005 the Face of the Future project was an established. by the University of St Andrews and Perception Lab, funded by the EPSRC. The website contains a "Face Transformer", which enables users to transform their face into any ethnicity and age as well as the ability to transform their face into a painting (in the style of either Sandro Botticelli or Amedeo Modigliani). This process is achieved by combining the user's photograph with an average face. In 2009 Debevec et al. presented new digital likenesses, made by Image Metrics, this time of actress Emily O'Brien whose reflectance was captured with the USC light stage 5 Motion looks fairly convincing contrasted to the clunky run in the Animatrix: Final Flight of the Osiris which was state-of-the-art in 2003 if photorealism was the intention of the animators. In 2009 a digital look-alike of a younger Arnold Schwarzenegger was made for the movie Terminator Salvation though the end result was critiqued as unconvincing. Facial geometry was acquired from a 1984 mold of Schwarzenegger. In 2010 Walt Disney Pictures released a sci-fi sequel entitled Tron: Legacy with a digitally rejuvenated digital look-alike of actor Jeff Bridges playing the antagonist CLU. In SIGGGRAPH 2013 Activision and USC presented a real-time "Digital Ira" a digital face look-alike of Ari Shapiro, an ICT USC research scientist, utilizing the USC light stage X by Ghosh et al. for both reflectance field and motion capture. The end result both precomputed and real-time rendering with the modernest game GPU shown here and looks fairly realistic. In 2014 The Presidential Portrait by USC Institute for Creative Technologies in conjunction with the Smithsonian Institution was made using the latest USC mobile light stage wherein President Barack Obama had his geometry, textures and reflectance captured. In 2014 Ian Goodfellow et al. presented the principles of a generative adversarial network. GANs made the headlines in early 2018 with the deepfakes controversies. For the 2015 film Furious 7 a digital look-alike of actor Paul Walker who died in an accident during the filming was done by Weta Digital to enable the completion of the film. In 2016 techniques which allow near real-time counterfeiting of facial expressions in existing 2D video have been believably demonstrated. In 2016 a digital look-alike of Peter Cushing was made for the Rogue One film where its appearance would appear to be of same age as the actor was during the filming of the original 1977 Star Wars film. In SIGGRAPH 2017 an audio driven digital look-alike of upper torso of Barack Obama was presented by researchers from University of Washington. It was driven only by a voice track as source data for the animation after the training phase to acquire lip sync and wider facial information from training material consisting 2D videos with audio had been completed. Late 2017 and early 2018 saw the surfacing of the deepfakes controversy where porn videos were doctored using deep machine learning so that the face of the actress was replaced by the software's opinion of what another persons face would look like in the same pose and lighting. In 2018 Game Developers Conference Epic Games and Tencent Games demonstrated "Siren", a digital look-alike of the actress Bingjie Jiang. It was made possible with the following technologies: CubicMotion's computer vision system, 3Lateral's facial rigging system and Vicon's motion capture system. The demonstration ran in near real time at 60 frames per second in the Unreal Engine 4. In 2018 at the World Internet Conference in Wuzhen the Xinhua News Agency presented two digital look-alikes made to the resemblance of its real news anchors Qiu Hao (Chinese language) and Zhang Zhao (English language). The digital look-alikes were made in conjunction with Sogou. Neither the speech synthesis used nor the gesturing of the digital look-alike anchors were good enough to deceive the watcher to mistake them for real humans imaged with a TV camera. In September 2018 Google added "involuntary synthetic pornographic imagery" to its ban list, allowing anyone to request the search engine block results that falsely depict them as "nude or in a sexually explicit situation." In February 2019 Nvidia open sources StyleGAN, a novel generative adversarial network. Right after this Phillip Wang made the website ThisPersonDoesNotExist.com with StyleGAN to demonstrate that unlimited amounts of often photo-realistic looking facial portraits of no-one can be made automatically using a GAN. Nvidia's StyleGAN was presented in a not yet peer reviewed paper in late 2018. At the June 2019 CVPR the MIT CSAIL presented a system titled "Speech2Face: Learning the Face Behind a Voice" that synthesizes likely faces based on just a recording of a voice. It was trained with massive amounts of video of people speaking. Since 1 July 2019 Virginia has criminalized the sale and dissemination of unauthorized synthetic pornography, but not the manufacture., as § 18.2–386.2 titled 'Unlawful dissemination or sale of images of another; penalty.' became part of the Code of Virginia. The law text states: "Any person who, with the intent to coerce, harass, or intimidate, maliciously disseminates or sells any videographic or still image created by any means whatsoever that depicts another person who is totally nude, or in a state of undress so as to expose the genitals, pubic area, buttocks, or female breast, where such person knows or has reason to know that he is not licensed or authorized to disseminate or sell such videographic or still image is guilty of a Class 1 misdemeanor.". The identical bills were House Bill 2678 presented by Delegate Marcus Simon to the Virginia House of Delegates on 14 January 2019 and three-day later an identical Senate bill 1736 was introduced to the Senate of Virginia by Senator Adam Ebbin. Since 1 September 2019 Texas senate bill SB 751 amendments to the election code came into effect, giving candidates in elections a 30-day protection period to the elections during which making and distributing digital look-alikes or synthetic fakes of the candidates is an offense. Th

    Read more →
  • Autonomous aircraft

    Autonomous aircraft

    An autonomous aircraft is an aircraft which flies under the control of on-board autonomous robotic systems and needs no intervention from a human pilot or remote control. Most contemporary autonomous aircraft are unmanned aerial vehicles (drones) with pre-programmed algorithms to perform designated tasks, but advancements in artificial intelligence technologies (e.g. machine learning) mean that autonomous control systems are reaching a point where several air taxis and associated regulatory regimes are being developed. == History == === Unmanned aerial vehicles === The earliest recorded use of an unmanned aerial vehicle for warfighting occurred in July 1849, serving as a balloon carrier (the precursor to the aircraft carrier) Significant development of radio-controlled drones started in the early 1900s, and originally focused on providing practice targets for training military personnel. The earliest attempt at a powered UAV was A. M. Low's "Aerial Target" in 1916. Autonomous features such as the autopilot and automated navigation were developed progressively through the twentieth century, although techniques such as terrain contour matching (TERCOM) were applied mainly to cruise missiles. Before the introduction of the Bayraktar Kızılelma some modern drones have a high degree of autonomy, although they were not fully capable and the regulatory environment prohibits their widespread use in civil aviation. However some limited trials had been undertaken. On December 17, 2025, two Bayraktar Kızılelma performed the world's first autonomous close-formation flight by two unmanned fighter jets, using artificial intelligence. This was the first time in the history of aviation when two unmanned aerial vehicles flew in close formation on their own. === Passengers === As flight, navigation and communications systems have become more sophisticated, safely carrying passengers has emerged as a practical possibility. Autopilot systems are relieving the human pilot of progressively more duties, but the pilot currently remains necessary. A number of air taxis are under development and larger autonomous transports are also being planned. The personal air vehicle is another class where from one to four passengers are not expected to be able to pilot the aircraft and autonomy is seen as necessary for widespread adoption. == Control system architecture == The computing capability of aircraft flight and navigation systems followed the advances of computing technology, beginning with analog controls and evolving into microcontrollers, then system-on-a-chip (SOC) and single-board computers (SBC). === Sensors === Position and movement sensors give information about the aircraft state. Exteroceptive sensors deal with external information like distance measurements, while proprioceptive ones correlate internal and external states. Degrees of freedom (DOF) refers to both the amount and quality of sensors on board: 6 DOF implies 3-axis gyroscopes and accelerometers (a typical inertial measurement unit – IMU), 9 DOF refers to an IMU plus a compass, 10 DOF adds a barometer and 11 DOF usually adds a GPS receiver. === Actuators === UAV actuators include digital electronic speed controllers (which control the RPM of the motors) linked to motors/engines and propellers, servomotors (for planes and helicopters mostly), weapons, payload actuators, LEDs and speakers. === Software === UAV software called the flight stack or autopilot. The purpose of the flight stack is to obtain data from sensors, control motors to ensure UAV stability, and facilitate ground control and mission planning communication. UAVs are real-time systems that require rapid response to changing sensor data. As a result, UAVs rely on single-board computers for their computational needs. Examples of such single-board computers include Raspberry Pis, Beagleboards, etc. shielded with NavIO, PXFMini, etc. or designed from scratch such as NuttX, preemptive-RT Linux, Xenomai, Orocos-Robot Operating System or DDS-ROS 2.0. Civil-use open-source stacks include: Due to the open-source nature of UAV software, they can be customized to fit specific applications. For example, researchers from the Technical University of Košice have replaced the default control algorithm of the PX4 autopilot. This flexibility and collaborative effort has led to a large number of different open-source stacks, some of which are forked from others, such as CleanFlight, which is forked from BaseFlight and from which three other stacks are forked from. === Loop principles === UAVs employ open-loop, closed-loop or hybrid control architectures. Open loop – This type provides a positive control signal (faster, slower, left, right, up, down) without incorporating feedback from sensor data. Closed loop – This type incorporates sensor feedback to adjust behavior (reduce speed to reflect tailwind, move to altitude 300 feet). The PID controller is common. Sometimes, feedforward is employed, transferring the need to close the loop further. == Communications == Most UAVs use a radio for remote control and exchange of video and other data. Early UAVs had only narrowband uplink. Downlinks came later. These bi-directional narrowband radio links carried command and control (C&C) and telemetry data about the status of aircraft systems to the remote operator. For very long range flights, military UAVs also use satellite receivers as part of satellite navigation systems. In cases when video transmission was required, the UAVs will implement a separate analog video radio link. In most modern autonomous applications, video transmission is required. A broadband link is used to carry all types of data on a single radio link. These broadband links can leverage quality of service techniques to optimize the C&C traffic for low latency. Usually, these broadband links carry TCP/IP traffic that can be routed over the Internet. Communications can be established with: Ground control – a military ground control station (GCS). The MAVLink protocol is increasingly becoming popular to carry command and control data between the ground control and the vehicle. Remote network system, such as satellite duplex data links for some military powers. Downstream digital video over mobile networks has also entered consumer markets, while direct UAV control uplink over the cellular mesh and LTE have been demonstrated and are in trials. Another aircraft, serving as a relay or mobile control station – military manned-unmanned teaming (MUM-T). As mobile networks have increased in performance and reliability over the years, drones have begun to use mobile networks for communication. Mobile networks can be used for drone tracking, remote piloting, over the air updates, and cloud computing. Modern networking standards have explicitly considered autonomous aircraft and therefore include optimizations. The 5G standard has mandated reduced user plane latency to 1ms while using ultra-reliable and low-latency communications. == Autonomy == Basic autonomy comes from proprioceptive sensors. Advanced autonomy calls for situational awareness, knowledge about the environment surrounding the aircraft from exteroceptive sensors: sensor fusion integrates information from multiple sensors. Civil aviation regulators and standards bodies have published high-level roadmaps and discussion papers focused on assurance, safety and governance of AI-enabled systems in aviation, particularly as autonomy increases in operations and decision support. === Basic principles === One way to achieve autonomous control employs multiple control-loop layers, as in hierarchical control systems. As of 2016 the low-layer loops (i.e. for flight control) tick as fast as 32,000 times per second, while higher-level loops may cycle once per second. The principle is to decompose the aircraft's behavior into manageable "chunks", or states, with known transitions. Hierarchical control system types range from simple scripts to finite state machines, behavior trees and hierarchical task planners. The most common control mechanism used in these layers is the PID controller which can be used to achieve hover for a quadcopter by using data from the IMU to calculate precise inputs for the electronic speed controllers and motors. Examples of mid-layer algorithms: Path planning: determining an optimal path for vehicle to follow while meeting mission objectives and constraints, such as obstacles or fuel requirements Trajectory generation (motion planning): determining control maneuvers to take in order to follow a given path or to go from one location to another Trajectory regulation: constraining a vehicle within some tolerance to a trajectory Evolved UAV hierarchical task planners use methods like state tree searches or genetic algorithms. === Autonomy features === UAV manufacturers often build in specific autonomous operations, such as: Self-level: attitude stabilization on the pitch and roll axes. Altitude hold: The aircraft maint

    Read more →
  • Synonym (database)

    Synonym (database)

    In databases, a synonym is an alias or alternate name for a table, view, sequence, or other schema object. They are used mainly to make it intuitive for users to access database objects owned by other users. They also hide the underlying object's identity and make it harder for a malicious program or user to target the underlying object (security through obscurity). Because a synonym is just an alternate name for an object, it requires no storage other than its definition. When an application uses a synonym, the DBMS forwards the request to the synonym's underlying base object. By coding your programs to use synonyms instead of database object names, you insulate yourself from any changes in the name, ownership, or object locations, at the cost of adding another layer that also needs to be maintained. Users can also have different needs, for example some may wish to use a shorter name to refer to database objects they often query, which can be done with aliases without having to rename the underlying object and alter the code referring to it. Synonyms are very powerful from the point of view of allowing users access to objects that do not lie within their schema. All synonyms have to be created explicitly with the CREATE SYNONYM command and the underlying objects can be located in the same database or in other databases that are connected by database links There are two major uses of synonyms: Object invisibility: Synonyms can be created to keep the original object hidden from the user. Location invisibility: Synonyms can be created as aliases for tables and other objects that are not part of the local database. When a table or a procedure is created, it is created in a particular schema, and other users can access it only by using that schema's name as a prefix to the object's name. The way around for this is for the schema owner creates a synonym with the same name as the table name. == Public synonyms == Public synonyms are owned by special schema in the Oracle Database called PUBLIC. As mentioned earlier, public synonyms can be referenced by all users in the database. Public synonyms are usually created by the application owner for the tables and other objects such as procedures and packages so the users of the application can see the objects The following code shows how to create a public synonym for the employee table: Now any user can see the table by just typing the original table name. If you wish, you could provide a different table name for that table in the CREATE SYNONYM statement. Remember that the DBA must create public synonyms. Just because you can see a table through public (or private) synonym doesn’t mean that you can also perform SELECT, INSERT, UPDATE or DELETE operations on the table. To be able to perform those operations, a user needs specific privileges for the underlying object, either directly or through roles from the application owner. == Private synonyms == A private synonym is a synonym within a database schema that a developer typically uses to mask the true name of a table, view stored procedure, or other database object in an application schema. Private synonyms, unlike public synonyms, can be referenced only by the schema that owns the table or object. You may want to create private synonyms when you want to refer to the same table by different contexts. Private synonym overrides public synonym definitions. You create private synonyms the same way you create public synonyms, but you omit the PUBLIC keyword in the CREATE statement. The following example shows how to create a private synonym called addresses for the locations table. Note that once you create the private synonym, you can refer to the synonym exactly as you would the original table name. == Drop a synonym == Synonyms, both private and public, are dropped in the same manner by using the DROP SYNONYM command, but there is one important difference. If you are dropping a public synonym; you need to add the keyword PUBLIC after the keyword DROP. The ALL_SYNONYMS (or DBA_SYNONYMS) view provides information on all synonyms in your database.

    Read more →
  • Interlacing (bitmaps)

    Interlacing (bitmaps)

    In computing, interlacing (also known as interleaving) is a method of encoding a bitmap image such that a person who has partially received it sees a degraded copy of the entire image. When communicating over a slow communications link, this is often preferable to seeing a perfectly clear copy of one part of the image, as it helps the viewer decide more quickly whether to abort or continue the transmission. Interlacing is supported by the following formats, where it is optional: GIF interlacing stores the lines in the order 0 , 8 , 16 , … , ( 8 n ) , 4 , 12 , … , ( 8 n + 4 ) , 2 , 6 , 10 , 14 , … , ( 4 n + 2 ) , 1 , 3 , 5 , 7 , 9 , … , ( 2 n + 1 ) . {\displaystyle 0,8,16,\dots ,(8n),\ 4,12,\dots ,(8n+4),\ 2,6,10,14,\dots ,(4n+2),\ 1,3,5,7,9,\dots ,(2n+1).} PNG uses the Adam7 algorithm, which interlaces in both the vertical and horizontal direction. TGA uses two optional interlacing algorithms: Two-way: 0 , 2 , 4 , … , ( 2 n ) , 1 , 3 , … , ( 2 n + 1 ) , {\displaystyle 0,2,4,\dots ,(2n),\ 1,3,\dots ,(2n+1),} And four-way: 0 , 4 , 8 , … , ( 4 n ) , 1 , 5 , … , ( 4 n + 1 ) , 2 , 6 , … , ( 4 n + 2 ) , 3 , 7 , … , ( 4 n + 3 ) . {\displaystyle 0,4,8,\dots ,(4n),\ 1,5,\dots ,(4n+1),\ 2,6,\dots ,\ (4n+2),3,7,\dots ,(4n+3).} JPEG, JPEG 2000, and JPEG XR (actually using a frequency decomposition hierarchy rather than interlacing of pixel values) PGF (also using a frequency decomposition) Interlacing is a form of incremental decoding, because the image can be loaded incrementally. Another form of incremental decoding is progressive scan. In progressive scan the loaded image is decoded line for line, so instead of becoming incrementally clearer it becomes incrementally larger. The main difference between the interlace concept in bitmaps and in video is that even progressive bitmaps can be loaded over multiple frames. For example: Interlaced GIF is a GIF image that seems to arrive on your display like an image coming through a slowly opening Venetian blind. A fuzzy outline of an image is gradually replaced by seven successive waves of bit streams that fill in the missing lines until the image arrives at its full resolution. Interlaced graphics were once widely used in web design and before that in the distribution of graphics files over bulletin board systems and other low-speed communications methods. The practice is much less common today, as common broadband internet connections allow most images to be downloaded to the user's screen nearly instantaneously, and interlacing is usually an inefficient method of encoding images. Interlacing has been criticized because it may not be clear to viewers when the image has finished rendering, unlike non-interlaced rendering, where progress is apparent (remaining data appears as blank). Also, the benefits of interlacing to those on low-speed connections may be outweighed by having to download a larger file, as interlaced images typically do not compress as well.

    Read more →
  • Multi-exposure HDR capture

    Multi-exposure HDR capture

    In photography and videography, multi-exposure HDR capture is a technique that creates high dynamic range (HDR) images (or extended dynamic range images) by taking and combining multiple exposures of the same subject matter at different exposures. Combining multiple images in this way results in an image with a greater dynamic range than what would be possible by taking one single image. The technique can also be used to capture video by taking and combining multiple exposures for each frame of the video. The term "HDR" is used frequently to refer to the process of creating HDR images from multiple exposures. Many smartphones have an automated HDR feature that relies on computational imaging techniques to capture and combine multiple exposures. A single image captured by a camera provides a finite range of luminosity inherent to the medium, whether it is a digital sensor or film. Outside this range, tonal information is lost and no features are visible; tones that exceed the range are "burned out" and appear pure white in the brighter areas, while tones that fall below the range are "crushed" and appear pure black in the darker areas. The ratio between the maximum and the minimum tonal values that can be captured in a single image is known as the dynamic range. In photography, dynamic range is measured in exposure value (EV) differences, also known as stops. The human eye's response to light is non-linear: halving the light level does not halve the perceived brightness of a space, it makes it look only slightly dimmer. For most illumination levels, the response is approximately logarithmic. Human eyes adapt fairly rapidly to changes in light levels. HDR can thus produce images that look more like what a human sees when looking at the subject. This technique can be applied to produce images that preserve local contrast for a natural rendering, or exaggerate local contrast for artistic effect. HDR is useful for recording many real-world scenes containing a wider range of brightness than can be captured directly, typically both bright, direct sunlight and deep shadows. Due to the limitations of printing and display contrast, the extended dynamic range of HDR images must be compressed to the range that can be displayed. The method of rendering a high dynamic range image to a standard monitor or printing device is called tone mapping; it reduces the overall contrast of an HDR image to permit display on devices or prints with lower dynamic range. == Benefits == One aim of HDR is to present a similar range of luminance to that experienced through the human visual system. The human eye, through non-linear response, adaptation of the iris, and other methods, adjusts constantly to a broad range of luminance present in the environment. The brain continuously interprets this information so that a viewer can see in a wide range of light conditions. Most cameras are limited to a much narrower range of exposure values within a single image, due to the dynamic range of the capturing medium. With a limited dynamic range, tonal differences can be captured only within a certain range of brightness. Outside of this range, no details can be distinguished: when the tone being captured exceeds the range in bright areas, these tones appear as pure white, and when the tone being captured does not meet the minimum threshold, these tones appear as pure black. Images captured with non-HDR cameras that have a limited exposure range (low dynamic range, LDR), may lose detail in highlights or shadows. Modern CMOS image sensors have improved dynamic range and can often capture a wider range of tones in a single exposure reducing the need to perform multi-exposure HDR. Color film negatives and slides consist of multiple film layers that respond to light differently. Original film (especially negatives versus transparencies or slides) feature a very high dynamic range (in the order of 8 for negatives and 4 to 4.5 for positive transparencies). Multi-exposure HDR is used in photography and also in extreme dynamic range applications such as welding or automotive work. In security cameras the term "wide dynamic range" is used instead of HDR. === Limitations === A fast-moving subject, or camera movement between the multiple exposures, will generate a "ghost" effect or a staggered-blur strobe effect due to the merged images not being identical. Unless the subject is static and the camera mounted on a tripod there may be a tradeoff between extended dynamic range and sharpness. Sudden changes in the lighting conditions (strobed LED light) can also interfere with the desired results, by producing one or more HDR layers that do have the luminosity expected by an automated HDR system, though one might still be able to produce a reasonable HDR image manually in software by rearranging the image layers to merge in order of their actual luminosity. Because of the nonlinearity of some sensors image artifacts can be common. Camera characteristics such as gamma curves, sensor resolution, noise, photometric calibration and color calibration affect resulting high-dynamic-range images. == Process == High-dynamic-range photographs are generally composites of multiple standard dynamic range images, often captured using exposure bracketing. Afterwards, photo manipulation software merges the input files into a single HDR image, which is then also tone mapped in accordance with the limitations of the planned output or display. === Capturing multiple images (exposure bracketing) === Any camera that allows manual exposure control can perform multi-exposure HDR image capture, although one equipped with automatic exposure bracketing (AEB) facilitates the process. Some cameras have an AEB feature that spans a far greater dynamic range than others, from ±0.6 in simpler cameras to ±18 EV in top professional cameras, as of 2020. The exposure value (EV) refers to the amount of light applied to the light-sensitive detector, whether film or digital sensor such as a CCD. An increase or decrease of one stop is defined as a doubling or halving of the amount of light captured. Revealing detail in the darkest of shadows requires an increased EV, while preserving detail in very bright situations requires very low EVs. EV is controlled using one of two photographic controls: varying either the size of the aperture or the exposure time. A set of images with multiple EVs intended for HDR processing should be captured only by altering the exposure time; altering the aperture size also would affect the depth of field and so the resultant multiple images would be quite different, preventing their final combination into a single HDR image. Multi-exposure HDR photography generally is limited to still scenes because any movement between successive images will impede or prevent success in combining them afterward. Also, because the photographer must capture three or more images to obtain the desired luminance range, taking such a full set of images takes extra time. Photographers have developed calculation methods and techniques to partially overcome these problems, but the use of a sturdy tripod is advised to minimize framing differences between exposures. === Merging the images into an HDR image === Tonal information and details from shadow areas can be recovered from images that are deliberately overexposed (i.e., with positive EV compared to the correct scene exposure), while similar tonal information from highlight areas can be recovered from images that are deliberately underexposed (negative EV). The process of selecting and extracting shadow and highlight information from these over/underexposed images and then combining them with image(s) that are exposed correctly for the overall scene is known as exposure fusion. Exposure fusion can be performed manually, relying on the HDR operator's judgment, experience, and training, but usually, fusion is performed automatically by software. === Storing === Information stored in high-dynamic-range images typically corresponds to the physical values of luminance or radiance that can be observed in the real world. This is different from traditional digital images, which represent colors as they should appear on a monitor or a paper print. Therefore, HDR image formats are often called scene-referred, in contrast to traditional digital images, which are device-referred or output-referred. Furthermore, traditional images are usually encoded for the human visual system (maximizing the visual information stored in the fixed number of bits), which is usually called gamma encoding or gamma correction. The values stored for HDR images are often gamma compressed using mathematical functions such as power laws logarithms, or floating point linear values, since fixed-point linear encodings are increasingly inefficient over higher dynamic ranges. HDR images often do not use fixed ranges per color channel, other than traditional images, to represent many more colors over a much wi

    Read more →
  • Vision transformer

    Vision transformer

    A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches (rather than text into tokens), serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings. ViTs were designed as alternatives to convolutional neural networks (CNNs) in computer vision applications. They have different inductive biases, training stability, and data efficiency. Compared to CNNs, ViTs are less data efficient, but have higher capacity. Some of the largest modern computer vision models are ViTs, such as one with 22B parameters. Subsequent to its publication, many variants were proposed, with hybrid architectures with both features of ViTs and CNNs. ViTs have found application in image recognition, image segmentation, weather prediction, and autonomous driving. == History == Transformers were introduced in Attention Is All You Need (2017), and have found widespread use in natural language processing. A 2019 paper applied ideas from the Transformer to computer vision. Specifically, they started with a ResNet, a standard convolutional neural network used for computer vision, and replaced all convolutional kernels by the self-attention mechanism found in a Transformer. It resulted in superior performance. However, it is not a Vision Transformer. In 2020, an encoder-only Transformer was adapted for computer vision, yielding the ViT, which reached state of the art in image classification, overcoming the previous dominance of CNN. The masked autoencoder (2022) extended ViT to work with unsupervised training. The vision transformer and the masked autoencoder, in turn, stimulated new developments in convolutional neural networks. Subsequently, there was cross-fertilization between the previous CNN approach and the ViT approach. In 2021, some important variants of the Vision Transformers were proposed. These variants are mainly intended to be more efficient, more accurate or better suited to a specific domain. Two studies improved efficiency and robustness of ViT by adding a CNN as a preprocessor. The Swin Transformer achieved state-of-the-art results on some object detection datasets such as COCO, by using convolution-like sliding windows of attention mechanism, and the pyramid process in classical computer vision. == Overview == The basic architecture, used by the original 2020 paper, is as follows. In summary, it is a BERT-like encoder-only Transformer. The input image is of type R H × W × C {\displaystyle \mathbb {R} ^{H\times W\times C}} , where H , W , C {\displaystyle H,W,C} are height, width, channel (RGB). It is then split into square-shaped patches of type R P × P × C {\displaystyle \mathbb {R} ^{P\times P\times C}} . For each patch, the patch is pushed through a linear operator, to obtain a vector ("patch embedding"). The position of the patch is also transformed into a vector by "position encoding" (the paper tried no embedding, 1D embedding, 2D embedding, and relative embedding: 1D was adopted). The two vectors are added, then pushed through several Transformer encoders. The attention mechanism in a ViT repeatedly transforms representation vectors of image patches, incorporating more and more semantic relations between image patches in an image. This is analogous to how in natural language processing, as representation vectors flow through a transformer, they incorporate more and more semantic relations between words, from syntax to semantics. The above architecture turns an image into a sequence of vector representations. To use these for downstream applications, an additional head needs to be trained to interpret them. For example, to use it for classification, one can add a shallow MLP on top of it that outputs a probability distribution over classes. The original paper uses a linear-GeLU-linear-softmax network. == Variants == === Original ViT === The original ViT was an encoder-only Transformer supervise-trained to predict the image label from the patches of the image. As in the case of BERT, it uses a special token in the input side, and the corresponding output vector is used as the only input of the final output MLP head. The special token is an architectural hack to allow the model to compress all information relevant for predicting the image label into one vector. Transformers found their initial applications in natural language processing tasks, as demonstrated by language models such as BERT and GPT-3. By contrast the typical image processing system uses a convolutional neural network (CNN). Well-known projects include Xception, ResNet, EfficientNet, DenseNet, and Inception. Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is quadratic in the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. === Architectural improvements === ==== Pooling ==== After the ViT processes an image, it produces some embedding vectors. These must be converted to a single class probability prediction by some kind of network. In the original ViT and Masked Autoencoder, they used a dummy [CLS] token, in emulation of the BERT language model. The output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multiheaded attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. The output from MAP is M u l t i h e a d e d A t t e n t i o n ( Q , V , V ) {\displaystyle \mathrm {MultiheadedAttention} (Q,V,V)} , where q {\displaystyle q} is a trainable query vector, and V {\displaystyle V} is the matrix with rows being x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} . This was first proposed in the Set Transformer architecture. Later papers demonstrated that GAP and MAP both perform better than BERT-like pooling. A variant of MAP was proposed as class attention, which applies MAP, then feedforward, then MAP again. Re-attention was proposed to allow training deep ViT. It changes the multiheaded attention module. === Masked Autoencoder === The Masked Autoencoder took inspiration from denoising autoencoders and context encoders. It has two ViTs put end-to-end. The first one ("encoder") takes in image patches with positional encoding, and outputs vectors representing each patch. The second one (called "decoder", even though it is still an encoder-only Transformer) takes in vectors with positional encoding and outputs image patches again. ==== Training ==== During training, input images (224px x 224 px in the original implementation) are split along a designated number of lines on each axis, producing image patches. A certain percentage of patches are selected to be masked out by mask tokens, while all others are retained in the image. The network is tasked with reconstructing the image from the remaining unmasked patches. Mask tokens in the original implementation are learnable vector quantities. A linear projection with positional embeddings is then applied to the vector of unmasked patches. Experiments varying mask ratio on networks trained on the ImageNet-1K dataset found 75% mask ratios achieved high performance on both finetuning and linear-probing of the encoder's latent space. The MAE processes only unmasked patches during training, increasing the efficiency of data processing in the encoder and lowering the memory usage of the transformer. A less computationally-intensive ViT is used for the decoder in the original implementation of the MAE. Masked patches are added back to the output of the encoder block as mask tokens and both are fed into the decoder. A reconstruction loss is computed for the masked patches to assess network performance. ==== Prediction ==== In prediction, the decoder architecture is discarded entirely. The input image is split into patches by the same algorithm as in training, but no patches are masked out. A linear projection wi

    Read more →
  • Data item

    Data item

    A data item describes an atomic state of a particular object concerning a specific property at a certain time point. A collection of data items for the same object at the same time forms an object instance (or table row). Any type of complex information can be broken down to elementary data items (atomic state). Data items are identified by object (o), property (p) and time (t), while the value (v) is a function of o, p and t: v = F(o,p,t). Values typically are represented by symbols like numbers, texts, images, sounds or videos. Values are not necessarily atomic. A value's complexity depends on the complexity of the property and time component. When looking at databases or XML files, the object is usually identified by an object name or other type of object identifier, which is part of the "data". Properties are defined as columns (table row), properties (object instance) or tags (XML). Often, time is not explicitly expressed and is an attribute applying to the complete data set. Other data collections provide time on the instance level (time series), column level, or even attribute/property level.

    Read more →
  • Digital on-screen graphics by country

    Digital on-screen graphics by country

    Digital on-screen graphics by country are the varying logos and differences of digital on-screen graphics in different countries and regions. == Overview == Digital on-screen graphics (DOGs; also called a digitally originated graphic, bug, network bug, on-screen bug, or screenbug) are almost always placed in one of four corners: the top left, the top right, the bottom left, or the bottom right. There are few exceptions to this rule: most notably, Saturday! in Russia, which places their DOG in the top center. Many news broadcasters, as well as a few television networks, also place a clock alongside their bug. In the United States, Canada, Australia, and New Zealand, DOGs may also include the show's parental guideline rating. In Australia, this is known as a Program Return Graphic (PRG). It has become common to place text above the station's logo advertising other programs on the network. In many countries, some TV networks insert the word "live" near the DOG to advise viewers that the program is live, rather than pre-recorded. During televised sports events, a DOG may also display game-related statistics such as the current score. This has led people in Canada and the United States to refer to such a DOG as a score bug. In many countries, DOGs are removed in non-program sections such as commercials and program trailers, but TV channels in some other countries have retained in full color or instead replaced them in either of these sections or in both sections (like Turkey, Indonesia, Italy, the entirety of South Asia, Vietnam, Taiwan, and Russia). == MENA == === Arab world === Arabic TV logos are placed in the top-right and top-left except for Al-Jazeera, whose logo appears on the bottom-right of the screen. Some Arabian TV stations hide their logos during commercial breaks and promos/trailers, such as Dubai TV, Dubai One, Funoon, the Egyptian CBC and Nile TV networks, ART Hekayat, ART Hekayat 2, Iqraa, and Al-Jazeera. Abu Dhabi TV and MBC1 initially had their logos at the bottom-right corner from their launch until the mid-2000s, when they were moved to the top-right corner. === Iran === Iranian broadcaster IRIB introduced DOGs in early 2000s. Unlike other Middle Eastern nations that introduced DOGs on their TV networks in 1990s, Iran was very late in this practice. Almost all Iranian TV channels display DOGs at top-left corner of the screen. The few exception is IRIB-owned channels remove DOGs during news broadcasts. === Israel === In Israel, Television DOGs were first introduced in 1991. Israeli channel watermarks most often appear on the top left or the top right corner since Israeli cable and satellite-based services often have the channel description and programming (OSD) on the bottom of the screen. Most channels have an opaque, full-color watermark, though exceptions exist, for example Channel 9, which displays a blue-tinted semi-transparent logo. In ad breaks, it is required to replace the channel watermark with another symbol – sometimes on the other edge of the screen – indicating there are ads at the moment. The Israel Broadcasting Authority, whose channels placed their logos in the top left corner, ceased broadcasting in May 2017. The new public broadcaster, the Israeli Public Broadcasting Corporation, displays its logos at the top right instead. The erstwhile Channel 2 as well as its successors, Keshet 12 and Reshet 13, also use the top right corner. However, Channel 10 used the top left corner before rebranding to Eser (Literally "Ten") in 2017 and simultaneously moving its logo to the top right (Not long after, in January 2019, it ceased broadcasting as it merged with Reshet 13). Channel 14 as well as its predecessor Channel 20 use the top right corner as well. The Knesset Channel, however, uses the top left corner. === Morocco === The SNRT and 2M And Al-Aoula Uses permanent on-screen DOGs for their TV channels. In contrast, other channels such as Medi 1 TV hide their DOGs during commercial breaks. == Asia == === Brunei === Radio Television Brunei introduced DOGs in 1994. Like TV channels from neighbouring Malaysia, all DOGs are removed during advertisement breaks. === Cambodia === Cambodian TV channels introduced DOGs in 1995. Like Thailand, all logos are full-color and displayed on the top-right corner of the screen. Some channels such as TV5 hide their logos during commercial breaks. Hang Meas HDTV Logo on the top-left corner of the screen, CTN (Cambodian Television Network), MyTV, Bayon TV, PNN, Logo on the top-right corner of the screen. === China === TV stations in mainland China always place their logo (usually semi-transparent and sometimes animated) in the top-left corner of the screen in full-color or grey-scale. Regardless of the content being broadcast (program or advertisements), some channels like Phoenix Television hide their logos during commercial breaks; although in some rare cases, the DOG may be placed elsewhere to avoid covering the score bug during the broadcast of a sporting event. China introduced logos in 1983 on the bottom-left corner of the screen, but they were used only during commercial breaks and clock idents. Later China Central Television (CCTV) introduced permanent DOGs for all programs in 1992, on the top-left corner of the screen. China also displays a clock on top-right corner of the screen for 1 minute between 59:30–00:30 & 29:30–30:30 time in transition between programs. === Hong Kong === Hong Kong TV introduced DOGs in 1994. Hong Kong DOGs can be either of full color or semi-transparent and (except for RTHK 31) always be hidden during commercial breaks. Television Broadcasts Limited (TVB) placed their logos at the top-right corner of the screen while now-defunct Asia Television and other channels placed their logos at the top-left corner of the screen. Sometimes, weather information, date, and time clocks had been used alongside DOGs in news programs, continuity & live broadcasts. === India === The first on-screen logo in India was introduced in 1984 by DD2 Metro (now DD News). It was white and slightly transparent. All Indian TV channels have on-screen logos. They are always full-colors, never transparent, and they are almost never removed during commercial breaks (though the channels of the South Indian Sun TV Network did so until 2015). The great majority of Indian TV channels place their logos in the top right corner of the screen, though there are exceptions. The corner used may be broadcaster-dependent. Among the big national broadcasters: Channels from the Sony network always use the top right corner, without exception. Star channels also use the top right, with the exception of National Geographic and Nat Geo Wild, which use the top left corner in line with their international counterparts. Past exceptions include The History Channel, whose logo was placed in the top left until it rebranded to Fox History & Entertainment in 2008; the now-defunct Channel V, which used the top left between 2013 and 2016; and Nat Geo People, Nat Geo Music and BabyTV, were withdrawn from India in June 2019. TV18 and Viacom18 channels use the top right corner as well, with the exceptions of regional-language movie channels (e.g., Colors Kannada Cinema and Colors Gujarati Cinema) as well as Colors Super, which have shown their logos at the top left corner since 2018; and VH1, which has always used the bottom right corner. Also, CNBC-TV18, CNBC Awaaz and CNBC Bajar use the bottom right. Moreover, MTV showed its logo in the top left corner until 23 April 2018, when it was moved to the top right (its HD version, launched in 2017, has always used the top right). Unlike most other major networks, the Zee Network's non-news channels containing 'Zee' in their name display their logos at the top left corner and not the top right. This has been the case since 15 October 2017, when almost all the Zee-branded TV channels of the Zee network rebranded with a new logo and, in many cases, a new graphics package and look. Before then, the logos were shown at the top right, as with other broadcasters. (News channels' logos—i.e., logos of channels owned by Zee Media Corporation—stayed put at the top right corner, with the exception of WION, which uses the bottom left.) All the major Zee-branded channels—such as Zee TV, Zee Cinema, Zee Café and the regional-language channels like Zee Tamil, Zee Telugu, Zee Marathi and Zee Bangla—show their logos at the top left; moreover, the Odia-language channel Sarthak TV rebranded to Zee Sarthak and moved its logo to the top left. Among the Zee channels not containing the word 'Zee' that moved their logos to the top left during the big rebrand in 2017 was English movie channel Zee Studio; when it was renamed to &flix on 3 June 2018, the logo remained at the top left. Moreover, Hindi movie channel &pictures has always shown its logo at the top left since its launch in 2013. However, &privé HD, Zee's other English movie channel, and Hindi entertainment channel &TV place the

    Read more →