AI Writing Helper

AI Writing Helper — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Video browsing

    Video browsing

    Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search. == History == Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993. They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x. The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents. === Video Notebook === Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy), or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young. === Video Browser Showdown === The Video Browser Showdown (VBS) is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID. The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.

    Read more →
  • Autocommit

    Autocommit

    In the context of data management, autocommit is a mode of operation of a database connection. Each individual database interaction (i.e., each SQL statement) submitted through the database connection in autocommit mode will be executed in its own transaction that is implicitly committed. A SQL statement executed in autocommit mode cannot be rolled back. Autocommit mode incurs per-statement transaction overhead and can often lead to undesirable performance or resource utilization impact on the database. Nonetheless, in systems such as Microsoft SQL Server, as well as connection technologies such as ODBC and Microsoft OLE DB, autocommit mode is the default for all statements that change data, in order to ensure that individual statements will conform to the ACID (atomicity-consistency-isolation-durability) properties of transactions. The alternative to autocommit mode (non-autocommit) means that the SQL client application itself is responsible for ending transactions explicitly via the commit or rollback SQL commands. Non-autocommit mode enables grouping of multiple data manipulation SQL commands into a single atomic transaction. Some DBMS (e.g. MariaDB) force autocommit for every DDL statement, even in non-autocommit mode. In this case, before each DDL statement, previous DML statements in transaction are autocommitted. Each DDL statement is executed in its own new autocommit transaction.

    Read more →
  • Database

    Database

    In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database. Before digital storage and retrieval of data became widespread, index cards were used for data storage in a wide range of applications and environments: in the home to record and store recipes, shopping lists, contact information and other organizational data; in business to record presentation notes, project research and notes, and contact information; in schools as flash cards or other visual aids; and in academic research to hold data such as bibliographical citations or notes in a card file. Professional book indexers used index cards in the creation of book indexes until they were replaced by indexing software in the 1980s and 1990s. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance. Computer scientists may classify database management systems according to the database models that they support. Relational databases became dominant in the 1980s. These model data as rows and columns in a series of tables, and the vast majority use SQL for writing and querying data. In the 2000s, non-relational databases became popular, collectively referred to as NoSQL, because they use different query languages. == Terminology and overview == Formally, a "database" refers to a set of related data accessed through the use of a "database management system" (DBMS), which is an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized. Because of the close relationship between them, the term "database" is often used casually to refer to both a database and the DBMS used to manipulate it. Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index) as size and usage requirements typically necessitate use of a database management system. Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups: Data definition – Creation, modification and removal of definitions that detail how the data is to be organized. Update – Insertion, modification, and deletion of the data itself. Retrieval – Selecting data according to specified criteria (e.g., a query, a position in a hierarchy, or a position in relation to other data) and providing that data either directly to the user, or making it available for further processing by the database itself or by other applications. The retrieved data may be made available in a more or less direct form without modification, as it is stored in the database, or in a new form obtained by altering it or combining it with existing data from the database. Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure. Both a database and its DBMS conform to the principles of a particular database model. "Database system" refers collectively to the database model, database management system, and database. Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large-volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions. Since DBMSs comprise a significant market, computer and storage vendors often take into account DBMS requirements in their own development plans. Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability, resilience, and security. == History == The sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitude. These performance increases were enabled by the technology progress in the areas of processors, computer memory, computer storage, and computer networks. The concept of a database was made possible by the emergence of direct access storage media such as magnetic disks, which became widely available in the mid-1960s; earlier systems relied on sequential storage of data on magnetic tape. The subsequent development of database technology can be divided into three eras based on data model or structure: navigational, SQL/relational, and post-relational. The two main early navigational data models were the hierarchical model and the CODASYL model (network model). These were characterized by the use of pointers (often physical disk addresses) to follow relationships from one record to another. The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links. The relational model employs sets of ledger-style tables, each used for a different type of entity. Only in the mid-1980s did computing hardware become powerful enough to allow the wide deployment of relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated in all large-scale data processing applications, and as of 2018 they remain dominant: IBM Db2, Oracle, MySQL, and Microsoft SQL Server are the most searched DBMS. The dominant database language, standardized SQL for the relational model, has influenced database languages for other data models. Object databases were developed in the 1980s to overcome the inconvenience of object–relational impedance mismatch, which led to the coining of the term "post-relational" and also the development of hybrid object–relational databases. The next generation of post-relational databases in the late 2000s became known as NoSQL databases, introducing fast key–value stores and document-oriented databases. A competing "next generation" known as NewSQL databases attempted new implementations that retained the relational/SQL model while aiming to match the high performance of NoSQL compared to commercially available relational DBMSs. === 1960s, navigational DBMS === The introduction of the term database coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing. The Oxford English Dictionary cites a 1962 report by the System Development Corporation of California as the first to use the term "data-base" in a specific technical sense. As computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s a number of such systems had come into commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, the Integrated Data Store (IDS), founded the Database Task Group within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971, the Database Task Group delivered their standard, which generally became known as the CODASYL approach, and soon a number of commercial products based on this approach entered the market. The CODASYL approach of

    Read more →
  • Hierarchical RBF

    Hierarchical RBF

    In computer graphics, hierarchical RBF is an interpolation method based on radial basis functions (RBFs). Hierarchical RBF interpolation has applications in treatment of results from a 3D scanner, terrain reconstruction, and the construction of shape models in 3D computer graphics (such as the Stanford bunny, a popular 3D model). This problem is informally named as "large scattered data point set interpolation." == Method == The steps of the interpolation method (in three dimensions) are as follows: Let the scattered points be presented as set P = { c i = ( x i , y i , z i ) | i = 1 N ⊂ R 3 } {\displaystyle \mathbf {P} =\{\mathbf {c} _{i}=(\mathbf {x} _{i},\mathbf {y} _{i},\mathbf {z} _{i})\vert _{i=1}^{N}\subset \mathbb {R} ^{3}\}} Let there exist a set of values of some function in scattered points H = { h i | i = 1 N ⊂ R } {\displaystyle \mathbf {H} =\{\mathbf {h} _{i}\vert _{i=1}^{N}\subset \mathbb {R} \}} Find a function f ( x ) {\displaystyle \mathbf {f} (\mathbf {x} )} that will meet the condition f ( x ) = 1 {\displaystyle \mathbf {f} (\mathbf {x} )=1} for points lying on the shape and f ( x ) ≠ 1 {\displaystyle \mathbf {f} (\mathbf {x} )\neq 1} for points not lying on the shape As J. C. Carr et al. showed, this function takes the form f ( x ) = ∑ i = 1 N λ i φ ( x , c i ) {\displaystyle \mathbf {f} (\mathbf {x} )=\sum _{i=1}^{N}\lambda _{i}\varphi (\mathbf {x} ,\mathbf {c} _{i})} where φ {\displaystyle \varphi } is a radial basis function and λ {\displaystyle \lambda } are the coefficients that are the solution of the following linear system of equations: [ φ ( c 1 , c 1 ) φ ( c 1 , c 2 ) . . . φ ( c 1 , c N ) φ ( c 2 , c 1 ) φ ( c 2 , c 2 ) . . . φ ( c 2 , c N ) . . . . . . . . . . . . φ ( c N , c 1 ) φ ( c N , c 2 ) . . . φ ( c N , c N ) ] ∗ [ λ 1 λ 2 . . . λ N ] = [ h 1 h 2 . . . h N ] {\displaystyle {\begin{bmatrix}\varphi (c_{1},c_{1})&\varphi (c_{1},c_{2})&...&\varphi (c_{1},c_{N})\\\varphi (c_{2},c_{1})&\varphi (c_{2},c_{2})&...&\varphi (c_{2},c_{N})\\...&...&...&...\\\varphi (c_{N},c_{1})&\varphi (c_{N},c_{2})&...&\varphi (c_{N},c_{N})\end{bmatrix}}{\begin{bmatrix}\lambda _{1}\\\lambda _{2}\\...\\\lambda _{N}\end{bmatrix}}={\begin{bmatrix}h_{1}\\h_{2}\\...\\h_{N}\end{bmatrix}}} For determination of surface, it is necessary to estimate the value of function f ( x ) {\displaystyle \mathbf {f} (\mathbf {x} )} in specific points x. A lack of such method is a considerable complication on the order of O ( n 2 ) {\displaystyle \mathbf {O} (\mathbf {n} ^{2})} to calculate RBF, solve system, and determine surface. == Other methods == Reduce interpolation centers ( O ( n 2 ) {\displaystyle \mathbf {O} (\mathbf {n} ^{2})} to calculate RBF and solve system, O ( m n ) {\displaystyle \mathbf {O} (\mathbf {m} \mathbf {n} )} to determine surface) Compactly support RBF ( O ( n log ⁡ n ) {\displaystyle \mathbf {O} (\mathbf {n} \log {\mathbf {n} })} to calculate RBF, O ( n 1.2..1.5 ) {\displaystyle \mathbf {O} (\mathbf {n} ^{1.2..1.5})} to solve system, O ( m log ⁡ n ) {\displaystyle \mathbf {O} (\mathbf {m} \log {\mathbf {n} })} to determine surface) FMM ( O ( n 2 ) {\displaystyle \mathbf {O} (\mathbf {n} ^{2})} to calculate RBF, O ( n log ⁡ n ) {\displaystyle \mathbf {O} (\mathbf {n} \log {\mathbf {n} })} to solve system, O ( m + n log ⁡ n ) {\displaystyle \mathbf {O} (\mathbf {m} +\mathbf {n} \log {\mathbf {n} })} to determine surface) == Hierarchical algorithm == A hierarchical algorithm allows for an acceleration of calculations due to decomposition of intricate problems on the great number of simple (see picture). In this case, hierarchical division of space contains points on elementary parts, and the system of small dimension solves for each. The calculation of surface in this case is taken to the hierarchical (on the basis of tree-structure) calculation of interpolant. A method for a 2D case is offered by Pouderoux J. et al. For a 3D case, a method is used in the tasks of 3D graphics by W. Qiang et al. and modified by Babkov V.

    Read more →
  • Instance-based learning

    Instance-based learning

    In machine learning, instance-based learning (sometimes called memory-based learning) is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have been stored in memory. Because computation is postponed until a new instance is observed, these algorithms are sometimes referred to as "lazy." It is called instance-based because it constructs hypotheses directly from the training instances themselves. This means that the hypothesis complexity can grow with the data: in the worst case, a hypothesis is a list of n training items and the computational complexity of classifying a single new instance is O(n). One advantage that instance-based learning has over other methods of machine learning is its ability to adapt its model to previously unseen data. Instance-based learners may simply store a new instance or throw an old instance away. Examples of instance-based learning algorithms are the k-nearest neighbors algorithm, kernel machines and RBF networks. These store (a subset of) their training set; when predicting a value/class for a new instance, they compute distances or similarities between this instance and the training instances to make a decision. To battle the memory complexity of storing all training instances, as well as the risk of overfitting to noise in the training set, instance reduction algorithms have been proposed.

    Read more →
  • Jeremy Renner Official

    Jeremy Renner Official

    Jeremy Renner Official (or Jeremy Renner on the Google Play Store) was a mobile app created by American actor Jeremy Renner. He created the app in March 2017 to hear the input and comments of his fans. The app was shut down in September 2019 in part due to the frequent bullying and trolling that the platform had experienced. The app featured optional microtransactions, with some ranging up to roughly US$400 despite the app itself being free. Upon shutting down the app, Renner issued a mass-refund for the collectible "stars" in the app for purchases made within the last ninety days, from the day the announcement was posted. He then posted an apology to the app itself, and the app was deleted from both the Google Play Store and the App Store shortly after. == Usage == Upon downloading the app, the user was faced with a video of Renner speaking about his fans and superfans, regular giveaways, and real-life updates. While the app was active, Renner posted regular questions and comments for fans. Renner occasionally livestreamed about his work and day-to-day life. The community developed to include memes, selfies, and a "Happy Rennsday" event on Wednesdays. == History == === 2017–2019 === The app launched in March 2017 with a promotional contest. Renner's fans were encouraged to download the app and create comments about being Renner's biggest fan; Renner would then choose a winner and transport the winner and a guest to have lunch with him at the Calgary Expo. In the first few months Renner teased behind-the-scenes of projects he was working on, which he now sporadically does on Instagram. The app was similarly designed to Instagram as well, with a near identically styled layout. Around midway through 2019, a hoax account of Renner was made to mock the celebrity, joking about masturbating to porn and defending another hoax account of Casey Anthony. FastCompany wrote extensively about Renner's app in April 2019, calling it "a surprising new kind of social media". The Ringer stated "Jeremy Renner's Jeremy Renner app is the Jeremy Renner of apps." === After deletion (2019–2020) === After the shutdown of the app, a comedy-based pseudo-app with modular endings was released, called "The Jeremy Renner App Experience", in which the player plays as Jeremy Renner on the day of the Jeremy Renner Official app's shutdown. The app details several different choices on how Renner handles the situation. A six-part podcast was also created to mock the app's deletion, called The Renner Files, featuring Carolyn Goldfarb and Sarah Ramos. == Controversies == === Marketing === One of the main controversies of Renner's app was its marketing. The app's developers, Escapex, specialized in and grew famous for making similar monetized apps for celebrities. The marketing campaign was based on direct contact with Renner, whose chances were increased with regular payments for "stars", although very few encounters seemed to happen with Renner himself. The multiple problems with the app led the CEO of Escapex, Sephi Shapira, to call the app a "freak situation", and added "Am I concerned about this? Not more than I'm concerned about 50 other things I'm dealing with as a startup company." Along with the marketing failures, the app was seen as misrepresenting itself as seemingly erotic with some advertisements featuring Renner suggestively staring at the camera, despite the actual app being initially considered safe for children. === Harassment === After its release in 2017, the app was met with waves of harassment and bullying by many users on the app, most frequently by using impersonation — referenced in Renner's apology/deletion notice. Some death threats were made across the app by fraud accounts pretending to be several controversial celebrities, including O. J. Simpson and Casey Anthony. As early as October 2017, there were claims of censorship, bullying, and "contest-rigging". In September 2019, comedian Stefan Heck publicized his discovery of the fact that replies through the app appeared as if they were sent by Renner himself in push notifications. After several users abused this feature, Renner asked Escapex to shut down the app.

    Read more →
  • Qapital

    Qapital

    Qapital is a personal finance mobile application (app) for the iOS and Android operating systems, developed by Qapital, LLC. The app is designed to motivate users to save money through a gamification of their spending behavior. It moves money from a user's checking account to a separate Qapital account, when certain rules are triggered. Its database is used by psychology professor Dan Ariely to study consumer behavior. Qapital was released in Sweden in 2013, then in the US in early 2015. The application was later withdrawn from the Swedish market in April 2015, in order to focus on the US market. == History == The idea for Qapital was conceived by ex-bankers in Sweden. The software was designed by twin brothers Daniel and Andreas Källbom of Studio Källbom and released in Sweden in December 2013. The original software was a personal finance dashboard, similar to Mint.com, to show its users how they spent their money. Qapital introduced the app into the US market with a different design in 2014 and started focusing exclusively on the US market. The app was re-designed to focus on building savings rather than managing personal finances. The Swedish version shut down in April 2015. The app was initially restricted to the iOS platform, but an Android version was released at the end of 2015. Shortly after its US launch, Qapital invited psychology professor Dan Ariely to join its team as its "chief behavioral economist". He uses the app's database to conduct research into behavioral economics and Qapital in turn uses Ariely's research in design and programming decisions. In 2017, Qapital added checking and debit card services to the app. == Concept and features == Qapital is a free personal finance app for iOS and Android devices, intended to encourage its users to save money. Qapital directs each of its users to set savings goals, then automatically transfers money from their checking account to an account for savings, when a rule established in the app is met. It uses the "if this then that" (IFTTT) rule-based web-service. For example, one rule could be that if a user purchases a cup of coffee, then the app will round up the charge to the nearest dollar and deposit the difference into savings. Users connect their bank accounts to Qapital, so it knows when purchases are made. When a rule is met, money for savings are transferred to a Qapital account operated in partnership with Lincoln Savings Bank. As of 2015, Qapital can connect to more than 180 other apps, such as Facebook, Twitter, Dropbox and Instagram. For example, connecting to Jawbone allows the user to set a rule that if they take a certain number of steps during the day, a set amount of money is transferred to savings. The app also allows users to monitor activity among their other financial accounts, such as deposits and withdrawals. == Reception == In an October 2015 review, PC Magazine gave Qapital four out of five marks and an editor rating of "excellent." The review praised the app for having a "lovely design" and criticized it for being a, "bit simplistic in some of its rules." Bankrate, in a May 2015 review, gave the app a score of 3/5 for "ease of use," 5/5 for "features," 4/5 for "effectiveness," 4/5 for "value," for a total score of 16/20. The reviewer criticized Qapital's savings account for providing a low-interest rate, but concluded that its numerous features make the app "intriguing" and "it would be difficult to find a standard bank app more fun to use than Qapital."

    Read more →
  • Digital Michelangelo Project

    Digital Michelangelo Project

    The Digital Michelangelo Project was a pioneering initiative undertaken during the 1998–1999 academic year to digitize the sculptures and architecture of Michelangelo using advanced laser scanning technology. The project was led by a team of 30 faculty, staff, and students from Stanford University and the University of Washington, with the aim of creating high-resolution 3D models of Michelangelo's works for scholarly, educational, and preservation purposes. == Objectives == The primary goals of the Digital Michelangelo Project were: To apply recent advancements in laser rangefinder technology for digitizing large cultural artifacts. To create detailed digital archives of Michelangelo's sculptures and architectural spaces for future study and analysis. To explore potential educational and curatorial applications for 3D scanned data. === Artworks digitized === The project involved scanning several iconic works by Michelangelo, including: David The Unfinished Slaves (Atlas, Awakening, Bearded, and Youthful) St. Matthew The allegorical statues from the Medici tombs (Night, Day, Dawn, and Dusk) The architectural interiors of the Tribuna del David at the Galleria dell'Accademia and the New Sacristy in the Medici Chapels. == Technology and methodology == === 3D scanning === The project's primary scanner was a laser triangulation rangefinder mounted on a motorized gantry, custom-built by Cyberware Inc. The scanner used a laser sheet to project onto an object, capturing its shape through triangulation. Multiple scans were taken from various angles and combined into a single, detailed 3D mesh. The resolution achieved was fine enough to capture even Michelangelo's chisel marks, with triangles approximately 0.25 mm on each side. In addition to shape data, color data was captured using a spotlight and a secondary camera, enabling the creation of textured 3D models. === Data processing === The project developed a software suite for processing the scanned data. This included: Aligning and merging multiple scans into a seamless 3D model. Filling holes in the geometry caused by inaccessible areas. Correcting color data for lighting inconsistencies and shadowing. Non-photorealistic rendering techniques were also applied, highlighting surface features such as Michelangelo’s chisel marks for enhanced visualization. == Logistical challenges == The scale and complexity of the project presented several challenges: Data size: The dataset for David alone comprised 2 billion polygons and 7,000 color images, occupying 60 GB of storage. Artifact safety: Ensuring the safety of the statues during scanning required extensive crew training, foam-encased equipment, and collision-prevention mechanisms. == Applications and impact == The digitized models have numerous potential applications: Art history: Allowing precise measurements and geometric analysis, such as determining chisel types or evaluating structural balance. Education: Providing new ways to study art, including interactive viewing from unconventional angles and with custom lighting. Museum curation: Enhancing visitor experiences through interactive kiosks and virtual models. The project demonstrated the potential for 3D technology to preserve and disseminate cultural heritage. == Data distribution == The project's models are available through Stanford University for scholarly purposes, under strict licensing due to Italian intellectual property laws. === ScanView === To provide public access to the 3D models while respecting usage restrictions, the project developed ScanView, a client/server rendering system. ScanView allows users to view and interact with high-resolution 3D models without downloading the data. The client component consists of a freely available viewer program and simplified 3D models. Users can navigate these models locally, adjusting position, orientation, lighting, and surface appearance. When a user finalizes a view, the client queries a remote server for a high-resolution rendering of the model, which is sent back to overwrite the simplified version on the user’s screen. A typical query-response cycle takes 1–2 seconds, depending on network conditions. To protect the models from unauthorized reconstruction, the system employs several security measures, including: Encrypting queries Perturbing viewpoint and lighting parameters Adding noise and warping rendered images Compressing images before transmission ScanView operates on Windows-based PCs and provides access to selected models, including David and St. Matthew, as well as other artifacts such as fragments of the Forma Urbis Romae and items from the Stanford 3D Scanning Repository. == Sponsors == The Digital Michelangelo Project was supported by Stanford University, Interval Research Corporation, and the Paul G. Allen Foundation for the Arts.

    Read more →
  • Cuboid (computer vision)

    Cuboid (computer vision)

    In computer vision, the term cuboid is used to describe a small spatiotemporal volume extracted for purposes of behavior recognition. The cuboid is regarded as a basic geometric primitive type and is used to depict three-dimensional objects within a three dimensional representation of a flat, two dimensional image. == Production == Cuboids can be produced from both two-dimensional and three-dimensional images. One method used to produce cuboids utilizes scene understanding (SUN) primitive databases, which are collections of pictures that already contain cuboids. By sorting through SUN primitive databases with machine learning tools, computers observe the conditions in which cuboids are produced in images from SUN primitive databases and can learn to produce cuboids from other images. RGB-D images, which are RGB images that also record the depth of each pixel, are occasionally used to produce cuboids because computers no longer need to determine the depth of an object, as they typically do because depth is already recorded. Cuboid production is sensitive to changes in color and illumination, blockage, and background clutter. This means that it is difficult for computers to produce cuboids of objects that are multicolored, irregularly illuminated, or partially covered, or if there are many objects in the background. This is partially due to the fact that algorithms for producing cuboids are still relatively simple. == Usage == Cuboids are created for point cloud-based three-dimensional maps and can be utilized in various situations such as augmented reality, the automated control of cars, drones, and robots, and object detection. Cuboids allow for software to identify a scene through geometric descriptions in an “object-agnostic” fashion. Interest points, locations within images that are identified by a computer as essential to identifying the image, created from two-dimensional images can be used with cuboids for image matching, identifying a room or scene, and instance recognition. Interest points created from three dimensional images can be used with cuboids to recognize activities. This is possible because interest points aid software to focus on only the most important aspects of the images. RGB-D images and SLAM systems are used together in RGB-D SLAM systems, which are employed by Computer-aided design systems to generate point cloud-based three-dimensional maps. Most industrial multi-axis machining tools use computer-aided manufacturing and subsequently work in cuboid work spaces.

    Read more →
  • Language-Theoretic Security

    Language-Theoretic Security

    Language-theoretic security, or LangSec, is an approach to software security that focuses on input handling, complexity, and program design as strategies to improve the verifiability of computer programs. It was introduced in 2005 by Robert J. Hansen and Meredith L. Patterson at BlackHat and in 2011 by Len Sassaman and Patterson. It aims to create a formal description of which software is likely to have security vulnerabilities of particular classes, and why. It considers programs to have an inherent parser component, whether or not explicit, composed of that part of the program which operates on external input before that input is fully parsed. A central hypothesis of language-theoretic security is that vulnerabilities in software increase according to the computational power of the notional input-accepting automaton equivalent to this parser, using the definitions of automata theory. The lower bound on this computational power is the input language complexity of the program. The extent to which reducing this complexity is possible is a function of the specification of the communication protocol or file format the program takes as input. == Parsing as a security mechanism == The behaviour of a program is defined with reference to its expected input. Unexpected input being used by a program is a factor in numerous security bugs, including the so-called Android master key vulnerability (CVE-2013-4787), because accepting unexpected input renders the program's specification ambiguous. In that instance, the unexpected ambiguity came in the form of a ZIP file with duplicate filenames. If a program fully parses its input and only acts on input that unambiguously meets the specification, it follows that the program will avoid these types of vulnerabilities. This is an intentional inversion of the Postel principle. Accepting only unambiguous and valid input is a more formal requirement than input validation or sanitization, and narrows the number of possible but unanticipated program states that can be induced in an application via user input. Conversely, failure to do this is associated with security vulnerabilities. Input sanitization in particular is held to be an inadequate approach to avoiding malicious input because it inherently ignores context-sensitive properties of the input; it can therefore result in paradoxical effects, such as sanitization code activating otherwise inert cross-site scripting payloads in browsers. === Parser differentials === If the language of accepted program input is sufficiently simple, it is possible to verify that two implementations parse the same input language consistently. This is advantageous because it shows no parser differential exists between the two implementations. The requisite level of simplicity is theoretically that for which there is a solution to the equivalence problem. If the two parsers involved in CVE-2013-4787 were equivalent - that is, if they rendered the same output state given the same input state - the vulnerability could not have existed. One strategy for doing this is to publish machine-readable specifications of a format or protocol, and then use a parser generator to generate the parser code. An example of a parser generator built for this purpose is DaeDaLus. The combination of Lex with any of GNU Bison, ANTLR, or Yacc also accomplishes this. However, many parser generators allow the mixing of general purpose code with the parsing definitions, which weakens the guarantees provided by parsing. === Analysis of injection attacks === Injection attacks are generally the result of differences between the serializer (or "unparser") and the corresponding parser at a layer boundary in a system; therefore, they are a special case of parser differentials. In a SQL injection attack, for example, an attacker is able to cause the application with which they are interacting to serialize a SQL query that has different semantics than intended. In the simplest case where the payload ends a string and adds new code, the payload has crossed the code-data boundary in SQL. In language-theoretic security, this is treated as a bug in the serializer of the SQL query, which should instead be written in a way that constrains its possible outputs to those within the scope of the intended query. === Parser combinators === If a parser generator is not used, it is still possible to avoid implementation bugs by using parser combinator such as Nom to implement the parser code. This has the drawback of relying on a programmer correctly translating the specification into the language of the parser generator library, though this task is still less error-prone than hand-coding a parser. == Input format complexity == Complexity in computer programs is associated with security vulnerabilities. Within the domain of language-theoretic security, complexity is described with reference to the computational power of the abstract machine necessary to implement the program, or more particularly, to implement the parser for its input language. This complexity describes whether it is possible to show that there is no unintended or undesired functionality in the program which might be exploitable by an attacker. To be bounded in complexity, the program's input must be well-defined both in terms of form and of semantics. === Weird machines === A weird machine is a model of computation in a program that exists in parallel with, but is distinct from, the intended abstract model of computation in that program. Some classes of weird machine arise from the multi-layered nature of computer programs, or the context in which the programs run; others result from the unanticipated functionality a program has due to its complexity or to software bugs. The more complex the computation model of a program, the more likely it is to implement a weird machine. Depending on context, the weird machine may or may not be concretely useful for an attacker. Since the space of weird machines in the context of some program is the universe of all possible states that are not within the program's intended states, many exploited states including remote code execution and injection attacks belong to the domain of weird machines. A reduction in weird machines is therefore a likely correlate with reduced program vulnerability. === SafeDocs project === SafeDocs is a DARPA project undertaken in 2018 to take existing file formats, create safer subsets of them, and develop programming tools to work for the safer formats. The initial test case for this was PDF. The purpose of creating safer subsets in this case is to lower the minimum bound on parser complexity so that it becomes possible to create tools that will generate correct, normative parsers for them. == Relation to programming languages == The analytic framework of language-theoretic security assumes programs to be virtual machines that execute their input. A document that is read by an application is in this sense a form of machine code, in a generalization of the data as code idea, following the automata theory description of parsers. === Type-safe programming languages === Parsing input and serializing output are operations that consume one data type and emit another. A programming language can therefore check that data is correctly parsed and contains the expected structure by checking data types, and correct serializing (or unparsing) can be implemented as operations on the data types that are relevant to the program's output. This approach can be used to show that the recognizer and unparser patterns have been implemented. It is also possible to implement type checking across a distributed system to enforce parsing and unparsing of the expected structures and to verify that the assumptions made in designing the compositional properties of a distributed system have been followed. === Memory-safe programming languages === In the general case, spatial memory correctness is undecidable. If any proof of spatial memory correctness is to be made, it is therefore necessary to bound the complexity of the code. Interpreted languages such as Java and Python effectively accomplish this via runtime bounds checking, and frameworks for runtime bounds checking also exist for C. The effect of these strategies for spatial memory correctness are to create a halt state in place of a spatial memory correctness violation; therefore, it can be shown that the program will not violate spatial memory correctness, but in exchange, it cannot be shown in the general case that programs will not have runtime bounds checking exceptions. Some programming languages, such as Rust, accomplish this using borrow checking. The borrow checker acts to assure spatial memory correctness by compile-time reference counting. Code for which spatial memory correctness cannot be shown to not be violated therefore does not compile, inherently limiting the complexity of the spatial memory correctness of the program to what is decidable. Thi

    Read more →
  • Color gradient

    Color gradient

    In color science, a color gradient (also known as a color ramp or a color progression) specifies a range of position-dependent colors, usually used to fill a region. In assigning colors to a set of values, a gradient is a continuous colormap, a type of color scheme. In computer graphics, the term swatch has come to mean a palette of active colors. == Definitions == Color gradient is a set of colors arranged in a linear order (ordered) A continuous colormap is a curve through a colorspace === Strict definition === A colormap is a function which associate a real value r with point c in color space C {\displaystyle C} f : [ r m i n , r m a x ] ⊂ R → C {\displaystyle f:[r_{min},r_{max}]\subset \mathbf {R} \to C} which is defined by: a colorspace C an increasing sequence of sampling points r 0 < . . . < r m ∈ [ r m i n , r m a x ] {\displaystyle r_{0}<... Read more →

  • TigerGraph

    TigerGraph

    TigerGraph is a private company headquartered in Redwood City, California. It provides graph database and graph analytics software. == History == TigerGraph was founded in 2012 by programmer Yu, Ruoming, Li, Like and Mingxi, under the name GraphSQL. In September 2017, the company came out of stealth mode under the name TigerGraph with $33 million in funding. It raised an additional $32 million in funding in September 2019 and another $105 million in a series C round in February 2021. Cumulative funding as of March 2021 is $170 million. == Products == TigerGraph's hybrid transactional/analytical processing database and analytics software can scale to hundreds of terabytes of data with trillions of edges, and is used for data intensive applications such as fraud detection, customer data analysis (customer 360), IoT, artificial intelligence and machine learning. It is available using the cloud computing delivery model. The analytics uses C++ based software and a parallel processing engine to process algorithms and queries. It has its own graph query language that is similar to SQL. TigerGraph also provides a software development kit for creating graphs and visual representations. As of Mar 2024, TigerGraph version is up to version 4.2.0 TigerGraph offers free Community Edition for developers, researchers, and educators. It can be obtained from https://dl.tigergraph.com/ == Query Language == GSQL , designed by Mingxi Wu and Alin Deutsch in 2015, is a SQL-like Turing complete query language. GSQL includes additions to make it compliant with the Graph Query Language standard.

    Read more →
  • Subvocal recognition

    Subvocal recognition

    Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output, aural or text-based. A silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis. == Input methods == Silent speech interface systems have been created using ultrasound and optical camera input of tongue and lip movements. Electromagnetic devices are another technique for tracking tongue and lip movements. The detection of speech movements by electromyography of speech articulator muscles and the larynx is another technique. Another source of information is the vocal tract resonance signals that get transmitted through bone conduction called non-audible murmurs. They have also been created as a brain–computer interface using brain activity in the motor cortex obtained from intracortical microelectrodes. == Uses == Such devices are created as aids to those unable to create the sound phonation needed for audible speech such as after laryngectomies. Another use is for communication when speech is masked by background noise or distorted by self-contained breathing apparatus. A further practical use is where a need exists for silent communication, such as when privacy is required in a public place, or hands-free data silent transmission is needed during a military or security operation. In 2002, the Japanese company NTT DoCoMo announced it had created a silent mobile phone using electromyography and imaging of lip movement. The company stated that "the spur to developing such a phone was ridding public places of noise," adding that, "the technology is also expected to help people who have permanently lost their voice." The feasibility of using silent speech interfaces for practical communication has since then been shown. In 2019, Arnav Kapur, a researcher from the Massachusetts Institute of Technology, conducted a study known as AlterEgo. Its implementation of the silent speech interface enables direct communication between the human brain and external devices through stimulation of the speech muscles. By leveraging neural signals associated with speech and language, the AlterEgo system deciphers the user's intended words and translates them into text or commands without the need for audible speech. == Research and patents == With a grant from the U.S. Army, research into synthetic telepathy using subvocalization is taking place at the University of California, Irvine under lead scientist Mike D'Zmura. NASA's Ames Research Laboratory in Mountain View, California, under the supervision of Charles Jorgensen is conducting subvocalization research. The Brain Computer Interface R&D program at Wadsworth Center under the New York State Department of Health has confirmed the existing ability to decipher consonants and vowels from imagined speech, which allows for brain-based communication using imagined speech, however using EEGs instead of subvocalization techniques. US Patents on silent communication technologies include: US Patent 6587729 "Apparatus for audibly communicating speech using the radio frequency hearing effect", US Patent 5159703 "Silent subliminal presentation system", US Patent 6011991 "Communication system and method including brain wave analysis and/or use of brain activity", US Patent 3951134 "Apparatus and method for remotely monitoring and altering brain waves". Latter two rely on brain wave analysis. == In fiction == The decoding of silent speech using a computer played an important role in Arthur C. Clarke's story and Stanley Kubrick's associated film A Space Odyssey. In this, HAL 9000, a computer controlling spaceship Discovery One, bound for Jupiter, discovers a plot to deactivate it by the mission astronauts Dave Bowman and Frank Poole through lip reading their conversations. In Orson Scott Card's series (including Ender's Game), the artificial intelligence can be spoken to while the protagonist wears a movement sensor in his jaw, enabling him to converse with the AI without making noise. He also wears an ear implant. In Speaker for the Dead and subsequent novels, author Orson Scott Card described an ear implant, called a "jewel", that allows subvocal communication with computer systems. Author Robert J. Sawyer made use of subvocal recognition to allow silent commands to the cybernetic 'companion implants' used by the advanced Neanderthal characters in his Neanderthal Parallax trilogy of science fiction novels. In Earth, David Brin depicts this technology and its uses as a normal gear in the near future. In Down and Out in the Magic Kingdom, Cory Doctorow has cellphone technology become silent through a cochlear implant and miking the throat to pick up subvocalization. William Gibson's Sprawl Trilogy frequently uses sub-vocalization systems in various devices. In Kage Baker's Company novels, the immortal cyborgs communicate subvocally. In the Hugo Award-winning Hyperion Cantos by Dan Simmons, the characters often use subvocalization to communicate. In the Culture novels by Iain M. Banks, more highly advanced species often communicate subvocally through their technology. In Deus Ex: Human Revolution (2011), the protagonist is augmented with a subvocalization implant for sending covert communications (and a corresponding cochlear implant for receiving covert communications). In the tabletop RPG and video game series Shadowrun, player characters can communicate via subvocal microphones in some instances. In Paranoia, all citizens can speak to the computer via their "cerebral cortech" implants. Alistair Reynolds Revelation Space trilogy frequently uses sub-vocalization systems in various devices.

    Read more →
  • Cyber and Information Domain Service

    Cyber and Information Domain Service

    The Cyber and Information Domain Service (CIDS; German: Cyber- und Informationsraum, lit. 'Cyber and Information space', pronounced [ˈsaɪbɐ ʔʊnt ʔɪnfɔʁmaˈtsi̯oːnsʁaʊm] ; CIR) is the youngest branch of the German Armed Forces, the Bundeswehr. The decision to form an organizational unit was presented by Defense Minister Ursula von der Leyen on 26 April 2016, becoming operational on 1 April 2017. It is headquartered in Bonn. == History == In November 2015, the German Ministry of Defense activated a Staff Group within the ministry tasked with developing plans for a reorganization of the Cyber, IT, military intelligence, geo-information, and operative communication units of the Bundeswehr. On 26 April 2016, Defense Minister Ursula von der Leyen presented the plans for the new military branch to the public and on 5 October 2016 the command's staff became operational as a department within the ministry of defense. On 1 April 2017, the Cyber and Information Domain Service (CIDS) was activated as a "military organizational unit" (Organisationsbereich), indicating its status below a full service branch. The CIDS Headquarters took command of all existing electronic warfare, signals, IT, military intelligence, geoinformation, and psychological operations units. As part of a wider restructuring of higher command in the Bundeswehr in 2024, it was decided to upgrade it from a military organizational unit to the fourth full military service branch, alongside Heer (army), Luftwaffe (air force) and Deutsche Marine (navy). == Organisation == The CIDS is commanded by the Chief of the Cyber and Information Domain Service (Inspekteur des Cyber- und Informationsraum InspCIR), a three-star general position, based in Bonn. As of April 2023, it is structured as follows: Cyber and Information Domain Service Command (Kommando Cyber- und Informationsraum KdoCIR), in Bonn Reconnaissance and Effects Command (Kommando Aufklärung und Wirkung KdoAufkl/Wirk), in Gelsdorf 911th Electronic Warfare Battalion 912th Electronic Warfare Battalion, mans the Oste-class SIGINT/ELINT and reconnaissance ships 931st Electronic Warfare Battalion 932nd Electronic Warfare Battalion, provides airborne troops for operations in enemy territory Cyber-Operations Centre (Zentrum Cyber-Operationen ZSO) Central Imaging Reconnaissance (Zentrale Abbildende Aufklärung ZAbbAufkl), operating the SAR-Lupe satellites Central Bundeswehr Investigation Authority for Technical Reconnaissance (Zentrale Untersuchungsstelle der Bundeswehr für Technische Aufklärung ZU-StelleBwTAufkl) Signals Reconnaissance Centre North (Fernmeldeaufklärungszentrale Nord FmAufklZentr NORD) Signals Reconnaissance Centre South (Fernmeldeaufklärungszentrale Süd FmAufklZentr SÜD) Information Technology Services Command (Kommando Informationstechnik-Services der Bundeswehr KdoIT-SBw), in Bonn 281st Information Technology Battalion 282nd Information Technology Battalion 292nd Information Technology Battalion 293rd Information Technology Battalion 381st Information Technology Battalion 383rd Information Technology Battalion Bundeswehr Geoinformation Centre (Zentrum für Geoinformationswesen der Bundeswehr), in Euskirchen Bundeswehr Cyber-Security Centre (Zentrum für Cyber-Sicherheit der Bundeswehr ZCSBw) Bundeswehr Software Digitalisation Centre (Zentrum Digitalisierung der Bundeswehr und Fähigkeitsentwicklung Cyber- und Informationsraum ZDigBw) Bundeswehr Operational Communications Centre (Zentrum Operative Kommunikation der Bundeswehr ZOpKomBw) Training Centre CIDS (Ausbildungszentrum CIR AusbZ CIR)

    Read more →
  • ImageNet

    ImageNet

    The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes. == History == AI researcher Fei-Fei Li began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms. In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the roughly 22,000 nouns of WordNet and using many of its features. She was also inspired by a 1987 estimate that the average person recognizes roughly 30,000 different kinds of objects. As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used Amazon Mechanical Turk to help with the classification of images. Labeling started in July 2008 and ended in April 2010. It took 49K workers from 167 countries filtering and labeling over 160M candidate images. They had enough budget to have each of the 14 million images labelled three times. The original plan called for 10,000 images per category, for 40,000 categories at 400 million images, each verified 3 times. They found that humans can classify at most 2 images/sec. At this rate, it was estimated to take 19 human-years of labor (without rest). They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida, titled "ImageNet: A Preview of a Large-scale Hierarchical Dataset". The poster was reused at Vision Sciences Society 2009. In 2009, Alex Berg suggested adding object localization as a task. Li approached PASCAL Visual Object Classes contest in 2009 for a collaboration. It resulted in the subsequent ImageNet Large Scale Visual Recognition Challenge starting in 2010, which has 1000 classes and object localization, as compared to PASCAL VOC which had just 20 classes and 19,737 images (in 2010). === Significance for deep learning === On 30 September 2012, a convolutional neural network (CNN) called AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner-up. Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training, an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole." In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest, having 3.57% error on the test set. Andrej Karpathy estimated in 2014 that with concentrated effort, he could reach 5.1% error rate, and ~10 people from his lab reached ~12-13% with less effort. It was estimated that with maximal effort, a human could reach 2.4%. == Dataset == ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification. In 2012, ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute. The original plan of the full ImageNet would have roughly 50M clean, diverse and full resolution images spread over approximately 50K synsets. This was not achieved. The summary statistics given on April 30, 2010: Total number of non-empty synsets: 21841 Total number of images: 14,197,122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million === Categories === The categories of ImageNet were filtered from the WordNet concepts. Each concept, since it can contain multiple synonyms (for example, "kitty" and "young cat"), so each concept is called a "synonym set" or "synset". There were more than 100,000 synsets in WordNet 3.0, majority of them are nouns (80,000+). The ImageNet dataset filtered these to 21,841 synsets that are countable nouns that can be visually illustrated. Each synset in WordNet 3.0 has a "WordNet ID" (wnid), which is a concatenation of part of speech and an "offset" (a unique identifying number). Every wnid starts with "n" because ImageNet only includes nouns. For example, the wnid of synset "dog, domestic dog, Canis familiaris" is "n02084071". The categories in ImageNet fall into 9 levels, from level 1 (such as "mammal") to level 9 (such as "German shepherd"). === Image format === The images were scraped from online image search (Google, Picsearch, MSN, Yahoo, Flickr, etc) using synonyms in multiple languages. For example: German shepherd, German police dog, German shepherd dog, Alsatian, ovejero alemán, pastore tedesco, 德国牧羊犬. ImageNet consists of images in RGB format with varying resolutions. For example, in ImageNet 2012, "fish" category, the resolution ranges from 4288 x 2848 to 75 x 56. In machine learning, these are typically preprocessed into a standard constant resolution, and whitened, before further processing by neural networks. For example, in PyTorch, ImageNet images are by default normalized by dividing the pixel values so that they fall between 0 and 1, then subtracting by [0.485, 0.456, 0.406], then dividing by [0.229, 0.224, 0.225]. These are the mean and standard deviations for ImageNet, so this whitens the input data. === Labels and annotations === Each image is labelled with exactly one wnid. Dense SIFT features (raw SIFT descriptors, quantized codewords, and coordinates of each descriptor/codeword) for ImageNet-1K were available for download, designed for bag of visual words. The bounding boxes of objects were available for about 3000 popular synsets with on average 150 images in each synset. Furthermore, some images have attributes. They released 25 attributes for ~400 popular synsets: Color: black, blue, brown, gray, green, orange, pink, red, violet, white, yellow Pattern: spotted, striped Shape: long, round, rectangular, square Texture: furry, smooth, rough, shiny, metallic, vegetation, wooden, wet === ImageNet-21K === The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k. The full ImageNet-21k was released in Fall of 2011, as fall11_whole.tar. There is no official train-validation-test split for ImageNet-21k. Some classes contain only 1-10 samples, while others contain thousands. === ImageNet-1K === There are various subsets of the ImageNet dataset used in various context, sometimes referred to as "versions". One of the most highly used subsets of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images. Each category in ImageNet-1K is a leaf category, meaning that there are no child nodes below it, unlike ImageNet-21K. For example, in ImageNet-21K, there are some images categorized as simply "mammal", whereas in ImageNet-1K, there are only images categorized as things like "German shepherd", since there are no child-words below "German shepherd". === Later developments === In the WordNet they built ImageNet on, there were 2832 synsets in the "person" subtree. During 2018--2020 period, they removed the download of the ImageNet-21k as they went through extensive filtering in these person synsets. Out of these 2832 synsets, 1593 were deemed "potentially offensive". Out of the remaining 1239, 1081 were deemed not really "visual". The result was that only 158 syn

    Read more →